research-article

Efficient and effective algorithms for clustering uncertain graphs

Authors:
Kai Han

University of Science and Technology of China

University of Science and Technology of China
View Profile

,
Fei Gui

National University of Singapore

National University of Singapore
View Profile

,
Xiaokui Xiao

National University of Singapore

National University of Singapore
View Profile

,
Jing Tang

National University of Singapore

National University of Singapore
View Profile

,
Yuntian He

University of Science and Technology of China

University of Science and Technology of China
View Profile

,
Zongmai Cao

University of Science and Technology of China

University of Science and Technology of China
View Profile

,
He Huang

Soochow University

Soochow University
View Profile

Proceedings of the VLDB Endowment Volume 12 Issue 6pp 667–680https://doi.org/10.14778/3311880.3311884

Published:01 February 2019Publication History

Proceedings of the VLDB Endowment

Abstract

We consider the edge uncertainty in an undirected graph and study the k-median (resp. k-center) problems, where the goal is to partition the graph nodes into k clusters such that the average (resp. minimum) connection probability between each node and its cluster's center is maximized. We analyze the hardness of these problems, and propose algorithms that provide considerably improved approximation guarantees than the existing studies do. Specifically, our algorithms offer (1 -- 1/e)-approximations for the k-median problem and (OPTck)-approximations for the k-center problem, where OPTck is the optimal objective function value for k-center. In addition, our algorithms incorporate several non-trivial optimizations that significantly enhance their practical efficiency. Extensive experimental results demonstrate that our algorithms considerably outperform the existing methods on both computation efficiency and the quality of clustering results.

References

https://www.dropbox.com/s/i12see0ochh26gw/CLUGFull.pdf.Google Scholar
https://github.com/Cecca/ugraph.Google Scholar
https://www.openmp.org/.Google Scholar
G. D. Bader and C. W. Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4:2, 2003.Google ScholarCross Ref
M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35(3):230--239, 1986.Google ScholarCross Ref
P. Boldi, F. Bonchi, A. Gionis, and T. Tassa. Injecting uncertainty in graphs for identity obfuscation. PVLDB, 5(11):1376--1387, 2012. Google ScholarDigital Library
F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In KDD, pages 1316--1325, 2014. Google ScholarDigital Library
C. Borgs, M. Brautbar, J. T. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA, pages 946--957, 2014. Google ScholarDigital Library
S. Brohee and J. Van Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics, 7:488, 2006.Google ScholarCross Ref
J. Byrka, T. Pensyl, B. Rybicki, A. Srinivasan, and K. Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In SODA, pages 737--756, 2015. Google ScholarDigital Library
M. Ceccarello, C. Fantozzi, A. Pietracaprina, G. Pucci, and F. Vandin. Clustering uncertain graphs. PVLDB, 11(4):472--484, 2017. Google ScholarDigital Library
M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001. Google ScholarDigital Library
H. N. Chua, K. Ning, W.-K. Sung, H. W. Leong, and L. Wong. Using indirect protein-protein interactions for protein complex prediction. Journal of bioinformatics and computational biology, 6(3):435--466, 2008.Google Scholar
F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
K. Fall. A delay-tolerant network architecture for challenged internets. In SIGCOMM, pages 27--34, 2003. Google ScholarDigital Library
X. Gao, Z. Chen, F. Wu, and G. Chen. Energy efficient algorithms for k-sink minimum movement target coverage problem in mobile sensor network. IEEE/ACM Transactions on Networking, 25(6):3616--3627, Dec 2017. Google ScholarDigital Library
A.-C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A.-M. Michon, C.-M. Cruciat, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415:141--147, 2002.Google ScholarCross Ref
T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google ScholarCross Ref
A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM, pages 241--250, 2010. Google ScholarDigital Library
Y. Gu, C. Gao, G. Cong, and G. Yu. Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 26(5):1117--1130, 2014. Google ScholarDigital Library
S. Guo, Y. Gu, B. Jiang, and T. He. Opportunistic flooding in low-duty-cycle wireless sensor networks with unreliable links. In MobiCom, pages 133--144, 2009. Google ScholarDigital Library
Z. Halim and J. H. Khattak. Density-based clustering of big probabilistic graphs. Evolving Systems, pages 1--18, Mar 2018.Google ScholarCross Ref
B. Han, J. Li, and A. Srinivasan. Your friends have more friends than you do: Identifying influential mobile users through random-walk sampling. IEEE/ACM Transactions on Networking, 22(5):1389--1400, 2014. Google ScholarDigital Library
A. L. Hu and K. C. Chan. Utilizing both topological and attribute information for protein complex identification in ppi networks. IEEE/ACM transactions on computational biology and bioinformatics, 10(3):780--792, 2013. Google ScholarDigital Library
L. Hu, X. Yuan, X. Liu, S. Xiong, and X. Luo. Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.Google ScholarCross Ref
X. Huang, W. Lu, and L. V. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. In SIGMOD, pages 77--90, 2016. Google ScholarDigital Library
K. Jain, M. Mahdian, and A. Saberi. A new greedy approach for facility location problems. In STOC, pages 731--740, 2002. Google ScholarDigital Library
S. Jain, K. Fall, and R. Patra. Routing in a delay tolerant network. In SIGCOMM, pages 145--158, 2004. Google ScholarDigital Library
R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992--1000, 2011. Google ScholarDigital Library
R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. PVLDB, 4(9):551--562, 2011. Google ScholarDigital Library
P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein. Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with zebranet. In ASPLOS, pages 96--107, 2002. Google ScholarDigital Library
A. Khan, F. Bonchi, A. Gionis, and F. Gullo. Fast reliability search in uncertain graphs. In EDBT, pages 535--546, 2014.Google Scholar
A. Khan and L. Chen. On uncertain graphs modeling and queries. PVLDB, 8(12):2042--2043, 2015. Google ScholarDigital Library
A. Khan, F. Gullo, T. Wohler, and F. Bonchi. Top-k reliable edge colors in uncertain graphs. In CIKM, pages 1851--1854, 2015. Google ScholarDigital Library
D. Kim, W. Wang, N. Sohaee, C. Ma, W. Wu, W. Lee, and D. Du. Minimum data-latency-boundk-sink placement problem in wireless sensor networks. IEEE/ACM Transactions on Networking, 19(5):1344--1353, Oct 2011. Google ScholarDigital Library
G. Kollios, M. Potamias, and E. Terzi. Clustering large probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 25(2):325--336, 2013. Google ScholarDigital Library
A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, pages 71--104. Cambridge University Press, 2014.Google Scholar
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarDigital Library
X. Li, M. Wu, C.-K. Kwoh, and S.-K. Ng. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC genomics, 11(1):S3, 2010.Google ScholarCross Ref
L. Liu, R. Jin, C. Aggarwal, and Y. Shen. Reliable clustering on uncertain graphs. In ICDM, pages 459--468, 2012. Google ScholarDigital Library
W. Lu, W. Chen, and L. V. Lakshmanan. From competition to complementarity: comparative influence diffusion and maximization. PVLDB, 9(2):60--71, 2015. Google ScholarDigital Library
Z. Lu, X. Sun, and T. L. Porta. Cooperative data offloading in opportunistic mobile networks. In INFOCOM, pages 1--9, 2016.Google ScholarCross Ref
Z. Lu, Y. Wen, and G. Cao. Information diffusion in mobile social networks: The speed perspective. In INFOCOM, pages 1932--1940, 2014.Google ScholarCross Ref
Z. Lu, Y. Wen, W. Zhang, Q. Zheng, and G. Cao. Towards information diffusion in mobile social networks. IEEE Transactions on Mobile Computing, 15(5):1292--1304, 2016. Google ScholarDigital Library
B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. In AAAI, pages 1812--1818, 2015. Google ScholarDigital Library
P. Parchas, F. Gullo, D. Papadias, and F. Bonchi. The pursuit of a good possible world: extracting representative instances of uncertain graphs. In SIGMOD, pages 967--978, 2014. Google ScholarDigital Library
M. Pellegrini, M. Baglioni, and F. Geraci. Protein complex prediction for large protein protein interaction networks with the core&peel method. BMC bioinformatics, 17(12):372, 2016.Google ScholarCross Ref
L. Pelusi, A. Passarella, and M. Conti. Opportunistic networking: Data forwarding in disconnected mobile ad hoc networks. IEEE Communications, 44(11):134--141, 2006. Google ScholarDigital Library
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. K-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarDigital Library
S. Pu, J. Wong, B. Turner, E. Cho, and S. J. Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic acids research, 37(3):825--831, 2008.Google Scholar
M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian. Fast influence-based coarsening for large networks. In KDD, pages 1296--1305, 2014. Google ScholarDigital Library
S. Rajasegarar, C. Leckie, and M. Palaniswami. Anomaly detection in wireless sensor networks. IEEE Wireless Communications, 15(4):34--40, Aug 2008. Google ScholarDigital Library
R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-probability pcp characterization of np. In STOC, pages 475--484, 1997. Google ScholarDigital Library
S. Srihari and H. W. Leong. A survey of computational methods for protein complex prediction from protein interaction networks. Journal of Bioinformatics and Computational Biology, 11(2), 2013.Google ScholarCross Ref
J. Tang, X. Tang, X. Xiao, and J. Yuan. Online processing algorithms for influence maximization. In SIGMOD, pages 991--1005, 2018. Google ScholarDigital Library
J. Tang, X. Tang, and J. Yuan. Influence maximization meets efficiency and effectiveness: A hop-based approach. In ASONAM, pages 64--71, 2017. Google ScholarDigital Library
J. Tang, X. Tang, and J. Yuan. An efficient and effective hop-based approach for inluence maximization in social networks. Social Network Analysis and Mining, 8:10, 2018.Google ScholarCross Ref
Y. Tang, Y. Shi, and X. Xiao. Influence maximization in near-linear time: A martingale approach. In SIGMOD, pages 1539--1554, 2015. Google ScholarDigital Library
V. V. Vazirani. Approximation Algorithms. Springer-Verlag, Berlin, 2001. Google ScholarDigital Library
D. P. Williamson and D. B. Shmoys. The design of approximation algorithms. Cambridge university press, 2011. Google ScholarDigital Library
B. Xu, H. Lin, and Z. Yang. Ontology integration to identify protein complex in protein interaction networks. Proteome science, 9(1):S7, 2011.Google ScholarCross Ref
Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. PVLDB, 5(9):800--811, 2012. Google ScholarDigital Library
Y. Yuan, G. Wang, H. Wang, and L. Chen. Efficient subgraph search over large uncertain graphs. PVLDB, 4(11):876--886, 2011.Google ScholarDigital Library
F. Zhao and A. K. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, 6(2):85--96, 2012. Google ScholarDigital Library
W. Zheng, L. Zou, X. Lian, J. X. Yu, S. Song, and D. Zhao. How to build templates for rdf question/answering: An uncertain graph similarity join approach. In SIGMOD, pages 1809--1824, 2015. Google ScholarDigital Library
Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarDigital Library

Recommendations

Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs

Recently, probabilistic graphs have attracted significant interests of the data mining community. It is observed that correlations may exist among adjacent edges in various probabilistic graphs. As one of the basic mining techniques, graph clustering is ...
Read More
Efficient algorithms for Roman domination on some classes of graphs

A Roman dominating function of a graph G=(V,E) is a function f:V->{0,1,2} such that every vertex x with f(x)=0 is adjacent to at least one vertex y with f(y)=2. The weight of a Roman dominating function is defined to be f(V)=@?"x"@?"Vf(x), and the ...
Read More
Stable structural clustering in uncertain graphs
Abstract
The uncertain graph is widely used to model and analyze graph data in which the relation between objects is uncertain. We here study the structural clustering in uncertain graphs. As an important method in graph clustering, structural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 12, Issue 6
February 2019
100 pages
ISSN:2150-8097
Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 February 2019
Published in pvldb Volume 12, Issue 6
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 218
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient and effective algorithms for clustering uncertain graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs

Efficient algorithms for Roman domination on some classes of graphs

Stable structural clustering in uncertain graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient and effective algorithms for clustering uncertain graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs

Efficient algorithms for Roman domination on some classes of graphs

Stable structural clustering in uncertain graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media