Abstract
We consider the edge uncertainty in an undirected graph and study the k-median (resp. k-center) problems, where the goal is to partition the graph nodes into k clusters such that the average (resp. minimum) connection probability between each node and its cluster's center is maximized. We analyze the hardness of these problems, and propose algorithms that provide considerably improved approximation guarantees than the existing studies do. Specifically, our algorithms offer (1 -- 1/e)-approximations for the k-median problem and (OPTck)-approximations for the k-center problem, where OPTck is the optimal objective function value for k-center. In addition, our algorithms incorporate several non-trivial optimizations that significantly enhance their practical efficiency. Extensive experimental results demonstrate that our algorithms considerably outperform the existing methods on both computation efficiency and the quality of clustering results.
- https://www.dropbox.com/s/i12see0ochh26gw/CLUGFull.pdf.Google Scholar
- https://github.com/Cecca/ugraph.Google Scholar
- https://www.openmp.org/.Google Scholar
- G. D. Bader and C. W. Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4:2, 2003.Google ScholarCross Ref
- M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35(3):230--239, 1986.Google ScholarCross Ref
- P. Boldi, F. Bonchi, A. Gionis, and T. Tassa. Injecting uncertainty in graphs for identity obfuscation. PVLDB, 5(11):1376--1387, 2012. Google ScholarDigital Library
- F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In KDD, pages 1316--1325, 2014. Google ScholarDigital Library
- C. Borgs, M. Brautbar, J. T. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA, pages 946--957, 2014. Google ScholarDigital Library
- S. Brohee and J. Van Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics, 7:488, 2006.Google ScholarCross Ref
- J. Byrka, T. Pensyl, B. Rybicki, A. Srinivasan, and K. Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In SODA, pages 737--756, 2015. Google ScholarDigital Library
- M. Ceccarello, C. Fantozzi, A. Pietracaprina, G. Pucci, and F. Vandin. Clustering uncertain graphs. PVLDB, 11(4):472--484, 2017. Google ScholarDigital Library
- M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001. Google ScholarDigital Library
- H. N. Chua, K. Ning, W.-K. Sung, H. W. Leong, and L. Wong. Using indirect protein-protein interactions for protein complex prediction. Journal of bioinformatics and computational biology, 6(3):435--466, 2008.Google Scholar
- F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarCross Ref
- K. Fall. A delay-tolerant network architecture for challenged internets. In SIGCOMM, pages 27--34, 2003. Google ScholarDigital Library
- X. Gao, Z. Chen, F. Wu, and G. Chen. Energy efficient algorithms for k-sink minimum movement target coverage problem in mobile sensor network. IEEE/ACM Transactions on Networking, 25(6):3616--3627, Dec 2017. Google ScholarDigital Library
- A.-C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A.-M. Michon, C.-M. Cruciat, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415:141--147, 2002.Google ScholarCross Ref
- T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google ScholarCross Ref
- A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM, pages 241--250, 2010. Google ScholarDigital Library
- Y. Gu, C. Gao, G. Cong, and G. Yu. Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 26(5):1117--1130, 2014. Google ScholarDigital Library
- S. Guo, Y. Gu, B. Jiang, and T. He. Opportunistic flooding in low-duty-cycle wireless sensor networks with unreliable links. In MobiCom, pages 133--144, 2009. Google ScholarDigital Library
- Z. Halim and J. H. Khattak. Density-based clustering of big probabilistic graphs. Evolving Systems, pages 1--18, Mar 2018.Google ScholarCross Ref
- B. Han, J. Li, and A. Srinivasan. Your friends have more friends than you do: Identifying influential mobile users through random-walk sampling. IEEE/ACM Transactions on Networking, 22(5):1389--1400, 2014. Google ScholarDigital Library
- A. L. Hu and K. C. Chan. Utilizing both topological and attribute information for protein complex identification in ppi networks. IEEE/ACM transactions on computational biology and bioinformatics, 10(3):780--792, 2013. Google ScholarDigital Library
- L. Hu, X. Yuan, X. Liu, S. Xiong, and X. Luo. Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.Google ScholarCross Ref
- X. Huang, W. Lu, and L. V. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. In SIGMOD, pages 77--90, 2016. Google ScholarDigital Library
- K. Jain, M. Mahdian, and A. Saberi. A new greedy approach for facility location problems. In STOC, pages 731--740, 2002. Google ScholarDigital Library
- S. Jain, K. Fall, and R. Patra. Routing in a delay tolerant network. In SIGCOMM, pages 145--158, 2004. Google ScholarDigital Library
- R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992--1000, 2011. Google ScholarDigital Library
- R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. PVLDB, 4(9):551--562, 2011. Google ScholarDigital Library
- P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein. Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with zebranet. In ASPLOS, pages 96--107, 2002. Google ScholarDigital Library
- A. Khan, F. Bonchi, A. Gionis, and F. Gullo. Fast reliability search in uncertain graphs. In EDBT, pages 535--546, 2014.Google Scholar
- A. Khan and L. Chen. On uncertain graphs modeling and queries. PVLDB, 8(12):2042--2043, 2015. Google ScholarDigital Library
- A. Khan, F. Gullo, T. Wohler, and F. Bonchi. Top-k reliable edge colors in uncertain graphs. In CIKM, pages 1851--1854, 2015. Google ScholarDigital Library
- D. Kim, W. Wang, N. Sohaee, C. Ma, W. Wu, W. Lee, and D. Du. Minimum data-latency-boundk-sink placement problem in wireless sensor networks. IEEE/ACM Transactions on Networking, 19(5):1344--1353, Oct 2011. Google ScholarDigital Library
- G. Kollios, M. Potamias, and E. Terzi. Clustering large probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 25(2):325--336, 2013. Google ScholarDigital Library
- A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, pages 71--104. Cambridge University Press, 2014.Google Scholar
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarDigital Library
- X. Li, M. Wu, C.-K. Kwoh, and S.-K. Ng. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC genomics, 11(1):S3, 2010.Google ScholarCross Ref
- L. Liu, R. Jin, C. Aggarwal, and Y. Shen. Reliable clustering on uncertain graphs. In ICDM, pages 459--468, 2012. Google ScholarDigital Library
- W. Lu, W. Chen, and L. V. Lakshmanan. From competition to complementarity: comparative influence diffusion and maximization. PVLDB, 9(2):60--71, 2015. Google ScholarDigital Library
- Z. Lu, X. Sun, and T. L. Porta. Cooperative data offloading in opportunistic mobile networks. In INFOCOM, pages 1--9, 2016.Google ScholarCross Ref
- Z. Lu, Y. Wen, and G. Cao. Information diffusion in mobile social networks: The speed perspective. In INFOCOM, pages 1932--1940, 2014.Google ScholarCross Ref
- Z. Lu, Y. Wen, W. Zhang, Q. Zheng, and G. Cao. Towards information diffusion in mobile social networks. IEEE Transactions on Mobile Computing, 15(5):1292--1304, 2016. Google ScholarDigital Library
- B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. In AAAI, pages 1812--1818, 2015. Google ScholarDigital Library
- P. Parchas, F. Gullo, D. Papadias, and F. Bonchi. The pursuit of a good possible world: extracting representative instances of uncertain graphs. In SIGMOD, pages 967--978, 2014. Google ScholarDigital Library
- M. Pellegrini, M. Baglioni, and F. Geraci. Protein complex prediction for large protein protein interaction networks with the core&peel method. BMC bioinformatics, 17(12):372, 2016.Google ScholarCross Ref
- L. Pelusi, A. Passarella, and M. Conti. Opportunistic networking: Data forwarding in disconnected mobile ad hoc networks. IEEE Communications, 44(11):134--141, 2006. Google ScholarDigital Library
- M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. K-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarDigital Library
- S. Pu, J. Wong, B. Turner, E. Cho, and S. J. Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic acids research, 37(3):825--831, 2008.Google Scholar
- M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian. Fast influence-based coarsening for large networks. In KDD, pages 1296--1305, 2014. Google ScholarDigital Library
- S. Rajasegarar, C. Leckie, and M. Palaniswami. Anomaly detection in wireless sensor networks. IEEE Wireless Communications, 15(4):34--40, Aug 2008. Google ScholarDigital Library
- R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-probability pcp characterization of np. In STOC, pages 475--484, 1997. Google ScholarDigital Library
- S. Srihari and H. W. Leong. A survey of computational methods for protein complex prediction from protein interaction networks. Journal of Bioinformatics and Computational Biology, 11(2), 2013.Google ScholarCross Ref
- J. Tang, X. Tang, X. Xiao, and J. Yuan. Online processing algorithms for influence maximization. In SIGMOD, pages 991--1005, 2018. Google ScholarDigital Library
- J. Tang, X. Tang, and J. Yuan. Influence maximization meets efficiency and effectiveness: A hop-based approach. In ASONAM, pages 64--71, 2017. Google ScholarDigital Library
- J. Tang, X. Tang, and J. Yuan. An efficient and effective hop-based approach for inluence maximization in social networks. Social Network Analysis and Mining, 8:10, 2018.Google ScholarCross Ref
- Y. Tang, Y. Shi, and X. Xiao. Influence maximization in near-linear time: A martingale approach. In SIGMOD, pages 1539--1554, 2015. Google ScholarDigital Library
- V. V. Vazirani. Approximation Algorithms. Springer-Verlag, Berlin, 2001. Google ScholarDigital Library
- D. P. Williamson and D. B. Shmoys. The design of approximation algorithms. Cambridge university press, 2011. Google ScholarDigital Library
- B. Xu, H. Lin, and Z. Yang. Ontology integration to identify protein complex in protein interaction networks. Proteome science, 9(1):S7, 2011.Google ScholarCross Ref
- Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. PVLDB, 5(9):800--811, 2012. Google ScholarDigital Library
- Y. Yuan, G. Wang, H. Wang, and L. Chen. Efficient subgraph search over large uncertain graphs. PVLDB, 4(11):876--886, 2011.Google ScholarDigital Library
- F. Zhao and A. K. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, 6(2):85--96, 2012. Google ScholarDigital Library
- W. Zheng, L. Zou, X. Lian, J. X. Yu, S. Song, and D. Zhao. How to build templates for rdf question/answering: An uncertain graph similarity join approach. In SIGMOD, pages 1809--1824, 2015. Google ScholarDigital Library
- Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarDigital Library
Recommendations
Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs
Recently, probabilistic graphs have attracted significant interests of the data mining community. It is observed that correlations may exist among adjacent edges in various probabilistic graphs. As one of the basic mining techniques, graph clustering is ...
Efficient algorithms for Roman domination on some classes of graphs
A Roman dominating function of a graph G=(V,E) is a function f:V->{0,1,2} such that every vertex x with f(x)=0 is adjacent to at least one vertex y with f(y)=2. The weight of a Roman dominating function is defined to be f(V)=@?"x"@?"Vf(x), and the ...
Stable structural clustering in uncertain graphs
AbstractThe uncertain graph is widely used to model and analyze graph data in which the relation between objects is uncertain. We here study the structural clustering in uncertain graphs. As an important method in graph clustering, structural ...
Comments