skip to main content
research-article

Efficient and effective algorithms for clustering uncertain graphs

Published:01 February 2019Publication History
Skip Abstract Section

Abstract

We consider the edge uncertainty in an undirected graph and study the k-median (resp. k-center) problems, where the goal is to partition the graph nodes into k clusters such that the average (resp. minimum) connection probability between each node and its cluster's center is maximized. We analyze the hardness of these problems, and propose algorithms that provide considerably improved approximation guarantees than the existing studies do. Specifically, our algorithms offer (1 -- 1/e)-approximations for the k-median problem and (OPTck)-approximations for the k-center problem, where OPTck is the optimal objective function value for k-center. In addition, our algorithms incorporate several non-trivial optimizations that significantly enhance their practical efficiency. Extensive experimental results demonstrate that our algorithms considerably outperform the existing methods on both computation efficiency and the quality of clustering results.

References

  1. https://www.dropbox.com/s/i12see0ochh26gw/CLUGFull.pdf.Google ScholarGoogle Scholar
  2. https://github.com/Cecca/ugraph.Google ScholarGoogle Scholar
  3. https://www.openmp.org/.Google ScholarGoogle Scholar
  4. G. D. Bader and C. W. Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, 4:2, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. O. Ball. Computational complexity of network reliability analysis: An overview. IEEE Transactions on Reliability, 35(3):230--239, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Boldi, F. Bonchi, A. Gionis, and T. Tassa. Injecting uncertainty in graphs for identity obfuscation. PVLDB, 5(11):1376--1387, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In KDD, pages 1316--1325, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Borgs, M. Brautbar, J. T. Chayes, and B. Lucier. Maximizing social influence in nearly optimal time. In SODA, pages 946--957, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Brohee and J. Van Helden. Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics, 7:488, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Byrka, T. Pensyl, B. Rybicki, A. Srinivasan, and K. Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In SODA, pages 737--756, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ceccarello, C. Fantozzi, A. Pietracaprina, G. Pucci, and F. Vandin. Clustering uncertain graphs. PVLDB, 11(4):472--484, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. N. Chua, K. Ning, W.-K. Sung, H. W. Leong, and L. Wong. Using indirect protein-protein interactions for protein complex prediction. Journal of bioinformatics and computational biology, 6(3):435--466, 2008.Google ScholarGoogle Scholar
  14. F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79--127, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. Fall. A delay-tolerant network architecture for challenged internets. In SIGCOMM, pages 27--34, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Gao, Z. Chen, F. Wu, and G. Chen. Energy efficient algorithms for k-sink minimum movement target coverage problem in mobile sensor network. IEEE/ACM Transactions on Networking, 25(6):3616--3627, Dec 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A.-C. Gavin, M. Bösche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J. M. Rick, A.-M. Michon, C.-M. Cruciat, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415:141--147, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In WSDM, pages 241--250, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Gu, C. Gao, G. Cong, and G. Yu. Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 26(5):1117--1130, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Guo, Y. Gu, B. Jiang, and T. He. Opportunistic flooding in low-duty-cycle wireless sensor networks with unreliable links. In MobiCom, pages 133--144, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Halim and J. H. Khattak. Density-based clustering of big probabilistic graphs. Evolving Systems, pages 1--18, Mar 2018.Google ScholarGoogle ScholarCross RefCross Ref
  23. B. Han, J. Li, and A. Srinivasan. Your friends have more friends than you do: Identifying influential mobile users through random-walk sampling. IEEE/ACM Transactions on Networking, 22(5):1389--1400, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. L. Hu and K. C. Chan. Utilizing both topological and attribute information for protein complex identification in ppi networks. IEEE/ACM transactions on computational biology and bioinformatics, 10(3):780--792, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Hu, X. Yuan, X. Liu, S. Xiong, and X. Luo. Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  26. X. Huang, W. Lu, and L. V. Lakshmanan. Truss decomposition of probabilistic graphs: Semantics and algorithms. In SIGMOD, pages 77--90, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Jain, M. Mahdian, and A. Saberi. A new greedy approach for facility location problems. In STOC, pages 731--740, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Jain, K. Fall, and R. Patra. Routing in a delay tolerant network. In SIGCOMM, pages 145--158, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Jin, L. Liu, and C. C. Aggarwal. Discovering highly reliable subgraphs in uncertain graphs. In KDD, pages 992--1000, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Jin, L. Liu, B. Ding, and H. Wang. Distance-constraint reachability computation in uncertain graphs. PVLDB, 4(9):551--562, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein. Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with zebranet. In ASPLOS, pages 96--107, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Khan, F. Bonchi, A. Gionis, and F. Gullo. Fast reliability search in uncertain graphs. In EDBT, pages 535--546, 2014.Google ScholarGoogle Scholar
  33. A. Khan and L. Chen. On uncertain graphs modeling and queries. PVLDB, 8(12):2042--2043, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Khan, F. Gullo, T. Wohler, and F. Bonchi. Top-k reliable edge colors in uncertain graphs. In CIKM, pages 1851--1854, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Kim, W. Wang, N. Sohaee, C. Ma, W. Wu, W. Lee, and D. Du. Minimum data-latency-boundk-sink placement problem in wireless sensor networks. IEEE/ACM Transactions on Networking, 19(5):1344--1353, Oct 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Kollios, M. Potamias, and E. Terzi. Clustering large probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 25(2):325--336, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Krause and D. Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, pages 71--104. Cambridge University Press, 2014.Google ScholarGoogle Scholar
  38. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD, pages 420--429, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Li, M. Wu, C.-K. Kwoh, and S.-K. Ng. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC genomics, 11(1):S3, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  40. L. Liu, R. Jin, C. Aggarwal, and Y. Shen. Reliable clustering on uncertain graphs. In ICDM, pages 459--468, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Lu, W. Chen, and L. V. Lakshmanan. From competition to complementarity: comparative influence diffusion and maximization. PVLDB, 9(2):60--71, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z. Lu, X. Sun, and T. L. Porta. Cooperative data offloading in opportunistic mobile networks. In INFOCOM, pages 1--9, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  43. Z. Lu, Y. Wen, and G. Cao. Information diffusion in mobile social networks: The speed perspective. In INFOCOM, pages 1932--1940, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  44. Z. Lu, Y. Wen, W. Zhang, Q. Zheng, and G. Cao. Towards information diffusion in mobile social networks. IEEE Transactions on Mobile Computing, 15(5):1292--1304, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. Vondrák, and A. Krause. Lazier than lazy greedy. In AAAI, pages 1812--1818, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Parchas, F. Gullo, D. Papadias, and F. Bonchi. The pursuit of a good possible world: extracting representative instances of uncertain graphs. In SIGMOD, pages 967--978, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Pellegrini, M. Baglioni, and F. Geraci. Protein complex prediction for large protein protein interaction networks with the core&peel method. BMC bioinformatics, 17(12):372, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  48. L. Pelusi, A. Passarella, and M. Conti. Opportunistic networking: Data forwarding in disconnected mobile ad hoc networks. IEEE Communications, 44(11):134--141, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. K-nearest neighbors in uncertain graphs. PVLDB, 3(1):997--1008, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Pu, J. Wong, B. Turner, E. Cho, and S. J. Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic acids research, 37(3):825--831, 2008.Google ScholarGoogle Scholar
  51. M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian. Fast influence-based coarsening for large networks. In KDD, pages 1296--1305, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Rajasegarar, C. Leckie, and M. Palaniswami. Anomaly detection in wireless sensor networks. IEEE Wireless Communications, 15(4):34--40, Aug 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-probability pcp characterization of np. In STOC, pages 475--484, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. S. Srihari and H. W. Leong. A survey of computational methods for protein complex prediction from protein interaction networks. Journal of Bioinformatics and Computational Biology, 11(2), 2013.Google ScholarGoogle ScholarCross RefCross Ref
  55. J. Tang, X. Tang, X. Xiao, and J. Yuan. Online processing algorithms for influence maximization. In SIGMOD, pages 991--1005, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. Tang, X. Tang, and J. Yuan. Influence maximization meets efficiency and effectiveness: A hop-based approach. In ASONAM, pages 64--71, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Tang, X. Tang, and J. Yuan. An efficient and effective hop-based approach for inluence maximization in social networks. Social Network Analysis and Mining, 8:10, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  58. Y. Tang, Y. Shi, and X. Xiao. Influence maximization in near-linear time: A martingale approach. In SIGMOD, pages 1539--1554, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. V. V. Vazirani. Approximation Algorithms. Springer-Verlag, Berlin, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. D. P. Williamson and D. B. Shmoys. The design of approximation algorithms. Cambridge university press, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. B. Xu, H. Lin, and Z. Yang. Ontology integration to identify protein complex in protein interaction networks. Proteome science, 9(1):S7, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  62. Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. PVLDB, 5(9):800--811, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Y. Yuan, G. Wang, H. Wang, and L. Chen. Efficient subgraph search over large uncertain graphs. PVLDB, 4(11):876--886, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. F. Zhao and A. K. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, 6(2):85--96, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. W. Zheng, L. Zou, X. Lian, J. X. Yu, S. Song, and D. Zhao. How to build templates for rdf question/answering: An uncertain graph similarity join approach. In SIGMOD, pages 1809--1824, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 6
    February 2019
    100 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 February 2019
    Published in pvldb Volume 12, Issue 6

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader