Abstract
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics.
In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.
- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 123--132. ACM, 2012. Google ScholarDigital Library
- A. Asuncion and D. Newman. Uci machine learning repository, 2007.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3: 993--1022, 2003.Google ScholarDigital Library
- J. L. Boyd-Graber, D. M. Blei, and X. Zhu. A topic model for word sense disambiguation. In EMNLP-CoNLL, 2007.Google Scholar
- L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In ICCV, 2007. Google ScholarCross Ref
- J. Chang and D. Blei. Relational topic models for document networks. In AISTATS, 2009.Google Scholar
- J. Chen, K. Li, J. Zhu, and W. Chen. Warplda: a cache efficient o (1) algorithm for latent dirichlet allocation. In VLDB, 2016.Google ScholarDigital Library
- N. Chen, J. Zhu, F. Xia, and B. Zhang. Discriminative relational topic models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37 (5): 973--986, 2015. Google ScholarCross Ref
- W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang. Collaborative filtering for orkut communities: discovery of user latent behavior. In WWW, 2009. Google ScholarDigital Library
- Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521: 452--459, 2015. Google ScholarCross Ref
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1): 5228--5235, 2004. Google ScholarCross Ref
- T. Iwata, T. Yamada, and N. Ueda. Probabilistic latent semantic visualization: topic model for visualizing documents. In KDD, 2008. Google ScholarDigital Library
- Z. G. Kingsley. Selective studies and the principle of relative frequency in language, 1932.Google Scholar
- A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In KDD, 2014. Google ScholarDigital Library
- J. D. O. Mark Harris, Shubhabrata Sengupta. Parallel prefix sum (scan) with cuda. http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html, 2007.Google Scholar
- ]segreduceNVIDIA. Segmented reduction. https://nvlabs.github.io/moderngpu/segreduce.html, 2013\natexlaba.Google Scholar
- ]segsortNVIDIA. Segmented sort and locality sort. https://nvlabs.github.io/moderngpu/segsort.html, 2013\natexlabb.Google Scholar
- NVIDIA. Cuda c programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions, 2015.Google Scholar
- J.-B. Tristan, J. Tassarotti, and G. Steele. Efficient training of lda on a gpu by mean-for-mode estimation. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 59--68, 2015.Google Scholar
- A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3 (3): 253--256, 1977. Google ScholarDigital Library
- H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1105--1112. ACM, 2009. Google ScholarDigital Library
- Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, J. Zeng, Q. Yang, et al. Towards topic modeling for big data. ACM Transactions on Intelligent Systems and Technology, 2014.Google Scholar
- F. Yan, N. Xu, and Y. Qi. Parallel inference for latent dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems, pages 2134--2142, 2009.Google Scholar
- L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009. Google ScholarDigital Library
- H.-F. Yu, C.-J. Hsieh, H. Yun, S. Vishwanathan, and I. S. Dhillon. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web, pages 1340--1350. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T.-Y. Liu, and W.-Y. Ma. Lightlda: Big topic models on modest compute clusters. In WWW, 2015.Google ScholarDigital Library
- M. Zaheer. Dmlc experimental-lda. https://github.com/dmlc/experimental-lda, 2016.Google Scholar
- M. Zaheer, M. Wick, J.-B. Tristan, A. Smola, and G. L. Steele Jr. Exponential stochastic cellular automata for massively parallel inference. In AISTATS, 2015.Google Scholar
- H. Zhao, B. Jiang, J. F. Canny, and B. Jaros. Same but different: Fast and high quality gibbs parameter estimation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1495--1502. ACM, 2015. Google ScholarDigital Library
- J. Zhu, A. Ahmed, and E. Xing. Medlda: maximum margin supervised topic models. Journal of Machine Learning Research, 13: 2237--2278, 2012.Google ScholarDigital Library
- J. Zhu, J. Chen, and W. Hu. Big learning with bayesian methods. pharXiv:1411.6370, 2014.Google Scholar
Index Terms
- SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Recommendations
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
ASPLOS '17Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based ...
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsLatent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Comments