jmlr jmlr2012 jmlr2012-9 jmlr2012-9-reference knowledge-graph by maker-knowledge-mining

9 jmlr-2012-A Topic Modeling Toolbox Using Belief Propagation


Source: pdf

Author: Jia Zeng

Abstract: Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational biology. This paper introduces a topic modeling toolbox (TMBP) based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by MEX C++/Matlab/Octave for either Windows 7 or Linux. Compared with existing topic modeling packages, the novelty of this toolbox lies in the BP algorithms for learning LDA-based topic models. The current version includes BP algorithms for latent Dirichlet allocation (LDA), authortopic models (ATM), relational topic models (RTM), and labeled LDA (LaLDA). This toolbox is an ongoing project and more BP-based algorithms for various topic models will be added in the near future. Interested users may also extend BP algorithms for learning more complicated topic models. The source codes are freely available under the GNU General Public Licence, Version 1.0 at https://mloss.org/software/view/399/. Keywords: topic models, belief propagation, variational Bayes, Gibbs sampling


reference text

A. Asuncion. Approximate mean field for Dirichlet-based models. In ICML Workshop on Topic Models, 2010. A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In UAI, pages 27–34, 2009. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3: 993–1022, 2003. J. Chang and D. M. Blei. Hierarchical relational models for document networks. Annals of Applied Statistics, 4(1):124–150, 2010. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc. Natl. Acad. Sci., 101:5228–5235, 2004. F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Inform. Theory, 47(2):498–519, 2001. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Empirical Methods in Natural Language Processing, pages 248–256, 2009. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, pages 487–494, 2004. J. Zeng, W. K. Cheung, and J. Liu. Learning topic models by belief propagation. arXiv:1109.3437v4 [cs.LG], 2011. J. Zeng, Z.-Q. Liu, and X.-Q. Cao. arXiv:1204.0170v1 [cs.LG], 2012a. A new approch to speeding up topic modeling. J. Zeng, Z.-Q. Liu, and X.-Q. Cao. Memory-efficient topic modeling. arXiv:1206.1147v1 [cs.LG], 2012b. J. Zeng, Z.-Q. Liu, and X.-Q. Cao. arXiv:1204.6610v1 [cs.LG], 2012c. Residual belief propagation for topic modeling. 2236