nips nips2012 nips2012-355 nips2012-355-reference knowledge-graph by maker-knowledge-mining

355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models


Source: pdf

Author: Chong Wang, David M. Blei

Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1


reference text

[1] Hjort, N., C. Holmes, P. Mueller, et al. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK, 2010.

[2] Antoniak, C. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6):1152–1174, 1974.

[3] Teh, Y., M. Jordan, M. Beal, et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2007.

[4] Andrieu, C., N. de Freitas, A. Doucet, et al. An introduction to MCMC for machine learning. Machine Learning, 50:5–43, 2003.

[5] Jordan, M., Z. Ghahramani, T. Jaakkola, et al. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.

[6] Neal, R. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.

[7] Newman, D., A. Asuncion, P. Smyth, et al. Distributed algorithms for topic models. Journal of Machine Learning Research, 10:1801–1828, 2009.

[8] Smola, A., S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3(1-2):703– 710, 2010.

[9] Ahmed, A., M. Aly, J. Gonzalez, et al. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM). 2012.

[10] Wainwright, M., M. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.

[11] Hoffman, M., D. M. Blei, C. Wang, et al. Stochastic Variational Inference. ArXiv e-prints, 2012.

[12] Hoffman, M., D. Blei, F. Bach. Online inference for latent Drichlet allocation. In Advances in Neural Information Processing Systems (NIPS). 2010.

[13] Wang, C., J. Paisley, D. M. Blei. Online variational inference for the hierarchical Dirichlet process. In International Conference on Artificial Intelligence and Statistics (AISTATS). 2011.

[14] Blei, D., M. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1(1):121–144, 2005.

[15] Kurihara, K., M. Welling, Y. Teh. Collapsed variational Dirichlet process mixture models. In International Joint Conferences on Artificial Intelligence (IJCAI). 2007.

[16] Teh, Y., K. Kurihara, M. Welling. Collapsed variational inference for HDP. In Advances in Neural Information Processing Systems (NIPS). 2007.

[17] Kurihara, K., M. Welling, N. Vlassis. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems (NIPS). 2007.

[18] Wang, C., D. Blei. Variational inference for the nested Chinese restaurant process. In Advances in Neural Information Processing Systems (NIPS). 2009.

[19] Gelman, A., J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge Univ. Press, 2007.

[20] McLachlan, G., D. Peel. Finite mixture models. Wiley-Interscience, 2000. 8

[21] Escobar, M., M. West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588, 1995.

[22] Blei, D., A. Ng, M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[23] Griffiths, T., Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems (NIPS). 2006.

[24] Teh, Y., D. Gorur, Z. Ghahramani. Stick-breaking construction for the Indian buffet process. In International Conference on Artifical Intelligence and Statistics (AISTATS). 2007.

[25] Blei, D., T. Griffiths, M. Jordan. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2):1–30, 2010.

[26] Sethuraman, J. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.

[27] Bishop, C. Pattern Recognition and Machine Learning. Springer New York., 2006.

[28] Zhai, K., J. Boyd-Graber, N. Asadi, et al. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. In International World Wide Web Conference (WWW). 2012.

[29] Sato, M. Online model selection based on the variational Bayes. Neural Computation, 13(7):1649–1681, 2001.

[30] Opper, M., O. Winther. From Naive Mean Field Theory to the TAP Equations, pages 1–19. MIT Press, 2001.

[31] MacKay, D. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

[32] Asuncion, A., M. Welling, P. Smyth, et al. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence (UAI). 2009.

[33] Sato, I., H. Nakagawa. Rethinking collapsed variational Bayes inference for LDA. In International Conference on Machine Learning (ICML). 2012.

[34] Sato, I., K. Kurihara, H. Nakagawa. Practical collapsed variational bayes inference for hierarchical dirichlet process. In International Conference on Knowledge Discovery and Data Mining, KDD, pages 105–113. ACM, New York, NY, USA, 2012.

[35] Minka, T. Divergence measures and message passing. Tech. Rep. TR-2005-173, Microsoft Research, 2005.

[36] Teh, Y., D. Newman, M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS). 2006.

[37] Mimno, D., M. Hoffman, D. Blei. Sparse stochastic inference for latent dirichlet allocation. In International Conference on Machine Learning (ICML). 2012.

[38] Amari, S. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.

[39] Robbins, H., S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):pp. 400–407, 1951.

[40] Chang, J., J. Boyd-Graber, C. Wang, et al. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems (NIPS). 2009.

[41] Griffiths, T., M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences (PNAS), 2004.

[42] Wallach, H., I. Murray, R. Salakhutdinov, et al. Evaluation methods for topic models. In International Conference on Machine Learning (ICML). 2009.

[43] Pitman, J., M. Yor. The two-parameter poisson-dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.

[44] Carlson, A., J. Betteridge, B. Kisiel, et al. Toward an architecture for never-ending language learning. In AAAI Conference on Artificial Intelligence (AAAI). 2010. 9