jmlr jmlr2013 jmlr2013-121 jmlr2013-121-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chong Wang, David M. Blei
Abstract: Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method
A. Ahmed and E. Xing. On tight approximate inference of the logistic normal topic admixture model. In Workshop on Artificial Intelligence and Statistics, 2007. J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B, 44(2):139–177, 1982. C. Archambeau, S. Guo, and O. Zoeter. Sparse Bayesian multi-task learning. In Advances in Neural Information Processing Systems, 2011. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Maching Learning, 73:243–272, December 2008. A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence, 2009. H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems, 2000. D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012. M. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003. J. Bernardo and A. Smith. Bayesian Theory. John Wiley & Sons Ltd., Chichester, 1994. D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999. P. Bickel and K. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, volume 1. Pearson Prentice Hall, Upper Saddle River, NJ, 2nd edition, 2007. C. Bishop. Variational principal components. In International Conference on Artificial Neural Networks, volume 1, pages 509–514. IET, 1999. C. Bishop. Pattern Recognition and Machine Learning. Springer New York., 2006. C. Bishop, D. Spiegelhalter, and J. Winn. VIBES: A variational inference engine for Bayesian networks. In Advances in Neural Information Processing Systems, 2003. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012. D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning, 2006. D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1): 17–35, 2007. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003. 1029 WANG AND B LEI M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004. M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. Journal of the American Statistical Association, 2010. L. Brown. Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics, Hayward, CA, 1986. B. Carlin and N. Polson. Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics, 19(4):399–405, 1991. J. Clinton, S. Jackman, and D. Rivers. The statistical analysis of roll call data. American Political Science Review, 98(2):355–370, 2004. A. Corduneanu and C. Bishop. Variational Bayesian model selection for mixture distributions. In International Conference on Artifical Intelligence and Statistics, 2001. W. Croft and J. Lafferty. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Norwell, MA, USA, 2003. A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems, 2001. J. Fox. Bayesian Item Response Modeling: Theory and Applications. Springer Verlag, 2010. A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2007. S. Gershman, M. Hoffman, and D. Blei. Nonparametric variational inference. In International Conference on Machine Learning, 2012. Z. Ghahramani and M. Jordan. Factorial hidden Markov models. Machine Learning, 31(1), 1997. A. Honkela and H. Valpola. Unsupervised variational Bayesian learning of nonlinear models. In Advances in Neural Information Processing Systems, 2004. T. Jaakkola and M. Jordan. Bayesian logistic regression: A variational approach. In Artificial Intelligence and Statistics, 1997. M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999. M. Khan, B. Marlin, G. Bouchard, and K. Murphy. Variational bounds for mixed-data factor analysis. In Advances in Neural Information Processing Systems, 2010. D. Knowles and T. Minka. Non-conjugate variational message passing for multinomial and binary regression. In Neural Information Processing Systems, 2011. S. Kotz, N. Balakrishnan, and N. Johnson. Continuous Multivariate Distributions, Models and Applications, volume 334. Wiley-Interscience, 2000. 1030 VARIATIONAL I NFERENCE IN N ONCONJUGATE M ODELS D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 1992. P. McCullagh and J. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989. D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with Dirichletmultinomial regression. In Uncertainty in Artificial Intelligence, 2008. T. Minka. Expectation propagation for approximate Bayesian inference. In Uncertainty in Artificial Intelligence, 2001. T. Minka, J. Winn, J. Guiver, and D. Knowles. Infer.NET 2.4, 2010. Microsoft Research Cambridge. http://research.microsoft.com/infernet. J. Paisley, D. Blei, and M. Jordan. Stick-breaking beta processes and the Poisson process. In Artificial Intelligence and Statistics, 2012a. J. Paisley, C. Wang, and D. Blei. The discrete infinite logistic normal distribution. Bayesian Analysis, 7(2):235–272, 2012b. H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society, Series B (Methodological), 71(2):319–392, 2009. A. Smola, V. Vishwanathan, and E. Eskin. Laplace propagation. In Advances in Neural Information Processing Systems, 2003. L. Tierney, R. Kass, and J. Kadane. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of American Statistical Association, 84(407), 1989. M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008. C. Wang, D. Blei, and D. Heckerman. Continuous time dynamic topic models. In Uncertainty in Artificial Intelligence, 2008. M. Wells. Generalized linear models: A Bayesian perspective. Journal of American Statistical Association, 96(453):339–355, 2001. E. Xing. On topic evolution. CMU-ML TR-05-115, 2005. E. Xing, M. Jordan, and S. Russell. A generalized mean field algorithm for variational inference in exponential families. In Uncertainty in Artificial Intelligence, 2003. Y. Xue, D. Dunson, and L. Carin. The matrix stick-breaking process for flexible multi-task learning. In International Conference on Machine Learning, 2007. 1031