jmlr jmlr2013 jmlr2013-121 jmlr2013-121-reference knowledge-graph by maker-knowledge-mining

121 jmlr-2013-Variational Inference in Nonconjugate Models

Source: pdf

Author: Chong Wang, David M. Blei

Abstract: Mean-ﬁeld variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-ﬁeld methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest—like the correlated topic model and Bayesian logistic regression—are nonconjugate. In these models, mean-ﬁeld methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for speciﬁc models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. Keywords: variational inference, nonconjugate models, Laplace approximations, the multivariate delta method

reference text

A. Ahmed and E. Xing. On tight approximate inference of the logistic normal topic admixture model. In Workshop on Artiﬁcial Intelligence and Statistics, 2007. J. Aitchison. The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B, 44(2):139–177, 1982. C. Archambeau, S. Guo, and O. Zoeter. Sparse Bayesian multi-task learning. In Advances in Neural Information Processing Systems, 2011. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Maching Learning, 73:243–272, December 2008. A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. In Uncertainty in Artiﬁcial Intelligence, 2009. H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems, 2000. D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012. M. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003. J. Bernardo and A. Smith. Bayesian Theory. John Wiley & Sons Ltd., Chichester, 1994. D. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, 1999. P. Bickel and K. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, volume 1. Pearson Prentice Hall, Upper Saddle River, NJ, 2nd edition, 2007. C. Bishop. Variational principal components. In International Conference on Artiﬁcial Neural Networks, volume 1, pages 509–514. IET, 1999. C. Bishop. Pattern Recognition and Machine Learning. Springer New York., 2006. C. Bishop, D. Spiegelhalter, and J. Winn. VIBES: A variational inference engine for Bayesian networks. In Advances in Neural Information Processing Systems, 2003. D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012. D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning, 2006. D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1): 17–35, 2007. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003. 1029 WANG AND B LEI M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classiﬁcation. Pattern Recognition, 37(9):1757–1771, 2004. M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. Journal of the American Statistical Association, 2010. L. Brown. Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics, Hayward, CA, 1986. B. Carlin and N. Polson. Inference for nonconjugate Bayesian models using the Gibbs sampler. Canadian Journal of Statistics, 19(4):399–405, 1991. J. Clinton, S. Jackman, and D. Rivers. The statistical analysis of roll call data. American Political Science Review, 98(2):355–370, 2004. A. Corduneanu and C. Bishop. Variational Bayesian model selection for mixture distributions. In International Conference on Artiﬁcal Intelligence and Statistics, 2001. W. Croft and J. Lafferty. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Norwell, MA, USA, 2003. A. Elisseeff and J. Weston. A kernel method for multi-labelled classiﬁcation. In Advances in Neural Information Processing Systems, 2001. J. Fox. Bayesian Item Response Modeling: Theory and Applications. Springer Verlag, 2010. A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2007. S. Gershman, M. Hoffman, and D. Blei. Nonparametric variational inference. In International Conference on Machine Learning, 2012. Z. Ghahramani and M. Jordan. Factorial hidden Markov models. Machine Learning, 31(1), 1997. A. Honkela and H. Valpola. Unsupervised variational Bayesian learning of nonlinear models. In Advances in Neural Information Processing Systems, 2004. T. Jaakkola and M. Jordan. Bayesian logistic regression: A variational approach. In Artiﬁcial Intelligence and Statistics, 1997. M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999. M. Khan, B. Marlin, G. Bouchard, and K. Murphy. Variational bounds for mixed-data factor analysis. In Advances in Neural Information Processing Systems, 2010. D. Knowles and T. Minka. Non-conjugate variational message passing for multinomial and binary regression. In Neural Information Processing Systems, 2011. S. Kotz, N. Balakrishnan, and N. Johnson. Continuous Multivariate Distributions, Models and Applications, volume 334. Wiley-Interscience, 2000. 1030 VARIATIONAL I NFERENCE IN N ONCONJUGATE M ODELS D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 1992. P. McCullagh and J. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989. D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with Dirichletmultinomial regression. In Uncertainty in Artiﬁcial Intelligence, 2008. T. Minka. Expectation propagation for approximate Bayesian inference. In Uncertainty in Artiﬁcial Intelligence, 2001. T. Minka, J. Winn, J. Guiver, and D. Knowles. Infer.NET 2.4, 2010. Microsoft Research Cambridge. http://research.microsoft.com/infernet. J. Paisley, D. Blei, and M. Jordan. Stick-breaking beta processes and the Poisson process. In Artiﬁcial Intelligence and Statistics, 2012a. J. Paisley, C. Wang, and D. Blei. The discrete inﬁnite logistic normal distribution. Bayesian Analysis, 7(2):235–272, 2012b. H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society, Series B (Methodological), 71(2):319–392, 2009. A. Smola, V. Vishwanathan, and E. Eskin. Laplace propagation. In Advances in Neural Information Processing Systems, 2003. L. Tierney, R. Kass, and J. Kadane. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of American Statistical Association, 84(407), 1989. M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008. C. Wang, D. Blei, and D. Heckerman. Continuous time dynamic topic models. In Uncertainty in Artiﬁcial Intelligence, 2008. M. Wells. Generalized linear models: A Bayesian perspective. Journal of American Statistical Association, 96(453):339–355, 2001. E. Xing. On topic evolution. CMU-ML TR-05-115, 2005. E. Xing, M. Jordan, and S. Russell. A generalized mean ﬁeld algorithm for variational inference in exponential families. In Uncertainty in Artiﬁcial Intelligence, 2003. Y. Xue, D. Dunson, and L. Carin. The matrix stick-breaking process for ﬂexible multi-task learning. In International Conference on Machine Learning, 2007. 1031