nips nips2013 nips2013-143 nips2013-143-reference knowledge-graph by maker-knowledge-mining

143 nips-2013-Integrated Non-Factorized Variational Inference

Source: pdf

Author: Shaobo Han, Xuejun Liao, Lawrence Carin

Abstract: We present a non-factorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efﬁcient and possibly parallel computation. Our approach uniﬁes the integrated nested Laplace approximation (INLA) under the variational framework. The proposed method is applicable in more challenging scenarios than typically assumed by INLA, such as Bayesian Lasso, which is characterized by the non-differentiability of the 1 norm arising from independent Laplace priors. We derive an upper bound for the Kullback-Leibler divergence, which yields a fast closed-form solution via decoupled optimization. Our method is a reliable analytic alternative to Markov chain Monte Carlo (MCMC), and it results in a tighter evidence lower bound than that of mean-ﬁeld variational Bayes (VB) method. 1

reference text

[1] D. Gamerman and H. F. Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. Chapman & Hall Texts in Statistical Science Series. Taylor & Francis, 2006.

[2] C. P. Robert and G. Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). SpringerVerlag New York, Inc., Secaucus, NJ, USA, 2005.

[3] R. E. Kass and D. Steffey. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). J. Am. Statist. Assoc., 84(407):717–726, 1989.

[4] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In Learning in graphical models, pages 105–161, Cambridge, MA, 1999. MIT Press.

[5] T. P. Minka. Expectation propagation for approximate Bayesian inference. In J. S. Breese and D. Koller, editors, Proceedings of the 17th Conference in Uncertainty in Artiﬁcial Intelligence, pages 362–369, 2001.

[6] J. T. Ormerod. Skew-normal variational approximations for Bayesian inference. Technical Report CRGTR-93-1, School of Mathematics and Statistics, Univeristy of Sydney, 2011.

[7] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B, 71(2):319– 392, 2009.

[8] J. Hensman, M. Rattray, and N. D. Lawrence. Fast variational inference in the conjugate exponential family. In Advances in Neural Information Processing Systems, 2012.

[9] J. Foulds, L. Boyles, C. Dubois, P. Smyth, and M. Welling. Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. In 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2013.

[10] J. W. Paisley, D. M. Blei, and M. I. Jordan. Variational Bayesian inference with stochastic search. In International Conference on Machine Learning, 2012.

[11] C. Wang and D. M. Blei. Truncation-free online variational inference for Bayesian nonparametric models. In Advances in Neural Information Processing Systems, 2012.

[12] S. J. Gershman, M. D. Hoffman, and D. M. Blei. Nonparametric variational inference. In International Conference on Machine Learning, 2012.

[13] E. Challis and D. Barber. Concave Gaussian variational approximations for inference in large-scale Bayesian linear models. Journal of Machine Learning Research - Proceedings Track, 15:199–207, 2011.

[14] M. E. Khan, S. Mohamed, and K. P. Muprhy. Fast Bayesian inference for non-conjugate Gaussian process regression. In Advances in Neural Information Processing Systems, 2012.

[15] M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.

[16] C. Ritter and M. A. Tanner. Facilitating the Gibbs sampler: The Gibbs stopper and the griddy-Gibbs sampler. J. Am. Statist. Assoc., 87(419):pp. 861–868, 1992.

[17] M. Opper and C. Archambeau. The variational Gaussian approximation revisited. Neural Comput., 21(3):786–792, 2009.

[18] E. Challis and D. Barber. Afﬁne independence variational inference. In Advances in Neural Information Processing Systems, 2012.

[19] C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). SpringerVerlag New York, Inc., Secaucus, NJ, USA, 2006.

[20] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. J. Am. Statist. Assoc., 81:82–86, 1986.

[21] T. Park and G. Casella. The Bayesian Lasso. J. Am. Statist. Assoc., 103(482):681–686, 2008.

[22] D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B, 36(1):pp. 99–102, 1974.

[23] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996.

[24] G. H. Golub and C. V. Loan. Matrix Computations(Third Edition). Johns Hopkins University Press, 1996.

[25] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407– 499, 2004.

[26] C. Hans. Bayesian Lasso regression. Biometrika, 96(4):835–845, 2009.

[27] T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, F. Freha, E. Redwine, and N. Yang. Prostate speciﬁc antigen in the diagnosis and treatment of adenocarcinoma of the prostate. ii. radical prostatectomy treated patients. Journal of Urology, 16:pp. 1076–1083, 1989.

[28] M. W. Seeger and H. Nickisch. Large scale Bayesian inference and experimental design for sparse linear models. SIAM J. Imaging Sciences, 4(1):166–199, 2011.

[29] B. Cseke and T. Heskes. Approximate marginals in latent Gaussian models. J. Mach. Learn. Res., 12:417– 454, 2011. 9