nips nips2012 nips2012-37 nips2012-37-reference knowledge-graph by maker-knowledge-mining

37 nips-2012-Affine Independent Variational Inference

Source: pdf

Author: Edward Challis, David Barber

Abstract: We consider inference in a broad class of non-conjugate probabilistic models based on minimising the Kullback-Leibler divergence between the given target density and an approximating ‘variational’ density. In particular, for generalised linear models we describe approximating densities formed from an afﬁne transformation of independently distributed latent variables, this class including many well known densities as special cases. We show how all relevant quantities can be efﬁciently computed using the fast Fourier transform. This extends the known class of tractable variational approximations and enables the ﬁtting for example of skew variational densities to the target density. 1

reference text

[1] D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.

[2] D. Barber and C. M. Bishop. Ensemble Learning for Multi-Layer Networks. In Advances in Neural Information Processing Systems, NIPS 10, 1998.

[3] D. Bickson and C. Guestrin. Inference with Multivariate Heavy-Tails in Linear Models. In Advances in Neural Information Processing Systems, NIPS 23. 2010.

[4] C. M. Bishop, N. Lawrence, T. Jaakkola, and M. I. Jordan. Approximating Posterior Distributions in Belief Networks Using Mixtures. In Advances in Neural Information Processing Systems, NIPS 10, 1998.

[5] G. Bouchard and O. Zoeter. Split Variational Inference. In International Conference on Artiﬁcial Intelligence and Statistics, AISTATS, 2009.

[6] R. N. Bracewell. The Fourier Transform and its Applications. McGraw-Hill Book Co, Singapore, 2000.

[7] E. Challis and D. Barber. Concave Gaussian Variational Approximations for Inference in Large-Scale Bayesian Linear Models. In International Conference on Artiﬁcial Intelligence and Statistics, AISTATS, 2011.

[8] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New York, 1991.

[9] J. T. A. S. Ferreira and M. F. J. Steel. A New Class of Skewed Multivariate Distributions with Applications To Regression Analysis. Statistica Sinica, 17:505–529, 2007.

[10] G. Gersman, M. Hoffman, and D. Blei. Nonparametric Variational Inference. In International Conference on Machine Learning, ICML 29, 2012.

[11] M. Girolami. A Variational Method for Learning Sparse and Overcomplete Representations. Neural Computation, 13:2517–2532, 2001.

[12] A. Graves. Practical Variational Inference for Neural Networks. In Advances in Neural Information Processing Systems, NIPS 24, 2011.

[13] A. Honkela and H. Valpola. Unsupervised Variational Bayesian Learning of Nonlinear Models. In Advances in Neural Information Processing Systems, NIPS 17, 2005.

[14] T. Jaakkola and M. Jordan. A Variational Approach to Bayesian Logistic Regression Problems and their Extensions. In Artiﬁcial Intelligence and Statistics, AISTATS 6, 1996.

[15] M. E. Khan, B. Marlin, G. Bouchard, and K. Murphy. Variational Bounds for Mixed-Data Factor Analysis. In Advances in Neural Information Processing Systems, NIPS 23, 2010.

[16] D. A. Knowles and T. Minka. Non-conjugate Variational Message Passing for Multinomial and Binary Regression. In Advances in Neural Information Processing Systems, NIPS 23. 2011.

[17] M. Kuss. Gaussian Process Models for Robust Regression, Classiﬁcation, and Reinforcement Learning. PhD thesis, Technische Universit¨ t Darmstadt, Darmstadt, Germany, 2006. a

[18] H. Nickisch and M. Seeger. Convex Variational Bayesian Inference for Large Scale Generalized Linear Models. In International Conference on Machine Learning, ICML 26, 2009.

[19] J. P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhauser, Boston, 2012. In progress, Chapter 1 online at academic2.american.edu/∼jpnolan.

[20] M. Opper and C. Archambeau. The Variational Gaussian Approximation Revisited. Neural Computation, 21(3):786–792, 2009.

[21] J. Ormerod. Skew-Normal Variational Approximations for Bayesian Inference. Technical Report CRGTR-93-1, School of Mathematics and Statistics, University of Sydney, 2011.

[22] A. Palmer, D. Wipf, K. Kreutz-Delgado, and B. Rao. Variational EM Algorithms for Non-Gaussian Latent Variable Models. In Advances in Neural Information Processing Systems, NIPS 20, 2006.

[23] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[24] P. Ruckdeschel and M. Kohl. General Purpose Convolution Algorithm in S4-Classes by means of FFT. Technical Report 1006.0764v2, arXiv.org, 2010.

[25] S. K. Sahu, D. K. Dey, and M. D. Branco. A New Class of Multivariate Skew Distributions with Applications to Bayesian Regression Models. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 31(2):129–150, 2003.

[26] P. Schaller and G. Temnov. Efﬁcient and precise computation of convolutions: applying FFT to heavy tailed distributions. Computational Methods in Applied Mathematics, 8(2):187–200, 2008.

[27] C. Siddhartha, F. Nardari, and N. Shephard. Markov chain Monte Carlo methods for stochastic volatility models. Journal of Econometrics, 108(2):281–316, 2002.

[28] M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008. 9