nips nips2012 nips2012-127 nips2012-127-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Emtiyaz Khan, Shakir Mohamed, Kevin P. Murphy
Abstract: We present a new variational inference algorithm for Gaussian process regression with non-conjugate likelihood functions, with application to a wide array of problems including binary and multi-class classification, and ordinal regression. Our method constructs a concave lower bound that is optimized using an efficient fixed-point updating algorithm. We show that the new algorithm has highly competitive computational complexity, matching that of alternative approximate inference methods. We also prove that the use of concave variational bounds provides stable and guaranteed convergence – a property not available to other approaches. We show empirically for both binary and multi-class classification that our new algorithm converges much faster than existing variational methods, and without any degradation in performance. 1
[1] J. Albert and S. Chib. Bayesian analysis of binary and polychotomous response data. J. of the Am. Stat. Assoc., 88(422):669–679, 1993.
[2] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition, 1999.
[3] D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Proceedings Systems, 2006.
[4] G. Bouchard. Efficient bounds for the softmax and applications to approximate inference in hybrid models. In NIPS 2007 Workshop on Approximate Inference in Hybrid Models, 2007.
[5] M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. Journal of the American Statistical Association, 105(489):324–335, 2010.
[6] E. Challis and D. Barber. Concave Gaussian variational approximations for inference in largescale Bayesian linear models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 6, page 7, 2011.
[7] A. Dempster. Covariance selection. Biometrics, 28(1), 1972.
[8] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432, 2008.
[9] S. Fr¨ hwirth-Schnatter and R. Fr¨ hwirth. Data augmentation and MCMC for binary and multiu u nomial logit models. Statistical Modelling and Regression Structures, pages 111–132, 2010.
[10] M. Girolami and S. Rogers. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comptuation, 18(8):1790 – 1817, 2006.
[11] C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168, 2006.
[12] T. Jaakkola and M. Jordan. A variational approach to Bayesian logistic regression problems and their extensions. In AI + Statistics, 1996.
[13] P. Jyl¨ nki, J. Vanhatalo, and A. Vehtari. Robust Gaussian process regression with a student-t a likelihood. The Journal of Machine Learning Research, 999888:3227–3257, 2011.
[14] M. Khan, S. Mohamed, B. Marlin, and K. Murphy. A stick-breaking likelihood for categorical data analysis with latent Gaussian models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2012.
[15] M. Kuss and C. E. Rasmussen. Assessing approximate inference for binary Gaussian process classification. J. of Machine Learning Research, 6:1679–1704, 2005.
[16] B. Marlin, M. Khan, and K. Murphy. Piecewise bounds for estimating Bernoulli-logistic latent Gaussian models. In Intl. Conf. on Machine Learning, 2011.
[17] T. Minka. Expectation propagation for approximate Bayesian inference. In UAI, 2001.
[18] H. Nickisch and C.E. Rasmussen. Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(10), 2008.
[19] M. Opper and C. Archambeau. The variational Gaussian approximation revisited. Neural computation, 21(3):786–792, 2009.
[20] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[21] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations. J. of Royal Stat. Soc. Series B, 71: 319–392, 2009.
[22] S. L. Scott. Data augmentation, frequentist estimation, and the Bayesian analysis of multinomial logit models. Statistical Papers, 52(1):87–109, 2011.
[23] M. Seeger. Bayesian Inference and Optimal Design in the Sparse Linear Model. J. of Machine Learning Research, 9:759–813, 2008.
[24] M. Seeger and H. Nickisch. Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2011. 9