jmlr jmlr2005 jmlr2005-14 jmlr2005-14-reference knowledge-graph by maker-knowledge-mining

14 jmlr-2005-Assessing Approximate Inference for Binary Gaussian Process Classification


Source: pdf

Author: Malte Kuss, Carl Edward Rasmussen

Abstract: Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC


reference text

P. Abrahamsen. A review of Gaussian random fields and correlation functions. Technical Report 917, Norwegian Computing Center, Oslo, 1997. C.-C. Chang and C.-J. Lin. LIBSVM: A library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/∼cjlin/libsvm. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131–159, 2002. W. Chu and Z. Ghahramani. Gaussian processes for ordinal regression. Journal of Machine Learning Research, 6:1019–1041, 2005. L. Csat´ and M. Opper. Sparse online Gaussian processes. Neural Computation, 14(2):641–669, o 2002. S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo. Physics Letters B, 195(2):216–222, 1987. A. Gelman and X.-L. Meng. Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185, 1998. M. N. Gibbs and D. J. C. MacKay. Variational Gaussian process classifiers. IEEE Transactions on Neural Networks, 11(6):1458–1464, 2000. G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, second edition, 1989. S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html. R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90 (430):773–795, 1995. N. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: The informative vector machine. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 609–616, Cambridge, MA, 2003. The MIT Press. J. S. Liu. Monte Carlo Strategies in Scientific Computing. Springer, New York, 2001. D. J. C. MacKay. Comparison of approximate methods for handling hyperparameters. Neural Computation, 11(5):1035–1068, 1999. D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, UK, 2003. T. P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Department of Electrical Engineering and Computer Science, MIT, 2001. R. M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, 1993. 1703 K USS AND R ASMUSSEN R. M. Neal. Regression and classification using Gaussian process priors. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 475–501. Oxford University Press, 1998. R. M. Neal. Annealed importance sampling. Statistics and Computing, 11:125–139, 2001. A. O’Hagan. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society, Series B, 40(1):1–42, 1978. M. Opper and O. Winther. Gaussian processes for classification: Mean-field algorithms. Neural Computation, 12(11):2655–2684, 2000. J. C. Platt. Probabilities for SV machines. In A. J. Smola, P. L. Bartlett, B. Sch¨ lkopf, and D. Schuo urmans, editors, Advances in Large Margin Classifiers, pages 61–73. The MIT Press, Cambridge, MA, 2000. C. E. Rasmussen and C. K. I Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, 2006. In press. B. D. Ripley. Pattern Recognition and Neural Newtorks. Cambridge University Press, Cambridge, UK, 1996. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, 2002. o M. Seeger. PAC-Bayesian generalisation error bounds for Gaussian process classification. Journal of Machine Learning Research, 3:233–269, 2002. M. Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003. M. Seeger. Expectation propagation for exponential families, 2005. Note obtainable from http://www.kyb.tuebingen.mpg.de/∼seeger. C. K. I. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998. 1704