jmlr jmlr2011 jmlr2011-82 jmlr2011-82-reference knowledge-graph by maker-knowledge-mining

82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

Source: pdf

Author: Pasi Jylänki, Jarno Vanhatalo, Aki Vehtari

Abstract: This paper considers the robust and efﬁcient implementation of Gaussian process regression with a Student-t observation model, which has a non-log-concave likelihood. The challenge with the Student-t model is the analytically intractable inference which is why several approximative methods have been proposed. Expectation propagation (EP) has been found to be a very accurate method in many empirical studies but the convergence of EP is known to be problematic with models containing non-log-concave site functions. In this paper we illustrate the situations where standard EP fails to converge and review different modiﬁcations and alternative algorithms for improving the convergence. We demonstrate that convergence problems may occur during the type-II maximum a posteriori (MAP) estimation of the hyperparameters and show that standard EP may not converge in the MAP values with some difﬁcult data sets. We present a robust implementation which relies primarily on parallel EP updates and uses a moment-matching-based double-loop algorithm with adaptively selected step size in difﬁcult cases. The predictive performance of EP is compared with Laplace, variational Bayes, and Markov chain Monte Carlo approximations. Keywords: Gaussian process, robust regression, Student-t distribution, approximate inference, expectation propagation

reference text

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer Science +Business Media LLC, 2006. Botond Cseke and Tom Heskes. Properties of Bethe free energies and message passing in Gaussian models. Journal of Artiﬁcial Intelligence Research, 41:1–24, 2011. A. Philip Dawid. Posterior expectations for large observations. Biometrika, 60(3):664–667, 1973. Bruno De Finetti. The Bayesian approach to the rejection of outliers. In Proceedings of the fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 199–210. University of California Press, 1961. Jerome H. Friedman. Multivariate adaptive regression splines. Annals of Statistics, 19(1):1–67, 1991. Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, second edition, 2004. John Geweke. Bayesian treatment of the independent Student-t linear model. Journal of Applied Econometrics, 8:519–540, 1993. Charles J. Geyer. Practical markov chain monte carlo. Statistical Science, 7(12):473–483, 1992. Mark N. Gibbs and David J. C. MacKay. Variational Gaussian process classiﬁers. IEEE Transactions on Neural Networks, 11(6):1458–1464, 2000. Paul W. Goldberg, Christopher K. I. Williams, and Christopher M. Bishop. Regression with inputdependent noise: A Gaussian process treatment. In M. I. Jordan, M. J. Kearns, and S. A Solla, editors, Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, MA, 1998. Jos´ Miguel Hern´ ndez-Lobato, Tjeerd Dijkstra, and Tom Heskes. Regulator discovery from gene e a expression time series of malaria parasites: a hierarchical approach. In J. C. Platt, D. Koller, 3254 G AUSSIAN P ROCESS R EGRESSION WITH A S TUDENT-t L IKELIHOOD Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 649–656. MIT Press, Cambridge, MA, 2008. Tom Heskes and Onno Zoeter. Expectation propagation for approximate inference in dynamic Bayesian networks. In A. Darwiche and N. Friedman, editors, Uncertainty in Artiﬁcial Intelligence: Proceedings of the Eighteenth Conference (UAI-2002), pages 216–233. Morgan Kaufmann, San Francisco, CA, 2002. Tom Heskes, Manfred Opper, Wim Wiegerinck, Ole Winther, and Onno Zoeter. Approximate inference techniques with expectation constraints. Journal of Statistical Mechanics: Theory and Experiment, 2005:P11015, 2005. Malte Kuss. Gaussian Process Models for Robust Regression, Classiﬁcation, and Reinforcement Learning. PhD thesis, Technische Universit¨ t Darmstadt, 2006. a Chuanhai Liu and Donald B. Rubin. ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica, 5:19–39, 1995. Thomas P. Minka. A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology, 2001a. Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In Proceedings of the 17th Conference in Uncertainty in Artiﬁcial Intelligence (UAI-2001), pages 362–369. Morgan Kaufmann, San Francisco, CA, 2001b. Thomas P. Minka. Power EP. Technical report, Microsoft Research, Cambridge, 2004. Thomas P. Minka. Divergence measures and message passing. Technical report, Microsoft Research, Cambridge, 2005. Thomas P. Minka and John Lafferty. Expectation-propagation for the generative aspect model. In Proceedings of the 18th Conference on Uncertainty in Artiﬁcial Intelligence (UAI-2002), pages 352–359. Morgan Kaufmann, San Francisco, CA, 2002. Andrew Naish-Guzman and Sean Holden. Robust regression with twinned Gaussian processes. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1065–1072. MIT Press, Cambridge, MA, 2008. Radford M. Neal. Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classiﬁcation. Technical Report 9702, Dept. of statistics and Dept. of Computer Science, University of Toronto, 1997. Radford M. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, 2001. Hannes Nickisch and Carl E. Rasmussen. Approximations for binary Gaussian process classiﬁcation. Journal of Machine Learning Research, 9:2035–2078, 2008. Anthony O’Hagan. On outlier rejection phenomena in Bayes inference. Journal of the Royal Statistical Society (Series B), 41(3):358–367, 1979. 3255 ¨ J YL ANKI , VANHATALO AND V EHTARI Manfred Opper and C´ dric Archambeau. The variational Gaussian approximation revisited. Neural e Computation, 21(3):786–792, 2009. Manfred Opper and Ole Winther. Expectation consistent approximate inference. Journal of Machine Learning Research, 6:2177–2204, 2005. Carl E. Rasmussen and Hannes Nickisch. Gaussian processes for machine learning (GPML) toolbox. Journal of Machine Learning Research, 11:3011–3015, 2010. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. H˚ vard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gausa sian models by using integrated nested Laplace approximations. Journal of the Royal statistical Society (Series B), 71(2):1–35, 2009. Matthias Seeger. Expectation propagation for exponential families. Technical report, Max Planck Institute for Biological Cybernetics, T¨ bingen, Germany, 2005. u Matthias Seeger. Bayesian inference and optimal design for the sparse linear model. Journal of Machine Learning Research, 9:759–813, 2008. Matthias Seeger and Hannes Nickisch. Fast convergent algorithms for expectation propagation approximate Bayesian inference. In Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics, volume 15, pages 652–660. JMLR W&CP;, 2011. Lawrence F. Shampine. Vectorized adaptive quadrature in MATLAB. Journal of Computational and Applied Mathematics, 211:131–140, 2008. Oliver Stegle, Sebastian V. Fallert, David J. C. MacKay, and Søren Brage. Gaussian process robust regression for noisy heart rate data. Biomedical Engineering, IEEE Transactions on, 55(9):2143– 2151, 2008. Luke Tierney and Joseph B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393):82–86, 1986. Michael E. Tipping and Neil D. Lawrence. A variational approach to robust Bayesian interpolation. In In Proceedings of the IEEE International Workshop on Neural Networks for Signal Processing, pages 229–238. IEEE, 2003. Michael E. Tipping and Neil D. Lawrence. Variational inference for Student-t models: Robust bayesian interpolation and generalised component analysis. Neurocomputing, 69:123–141, 2005. Marcel van Gerven, Botond Cseke, Robert Oostenveld, and Tom Heskes. Bayesian source localization with the multivariate Laplace prior. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1901–1909, 2009. Jarno Vanhatalo and Aki Vehtari. Sparse log Gaussian processes via MCMC for spatial epidemiology. JMLR Workshop and Conference Proceedings, 1:73–89, 2007. 3256 G AUSSIAN P ROCESS R EGRESSION WITH A S TUDENT-t L IKELIHOOD Jarno Vanhatalo, Pasi Jyl¨ nki, and Aki Vehtari. Gaussian process regression with Student-t likea lihood. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1910–1918, 2009. Aki Vehtari and Jouko Lampinen. Bayesian model assessment and comparison using crossvalidation predictive densities. Neural Computation, 14(10):2439–2468, 2002. Mike West. Outlier models and prior distributions in Bayesian linear regression. Journal of the Royal Statistical Society (Series B), 46(3):431–439, 1984. Christopher K. I. Williams and David Barber. Bayesian classiﬁcation with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998. I-Cheng Yeh. Modeling of strength of high performance concrete using artiﬁcial neural networks. Cement and Concrete Research, 28(12):1797–1808, 1998. 3257