nips nips2002 nips2002-110 nips2002-110-reference knowledge-graph by maker-knowledge-mining

110 nips-2002-Incremental Gaussian Processes

Source: pdf

Author: Joaquin Quiñonero-candela, Ole Winther

Abstract: In this paper, we consider Tipping’s relevance vector machine (RVM) [1] and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call Subspace EM (SSEM). Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean ﬁeld approach to the intractable classiﬁcation model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(103 − 104 ) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.

reference text

[1] Michael E. Tipping, “Sparse bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.

[2] Vladimir N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[3] Bernhard Sch¨ lkopf and Alex J. Smola, Learning with Kernels, MIT Press, Cambridge, 2002. o

[4] Carl E. Rasmussen, Evaluation of Gaussian Processes and Other Methods for Non-linear Regression, Ph.D. thesis, Dept. of Computer Science, University of Toronto, 1996.

[5] Chris K. I. Williams and Carl E. Rasmussen, “Gaussian Proceses for Regression,” in Advances in Neural Information Processing Systems, 1996, number 8, pp. 514–520.

[6] D. J. C. Mackay, “Gaussian Processes: A replacement for supervised Neural Networks?,” Tech. Rep., Cavendish Laboratory, Cambridge University, 1997, Notes for a tutorial at NIPS 1997.

[7] Radford M. Neal, Bayesian Learning for Neural Networks, Springer, New York, 1996.

[8] Manfred Opper and Ole Winther, “Gaussian processes for classiﬁcation: Mean ﬁeld algorithms,” Neural Computation, vol. 12, pp. 2655–2684, 2000.

[9] Michael Tipping and Anita Faul, “Fast marginal likelihood maximisation for sparse bayesian models,” in International Workshop on Artiﬁcial Intelligence and Statistics, 2003.

[10] N. M. Dempster, A.P. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc. B, vol. 39, pp. 185–197, 1977.

[11] Chris Williams and Mathias Seeger, “Using the Nystr¨ m method to speed up kernel machines,” o in Advances in Neural Information Processing Systems, 2001, number 13, pp. 682–688.

[12] Alex J. Smola and Peter L. Bartlett, “Sparse greedy gaussian process regression,” in Advances in Neural Information Processing Systems, 2001, number 13, pp. 619–625.

[13] Lehel Csat´ and Manfred Opper, “Sparse representation for gaussian process models,” in o Advances in Neural Information Processing Systems, 2001, number 13, pp. 444–450.

[14] Volker Tresp, “Mixtures of gaussian processes,” in Advances in Neural Information Processing Systems, 2000, number 12, pp. 654–660.

[15] Carl E. Rasmussen and Zoubin Ghahramani, “Inﬁnite mixtures of gaussian process experts,” in Advances in Neural Information Processing Systems, 2002, number 14.

[16] Joaquin Qui˜ onero-Candela and Lars Kai Hansen, “Time series prediction based on the relen vance vector machine with adaptive kernels,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2002.

[17] Michael E. Tipping, “The relevance vector machine,” in Advances in Neural Information Processing Systems, 2000, number 12, pp. 652–658.

[18] David J. C. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.

[19] Claus Svarer, Lars K. Hansen, Jan Larsen, and Carl E. Rasmussen, “Designer networks for time series processing,” in IEEE NNSP Workshop, 1993, pp. 78–87.

[20] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Poceedings of the IEEE, 1998, vol. 86, pp. 2278–2324.