nips nips2002 nips2002-110 nips2002-110-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joaquin Quiñonero-candela, Ole Winther
Abstract: In this paper, we consider Tipping’s relevance vector machine (RVM) [1] and formalize an incremental training strategy as a variant of the expectation-maximization (EM) algorithm that we call Subspace EM (SSEM). Working with a subset of active basis functions, the sparsity of the RVM solution will ensure that the number of basis functions and thereby the computational complexity is kept low. We also introduce a mean field approach to the intractable classification model that is expected to give a very good approximation to exact Bayesian inference and contains the Laplace approximation as a special case. We test the algorithms on two large data sets with O(103 − 104 ) examples. The results indicate that Bayesian learning of large data sets, e.g. the MNIST database is realistic.
[1] Michael E. Tipping, “Sparse bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.
[2] Vladimir N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[3] Bernhard Sch¨ lkopf and Alex J. Smola, Learning with Kernels, MIT Press, Cambridge, 2002. o
[4] Carl E. Rasmussen, Evaluation of Gaussian Processes and Other Methods for Non-linear Regression, Ph.D. thesis, Dept. of Computer Science, University of Toronto, 1996.
[5] Chris K. I. Williams and Carl E. Rasmussen, “Gaussian Proceses for Regression,” in Advances in Neural Information Processing Systems, 1996, number 8, pp. 514–520.
[6] D. J. C. Mackay, “Gaussian Processes: A replacement for supervised Neural Networks?,” Tech. Rep., Cavendish Laboratory, Cambridge University, 1997, Notes for a tutorial at NIPS 1997.
[7] Radford M. Neal, Bayesian Learning for Neural Networks, Springer, New York, 1996.
[8] Manfred Opper and Ole Winther, “Gaussian processes for classification: Mean field algorithms,” Neural Computation, vol. 12, pp. 2655–2684, 2000.
[9] Michael Tipping and Anita Faul, “Fast marginal likelihood maximisation for sparse bayesian models,” in International Workshop on Artificial Intelligence and Statistics, 2003.
[10] N. M. Dempster, A.P. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc. B, vol. 39, pp. 185–197, 1977.
[11] Chris Williams and Mathias Seeger, “Using the Nystr¨ m method to speed up kernel machines,” o in Advances in Neural Information Processing Systems, 2001, number 13, pp. 682–688.
[12] Alex J. Smola and Peter L. Bartlett, “Sparse greedy gaussian process regression,” in Advances in Neural Information Processing Systems, 2001, number 13, pp. 619–625.
[13] Lehel Csat´ and Manfred Opper, “Sparse representation for gaussian process models,” in o Advances in Neural Information Processing Systems, 2001, number 13, pp. 444–450.
[14] Volker Tresp, “Mixtures of gaussian processes,” in Advances in Neural Information Processing Systems, 2000, number 12, pp. 654–660.
[15] Carl E. Rasmussen and Zoubin Ghahramani, “Infinite mixtures of gaussian process experts,” in Advances in Neural Information Processing Systems, 2002, number 14.
[16] Joaquin Qui˜ onero-Candela and Lars Kai Hansen, “Time series prediction based on the relen vance vector machine with adaptive kernels,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2002.
[17] Michael E. Tipping, “The relevance vector machine,” in Advances in Neural Information Processing Systems, 2000, number 12, pp. 652–658.
[18] David J. C. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415–447, 1992.
[19] Claus Svarer, Lars K. Hansen, Jan Larsen, and Carl E. Rasmussen, “Designer networks for time series processing,” in IEEE NNSP Workshop, 1993, pp. 78–87.
[20] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Poceedings of the IEEE, 1998, vol. 86, pp. 2278–2324.