nips nips2001 nips2001-35 nips2001-35-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
[1] C. M. Bishop and M. E . Tipping. Variational relevance vector machines. In C. Boutilier and M. Goldszmidt , editors, Proceedings of th e 16th Conference on Uncertainty in Artificial Intelligence, pages 46- 53. Morgan Kaufmann , 2000.
[2] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, 1995.
[3] Y. Grandvalet. Least absolute shrinkage is equivalent to quadratic penalisation. In L. Niklasson, M. Boden, and T. Ziemske, editors, Proceedings of th e Eighth International Conference on Artificial N eural Networks (ICANN98) , pages 201- 206. Springer, 1998.
[4] D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4(3):415- 447, 1992.
[5] A. J. Smola, B. Scholkopf, and G. Ratsch . Linear programs for automatic accuracy control in regression. In Proceedings of th e Ninth Int ernational Conference on Artificial N eural N etworks, pages 575- 580, 1999.
[6] M. E. Tipping. The Relevance Vector Machine. In S. A. Solla, T . K. Leen , and K.-R. Muller, editors, Advances in N eural Information Processing Systems 12, pages 652- 658. MIT Press, 2000.
[7] M. E. Tipping. Sparse kernel principal component analysis. In Advances in Neural Information Processing Systems 13. MIT Press, 200l.
[8] M. E . Tipping and A. C. Faul. Bayesian pursuit. Submitted to NIPS*Ol.
[9] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.