nips nips2001 nips2001-29 nips2001-29-reference knowledge-graph by maker-knowledge-mining

29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

Source: pdf

Author: Mário Figueiredo

Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classiﬁcation. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.

reference text

[1] V. Cherkassky and F. Mulier, Learning from Data: Concepts, Theory, and Methods. New York: Wiley, 1998.

[2] N. Cristianini and J. Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

[3] B. Ripley, Pattern Recognition and Neural Networks. Cambridge University Press, 1996.

[4] V. Vapnik, Statistical Learning Theory. New York: John Wiley, 1998.

[5] A. Hoerl and R. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, pp. 55–67, 1970.

[6] C. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995.

[7] R. Neal, Bayesian Learning for Neural Networks. New York: Springer Verlag, 1996.

[8] C. Williams, “Prediction with Gaussian processes: from linear regression to linear prediction and beyond,” in Learning and Inference in Graphical Models, Kluwer, 1998.

[9] C. Williams and D. Barber, “Bayesian classiﬁcation with Gaussian priors,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1342–1351, 1998.

[10] G. Kimeldorf and G. Wahba, “A correspondence between Bayesian estimation of stochastic processes and smoothing by splines,” Annals of Mathematical Statistics, vol. 41, pp. 495–502, 1990.

[11] T. Poggio and F. Girosi, “Networks for approximation and learning,” Proceedings of the IEEE, vol. 78, pp. 1481–1497, 1990.

[12] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal of Scientiﬁc Computation, vol. 20, no. 1, pp. 33–61, 1998.

[13] F. Girosi, “An equivalence between sparse approximation and support vector machines,” Neural Computation, vol. 10, pp. 1445–1480, 1998.

[14] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society (B), vol. 58, 1996.

[15] P. Williams, “Bayesian regularization and pruning using a Laplace prior,” Neural Computation, vol. 7, pp. 117–143, 1995.

[16] K. Lange and J. Sinsheimer, “Normal/independent distributions and their applications in robust regression,” Journal of Computational and Graphical Statistics, vol. 2, pp. 175–198, 1993.

[17] M. Figueiredo and R. Nowak, “Wavelet-based image estimation: an empirical Bayes approach using Jeffreys’ noninformative prior,” IEEE Transactions on Image Processing, vol. 10, pp. 1322-1331, 2001.

[18] J. Berger, Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, 1980.

[19] D. MacKay, “Bayesian non-linear modelling for the 1993 energy prediction competition,” in Maximum Entropy and Bayesian Methods, G. Heidbreder, ed., pp. 221–234, Kluwer, 1996.

[20] C. Bishop and M. Tipping, “Variational relevance vector machines,” in Proceedings of the 16th Conference in Uncertainty in Artiﬁcial Intelligence, pp. 46–53, Morgan Kaufmann, 2000.

[21] M. Tipping, “The relevance vector machine,” in Advances in Neural Information Processing Systems – NIPS 12 (S. Solla, T. Leen, and K.-R. M¨ ller, eds.), pp. 652–658, u MIT Press, 2000.

[22] D. L. Donoho and I. M. Johnstone, “Ideal adaptation via wavelet shrinkage,” Biometrika, vol. 81, pp. 425–455, 1994.

[23] M. Osborne, B. Presnell, and B. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20, pp. 389–404, 2000.

[24] P. McCullagh and J. Nelder, Generalized Linear Models. London: Chapman and Hall, 1989.

[25] J. Albert and S. Chib, “Bayesian analysis of binary and polychotomous response data,” Journal of the American Statistical Association, vol. 88, pp. 669–679, 1993.

[26] M. Seeger, “Bayesian model selection for support vector machines, Gaussian processes and other kernel classiﬁers,” in Advances in Neural Information Processing – NIPS 12 (S. Solla, T. Leen, and K.-R. M¨ ller, eds.), pp. 603–609, MIT Press, 2000. u

[27] C. Williams and M. Seeger, “Using the Nystrom method to speedup kernel machines,” in NIPS 13, MIT Press, 2001.