jmlr jmlr2007 jmlr2007-75 jmlr2007-75-reference knowledge-graph by maker-knowledge-mining

75 jmlr-2007-Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results


Source: pdf

Author: Peter L. Bartlett, Ambuj Tewari

Abstract: One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions. Keywords: kernel methods, support vector machines, sparseness, estimating conditional probabilities


reference text

Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999. Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Large margin classifiers: Convex loss, low noise and convergence rates. In Advances in Neural Information Processing Systems 16. MIT Press, 2004. Anthony V. Fiacco. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming. Academic Press, New York, 1983. G´ bor Lugosi and Nicolas Vayatis. On the Bayes-risk consistency of regularized boosting methods. a Annals of Statistics, 32(1):30–55, 2004. David Pollard. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984. R Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970. Ingo Steinwart. Sparseness of support vector machines. Journal of Machine Learning Research, 4: 1071–1105, 2003. Ingo Steinwart. Sparseness of support vector machines – some asymptotically sharp bounds. In Advances in Neural Information Processing Systems 16. MIT Press, 2004. Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1):128–142, 2005. 789 BARTLETT AND T EWARI Grace Wahba. Soft and hard classification by reproducing kernel Hilbert space methods. Proceedings of the National Academy of Sciences USA, 99(26):16524–16530, 2002. Tong Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527–550, 2002. Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1):56–85, 2004. 790