jmlr jmlr2008 jmlr2008-21 jmlr2008-21-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter L. Bartlett, Marten H. Wegkamp
Abstract: We consider the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efficiently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y = 1|X) is unlikely to be close to certain critical values. Keywords: Bayes classifiers, classification, convex surrogate loss, empirical risk minimization, hinge loss, large margin classifiers, margin condition, reject option, support vector machines
J. Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classifiers under margin conditions. Annals of Statistics, 35(2):608–633, 2007. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. G. Blanchard, O. Bousquet, and P. Massart. Statistical performance of support vector machines. Annals of Statistics, 36(2):489–531, 2008. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of recent advances. ESAIM: Probability and Statistics, 9:323–375, 2005. S. Boucheron, O. Bousquet, and G. Lugosi. Introduction to statistical learning theory. In O. Bousquet, U. von Luxburg, and G. R¨ tsch, editors, Advanced Lectures in Machine Learning, pages a 169–207. Springer, 2006. A. Bounsiar, E. Grall, and P. Beauseroy. A kernel based rejection method for supervised classification. International Journal of Computational Intelligence, 3(4):312–321, 2006. C.K. Chow. On optimum error and reject trade-off. IEEE Transactions on Information Theory, 16: 41–46, 1970. D. Cox and F. O’Sullivan. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics, 18:1676–1695, 1990. G. Fumera and F. Roli. Suppport vector machines with embedded reject option. In S. Lee and A. Verri, editors, Pattern Recognition with Support Vector Machines, volume 2388, pages 68–82. Springer, 2002. G. Fumera and F. Roli. Analysis of error-reject trade-off in linearly combined multiple classifiers. Pattern Recognition, 37:1245–1265, 2004. 1838 C LASSIFICATION WITH A R EJECT O PTION USING A H INGE L OSS G. Fumera, F. Roli, and G. Giacinto. Reject option with multiple thresholds. Pattern Recognition, 33:2099–2101, 2000. G. Fumera, I. Pillai, and F. Roli. Classification with reject option in text categorisation systems. In Proceedings of the 12th International Conference on Image Analysis and Processing, pages 582–587. IEEE Computer Society, 2003. M. Golfarelli, D. Maio, and D. Maltoni. On the error-reject trade-off in biometric verification systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:786 –796, 1997. Y. Guo, P.L. Bartlett, J. Shawe-Taylor, and R.C Williamson. Covering numbers for support vector machines. IEEE Transactions on Information Theory, 48(1):239 – 250, 2002. L. Gy¨ rfi, Z. Gy¨ rfi, and I. Vajda. Bayesian decision with rejection. Problems of Control and o o Information Theory, 8:445–452, 1978. L. K. Hansen, C. Lissberg, and P. Salamon. The error-reject tradeoff. Open Systems and Information Dynamics, 4:159–184, 1997. R. Herbei and M. H. Wegkamp. Classification with reject option. Canadian Journal of Statistics, 4 (4):709–721, 2006. ˜ G.S. Kimeldorf and G. Wahba. Some results on tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33:82–95, 1971. A.N. Kolmogorov and V.M. Tichomirov. ε-entropy and ε-capacity of sets in functional spaces. American Mathematical Society Translations, 17:277–364, 1961. C.W. Landgrebe, D.M.J. Tax, P. Paclik, and R.P.W. Duin. The interaction between classification and reject performance for distance-based reject-option classifiers. Pattern Recognition Letters, 27(8):908–917, 2006. P. Massart. Concentration Inequalities and Model Selection, volume 1896. Springer, 2007. B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, 1996. I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. Annals of Statistics, 35(2):575–607, 2007. B. Tarigan and S. A. van de Geer. Classifiers of support vector machine type with regularization. Bernoulli, 12(6):1045–1076, 2006. 1 complexity F. Tortorella. Reducing the classification cost of support vector classifiers through an ROC-based rejection rule. Pattern Analysis and Applications, 7:128 – 143, 2004. A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32: 135–166, 2004. A.W. van der Vaart and J.A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004. 1839 BARTLETT AND W EGKAMP D.X. Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE Transactions on Information Theory, 49(7):1743–1752, 2003. 1840