jmlr jmlr2008 jmlr2008-21 jmlr2008-21-reference knowledge-graph by maker-knowledge-mining

21 jmlr-2008-Classification with a Reject Option using a Hinge Loss

Source: pdf

Author: Peter L. Bartlett, Marten H. Wegkamp

Abstract: We consider the problem of binary classiﬁcation where the classiﬁer can, for a particular cost, choose not to classify an observation. Just as in the conventional classiﬁcation problem, minimization of the sample average of the cost is a difﬁcult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efﬁciently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y = 1|X) is unlikely to be close to certain critical values. Keywords: Bayes classiﬁers, classiﬁcation, convex surrogate loss, empirical risk minimization, hinge loss, large margin classiﬁers, margin condition, reject option, support vector machines

reference text

J. Y. Audibert and A. B. Tsybakov. Fast learning rates for plug-in classiﬁers under margin conditions. Annals of Statistics, 35(2):608–633, 2007. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. G. Blanchard, O. Bousquet, and P. Massart. Statistical performance of support vector machines. Annals of Statistics, 36(2):489–531, 2008. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classiﬁcation: a survey of recent advances. ESAIM: Probability and Statistics, 9:323–375, 2005. S. Boucheron, O. Bousquet, and G. Lugosi. Introduction to statistical learning theory. In O. Bousquet, U. von Luxburg, and G. R¨ tsch, editors, Advanced Lectures in Machine Learning, pages a 169–207. Springer, 2006. A. Bounsiar, E. Grall, and P. Beauseroy. A kernel based rejection method for supervised classiﬁcation. International Journal of Computational Intelligence, 3(4):312–321, 2006. C.K. Chow. On optimum error and reject trade-off. IEEE Transactions on Information Theory, 16: 41–46, 1970. D. Cox and F. O’Sullivan. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics, 18:1676–1695, 1990. G. Fumera and F. Roli. Suppport vector machines with embedded reject option. In S. Lee and A. Verri, editors, Pattern Recognition with Support Vector Machines, volume 2388, pages 68–82. Springer, 2002. G. Fumera and F. Roli. Analysis of error-reject trade-off in linearly combined multiple classiﬁers. Pattern Recognition, 37:1245–1265, 2004. 1838 C LASSIFICATION WITH A R EJECT O PTION USING A H INGE L OSS G. Fumera, F. Roli, and G. Giacinto. Reject option with multiple thresholds. Pattern Recognition, 33:2099–2101, 2000. G. Fumera, I. Pillai, and F. Roli. Classiﬁcation with reject option in text categorisation systems. In Proceedings of the 12th International Conference on Image Analysis and Processing, pages 582–587. IEEE Computer Society, 2003. M. Golfarelli, D. Maio, and D. Maltoni. On the error-reject trade-off in biometric veriﬁcation systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:786 –796, 1997. Y. Guo, P.L. Bartlett, J. Shawe-Taylor, and R.C Williamson. Covering numbers for support vector machines. IEEE Transactions on Information Theory, 48(1):239 – 250, 2002. L. Gy¨ rﬁ, Z. Gy¨ rﬁ, and I. Vajda. Bayesian decision with rejection. Problems of Control and o o Information Theory, 8:445–452, 1978. L. K. Hansen, C. Lissberg, and P. Salamon. The error-reject tradeoff. Open Systems and Information Dynamics, 4:159–184, 1997. R. Herbei and M. H. Wegkamp. Classiﬁcation with reject option. Canadian Journal of Statistics, 4 (4):709–721, 2006. ˜ G.S. Kimeldorf and G. Wahba. Some results on tchebychefﬁan spline functions. Journal of Mathematical Analysis and Applications, 33:82–95, 1971. A.N. Kolmogorov and V.M. Tichomirov. ε-entropy and ε-capacity of sets in functional spaces. American Mathematical Society Translations, 17:277–364, 1961. C.W. Landgrebe, D.M.J. Tax, P. Paclik, and R.P.W. Duin. The interaction between classiﬁcation and reject performance for distance-based reject-option classiﬁers. Pattern Recognition Letters, 27(8):908–917, 2006. P. Massart. Concentration Inequalities and Model Selection, volume 1896. Springer, 2007. B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, 1996. I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. Annals of Statistics, 35(2):575–607, 2007. B. Tarigan and S. A. van de Geer. Classiﬁers of support vector machine type with regularization. Bernoulli, 12(6):1045–1076, 2006. 1 complexity F. Tortorella. Reducing the classiﬁcation cost of support vector classiﬁers through an ROC-based rejection rule. Pattern Analysis and Applications, 7:128 – 143, 2004. A. B. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Annals of Statistics, 32: 135–166, 2004. A.W. van der Vaart and J.A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996. T. Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004. 1839 BARTLETT AND W EGKAMP D.X. Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE Transactions on Information Theory, 49(7):1743–1752, 2003. 1840