jmlr jmlr2010 jmlr2010-85 jmlr2010-85-reference knowledge-graph by maker-knowledge-mining

85 jmlr-2010-On the Foundations of Noise-free Selective Classification

Source: pdf

Author: Ran El-Yaniv, Yair Wiener

Abstract: We consider selective classiﬁcation, a term we adopt here to refer to ‘classiﬁcation with a reject option.’ The essence in selective classiﬁcation is to trade-off classiﬁer coverage for higher accuracy. We term this trade-off the risk-coverage (RC) trade-off. Our main objective is to characterize this trade-off and to construct algorithms that can optimally or near optimally achieve the best possible trade-offs in a controlled manner. For noise-free models we present in this paper a thorough analysis of selective classiﬁcation including characterizations of RC trade-offs in various interesting settings. Keywords: classiﬁcation with a reject option, selective classiﬁcation, perfect learning, high performance classiﬁcation, risk-coverage trade-off

reference text

M. Anthony and P.L. Bartlett. Neural Network Learning; Theoretical Foundations. Cambridge University Press, 1999. A. Antos and G. Lugosi. Strong minimax lower bounds for learning. Machine Learning, 30(1): 31–56, 1998. L. Atlas, D. Cohn, R. Ladner, A.M. El-Sharkawi, and R.J. Marks. Training connectionist networks with queries and selective sampling. In Neural Information Processing Systems (NIPS), pages 566–573, 1990. P.L. Bartlett and M.H. Wegkamp. Classiﬁcation with a reject option using a hinge loss. Technical report M980, Department of Statistics, Florida State University, 2007. J.L. Bentley, H.T. Kung, M. Schkolnick, and C.D. Thompson. On the average number of maxima in a set of vectors and applications. Journal of the ACM, 25, 1978. A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Chervonenkis dimension. Journal of the ACM, 36, 1989. Learnability and the Vapnik- A. Bounsiar, E. Grall, and P. Beauseroy. A kernel based rejection method for supervised classiﬁcation. International Journal of Computational Intelligence, 3:312–321, 2006. C.K. Chow. An optimum character recognition system using decision function. IEEE Transactions on Computers, 6(4):247–254, 1957. C.K. Chow. On optimum recognition error and reject trade-off. IEEE Transactions on Information Theory, 16:41–36, 1970. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994. 1639 E L -YANIV AND W IENER T.M. Cover and P. Hart. Neighbor pattern classiﬁcation. IEEE Transactions on Information Theory, 13(1):21–27, 1967. Y. Freund, Y. Mansour, and R.E. Schapire. Generalization bounds for averaged classiﬁers. Annals of Statistics, 32(4):1698–1722, 2004. E. Friedman. Active learning for smooth problems. In Proceedings of the 22nd Annual Conference on Learning Theory, 2009. G. Fumera and F. Roli. Support vector machines with embedded reject option. In Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines, pages 811–919, 2002. G. Fumera, F. Roli, and G. Giacinto. Multiple reject thresholds for improving classiﬁcation reliability. In Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition, pages 863–871, 2000. B. Hanczar and E.R. Dougherty. Classiﬁcation with reject option in gene expression data. Bioinformatics, 24(17):1889–1895, 2008. S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 353–360, 2007. S. Hanneke. Theoretical Foundations of Active Learning. PhD thesis, Carnegie Mellon University, 2009. D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artiﬁcial Intelligence, 36:177–221, 1988. M.E. Hellman. The nearest neighbor classiﬁcation rule with a reject option. IEEE Transactions on Systems, Man and Cybernetics, 6:179–185, 1970. R. Herbei and M.H. Wegkamp. Classiﬁcation with reject option. The Canadian Journal of Statistics, 34(4):709–721, 2006. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, March 1963. T.C.W. Landgrebe, D.M.J. Tax, P. Pacl´k, and R.P.W. Duin. The interaction between classiﬁcation ı and reject performance for distance-based reject-option classiﬁers. Pattern Recognition Letters, 27(8):908–917, 2006. J. Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6:273–306, 2005. P. S. Meltzer, J. Khan, J. S. Wei, M Ringn´ r, L. H. Saal, M Ladanyi, F Westermann, F Berthold, e M Schwab, C. R. Antonescu, and C Peterson. Classiﬁcation and diagnostic prediction of cancers using gene expression proﬁling and artiﬁcial neural networks. Nature Medicine, 7(6), June 2001. T. Mitchell. Version spaces: a candidate elimination approach to rule learning. In IJCAI’77: Proceedings of the 5th international joint conference on Artiﬁcial Intelligence, pages 305–310, 1977. 1640 O N THE F OUNDATIONS OF N OISE - FREE S ELECTIVE C LASSIFICATION T. Pietraszek. Optimizing abstaining classiﬁers using ROC analysis. In Proceedings of the TwentySecond International Conference on Machine Learning(ICML), pages 665–672, 2005. F.P. Preparata and M.I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1990. C.M. Santos-Pereira and A.M. Pires. On optimal reject rules and ROC curves. Pattern Recognition Letters, 26(7):943–952, 2005. F. Tortorella. An optimal reject rule for binary classiﬁers. Lecture Notes in Computer Science, 1876: 611–620, 2001. A.B. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Annals of Statistics, 32(1): 135–166, 2004. V. Vapnik. Statistical Learning Theory. Wiley Interscience, New York, 1998. V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264–280, 1971. M.H. Wegkamp. Lasso type classiﬁers with a reject option. Electronic Journal of Statistics, 1: 155–168, 2007. A.P. Yogananda, M.N. Murthy, and G. Lakshmi. A fast linear separability test by projection of positive points on subspaces. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 713–720, 2007. 1641