jmlr jmlr2010 jmlr2010-85 jmlr2010-85-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ran El-Yaniv, Yair Wiener
Abstract: We consider selective classification, a term we adopt here to refer to ‘classification with a reject option.’ The essence in selective classification is to trade-off classifier coverage for higher accuracy. We term this trade-off the risk-coverage (RC) trade-off. Our main objective is to characterize this trade-off and to construct algorithms that can optimally or near optimally achieve the best possible trade-offs in a controlled manner. For noise-free models we present in this paper a thorough analysis of selective classification including characterizations of RC trade-offs in various interesting settings. Keywords: classification with a reject option, selective classification, perfect learning, high performance classification, risk-coverage trade-off
M. Anthony and P.L. Bartlett. Neural Network Learning; Theoretical Foundations. Cambridge University Press, 1999. A. Antos and G. Lugosi. Strong minimax lower bounds for learning. Machine Learning, 30(1): 31–56, 1998. L. Atlas, D. Cohn, R. Ladner, A.M. El-Sharkawi, and R.J. Marks. Training connectionist networks with queries and selective sampling. In Neural Information Processing Systems (NIPS), pages 566–573, 1990. P.L. Bartlett and M.H. Wegkamp. Classification with a reject option using a hinge loss. Technical report M980, Department of Statistics, Florida State University, 2007. J.L. Bentley, H.T. Kung, M. Schkolnick, and C.D. Thompson. On the average number of maxima in a set of vectors and applications. Journal of the ACM, 25, 1978. A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Chervonenkis dimension. Journal of the ACM, 36, 1989. Learnability and the Vapnik- A. Bounsiar, E. Grall, and P. Beauseroy. A kernel based rejection method for supervised classification. International Journal of Computational Intelligence, 3:312–321, 2006. C.K. Chow. An optimum character recognition system using decision function. IEEE Transactions on Computers, 6(4):247–254, 1957. C.K. Chow. On optimum recognition error and reject trade-off. IEEE Transactions on Information Theory, 16:41–36, 1970. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994. 1639 E L -YANIV AND W IENER T.M. Cover and P. Hart. Neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967. Y. Freund, Y. Mansour, and R.E. Schapire. Generalization bounds for averaged classifiers. Annals of Statistics, 32(4):1698–1722, 2004. E. Friedman. Active learning for smooth problems. In Proceedings of the 22nd Annual Conference on Learning Theory, 2009. G. Fumera and F. Roli. Support vector machines with embedded reject option. In Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines, pages 811–919, 2002. G. Fumera, F. Roli, and G. Giacinto. Multiple reject thresholds for improving classification reliability. In Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition, pages 863–871, 2000. B. Hanczar and E.R. Dougherty. Classification with reject option in gene expression data. Bioinformatics, 24(17):1889–1895, 2008. S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 353–360, 2007. S. Hanneke. Theoretical Foundations of Active Learning. PhD thesis, Carnegie Mellon University, 2009. D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36:177–221, 1988. M.E. Hellman. The nearest neighbor classification rule with a reject option. IEEE Transactions on Systems, Man and Cybernetics, 6:179–185, 1970. R. Herbei and M.H. Wegkamp. Classification with reject option. The Canadian Journal of Statistics, 34(4):709–721, 2006. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, March 1963. T.C.W. Landgrebe, D.M.J. Tax, P. Pacl´k, and R.P.W. Duin. The interaction between classification ı and reject performance for distance-based reject-option classifiers. Pattern Recognition Letters, 27(8):908–917, 2006. J. Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6:273–306, 2005. P. S. Meltzer, J. Khan, J. S. Wei, M Ringn´ r, L. H. Saal, M Ladanyi, F Westermann, F Berthold, e M Schwab, C. R. Antonescu, and C Peterson. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7(6), June 2001. T. Mitchell. Version spaces: a candidate elimination approach to rule learning. In IJCAI’77: Proceedings of the 5th international joint conference on Artificial Intelligence, pages 305–310, 1977. 1640 O N THE F OUNDATIONS OF N OISE - FREE S ELECTIVE C LASSIFICATION T. Pietraszek. Optimizing abstaining classifiers using ROC analysis. In Proceedings of the TwentySecond International Conference on Machine Learning(ICML), pages 665–672, 2005. F.P. Preparata and M.I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1990. C.M. Santos-Pereira and A.M. Pires. On optimal reject rules and ROC curves. Pattern Recognition Letters, 26(7):943–952, 2005. F. Tortorella. An optimal reject rule for binary classifiers. Lecture Notes in Computer Science, 1876: 611–620, 2001. A.B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32(1): 135–166, 2004. V. Vapnik. Statistical Learning Theory. Wiley Interscience, New York, 1998. V. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264–280, 1971. M.H. Wegkamp. Lasso type classifiers with a reject option. Electronic Journal of Statistics, 1: 155–168, 2007. A.P. Yogananda, M.N. Murthy, and G. Lakshmi. A fast linear separability test by projection of positive points on subspaces. In ICML ’07: Proceedings of the 24th international conference on Machine learning, pages 713–720, 2007. 1641