nips nips2011 nips2011-28 nips2011-28-reference knowledge-graph by maker-knowledge-mining

28 nips-2011-Agnostic Selective Classification

Source: pdf

Author: Yair Wiener, Ran El-Yaniv

Abstract: For a learning problem whose associated excess loss class is (β, B)-Bernstein, we show that it is theoretically possible to track the same classiﬁcation performance of the best (unknown) hypothesis in our class, provided that we are free to abstain from prediction in some region of our choice. The (probabilistic) volume of this √ rejected region of the domain is shown to be diminishing at rate O(Bθ( 1/m)β ), where θ is Hanneke’s disagreement coefﬁcient. The strategy achieving this performance has computational barriers because it requires empirical error minimization in an agnostic setting. Nevertheless, we heuristically approximate this strategy and develop a novel selective classiﬁcation algorithm using constrained SVMs. We show empirically that the resulting algorithm consistently outperforms the traditional rejection mechanism based on distance from decision boundary. 1

reference text

[1] R. El-Yaniv and Y. Wiener. On the foundations of noise-free selective classiﬁcation. JMLR, 11:1605– 1641, 2010.

[2] C.K. Chow. An optimum character recognition system using decision function. IEEE Trans. Computer, 6(4):247–254, 1957.

[3] C.K. Chow. On optimum recognition error and reject trade-off. IEEE Trans. on Information Theory, 16:41–36, 1970.

[4] S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, pages 353–360, 2007.

[5] S. Hanneke. Theoretical Foundations of Active Learning. PhD thesis, Carnegie Mellon University, 2009.

[6] P.L. Bartlett, S. Mendelson, and P. Philips. Local complexities for empirical risk minimization. In COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 2004.

[7] V. Koltchinskii. 2004 IMS medallion lecture: Local rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics, 34:2593–2656, 2006.

[8] P.L. Bartlett and S. Mendelson. Discussion of ”2004 IMS medallion lecture: Local rademacher complexities and oracle inequalities in risk minimization” by V. koltchinskii. Annals of Statistics, 34:2657–2663, 2006.

[9] A.B. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Annals of Mathematical Statistics, 32:135–166, 2004.

[10] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 49–56. ACM, 2009.

[11] O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. In Advanced Lectures on Machine Learning, volume 3176 of Lecture Notes in Computer Science, pages 169–207. Springer, 2003.

[12] L. Wang. Smoothness, disagreement coefﬁcient, and the label complexity of agnostic active learning. JMLR, pages 2269–2292, 2011.

[13] S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In NIPS, 2007.

[14] A. Beygelzimer, D. Hsu, J. Langford, and T. Zhang. Agnostic active learning without constraints. Advances in Neural Information Processing Systems 23, 2010.

[15] C.C. Chang and C.J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at ”http://www.csie.ntu.edu.tw/ cjlin/libsvm”.

[16] Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu. Support vector machines with a reject option. In NIPS, pages 537–544. MIT Press, 2008.

[17] American Cancer Society. Cancer facts and ﬁgures. 2010.

[18] Y. Freund, Y. Mansour, and R.E. Schapire. Generalization bounds for averaged classiﬁers. Annals of Statistics, 32(4):1698–1722, 2004.

[19] R. Herbei and M.H. Wegkamp. Classiﬁcation with reject option. The Canadian Journal of Statistics, 34(4):709–721, 2006.

[20] P.L. Bartlett and M.H. Wegkamp. Classiﬁcation with a reject option using a hinge loss. Journal of Machine Learning Research, 9:1823–1840, 2008.

[21] M.H. Wegkap. Lasso type classiﬁers with a reject option. Electronic Journal of Statistics, 1:155–168, 2007.

[22] S. Mukherjee, P. Tamayo, D. Slonim, A. Verri, T. Golub, J. P. Mesirov, and T. Poggio. Support vector machine classiﬁcation of microarray data. Technical report, AI Memo 1677, Massachusetts Institute of Technology, 1998.

[23] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classiﬁcation using support vector machines. Machine Learning, pages 389–422, 2002.

[24] S. Mukherjee. Chapter 9. classifying microarray data using support vector machines. In of scientists from the University of Pennsylvania School of Medicine and the School of Engineering and Applied Science. Kluwer Academic Publishers, 2003.

[25] G. Fumera and F. Roli. Support vector machines with embedded reject option. In Pattern Recognition with Support Vector Machines: First International Workshop, pages 811–919, 2002.

[26] R. Sousa, B. Mora, and J.S. Cardoso. An ordinal data method for the classiﬁcation with reject option. In ICMLA, pages 746–750. IEEE Computer Society, 2009.

[27] R. El-Yaniv and Y. Wiener. Active learning via perfect selective classiﬁcation. Accepted to JMLR, 2011. 9