nips nips2011 nips2011-28 nips2011-28-reference knowledge-graph by maker-knowledge-mining

28 nips-2011-Agnostic Selective Classification


Source: pdf

Author: Yair Wiener, Ran El-Yaniv

Abstract: For a learning problem whose associated excess loss class is (β, B)-Bernstein, we show that it is theoretically possible to track the same classification performance of the best (unknown) hypothesis in our class, provided that we are free to abstain from prediction in some region of our choice. The (probabilistic) volume of this √ rejected region of the domain is shown to be diminishing at rate O(Bθ( 1/m)β ), where θ is Hanneke’s disagreement coefficient. The strategy achieving this performance has computational barriers because it requires empirical error minimization in an agnostic setting. Nevertheless, we heuristically approximate this strategy and develop a novel selective classification algorithm using constrained SVMs. We show empirically that the resulting algorithm consistently outperforms the traditional rejection mechanism based on distance from decision boundary. 1


reference text

[1] R. El-Yaniv and Y. Wiener. On the foundations of noise-free selective classification. JMLR, 11:1605– 1641, 2010.

[2] C.K. Chow. An optimum character recognition system using decision function. IEEE Trans. Computer, 6(4):247–254, 1957.

[3] C.K. Chow. On optimum recognition error and reject trade-off. IEEE Trans. on Information Theory, 16:41–36, 1970.

[4] S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, pages 353–360, 2007.

[5] S. Hanneke. Theoretical Foundations of Active Learning. PhD thesis, Carnegie Mellon University, 2009.

[6] P.L. Bartlett, S. Mendelson, and P. Philips. Local complexities for empirical risk minimization. In COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 2004.

[7] V. Koltchinskii. 2004 IMS medallion lecture: Local rademacher complexities and oracle inequalities in risk minimization. Annals of Statistics, 34:2593–2656, 2006.

[8] P.L. Bartlett and S. Mendelson. Discussion of ”2004 IMS medallion lecture: Local rademacher complexities and oracle inequalities in risk minimization” by V. koltchinskii. Annals of Statistics, 34:2657–2663, 2006.

[9] A.B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Mathematical Statistics, 32:135–166, 2004.

[10] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 49–56. ACM, 2009.

[11] O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. In Advanced Lectures on Machine Learning, volume 3176 of Lecture Notes in Computer Science, pages 169–207. Springer, 2003.

[12] L. Wang. Smoothness, disagreement coefficient, and the label complexity of agnostic active learning. JMLR, pages 2269–2292, 2011.

[13] S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In NIPS, 2007.

[14] A. Beygelzimer, D. Hsu, J. Langford, and T. Zhang. Agnostic active learning without constraints. Advances in Neural Information Processing Systems 23, 2010.

[15] C.C. Chang and C.J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at ”http://www.csie.ntu.edu.tw/ cjlin/libsvm”.

[16] Y. Grandvalet, A. Rakotomamonjy, J. Keshet, and S. Canu. Support vector machines with a reject option. In NIPS, pages 537–544. MIT Press, 2008.

[17] American Cancer Society. Cancer facts and figures. 2010.

[18] Y. Freund, Y. Mansour, and R.E. Schapire. Generalization bounds for averaged classifiers. Annals of Statistics, 32(4):1698–1722, 2004.

[19] R. Herbei and M.H. Wegkamp. Classification with reject option. The Canadian Journal of Statistics, 34(4):709–721, 2006.

[20] P.L. Bartlett and M.H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9:1823–1840, 2008.

[21] M.H. Wegkap. Lasso type classifiers with a reject option. Electronic Journal of Statistics, 1:155–168, 2007.

[22] S. Mukherjee, P. Tamayo, D. Slonim, A. Verri, T. Golub, J. P. Mesirov, and T. Poggio. Support vector machine classification of microarray data. Technical report, AI Memo 1677, Massachusetts Institute of Technology, 1998.

[23] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, pages 389–422, 2002.

[24] S. Mukherjee. Chapter 9. classifying microarray data using support vector machines. In of scientists from the University of Pennsylvania School of Medicine and the School of Engineering and Applied Science. Kluwer Academic Publishers, 2003.

[25] G. Fumera and F. Roli. Support vector machines with embedded reject option. In Pattern Recognition with Support Vector Machines: First International Workshop, pages 811–919, 2002.

[26] R. Sousa, B. Mora, and J.S. Cardoso. An ordinal data method for the classification with reject option. In ICMLA, pages 746–750. IEEE Computer Society, 2009.

[27] R. El-Yaniv and Y. Wiener. Active learning via perfect selective classification. Accepted to JMLR, 2011. 9