nips nips2012 nips2012-67 nips2012-67-reference knowledge-graph by maker-knowledge-mining

67 nips-2012-Classification Calibration Dimension for General Multiclass Losses


Source: pdf

Author: Harish G. Ramaswamy, Shivani Agarwal

Abstract: We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting. We then introduce the notion of classification calibration dimension of a multiclass loss matrix, which measures the smallest ‘size’ of a prediction space for which it is possible to design a convex surrogate that is classification calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, as one application, we provide a different route from the recent result of Duchi et al. (2010) for analyzing the difficulty of designing ‘low-dimensional’ convex surrogates that are consistent with respect to pairwise subset ranking losses. We anticipate the classification calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems. 1


reference text

[1] G´ bor Lugosi and Nicolas Vayatis. On the bayes-risk consistency of regularized boosting a methods. Annals of Statistics, 32(1):30–55, 2004.

[2] Wenxin Jiang. Process consistency for AdaBoost. Annals of Statistics, 32(1):13–29, 2004.

[3] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1):56–134, 2004.

[4] Ingo Steinwart. Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1):128–142, 2005.

[5] Peter Bartlett, Michael Jordan, and Jon McAuliffe. Convexity, classification and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.

[6] Tong Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5:1225–1251, 2004.

[7] Ambuj Tewari and Peter Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8:1007–1025, 2007.

[8] David Cossock and Tong Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.

[9] Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to learning to rank: Theory and algorithm. In International Conference on Machine Learning, 2008.

[10] John Duchi, Lester Mackey, and Michael Jordan. On the consistency of ranking algorithms. In International Conference on Machine Learning, 2010.

[11] Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. On NDCG consistency of listwise ranking methods. In International Conference on Artificial Intelligence and Statistics(AISTATS), volume 15. JMLR: W&CP;, 2011.

[12] David Buffoni, Cl´ ment Calauz` nes, Patrick Gallinari, and Nicolas Usunier. Learning scoring e e functions with order-preserving losses and standardized supervision. In International Conference on Machine Learning, 2011.

[13] Wei Gao and Zhi-Hua Zhou. On the consistency of multi-label learning. In Conference on Learning Theory, 2011.

[14] Wojciech Kotlowski, Krzysztof Dembczynski, and Eyke Huellermeier. Bipartite ranking through minimization of univariate loss. In International Conference on Machine Learning, 2011.

[15] Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation, 26:225–287, 2007.

[16] Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin Markov networks. In Neural Information Processing Systems, 2003.

[17] Deirdre O’Brien, Maya Gupta, and Robert Gray. Cost-sensitive multi-class classification from probability estimates. In International Conference on Machine Learning, 2008.

[18] Nicolas Lambert and Yoav Shoham. Eliciting truthful answers to multiple-choice questions. In ACM Conference on Electronic Commerce, 2009.

[19] Dimitri Bertsekas, Angelia Nedic, and Asuman Ozdaglar. Convex Analysis and Optimization. Athena Scientific, 2003.

[20] Jean Gallier. Notes on convex sets, polytopes, polyhedra, combinatorial topology, Voronoi diagrams and Delaunay triangulations. Technical report, Department of Computer and Information Science, University of Pennsylvania, 2009.

[21] Elodie Vernet, Robert C. Williamson, and Mark D. Reid. Composite multiclass losses. In Neural Information Processing Systems, 2011. 9