nips nips2008 nips2008-163 nips2008-163-reference knowledge-graph by maker-knowledge-mining

163 nips-2008-On the Efficient Minimization of Classification Calibrated Surrogates


Source: pdf

Author: Richard Nock, Frank Nielsen

Abstract: Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates. In this paper, we propose an algorithm which provably minimizes any classification calibrated surrogate strictly convex and differentiable — a set whose losses span the exponential, logistic and squared losses —, with boosting-type guaranteed convergence rates under a weak learning assumption. A particular subclass of these surrogates, that we call balanced convex surrogates, has a key rationale that ties it to maximum likelihood estimation, zerosum games and the set of losses that satisfy some of the most common requirements for losses in supervised learning. We report experiments on more than 50 readily available domains of 11 flavors of the algorithm, that shed light on new surrogates, and the potential of data dependent strategies to tune surrogates. 1


reference text

[1] A. Banerjee, X. Guo, and H. Wang. On the optimality of conditional expectation as a bregman predictor. IEEE Trans. on Information Theory, 51:2664–2669, 2005.

[2] A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005.

[3] P. Bartlett, M. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the Am. Stat. Assoc., 101:138–156, 2006.

[4] P. Bartlett and M. Traskin. Adaboost is consistent. In NIPS*19, 2006.

[5] L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math. and Math. Phys., 7:200–217, 1967.

[6] M. Collins, R. Schapire, and Y. Singer. Logistic regression, adaboost and Bregman distances. In COLT’00, pages 158–169, 2000.

[7] J. Friedman, T. Hastie, and R. Tibshirani. Additive Logistic Regression : a Statistical View of Boosting. Ann. of Stat., 28:337–374, 2000.

[8] C. Gentile and M. Warmuth. Linear hinge loss and average margin. In NIPS*11, pages 225–231, 1998.

[9] P. Gr¨ nwald and P. Dawid. Game theory, maximum entropy, minimum discrepancy and robust Bayesian u decision theory. Ann. of Statistics, 32:1367–1433, 2004.

[10] M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Journal of Comp. Syst. Sci., 58:109–128, 1999.

[11] K. Matsushita. Decision rule, based on distance, for the classification problem. Ann. of the Inst. for Stat. Math., 8:67–77, 1956.

[12] R. Nock and F. Nielsen. A Real Generalization of discrete AdaBoost. Artif. Intell., 171:25–41, 2007.

[13] R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In COLT’98, pages 80–91, 1998.

[14] M. Warmuth, J. Liao, and G. R¨ tsch. Totally corrective boosting algorithms that maximize the margin. In a ICML’06, pages 1001–1008, 2006.