nips nips2011 nips2011-282 nips2011-282-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matus J. Telgarsky
Abstract: This manuscript considers the convergence rate of boosting under a large class of losses, including the exponential and logistic losses, where the best previous rate of convergence was O(exp(1/✏2 )). First, it is established that the setting of weak learnability aids the entire class, granting a rate O(ln(1/✏)). Next, the (disjoint) conditions under which the infimal empirical risk is attainable are characterized in terms of the sample and weak learning class, and a new proof is given for the known rate O(ln(1/✏)). Finally, it is established that any instance can be decomposed into two smaller instances resembling the two preceding special cases, yielding a rate O(1/✏), with a matching lower bound for the logistic loss. The principal technical hurdle throughout this work is the potential unattainability of the infimal empirical risk; the technique for overcoming this barrier may be of general interest. 1
[1] Robert E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, July 1990.
[2] Yoav Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.
[3] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997.
[4] Leo Breiman. Prediction games and arcing algorithms. Neural Computation, 11:1493–1517, October 1999.
[5] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28, 1998.
[6] Peter J. Bickel, Yaacov Ritov, and Alon Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, 2006.
[7] Z. Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72:7–35, 1992.
[8] Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1-3):253–285, 2002.
[9] Gunnar R¨ tsch, Sebastian Mika, and Manfred K. Warmuth. On the convergence of leveraging. a In NIPS, pages 487–494, 2001.
[10] Indraneel Mukherjee, Cynthia Rudin, and Robert Schapire. The convergence rate of AdaBoost. In COLT, 2011.
[11] Robert E. Schapire. The convergence rate of AdaBoost. In COLT, 2010.
[12] Robert E. Schapire, Yoav Freund, Peter Barlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML, pages 322–330, 1997.
[13] Gunnar R¨ tsch and Manfred K. Warmuth. Maximizing the margin with boosting. In COLT, a pages 334–350, 2002.
[14] Manfred K. Warmuth, Karen A. Glocer, and Gunnar R¨ tsch. Boosting algorithms for maxia mizing the soft margin. In NIPS, 2007.
[15] Shai Shalev-Shwartz and Yoram Singer. On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. In COLT, pages 311–322, 2008.
[16] Llew Mason, Jonathan Baxter, Peter L. Bartlett, and Marcus R. Frean. Functional gradient techniques for combining hypotheses. In A.J. Smola, P.L. Bartlett, B. Sch¨ lkopf, and D. Schuo urmans, editors, Advances in Large Margin Classifiers, pages 221–246, Cambridge, MA, 2000. MIT Press.
[17] Stephen P. Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[18] Jean-Baptiste Hiriart-Urruty and Claude Lemar´ chal. e Springer Publishing Company, Incorporated, 2001. Fundamentals of Convex Analysis.
[19] Jonathan Borwein and Adrian Lewis. Convex Analysis and Nonlinear Optimization. Springer Publishing Company, Incorporated, 2000.
[20] Robert E. Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, 1999.
[21] George B. Dantzig and Mukund N. Thapa. Linear Programming 2: Theory and Extensions. Springer, 2003.
[22] Adi Ben-Israel. Motzkin’s transposition theorem, and the related theorems of Farkas, Gordan and Stiemke. In M. Hazewinkel, editor, Encyclopaedia of Mathematics, Supplement III. 2002. 9