jmlr jmlr2012 jmlr2012-8 jmlr2012-8-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matus Telgarsky
Abstract: Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O (ln(1/ε)) iterations suffice to produce a predictor with empirical risk ε-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O (ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O (1/ε), with a matching lower bound provided for the logistic loss. Keywords: boosting, convex analysis, weak learnability, coordinate descent, maximum entropy
Adi Ben-Israel. Motzkin’s transposition theorem, and the related theorems of Farkas, Gordan and Stiemke. In M. Hazewinkel, editor, Encyclopaedia of Mathematics, Supplement III. 2002. Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 2 edition, 1999. Peter J. Bickel, Yaacov Ritov, and Alon Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, 2006. Jonathan Borwein and Adrian Lewis. Convex Analysis and Nonlinear Optimization. Springer Publishing Company, Incorporated, 2000. St´ phane Boucheron, Olivier Bousquet, and Gabor Lugosi. Theory of classification: A survey of e some recent advances. ESAIM: Probability and Statistics, 9:323–375, 2005. Stephen P. Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Leo Breiman. Prediction games and arcing algorithms. Neural Computation, 11:1493–1517, October 1999. Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1-3):253–285, 2002. George B. Dantzig and Mukund N. Thapa. Linear Programming 2: Theory and Extensions. Springer, 2003. Yoav Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121 (2):256–285, 1995. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, 1997. 605 T ELGARSKY Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000. Jean-Baptiste Hiriart-Urruty and Claude Lemar´ chal. Fundamentals of Convex Analysis. Springer e Publishing Company, Incorporated, 2001. Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In FOCS, pages 538– 545, 1995. Jyrki Kivinen and Manfred K. Warmuth. Boosting as entropy projection. In COLT, pages 134–144, 1999. Zhi-Quan Luo and Paul Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72:7–35, 1992. Llew Mason, Jonathan Baxter, Peter L. Bartlett, and Marcus R. Frean. Functional gradient techniques for combining hypotheses. In A.J. Smola, P.L. Bartlett, B. Sch¨ lkopf, and D. Schuuro mans, editors, Advances in Large Margin Classifiers, pages 221–246, Cambridge, MA, 2000. MIT Press. Indraneel Mukherjee, Cynthia Rudin, and Robert Schapire. The convergence rate of AdaBoost. In COLT, 2011. Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, 2 edition, 2006. Gunnar R¨ tsch and Manfred K. Warmuth. Maximizing the margin with boosting. In COLT, pages a 334–350, 2002. Gunnar R¨ tsch, Sebastian Mika, and Manfred K. Warmuth. On the convergence of leveraging. In a NIPS, pages 487–494, 2001. Robert E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, July 1990. Robert E. Schapire. The convergence rate of AdaBoost. In COLT, 2010. Robert E. Schapire and Yoav Freund. Boosting: Foundations and Algorithms. MIT Press, in preparation. Robert E. Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, 1999. Robert E. Schapire, Yoav Freund, Peter Barlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML, pages 322–330, 1997. Shai Shalev-Shwartz and Yoram Singer. On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. In COLT, pages 311–322, 2008. Manfred K. Warmuth, Karen A. Glocer, and Gunnar R¨ tsch. Boosting algorithms for maximizing a the soft margin. In NIPS, 2007. 606