jmlr jmlr2013 jmlr2013-114 jmlr2013-114-reference knowledge-graph by maker-knowledge-mining

114 jmlr-2013-The Rate of Convergence of AdaBoost

Source: pdf

Author: Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

Abstract: The AdaBoost algorithm was designed to combine many “weak” hypotheses that perform slightly better than random guessing into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss.” Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are ﬁnite. Our ﬁrst result shows that the exponential loss of AdaBoost’s computed parameter vector will be at most ε more than that of any parameter vector of ℓ1 -norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most ε more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss. Keywords: AdaBoost, optimization, coordinate descent, convergence rate

reference text

P. L. Bartlett and M. Traskin. AdaBoost is consistent. Journal of Machine Learning Research, 8: 2347–2368, 2007. P. J. Bickel, Y. Ritov, and A. Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, 2006. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. L. Breiman. Prediction games and arcing classiﬁers. Neural Computation, 11(7):1493–1517, 1999. M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1/2/3), 2002. M. Frean and T. Downs. A simple cost function for boosting. Technical report, Department of Computer Science and Electrical Engineering, University of Queensland, 1998. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, Aug. 1997. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337–374, Apr. 2000. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), Oct. 2001. 2346 T HE R ATE OF C ONVERGENCE OF A DA B OOST A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence, 1998. D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, third edition, 2008. Z. Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, Jan. 1992. L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12, 2000. T. Onoda, G. R¨ tsch, and K.-R. M¨ ller. An asymptotic analysis of AdaBoost in the binary classia u ﬁcation case. In Proceedings of the 8th International Conference on Artiﬁcial Neural Networks, pages 195–200, 1998. G. R¨ tsch and M. K. Warmuth. Efﬁcient margin maximizing with boosting. Journal of Machine a Learning Research, 6:2131–2152, 2005. G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Machine Learning, 42(3): a u 287–320, 2001. G. R¨ tsch, S. Mika, and M. K. Warmuth. On the convergence of leveraging. In Advances in Neural a Information Processing Systems 14, 2002. C. Rudin, I. Daubechies, and R. E. Schapire. The dynamics of AdaBoost: Cyclic behavior and convergence of margins. Journal of Machine Learning Research, 5:1557–1595, 2004. C. Rudin, R. E. Schapire, and I. Daubechies. Analysis of boosting algorithms using the smooth margin function. Annals of Statistics, 35(6):2723–2768, 2007. R. E. Schapire. The convergence rate of AdaBoost. In The 23rd Conference on Learning Theory, 2010. open problem. R. E. Schapire and Y. Singer. Improved boosting algorithms using conﬁdence-rated predictions. Machine Learning, 37(3):297–336, Dec. 1999. R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651–1686, October 1998. S. Shalev-Shwartz and Y. Singer. On the equivalence of weak learnability and linear separability: New relaxations and efﬁcient boosting algorithms. In 21st Annual Conference on Learning Theory, 2008. M. Telgarsky. The convergence rate of AdaBoost and friends. http://arxiv.org/abs/1101.4752, January 2011. T. Zhang and B. Yu. Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33(4):1538–1579, 2005. 2347