jmlr jmlr2007 jmlr2007-9 jmlr2007-9-reference knowledge-graph by maker-knowledge-mining

9 jmlr-2007-AdaBoost is Consistent

Source: pdf

Author: Peter L. Bartlett, Mikhail Traskin

Abstract: The risk, or probability of error, of the classiﬁer produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1−ε iterations—for sample size n and ε ∈ (0, 1)—the sequence of risks of the classiﬁers it produces approaches the Bayes risk. Keywords: boosting, adaboost, consistency

reference text

Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Discussion of boosting papers. The Annals of Statistics, 32(1):85–91, 2004. Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. Eric Bauer and Ron Kohavi. An empirical comparison of voting classiﬁcation algorithms: Bagging, boosting and variants. Machine Learning, 36:105–139, 1999. Peter J. Bickel, Ya’acov Ritov, and Alon Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, May 2006. Gilles Blanchard, G´ bor Lugosi, and Nicolas Vayatis. On the rate of convergence of regularized a boosting classiﬁers. Journal of Machine Learning Research, 4:861–894, 2003. 2366 A DA B OOST IS C ONSISTENT Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. Leo Breiman. Arcing the edge. Technical Report 486, Department of Statistics, University of California, Berkeley, 1997. Leo Breiman. Prediction games and arcing algorithms. Neural Computation, 11:1493–1517, 1999. (Was Department of Statistics, U.C. Berkeley Technical Report 504, 1997). Leo Breiman. Arcing classiﬁers (with discussion). The Annals of Statistics, 26(3):801–849, 1998. (Was Department of Statistics, U.C. Berkeley Technical Report 460, 1996). Leo Breiman. Population theory for predictor ensembles. The Annals of Statistics, 32(1):1–11, 2004. (See also Department of Statistics, U.C. Berkeley Technical Report 579, 2000). Luc Devroye, L´ szl´ Gy¨ rﬁ, and G´ bor Lugosi. A Probabilistic Theory of Pattern Recognition. a o o a Springer, New York, 1996. Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2):139–158, 2000. Harris Drucker and Corinna Cortes. Boosting decision trees. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 479–485. M.I.T. Press, 1996. Richard M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, Cambridge, MA, 1999. Yoav Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121: 256–285, 1995. Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In 13th International Conference on Machine Learning, pages 148–156, San Francisco, 1996. Morgan Kaufman. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:337–407, 2000. Adam J. Grove and Dale Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence, pages 692–699, Menlo Park, CA, 1998. AAAI Press. Wenxin Jiang. On weak base hypotheses and their implications for boosting regression and classiﬁcation. The Annals of Statistics, 30:51–73, 2002. Wenxin Jiang. Process consistency for AdaBoost. The Annals of Statistics, 32(1):13–29, 2004. Vladimir Koltchinskii and Dmitry Panchenko. Empirical margin distributions and bounding the generalization error of combined classiﬁers. The Annals of Statistics, 30:1–50, 2002. 2367 BARTLETT AND T RASKIN Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer-Verlag, New York, 1991. G´ bor Lugosi and Nicolas Vayatis. On the Bayes-risk consistency of regularized boosting methods. a The Annals of Statistics, 32(1):30–55, 2004. Shie Mannor and Ron Meir. Weak learners and improved rates of convergence in boosting. In Advances in Neural Information Processing Systems, 13, pages 280–286, 2001. Shie Mannor, Ron Meir, and Tong Zhang. Greedy algorithms for classiﬁcation – consistency, convergence rates, and adaptivity. Journal of Machine Learning Research, 4:713–742, 2003. Llew Mason, Jonathan Baxter, Peter L. Bartlett, and Marcus Frean. Boosting algorithms as gradient descent. In S.A. Solla, T.K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems, 12, pages 512–518. MIT Press, 2000. David Pollard. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984. David Pollard. Empirical Processes: Theory and Applications. IMS, 1990. J. Ross Quinlan. Bagging, boosting, and C4.5. In 13 AAAI Conference on Artiﬁcial Intelligence, pages 725–730, Menlo Park, CA, 1996. AAAI Press. Lev Reyzin and Robert E. Schapire. How boosting the margin can also boost classiﬁer complexity. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 753–760, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-383-2. doi: http://doi.acm.org/10.1145/1143844.1143939. Robert E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990. Robert E. Schapire, Yoav Freund, Peter L. Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:1651–1686, 1998. Xiaotong Shen, George C. Tseng, Xuegong Zhang, and Wing H. Wong. On ψ-learning. Journal of the American Statistical Association, 98(463):724–734, 2003. Tong Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004. Tong Zhang and Bin Yu. Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33:1538–1579, 2005. 2368