nips nips2006 nips2006-21 nips2006-21-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter L. Bartlett, Mikhail Traskin
Abstract: The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after nν iterations—for sample size n and ν < 1—the sequence of risks of the classifiers it produces approaches the Bayes risk if Bayes risk L∗ > 0. 1
[1] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
[2] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
[3] Leo Breiman. Arcing classifiers (with discussion). The Annals of Statistics, 26(3):801–849, 1998. (Was Department of Statistics, U.C. Berkeley Technical Report 460, 1996).
[4] Leo Breiman. Some infinite theory for predictor ensembles. Technical Report 579, Department of Statistics, University of California, Berkeley, 2000.
[5] Wenxin Jiang. On weak base hypotheses and their implications for boosting regression and classification. The Annals of Statistics, 30:51–73, 2002.
[6] G´ bor Lugosi and Nicolas Vayatis. On the Bayes-risk consistency of regularized boosting a methods. The Annals of Statistics, 32(1):30–55, 2004.
[7] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56–85, 2004.
[8] Tong Zhang and Bin Yu. Boosting with early stopping: convergence and consistency. The Annals of Statistics, 33:1538–1579, 2005.
[9] Wenxin Jiang. Process consistency for AdaBoost. The Annals of Statistics, 32(1):13–29, 2004.
[10] P. J. Bickel, Y. Ritov, and A. Zakai. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7:705–732, May 2006.
[11] Martin Anthony and Peter Bartlett. Neural network learning: theoretical foundations. Cambridge University Press, 1999.
[12] V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30:1–50, 2002.
[13] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer-Verlag, New York, 1991.
[14] Richard M. Dudley. Uniform central limit theorems. Cambridge University Press, Cambridge, MA, 1999.
[15] David Pollard. Empirical Processes: Theory and Applications. IMS, 1990.
[16] Luc Devroye, L´ szl´ Gy¨ rfi, and G´ bor Lugosi. A Probabilistic Theory of Pattern Recognition. a o o a Springer, New York, 1996.
[17] Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006.