jmlr jmlr2006 jmlr2006-82 jmlr2006-82-reference knowledge-graph by maker-knowledge-mining

82 jmlr-2006-Some Theory for Generalized Boosting Algorithms

Source: pdf

Author: Peter J. Bickel, Ya'acov Ritov, Alon Zakai

Abstract: We give a review of various aspects of boosting, clarifying the issues through a few simple results, and relate our work and that of others to the minimax paradigm of statistics. We consider the population version of the boosting algorithm and prove its convergence to the Bayes classiﬁer as a corollary of a general result about Gauss-Southwell optimization in Hilbert space. We then investigate the algorithmic convergence of the sample version, and give bounds to the time until perfect separation of the sample. We conclude by some results on the statistical optimality of the L2 boosting. Keywords: classiﬁcation, Gauss-Southwell algorithm, AdaBoost, cross-validation, non-parametric convergence rate

reference text

P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: A large sample study. Ann. Stat. 10:1100–1120, 1982. Y. Baraud. Model selection for regression on a random design.Tech. Report, U. Paris Sud, 2001. A. Barron, L. Birg´ , and P. Massart. Risk bounds for model selection under penalization. Prob. e Theory and Related Fields, 113:301–413, 1999. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Tech. Report 638, Department of Statistics, University of California at Berkeley, 2003. P. J. Bickel and P. W. Millar. Uniform convergence of probability measures on classes of functions. Statistica Sinica 2:1-15, 1992. P. J. Bickel and Y. Ritov. The golden chain. A comment. Ann. Statist., 32:91–96, 2003. L. Breiman. Arcing classiﬁers (with discussion). Ann. Statist. 26:801–849, 1998. L. Breiman. Prediction games and arcing algorithms. Neural Computation 11:1493-1517, 1999. 730 S OME T HEORY FOR G ENERALIZED B OOSTING A LGORITHMS L. Breiman. Some inﬁnity theory for predictor ensembles Technical Report U.C. Berkeley, 2000. P. B¨ hlmann. Consistency for L2 boosting and matching pursuit with trees and tree type base funcu tions. Technical Report ETH Z¨ rich, 2002. u P. B¨ hlmann and B. Yu. Boosting the L2 loss: regression and classiﬁcation. J. of Amer. Statist. u Assoc., 98:324–339, 2003 D. Donoho, I.M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: asymptopia (with discussion). J. Roy. Statist. Soc. Ser. B 57:371–394, 1995. J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist. 28:337–407, 2000. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation 121:256– 285, 1995. Y, Freund and R. E. Schapire. Experiments with a new boosting algorithm. Machine Learning: Proc. 13th International Conference, 148–156. Morgan Kauffman, San Francisco, 1996. G. Gy¨ rﬁ, M. Kohler, A. Krzy˙ ak, and H. Walk. A Distribution Free Theory of Nonparametric o z Regression. Springer, New York, 2002. W. Jiang. Process consistency for ADABOOST. Technical Report 00-05, Dept. of Statistics, Northwestern University, 2002. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classiﬁcation of microarray data and satellite radiance data . J. of Amer. Statist. Assoc., 99:67–81, 2002. D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing Company, Reading, 1984. G. Lugosi and N. Vayatis. On the Bayes-risk consistency of boosting methods. Ann. Statist. 32:30– 55, 2004. S. Mallat and Z. Zhang. Matching pursuit with time frequency dictionaries. IEEE Transactions on Signal Processing 41:3397–3415, 1993. E. Mammen and A. Tsybakov. Smooth discrimination analysis. Ann. Statist. 27:1808–1829, 1999. S. Mannor, R. Meir, and T. Zhang. Greedy algorithms for classiﬁcation—consistency, convergence rates and adaptivity. J. of Machine Learning Research 4:713–742, 2004. L. Mason, P. Bartlett, J. Baxter, and M. Frean. Functional gradient techniques for combining hypotheses. In Sch¨ lkopg, Smola, A., Bartlett, P., and Schurmans, D. (edts.) Advances in Large o Margin Classiﬁers, MIT Press, Boston, 2000. R. E. Schapire. The strength of weak learnability. Machine Learning 5:197–227, 1990. R. E. Schapire and Y. Singer. Improved boosting algorithms using conﬁdence related predictions. Machine Learning, 37:297–336, 1999. 731 B ICKEL , R ITOV AND Z AKAI A, Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Technical Report, U. of Paris IV, 2001. A. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, New York, 1996. Y. Yang. Minimax nonparametric classiﬁcation – Part I Rates of convergence, Part II Model selection, IEEE Trans. Inf. Theory 45:2271–2292, 1999. T. Zhang and B. Yu. Boosting with early stopping: convergence and consistency. Tech Report 635, Stat Dept, UCB, 2003. T. Zhang. Statistical behaviour and consistency of classiﬁcation methods based on convex risk minimization. Ann. Statist., 32:56–134, 2004. 732