jmlr jmlr2006 jmlr2006-82 jmlr2006-82-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter J. Bickel, Ya'acov Ritov, Alon Zakai
Abstract: We give a review of various aspects of boosting, clarifying the issues through a few simple results, and relate our work and that of others to the minimax paradigm of statistics. We consider the population version of the boosting algorithm and prove its convergence to the Bayes classifier as a corollary of a general result about Gauss-Southwell optimization in Hilbert space. We then investigate the algorithmic convergence of the sample version, and give bounds to the time until perfect separation of the sample. We conclude by some results on the statistical optimality of the L2 boosting. Keywords: classification, Gauss-Southwell algorithm, AdaBoost, cross-validation, non-parametric convergence rate
P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: A large sample study. Ann. Stat. 10:1100–1120, 1982. Y. Baraud. Model selection for regression on a random design.Tech. Report, U. Paris Sud, 2001. A. Barron, L. Birg´ , and P. Massart. Risk bounds for model selection under penalization. Prob. e Theory and Related Fields, 113:301–413, 1999. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Tech. Report 638, Department of Statistics, University of California at Berkeley, 2003. P. J. Bickel and P. W. Millar. Uniform convergence of probability measures on classes of functions. Statistica Sinica 2:1-15, 1992. P. J. Bickel and Y. Ritov. The golden chain. A comment. Ann. Statist., 32:91–96, 2003. L. Breiman. Arcing classifiers (with discussion). Ann. Statist. 26:801–849, 1998. L. Breiman. Prediction games and arcing algorithms. Neural Computation 11:1493-1517, 1999. 730 S OME T HEORY FOR G ENERALIZED B OOSTING A LGORITHMS L. Breiman. Some infinity theory for predictor ensembles Technical Report U.C. Berkeley, 2000. P. B¨ hlmann. Consistency for L2 boosting and matching pursuit with trees and tree type base funcu tions. Technical Report ETH Z¨ rich, 2002. u P. B¨ hlmann and B. Yu. Boosting the L2 loss: regression and classification. J. of Amer. Statist. u Assoc., 98:324–339, 2003 D. Donoho, I.M. Johnstone, G. Kerkyacharian, and D. Picard. Wavelet shrinkage: asymptopia (with discussion). J. Roy. Statist. Soc. Ser. B 57:371–394, 1995. J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist. 28:337–407, 2000. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation 121:256– 285, 1995. Y, Freund and R. E. Schapire. Experiments with a new boosting algorithm. Machine Learning: Proc. 13th International Conference, 148–156. Morgan Kauffman, San Francisco, 1996. G. Gy¨ rfi, M. Kohler, A. Krzy˙ ak, and H. Walk. A Distribution Free Theory of Nonparametric o z Regression. Springer, New York, 2002. W. Jiang. Process consistency for ADABOOST. Technical Report 00-05, Dept. of Statistics, Northwestern University, 2002. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data . J. of Amer. Statist. Assoc., 99:67–81, 2002. D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing Company, Reading, 1984. G. Lugosi and N. Vayatis. On the Bayes-risk consistency of boosting methods. Ann. Statist. 32:30– 55, 2004. S. Mallat and Z. Zhang. Matching pursuit with time frequency dictionaries. IEEE Transactions on Signal Processing 41:3397–3415, 1993. E. Mammen and A. Tsybakov. Smooth discrimination analysis. Ann. Statist. 27:1808–1829, 1999. S. Mannor, R. Meir, and T. Zhang. Greedy algorithms for classification—consistency, convergence rates and adaptivity. J. of Machine Learning Research 4:713–742, 2004. L. Mason, P. Bartlett, J. Baxter, and M. Frean. Functional gradient techniques for combining hypotheses. In Sch¨ lkopg, Smola, A., Bartlett, P., and Schurmans, D. (edts.) Advances in Large o Margin Classifiers, MIT Press, Boston, 2000. R. E. Schapire. The strength of weak learnability. Machine Learning 5:197–227, 1990. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence related predictions. Machine Learning, 37:297–336, 1999. 731 B ICKEL , R ITOV AND Z AKAI A, Tsybakov. Optimal aggregation of classifiers in statistical learning. Technical Report, U. of Paris IV, 2001. A. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, New York, 1996. Y. Yang. Minimax nonparametric classification – Part I Rates of convergence, Part II Model selection, IEEE Trans. Inf. Theory 45:2271–2292, 1999. T. Zhang and B. Yu. Boosting with early stopping: convergence and consistency. Tech Report 635, Stat Dept, UCB, 2003. T. Zhang. Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann. Statist., 32:56–134, 2004. 732