nips nips2001 nips2001-137 nips2001-137-reference knowledge-graph by maker-knowledge-mining

137 nips-2001-On the Convergence of Leveraging

Source: pdf

Author: Gunnar Rätsch, Sebastian Mika, Manfred K. Warmuth

Abstract: We give an uniﬁed convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the Least-SquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the Gauss-Southwell method known from numerical optimization and state non-asymptotical convergence results for all these methods. Our analysis includes -norm regularized cost functions leading to a clean and general way to regularize ensemble learning.

reference text

[1] H.H. Bauschke and J.M. Borwein. Legendre functions and the method of random bregman projections. Journal of Convex Analysis, 4:27–67, 1997.

[2] K.P. Bennett, A. Demiriz, and J. Shawe-Taylor. A column generation algorithm for boosting. In P. Langley, editor, Proceedings, 17th ICML, pages 65–72. Morgan Kaufmann, 2000.

[3] L. Breiman. Prediction games and arcing algorithms. Neural Comp., 11(7):1493–1518, 1999.

[4] N. Cesa-Bianchi, A. Krogh, and M. Warmuth. Bounds on approximate steepest descent for likelihood maximization in exponential families. IEEE Trans. Inf. Th., 40(4):1215–1220, 1994.

[5] M. Collins, R.E. Schapire, and Y. Singer. Logistic Regression, Adaboost and Bregman distances. In Proc. COLT, pages 158–169, San Francisco, 2000. Morgan Kaufmann.

[6] J. Copas. Regression, prediction and shrinkage. J.R. Statist. Soc. B, 45:311–354, 1983.

[7] S. Della Pietra, V. Della Pietra, and J. Lafferty. Duality and auxiliary functions for bregman distances. TR CMU-CS-01-109, Carnegie Mellon University, 2001.

[8] N. Duffy and D.P. Helmbold. A geometric approach to leveraging weak learners. In P. Fischer and H. U. Simon, editors, Proc. EuroCOLT ’99, pages 18–33, 1999.

[9] N. Duffy and D.P. Helmbold. Potential boosters? In S.A. Solla, T.K. Leen, and K.-R. M¨ ller, u editors, NIPS, volume 12, pages 258–264. MIT Press, 2000.

[10] Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

[11] J. Friedman, T. Hastie, and R.J. Tibshirani. Additive Logistic Regression: a statistical view of boosting. Annals of Statistics, 2:337–374, 2000.

[12] J.H. Friedman. Greedy function approximation. Tech. rep., Stanford University, 1999.

[13] A.J. Hoffmann. On approximate solutions of systems of linear inequalities. Journal of Research of the National Bureau of Standards, 49(4):263–265, October 1952.

[14] J. Kivinen and M. Warmuth. Boosting as entropy projection. In Proc. 12th Annu. Conference on Comput. Learning Theory, pages 134–144. ACM Press, New York, NY, 1999.

[15] D.G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley Publishing Co., Reading, second edition, May 1984. Reprinted with corrections in May, 1989.

[16] Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.

[17] L. Mason, J. Baxter, P.L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In Adv. Large Margin Class., pages 221–247. MIT Press, 2000.

[18] T. Onoda, G. R¨ tsch, and K.-R. M¨ ller. An asymptotic analysis of AdaBoost in the binary a u classiﬁcation case. In L. Niklasson, M. Bod´ n, and T. Ziemke, editors, Proc. of the Int. Conf. on e Artiﬁcial Neural Networks (ICANN’98), pages 195–200, March 1998.

[19] G. R¨ tsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, a October 2001. http://mlg.anu.edu.au/˜raetsch/thesis.ps.gz.

[20] G. R¨ tsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in inﬁnite and ﬁnite a hypothesis spaces. Machine Learning, 48(1-3):193–221, 2002.

[21] G. R¨ tsch, S. Mika, and M.K. Warmuth. On the convergence of leveraging. NeuroCOLT2 a Technical Report 98, Royal Holloway College, London, 2001.

[22] G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Machine Learning, a u 42(3):287–320, March 2001. also NeuroCOLT Technical Report NC-TR-1998-021.

[23] G. R¨ tsch and M.K. Warmuth. Marginal boosting. NeuroCOLT2 Tech. Rep. 97, 2001. a

[24] R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970.

[25] Y. Singer. Leveraged vector machines. In S.A. Solla, T.K. Leen, and K.-R. M¨ ller, editors, u NIPS, volume 12, pages 610–616. MIT Press, 2000.

[26] R.J. Tibshirani. Regression selection and shrinkage via the LASSO. Technical report, Department of Statistics, University of Toronto, June 1994. ftp://utstat.toronto.edu/pub/tibs/lasso.ps.

[27] T. Zhang. A general greedy approximation algorithm with applications. In Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002. in press.