jmlr jmlr2005 jmlr2005-29 jmlr2005-29-reference knowledge-graph by maker-knowledge-mining

29 jmlr-2005-Efficient Margin Maximizing with Boosting

Source: pdf

Author: Gunnar Rätsch, Manfred K. Warmuth

Abstract: AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. The linear combination may be viewed as a hyperplane in feature space where the base hypotheses form the features. It has been observed that the generalization error of the algorithm continues to improve even after all examples are on the correct side of the current hyperplane. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even after all examples are on the correct side. We introduce a new version of AdaBoost, called AdaBoost∗ , that explicitly maximizes the ν minimum margin of the examples up to a given precision. The algorithm incorporates a current estimate of the achievable margin into its calculation of the linear coefﬁcients of the base hypotheses. The bound on the number of iterations needed by the new algorithms is the same as the number needed by a known version of AdaBoost that must have an explicit estimate of the achievable margin as a parameter. We also illustrate experimentally that our algorithm requires considerably fewer iterations than other algorithms that aim to maximize the margin.

reference text

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Twenty-ﬁrst international conference on Machine learning. ACM Press, 2004. ISBN 1-58113-828-5. K. P. Bennett, A. Demiriz, and J. Shawe-Taylor. A column generation algorithm for boosting. In P. Langley, editor, Proceedings, 17th ICML, pages 65–72, San Francisco, 2000. Morgan Kaufmann. L. Breiman. Are margins relevant in voting? Talk at the NIPS’98 workshop on Large Margin Classiﬁers, December 1998. 2150 E FFICIENT M ARGIN M AXIMIZATION WITH B OOSTING L. Breiman. Prediction games and arcing algorithms. Neural Computation, 11(7):1493–1518, 1999. Also Technical Report 504, Statistics Department, University of California Berkeley, Dec. 1997. B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on Statistic and Applied Probability. Chapman and Hall/CRC, New York, 1994. Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2): 256–285, September 1995. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148–146. Morgan Kaufmann, 1996. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999. A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artiﬁcal Intelligence, 1998. R. Hettich and K. O. Kortanek. Semi-inﬁnite programming: Theory, methods and applications. SIAM Review, 3:380–429, September 1993. J. Kivinen and M. Warmuth. Boosting as entropy projection. In Proc. 12th Annu. Conference on Comput. Learning Theory, pages 134–144. ACM Press, New York, NY, 1999. V. Koltchinskii, D. Panchenko, and F. Lozano. Some new bounds on the generalization error of combined classiﬁers. In Advances in Neural Information Processing Systems, volume 13, 2001. J. Lafferty. Additive models, boosting, and inference for generalized divergences. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 125–133, New York, NY, 1999. ACM Press. G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framework for genomic data fusion. Bioinformatics, 2004. O. L. Mangasarian. Arbitrary-norm separating plane. Operation Research Letters, 24(1):15–23, 1999. S. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, NY, 1996. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992. J. R. Quinlan. Boosting ﬁrst-order learning. Lecture Notes in Computer Science, 1160:143, 1996. G. R¨ tsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Neues a Palais 10, 14469 Potsdam, Germany, October 2001. G. R¨ tsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in inﬁnite and ﬁnite hypothesis a spaces. Machine Learning, 48(1-3):193–221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085. 2151 ¨ R ATSCH AND WARMUTH G. R¨ tsch, S. Mika, B. Sch¨ lkopf, and K.-R. M¨ ller. Constructing boosting algorithms from SVMs: a o u an application to one-class classiﬁcation. IEEE PAMI, 24(9), September 2002. In press. Earlier version is GMD TechReport No. 119, 2000. G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Machine Learning, 42(3): a u 287–320, March 2001. Also NeuroCOLT Technical Report NC-TR-1998-021. G. R¨ tsch and M. K. Warmuth. Maximizing the margin with boosting. In Proc. COLT, volume a 2375 of LNAI, pages 319–333, Sydney, 2002. Springer. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin separator. Technical report, Department of Statistics, Stanford University, 2002. C. Rudin, I. Daubechies, and R. E. Schapire. On the dynamics of boosting. In Advances in Neural Information Processing, volume 15, 2004a. C. Rudin, I. Daubechies, and R. E. Schapire. The dynamics of AdaBoost: Cyclic behavior and convergence of margins. Journal of Machine Learning Research, 2005. C. Rudin, R. E. Schapire, and I. Daubechies. Analysis of boosting algoritms using the smooth margin function: A study of three algorithms. Unpublished manuscript, October 2004b. C. Rudin, R. E. Schapire, and I. Daubechies. Boosting based on a smooth margin. In Proc. COLT’04, LNCS. Springer Verlag, July 2004c. R. E. Schapire. The Design and Analysis of Efﬁcient Learning Algorithms. PhD thesis, MIT Press, 1992. R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998. R. E. Schapire and Y. Singer. Improved boosting algorithms using conﬁdence-rated predictions. Machine Learning, 37(3):297–336, December 1999. S. Sonnenburg, G. R¨ tsch, and C. Sch¨ fer. Learning interpretable svms for biological sequence a a analysis. In Proc. RECOMB’05, LNCB. Springer Verlag, 2005. J. von Neumann. Zur Theorie der Gesellschaftsspiele. Math. Ann., 100:295–320, 1928. T. Zhang. Sequential greedy approximation for certain convex optimization problems. Technical report, IBM T. J. Watson Research Center, 2002. 2152