nips nips2001 nips2001-45 nips2001-45-reference knowledge-graph by maker-knowledge-mining

45 nips-2001-Boosting and Maximum Likelihood for Exponential Models

Source: pdf

Author: Guy Lebanon, John D. Lafferty

Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.

reference text

[1] S. Chen and R. Rosenfeld. A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing, 8(1), 2000.

[2] M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, to appear.

[3] S. Della Pietra, V. Della Pietra, and J. Lafferty. Inducing features of random ﬁelds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 1997.

[4] S. Della Pietra, V. Della Pietra, and J. Lafferty. Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109, Carnegie Mellon University, 2001.

[5] N. Duffy and D. Helmbold. Potential boosters? In Neural Information Processing Systems, 2000.

[6] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, 1996.

[7] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 2000.

[8] J. Kivinen and M. K. Warmuth. Boosting as entropy projection. In Computational Learning Theory, 1999.

[9] J. Lafferty. Additive models, boosting, and inference for generalized divergences. In Computational Learning Theory, 1999.

[10] L. Mason, J. Baxter, P. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. Smola, P. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances in o Large Margin Classiﬁers, 1999.

[11] G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Machine Learning, 2001. a u