jmlr jmlr2009 jmlr2009-52 jmlr2009-52-reference knowledge-graph by maker-knowledge-mining

52 jmlr-2009-Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

Source: pdf

Author: Cynthia Rudin, Robert E. Schapire

Abstract: We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modiﬁcation of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classiﬁcation in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore, RankBoost, when given a speciﬁc intercept, achieves a misclassiﬁcation error that is as good as AdaBoost’s. This may help to explain the empirical observations made by Cortes and Mohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve. Keywords: ranking, RankBoost, generalization bounds, AdaBoost, area under the ROC curve

reference text

Shivani Agarwal, Thore Graepel, Ralf Herbich, Sariel Har-Peled, and Dan Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393–425, 2005. Peter L. Bartlett. The sample complexity of pattern classiﬁcation with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525–536, March 1998. Peter L. Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. Olivier Bousquet. New approaches to statistical learning theory. Annals of the Institute of Statistical Mathematics, 55(2):371–389, 2003. Ulf Brefeld and Tobias Scheffer. AUC maximizing support vector learning. In Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning, 2005. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings of the Twenty-second International Conference on Machine Learning, pages 89–96, 2005. Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the Twenty-third International Conference on Machine Learning, 2006. St´ phan Clemencon and Nicolas Vayatis. Ranking the best instances. Journal of Machine Learning e ¸ Research, 8:2671–2699, Dec 2007. St´ phan Clemencon, Gabor Lugosi, and Nicolas Vayatis. Ranking and scoring using empirical risk e ¸ minimization. In Proceedings of the Eighteenth Annual Conference on Learning Theory, 2005. St´ phan Clemencon, Gabor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of e ¸ U-statistics. The Annals of Statistics, 36(2):844–874, 2007. Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48(1/2/3), 2002. Corinna Cortes and Mehryar Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems 16, 2004. Corinna Cortes and Mehryar Mohri. Conﬁdence intervals for the area under the ROC curve. In Advances in Neural Information Processing Systems 17, 2005. 2230 M ARGIN -BASED R ANKING AND A DA B OOST Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995. David Cossock and Tong Zhang. Statistical analysis of bayes optimal subset ranking. IEEE Trans. Info. Theory, 54:4140–5154, 2008. Ofer Dekel, Christopher Manning, and Yoram Singer. Log-linear models for label ranking. In Advances in Neural Information Processing Systems 16, 2004. Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. Duality and auxiliary functions for Bregman distances. Technical Report CMU-CS-01-109R, School of Computer Science, Carnegie Mellon University, 2002. Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997. Yoav Freund and Robert E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behaviour, 29:79–103, 1999. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efﬁcient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933–969, 2003. Martin Gardner. The Colossal Book of Mathematics, chapter on More Nontransitive Paradoxes, pages 297–311. Norton, 2001. Quoc Le and Alex Smola. Direct optimization of ranking measures. arXiv:0704.3359v1, November 2007. Colin McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188. Cambridge University Press, 1989. Alain Rakotomamonjy. Optimizing AUC with support vector machine (SVM). In Proceedings of European Conference on Artiﬁcial Intelligence Workshop on ROC Curve and AI, Valencia, Spain, 2004. Cynthia Rudin. The P-Norm Push: A simple convex ranking algorithm that concentrates at the top of the list. Journal of Machine Learning Research, 10:2233–2271, October 2009. Cynthia Rudin, Ingrid Daubechies, and Robert E. Schapire. The dynamics of AdaBoost: Cyclic behavior and convergence of margins. Journal of Machine Learning Research, 5:1557–1595, December 2004. Cynthia Rudin, Corinna Cortes, Mehryar Mohri, and Robert E. Schapire. Margin-based ranking meets boosting in the middle. In Proceedings of the Eighteenth Annual Conference on Learning Theory, pages 63–78, 2005. Cynthia Rudin, Robert E. Schapire, and Ingrid Daubechies. Analysis of boosting algorithms using the smooth margin function. The Annals of Statistics, 35(6):2723–2768, 2007. Robert E. Schapire and Yoram Singer. Improved boosting algorithms using conﬁdence-rated predictions. Machine Learning, 37(3):297–336, December 1999. 2231 RUDIN AND S CHAPIRE Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998. Shai Shalev-Shwartz and Yoram Singer. Efﬁcient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7:1567–1599, December 2006. Nicolas Usunier, Massih-Reza Amini, and Patrick Gallinari. A data-dependent generalisation error bound for the AUC. In Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning, 2005. Tong Zhang and Bin Yu. Boosting with early stopping - convergence and consistency. The Annals of Statistics, 33(4):1538–1579, 2005. 2232