jmlr jmlr2011 jmlr2011-71 jmlr2011-71-reference knowledge-graph by maker-knowledge-mining

71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms

Source: pdf

Author: Şeyda Ertekin, Cynthia Rudin

Abstract: We demonstrate that there are machine learning algorithms that can achieve success for two separate tasks simultaneously, namely the tasks of classiﬁcation and bipartite ranking. This means that advantages gained from solving one task can be carried over to the other task, such as the ability to obtain conditional density estimates, and an order-of-magnitude reduction in computational time for training the algorithm. It also means that some algorithms are robust to the choice of evaluation metric used; they can theoretically perform well when performance is measured either by a misclassiﬁcation error or by a statistic of the ROC curve (such as the area under the curve). Speciﬁcally, we provide such an equivalence relationship between a generalization of Freund et al.’s RankBoost algorithm, called the “P-Norm Push,” and a particular cost-sensitive classiﬁcation algorithm that generalizes AdaBoost, which we call “P-Classiﬁcation.” We discuss and validate the potential beneﬁts of this equivalence relationship, and perform controlled experiments to understand P-Classiﬁcation’s empirical performance. There is no established equivalence relationship for logistic regression and its ranking counterpart, so we introduce a logistic-regression-style algorithm that aims in between classiﬁcation and ranking, and has promising experimental performance with respect to both tasks. Keywords: supervised classiﬁcation, bipartite ranking, area under the curve, rank statistics, boosting, logistic regression

reference text

Maria-Florina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John Langford, and Gregory B. Sorkin. Robust reductions from ranking to classiﬁcation. Machine Learning, 72: 139–153, 2008. Leo Breiman. Arcing the edge. Technical Report 486, Statistics Department, University of California at Berkeley, 1997. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 89–96, 2005. Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 161–168, 2006. Michael Collins, Robert E. Schapire, and Yoram Singer. Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48:253–285, 2002. Nigel Duffy and David P. Helmbold. A geometric approach to leveraging weak learners. In European Conference on Computational Learning Theory (EuroCOLT). Springer-Verlag, 1999. Wei Fan, Salvatore J. Stolfo, Junxin Zhang, and Philip K. Chan. Adacost: Misclassiﬁcation costsensitive boosting. In Proceedings of the 16th International Conference on Machine Learning (ICML), pages 97–105, 1999. Andrew Frank and Arthur Asuncion. http://archive.ics.uci.edu/ml. UCI machine learning repository, 2010. URL Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997. Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efﬁcient boosting algorithm for combining preferences. Journal of Machine Learning Research (JMLR), 4:933–969, 2003. Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337–374, 2000. Wojciech Kotlowski, Krzysztof Dembczynski, and Eyke Huellermeier. Bipartite ranking through minimization of univariate loss. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 1113–1120, 2011. 2928 O N E QUIVALENCE R ELATIONSHIPS B ETWEEN C LASSIFICATION AND R ANKING A LGORITHMS Aur´ lie C. Lozano and Naoki Abe. Multi-class cost-sensitive boosting with p-norm loss functions. e In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 506–514, 2008. Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. Boosting algorithms as gradient descent. In Proceedings of Neural Information Processing Systems (NIPS), 2000. David Mease, Abraham J. Wyner, and Andreas Buja. Boosted classiﬁcation trees and class probability/quantile estimation. Journal of Machine Learning Research (JMLR), 8:409–439, 2007. Indraneel Mukherjee, Cynthia Rudin, and Robert Schapire. The rate of convergence of AdaBoost. In Proceedings of the 24th Annual Conference on Learning Theory (COLT), 2011. Gunnar R¨ tsch, Takashi Onoda, and Klaus-Robert M¨ ller. Soft margins for AdaBoost. Machine a u Learning, 42(3):287–320, 2001. Cynthia Rudin. The P-Norm Push: A simple convex ranking algorithm that concentrates at the top of the list. Journal of Machine Learning Research (JMLR), 10:2233–2271, 2009. Cynthia Rudin and Robert E. Schapire. Margin-based ranking and an equivalence between AdaBoost and RankBoost. Journal of Machine Learning Research (JMLR), 10:2193–2232, 2009. Robert E. Schapire and Yoav Freund. Boosting: Foundations and Algorithms. MIT Press, 2011. In Preparation. Yanmin Sun, Mohamed S. Kamel, Andrew K. C. Wong, and Yang Wang. Cost-sensitive boosting for classiﬁcation of imbalanced data. Pattern Recognition, 40:3358–3378, 2007. David H. Wolpert and William G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, 1997. 2929