jmlr jmlr2007 jmlr2007-70 jmlr2007-70-reference knowledge-graph by maker-knowledge-mining

70 jmlr-2007-Ranking the Best Instances

Source: pdf

Author: Stéphan Clémençon, Nicolas Vayatis

Abstract: We formulate a local form of the bipartite ranking problem where the goal is to focus on the best instances. We propose a methodology based on the construction of real-valued scoring functions. We study empirical risk minimization of dedicated statistics which involve empirical quantiles of the scores. We ﬁrst state the problem of ﬁnding the best instances which can be cast as a classiﬁcation problem with mass constraint. Next, we develop special performance measures for the local ranking problem which extend the Area Under an ROC Curve (AUC) criterion and describe the optimal elements of these new criteria. We also highlight the fact that the goal of ranking the best instances cannot be achieved in a stage-wise manner where ﬁrst, the best instances would be tentatively identiﬁed and then a standard AUC criterion could be applied. Eventually, we state preliminary statistical results for the local ranking problem. Keywords: ranking, ROC curve and AUC, empirical risk minimization, fast rates

reference text

S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393–425, 2005. P. Bartlett, M. Jordan, and J. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classiﬁcation: A survey of some recent advances. ESAIM: Probability and Statistics, 9:323–375, 2005. L. Cavalier. Nonparametric estimation of regression level sets. Statistics, 29:131–160, 1997. S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and scoring using empirical risk minimization. e ¸ In P. Auer and R. Meir, editors, Proceedings of COLT 2005, volume 3559 of Lecture Notes in Computer Science, pages 1–15. Springer, 2005. S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and empirical risk minimization of U-statistics. e ¸ The Annals of Statistics, To appear. C. Cortes and M. Mohri. Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, o Cambridge, MA, 2004. D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset ranking. Technical report, Yahoo! Research, 2006. L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o 1996. 2697 ´ C L E MENC ON AND VAYATIS ¸ L. E. Dodd and M. S. Pepe. Partial AUC estimation and regression. Biometrics, 59(3):614–623, 2003. R.M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, 1999. V. Dupac and J. H´ jek. Asymptotic normality of simple linear rank statistics under alternatives ii. a The Annals of Mathematical Statistics, (6):1992–2017, 1969. J.P. Egan. Signal Detection Theory and ROC Analysis. Academic Press, 1975. Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efﬁcient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 2003. J. H´ jek and Z. Sid´ k. Theory of Rank Tests. Academic Press, 1967. a a J.A. Hanley and J. McNeil. The meaning and use of the area under a ROC curve. Radiology, (143): 29–36, 1982. K. J¨ rvelin and J. Kek¨ l¨ inen. IR evaluation methods for retrieving highly relevant documents. In a aa N.J. Belkin, P . Ingwersen, , and M.-K. Leong, editors, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 41–48, 2000. H.L. Koul. Weighted Empirical Processes in Dynamic Nonlinear Models, volume 166 of Lecture Notes in Statistics. Springer, 2nd edition, 2002. H.L. Koul. Some convergence theorems for ranks and weighted empirical cumulatives. The Annals of Mathematical Statistics, (41):1768–1773, 1970. H.L. Koul and Jr. R.G. Staudte. Weak convergence of weighted empirical cumulatives based on ranks. The Annals of Mathematical Statistics, (43):823–841, 1972. P. Li, C. Burges, and Q. Wu. Learning to rank using classiﬁcation and gradient boosting. Technical report MSR-TR-2007-74, Microsoft Research, 2007. ¨ G. Lugosi. Pattern classiﬁcation and learning theory. In L. Gy orﬁ, editor, Principles of Nonparametric Learning, pages 1–56. E. Mammen and A. B. Tsybakov. Smooth discrimination analysis. Annals of Statistics, 27(6): 1808–1829, 1999. P. Massart. Concentration Inequalities and Model Selection. Lecture Notes in Mathematics. Springer, 2006. P. Massart and E. N´ d´ lec. Risk bounds for statistical learning. Annals of Statistics, 34(5), 2006. e e A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1965. C. Rudin. Ranking with a P-Norm Push. In H.U. Simon and G. Lugosi, editors, Proceedings of COLT 2006, volume 4005 of Lecture Notes in Computer Science, pages 589–604, 2006. 2698 R ANKING THE B EST I NSTANCES C. Rudin, C. Cortes, M. Mohri, and R. E. Schapire. Margin-based ranking and boosting meet in the middle. In P. Auer and R. Meir, editors, Proceedings of COLT 2005, volume 3559 of Lecture Notes in Computer Science, pages 63–78. Springer, 2005. C. Scott. Performance measures for Neyman-Pearson classiﬁcation. Technical report, Department of Statistics, Rice University, 2005. C. Scott and M. Davenport. Regression level set estimation via cost-sensitive classiﬁcation. IEEE Transactions on Signal Processing, 2006, to appear. C. Scott and R. Nowak. A Neyman-Pearson approach to statistical learning. IEEE Transactions on Information Theory, 51(11):3806–3819, November 2005. C. Scott and R. Nowak. Learning minimum volume sets. Journal of Machine Learning Research, 7:665–704, April 2006. I. Steinwart, D. Hush, and C. Scovel. A classiﬁcation framework for anomaly detection. Journal of Machine Learning Research, 6:211–232, 2005. J. Taylor and R. Tibshirani. A tail strength measure for assessing the overall univariate signiﬁcance in a dataset. Biostatistics, 7(2):167–181, 2006. A. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Annals of Statistics, 32(1): 135–166, 2004. A. Tsybakov. On nonparametric estimation of density level sets. Annals of Statistics, 25(3):948– 969, 1997. S. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, 2000. A. van de Vaart. Asymptotic Statistics. Cambridge University Press, 1998. H.L. van Trees. Detection, Estimation, and Modulation Theory, Part I. John Wiley, 1968. R. Vert and J.-P. Vert. Consistency and convergence rates of one-class SVMs and related algorithms. Journal of Machine Learning Research, 7:817–854, May 2006. R. Willett and R. Nowak. Minimax optimal level set estimation. Technical report, Rice University, 2006. 2699