nips nips2008 nips2008-72 nips2008-72-reference knowledge-graph by maker-knowledge-mining

72 nips-2008-Empirical performance maximization for linear rank statistics

Source: pdf

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1

reference text

[AGH+ 05] S. Agarwal, T. Graepel, R. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393–425, 2005. [BBL05] S. Boucheron, O. Bousquet, and G. Lugosi. Theory of Classiﬁcation: A Survey of Some Recent Advances. ESAIM: Probability and Statistics, 9:323–375, 2005. [CLV08] S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and empirical risk minimization of e ¸ U-statistics. The Annals of Statistics, 36(2):844–874, 2008. [CS58] J. Chernoff and Savage. Asymptotic normality and efﬁciency of certain non parametric test statistics. Ann. Math. Stat., 29:972–994, 1958. [CV07] S. Cl´ mencon and N. Vayatis. Ranking the best instances. Journal of Machine Learne ¸ ing Research, 8:2671–2699, 2007. [CZ06] D. Cossock and T. Zhang. Subset ranking using regression. In H.U. Simon and G. Lugosi, editors, Proceedings of COLT 2006, volume 4005 of Lecture Notes in Computer Science, pages 605–619, 2006. [DGL96] L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. o Springer, 1996. 8 [Dud99] R.M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, 1999. [GKKW02] L. Gy¨ rﬁ, M. K¨ hler, A. Krzyzak, and H. Walk. A Distribution-Free Theory of Nono o parametric Regression. Springer, 2002. [Haj68] J. Hajek. Asymptotic normality of simple linear rank statistics under alternatives. Ann. Math. Stat., 39:325–346, 1968. [Hoe48] W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Stat., 19:293–325, 1948. [HS67] J. H´ jek and Z. Sid´ k. Theory of Rank Tests. Academic Press, 1967. a a [Rud06] C. Rudin. Ranking with a P-Norm Push. In H.U. Simon and G. Lugosi, editors, Proceedings of COLT 2006, volume 4005 of Lecture Notes in Computer Science, pages 589–604, 2006. [Ser80] R.J. Serﬂing. Approximation theorems of mathematical statistics. John Wiley & Sons, 1980. 9