nips nips2008 nips2008-72 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Empirical performance maximization for linear rank statistics St´ phan Cl´ mencon e e ¸ Telecom Paristech (TSI) - LTCI UMR Institut Telecom/CNRS 5141 stephan. [sent-1, score-0.439]
2 Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. [sent-6, score-0.333]
3 This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. [sent-7, score-0.441]
4 We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. [sent-8, score-0.666]
5 Even in the simplest framework of bipartite ranking, where a binary label is available, there is not one and only natural criterion, but many possible options. [sent-10, score-0.057]
6 The ROC curve provides a complete description of performance but its functional nature renders direct optimization strategies rather complex. [sent-11, score-0.071]
7 Empirical risk minimization strategies are thus based on summaries of the ROC curve, which take the form of empirical risk functionals where the averages involved are no longer taken over i. [sent-12, score-0.254]
8 The most popular choice is the so-called AUC criterion (see [AGH+ 05] or [CLV08] for instance), but when top-ranked instances are more important, various choices can be considered: the Discounted Cumulative Gain or DCG [CZ06], the p-norm push (see [Rud06]), or the local AUC (refer to [CV07]). [sent-16, score-0.176]
9 The present paper starts from the simple observation that all these summary criteria have a common feature: conditioned upon the labels, they all belong to the class of linear rank statistics. [sent-17, score-0.418]
10 Such statistics have been extensively studied in the mathematical statistics literature because of their optimality properties in hypothesis testing, see [HS67]. [sent-18, score-0.128]
11 Now, in the statistical learning view, with the importance of excess risk bounds, the theory of rank tests needs to be revisited and new problems come up. [sent-19, score-0.317]
12 The arguments required to deal with risk functionals based on linear rank statistics have been sketched in [CV07] in a special case. [sent-20, score-0.429]
13 The empirical AUC, known as the Wilcoxon-Mann-Whitney statistic, is also a U -statistic and this particular dependence structure was extensively exploited in [CLV08]. [sent-21, score-0.056]
14 In the present paper, we describe the generic structure of linear rank statistics as an orthogonal decomposition after projection onto the space of sums of i. [sent-22, score-0.515]
15 This projection method is the key to all statistical results related to maximizers of such criteria: consistency, (fast) rates of convergence or model selection. [sent-26, score-0.167]
16 We relate linear rank statistics to performance measures relevant for the ranking problem by showing that the target of ranking algorithms 1 correspond to optimal ordering rules in that sense (Section 3). [sent-27, score-0.794]
17 Eventually, we provide some preliminary results in Section 4 for empirical maximizers of performance criteria based on linear rank statistics with smooth score-generating functions. [sent-28, score-0.646]
18 2 Criteria based on linear rank statistics Along the paper, we shall consider the standard binary classification model. [sent-29, score-0.365]
19 1 Motivation Most of the statistical learning theory has been developed for empirical risk minimizers (ERM) of sums of i. [sent-45, score-0.141]
20 Mathematical results were elaborated with the use of empirical processes techniques and particularly concentration inequalities for such processes (see [BBL05] for an overview). [sent-49, score-0.139]
21 For prediction tasks such as ranking or scoring, more involved statistics need to be considered, such as the Area Under the ROC curve (AUC), the local AUC, the Discounted Cumulative Gain (DCG), the p-norm push, etc. [sent-59, score-0.301]
22 For instance, the AUC, a very popular performance measure in various scoring applications, such as medical diagnosis or credit-risk screening, can be seen as a probability of an ”event of order two”, i. [sent-60, score-0.352]
23 The first theoretical studies either attempt to get back to sums of i. [sent-64, score-0.038]
24 We shall see that this approach requires the development of new tools for handling the concentration properties of rank processes, namely collections of rank statistics indexed by classes of functions, which have never been studied before. [sent-69, score-0.649]
25 2 Empirical performance of scoring rules The learning task on which we focus here is known as the bipartite ranking problem. [sent-71, score-0.574]
26 The goal of ranking is to order the instances Xi by means of a real-valued scoring function s : X → R , given the binary labels Yi . [sent-72, score-0.509]
27 It is natural to assume that a good scoring rule s would assign higher ranks to the positive instances (those for which Yi = +1) than to the negative ones. [sent-74, score-0.404]
28 The rank of the observation Xi induced by the scoring function s is expressed as n Rank(s(Xi )) = j=1 I{s(Xj )≤s(Xi )} and ranges from 1 to n. [sent-75, score-0.541]
29 In the present paper, we consider a particular class of simple (conditional) linear rank statistics inspired from the Wilcoxon statistic. [sent-76, score-0.372]
30 We define the ”empirical Wranking performance measure” as the empirical risk functional n Wn (s) = Rank(s(Xi )) n+1 I{Yi =+1} φ i=1 , ∀s ∈ S. [sent-79, score-0.126]
31 We refer to the book by Serfling [Ser80] for properties and asymptotic theory of rank statistics. [sent-81, score-0.338]
32 We point out that our definition does not match exactly with the standard definition of linear rank statistics. [sent-82, score-0.291]
33 Indeed, in our case, coefficients of the ranks in the sum are random because they involve the variables Yi . [sent-83, score-0.078]
34 We will call statistics Wn (s) conditional linear rank statistics. [sent-84, score-0.37]
35 It is a very natural idea to consider ranking criteria based on ranks. [sent-85, score-0.277]
36 Observe indeed that the performance of a given scoring function s is invariant by increasing transforms of the latter, when evaluated through the empirical W -ranking performance measure. [sent-86, score-0.401]
37 Such a criterion is of interest when one wants to focus on the highest ranks. [sent-89, score-0.054]
38 • φ(u) = up - this is another choice which puts emphasis on high ranks but in a smoother way than the previous one. [sent-90, score-0.078]
39 This is related to the p-norm push approach taken in [Rud06]. [sent-91, score-0.067]
40 However, we point out that the criterion studied in the latter work relies on a different definition of the rank of an observation. [sent-92, score-0.324]
41 Namely, the rank of positive instances among negative instances (and not in the pooled sample) is used. [sent-93, score-0.38]
42 • φ(u) = φn (u) = c ((n + 1) u)·I{u≥k/(n+1)} - this corresponds to the DCG criterion in the bipartite setup, one of the ”gold standard quality measure” in information retrieval, when grades are binary (namely I{Yi =+1} ). [sent-95, score-0.111]
43 The c(i)’s denote the discount factors, c(i) measuring the importance of rank i. [sent-96, score-0.296]
44 The integer k denotes the number of top-ranked instances to take into account. [sent-97, score-0.055]
45 Notice that, with our indexation, top positions correspond to the largest ranks and the sequence {ci } should be chosen to be increasing. [sent-98, score-0.078]
46 3 Uniform approximation of linear rank statistics This subsection describes the main result of the present analysis, which shall serve as the essential tool for deriving statistical properties of maximizers of empirical W -ranking performance measures. [sent-100, score-0.555]
47 For a given scoring function s, we denote by Gs , respectively Hs , the conditional cumulative distribution function of s(X) given Y = +1, respectively Y = −1. [sent-101, score-0.326]
48 random variables, the underlying statistical structure can be revealed by orthogonal projections onto the space of sums of i. [sent-106, score-0.082]
49 This projection argument was the key for the study of empirical AUC maximization, which involved U -processes, see [CLV08]. [sent-110, score-0.134]
50 In the case of U -statistics, this orthogonal decomposition is known as the Hoeffding decomposition and the remainder may be expressed as a degenerate U -statistic, see [Hoe48]. [sent-111, score-0.126]
51 For rank statistics, a similar though less accurate decomposition can be considered. [sent-112, score-0.308]
52 We refer to [Haj68] for a systematic use of the projection method for investigating the asymptotic properties of general statistics. [sent-113, score-0.124]
53 T = i=1 E[T | Zi ] − (n − 1)E[Z] is called the H´ jek projection of T . [sent-126, score-0.101]
54 3 From the perspective of ERM in statistical learning theory, through the projection method, wellknown concentration results for standard empirical processes may carry over to more complex collections of r. [sent-128, score-0.171]
55 such as rank processes, as shown by the next approximation result. [sent-130, score-0.27]
56 ) On the terminology of major sets and major classes, we refer to [Dud99]. [sent-139, score-0.097]
57 In this case, the lack of smoothness of φ has to be compensated by smoothness assumptions on the underlying conditional distributions. [sent-144, score-0.085]
58 Another approach would consist of approximating Wn (s) by the empirical W-ranking criterion where the score-generating function ψ would be a smooth approximation of φ. [sent-145, score-0.11]
59 An essential hint to the study of the asymptotic behavior of a linear rank statistic consists in rewriting n it as a function of the sampling cdf. [sent-147, score-0.382]
60 Denoting by Fs (x) = n−1 i=1 I{s(Xi )≤x} the empirical counterpart of Fs (x), we have: k φ Wn (s) = i=1 n + Fs s(Xi ) n+1 . [sent-148, score-0.056]
61 5 Let S0 ⊂ S be a VC major class of functions with VC dimension V and φ be a score-generating function of class C 1 . [sent-154, score-0.101]
62 Then, as n → ∞, we have with probability one: 1 sup |Wn (s) − kWφ (s)| → 0. [sent-155, score-0.063]
63 n s∈S0 3 Optimality We introduce the class S ∗ of scoring functions obtained as strictly increasing transformations of the regression function η: S ∗ = { s∗ = T ◦ η | T : [0, 1] → R strictly increasing }. [sent-156, score-0.442]
64 The class S ∗ contains the optimal scoring rules for the bipartite ranking problem. [sent-157, score-0.584]
65 The next paragraphs motivate the use of W -ranking performance measures as optimization criteria for this problem. [sent-158, score-0.117]
66 1 ROC curves A classical tool for measuring the performance of a scoring rule s is the so-called ROC curve −1 ROC(s, . [sent-160, score-0.368]
67 The set of points (α, β) ∈ [0, 1]2 which can be achieved as (α, ROC(s, α)) for some scoring function s is called the ROC space. [sent-163, score-0.271]
68 It is a well-known fact that the regression function provides an optimal scoring function for the ROC curve. [sent-164, score-0.271]
69 Using the fact that, for a given scoring function, the ROC curve is invariant by increasing transformations of the scoring function, we get the following result: Lemma. [sent-167, score-0.64]
70 6 For any scoring function s and any α ∈ [0, 1], we have: . [sent-168, score-0.271]
71 The next result states that the set of optimal scoring functions coincides with the set of maximizers of the Wφ -ranking performance, provided that the score-generating function φ is strictly increasing. [sent-170, score-0.412]
72 Remark 3 (O N PLUG - IN RANKING RULES) Theoretically, a possible approach to ranking is the plug-in method ([DGL96]), which consists of using an estimate η of the regression function as a ˆ scoring function. [sent-176, score-0.454]
73 2 Connection to hypothesis testing From the angle embraced in this paper, the ranking problem is tightly related to hypothesis testing. [sent-180, score-0.205]
74 As a first go, we can reformulate the ranking problem as the one of finding a scoring function s such that s(X − ) is stochastically smaller than s(X + ), which means, for example, that: ∀t ∈ R, P{s(X − ) ≥ t} ≤ P{s(X + ) ≥ t}. [sent-184, score-0.484]
75 It is easy to see that the latter statement means that the ROC curve of s dominates the first diagonal of the ROC space. [sent-185, score-0.048]
76 We point out the fact that the first diagonal corresponds to nondiscriminating scoring functions s0 such that Hs0 = Gs0 . [sent-186, score-0.271]
77 However, searching for a scoring function s fulfilling this property is generally not sufficient in practice. [sent-187, score-0.271]
78 In this respect, various criteria may be considered such as the L1 -Mallows metric (see the next remark). [sent-190, score-0.119]
79 , s(Xm ), it is well-known that nonparametric tests based on linear rank statistics have optimality properties. [sent-198, score-0.371]
80 We refer to Chapter 9 in [Ser80] for an overview of rank procedures for testing homogeneity, which may yield relevant criteria in the ranking context. [sent-199, score-0.596]
81 It is well-known that this criterion may be interpreted as 5 the ”rate of concording pairs”: AUC(s) = P{s(X) < s(X ) | Y = −1, Y = +1} where (X, Y ) and (X , Y ) denote independent copies. [sent-201, score-0.054]
82 Furthermore, it may be easily shown that AUC(s) = 1 + 2 ∞ {Hs (t) − Gs (t)}dF (t), −∞ where the cdf F may be taken as any linear convex combination of Hs and Gs . [sent-202, score-0.048]
83 4 A generalization error bound We now provide a bound on the generalization ability of scoring rules based on empirical maximization of W-ranking performance criteria. [sent-204, score-0.414]
84 8 Set the empirical W -ranking performance maximizer sn = arg maxs∈S Wn (s). [sent-206, score-0.079]
85 Unˆ der the same assumptions as in Proposition 3 and assuming in addition that the class of functions Φs induced by S0 is also a VC major class of functions, we have, for any δ > 0, and with probability 1 − δ: V log(1/δ) ∗ Wφ − Wφ (ˆn ) ≤ c1 s + c2 , n n for some positive constants c1 , c2 . [sent-207, score-0.101]
86 5 Conclusion In this paper, we considered a general class of performance measures for ranking/scoring which can be described as conditional linear rank statistics. [sent-209, score-0.378]
87 Our overall setup encompasses in particular known criteria used in medical diagnosis and information retrieval. [sent-210, score-0.187]
88 We have described the statistical nature of such statistics, proved that they ar compatible with optimal scoring functions in the bipartite setup, √ and provided a preliminary generalization bound with a n-rate of convergence. [sent-211, score-0.351]
89 By doing so, we provided the very results on a class of linear rank processes. [sent-212, score-0.324]
90 Moreover, it is not clear how to formulate convex surrogates for such functionals yet. [sent-214, score-0.043]
91 Appendix - Proofs Proof of Proposition 5 By virtue of the finite increment theorem, we have: 1 + sup |Fs (t) − Fs (t)| n + 1 (s,t)∈S0 ×R sup |Wn (s) − kWφ (s)| ≤ k||φ ||∞ s∈S0 and the desired result immediately follows from the application of the VC inequality, see Remark 1. [sent-215, score-0.156]
92 Following in the footsteps of [Haj68], we first compute the projection of Bn (s) onto the space Σ of r. [sent-217, score-0.077]
93 Eventually, one needs to evaluate the accuracy of the approximation yield by the projection Bn (s) − {PΣ (Bn (s)) − (n − 1)E[Bn (s)]}. [sent-231, score-0.056]
94 Write, for all s ∈ S0 , n Bn (s) = nUn (s) + I{Yi =+1} i=1 1 − Fs (s(Xi )) φ (Fs (s(Xi ))), n+1 7 where Un (s) = i=j qs ((Xi , Yi ), (Xj , Yj ))/(n(n + 1)) is a U -statistic with kernel: qs ((x, y), (x , y )) = I{y=+1} · I{s(x )≤s(x)} · φ (Fs (s(x))). [sent-232, score-0.12]
95 Hence, we have n−1 Bn (s) − {PΣ (Bn (s)) − (n − 1)E[Bn (s)]} = Un (s) − {PΣ (Un (s)) − (n − 1)E[Un (s)]} which actually corresponds to the degenerate part of the Hoeffding decomposition of the U -statistic Un (s). [sent-233, score-0.065]
96 Now, given that sups∈S0 ||qs ||∞ < ∞, it follows from Theorem 11 in [CLV08] for instance, combined with the basic symmetrization device of the kernel qs , that sup |Un (s) − {PΣ (Un (s)) − (n − 1)E[Un (s)]}| = OP (n−1 ) as n → ∞, s∈S0 which concludes the proof. [sent-234, score-0.123]
97 Proof of Proposition 7 Using the decomposition Fs = pGs + (1 − p)Hs , we are led to the following expression: 1 φ(u) du − (1 − p)E[φ(Fs (s(X))) | Y = −1]. [sent-235, score-0.038]
98 0 It is now easy to conclude since φ is increasing (by assumption) and because of the optimality of elements of S ∗ in the sense of Lemma 6. [sent-237, score-0.06]
99 Ranking and empirical risk minimization of e ¸ U-statistics. [sent-257, score-0.103]
100 Asymptotic normality of simple linear rank statistics under alternatives. [sent-299, score-0.368]
wordName wordTfidf (topN-words)
[('fs', 0.571), ('scoring', 0.271), ('rank', 0.27), ('roc', 0.244), ('auc', 0.214), ('ranking', 0.183), ('hs', 0.18), ('bn', 0.161), ('wn', 0.159), ('xi', 0.144), ('gs', 0.126), ('op', 0.121), ('maximizers', 0.111), ('un', 0.11), ('vc', 0.101), ('dcg', 0.097), ('criteria', 0.094), ('dgs', 0.089), ('remark', 0.079), ('ranks', 0.078), ('proposition', 0.072), ('yi', 0.072), ('push', 0.067), ('sup', 0.063), ('qs', 0.06), ('agh', 0.058), ('bipartite', 0.057), ('projection', 0.056), ('empirical', 0.056), ('instances', 0.055), ('criterion', 0.054), ('kw', 0.053), ('mencon', 0.053), ('statistic', 0.05), ('xk', 0.049), ('statistics', 0.048), ('curve', 0.048), ('vn', 0.048), ('risk', 0.047), ('erm', 0.045), ('jek', 0.045), ('pgs', 0.045), ('lugosi', 0.043), ('functionals', 0.043), ('asymptotic', 0.041), ('rules', 0.04), ('ser', 0.039), ('summaries', 0.039), ('decomposition', 0.038), ('sums', 0.038), ('cl', 0.037), ('vayatis', 0.036), ('setup', 0.035), ('concentration', 0.035), ('major', 0.035), ('rn', 0.035), ('umr', 0.033), ('class', 0.033), ('xj', 0.033), ('yj', 0.033), ('optimality', 0.032), ('proof', 0.032), ('medical', 0.031), ('conditional', 0.031), ('stochastically', 0.03), ('virtue', 0.03), ('gy', 0.03), ('simon', 0.03), ('strictly', 0.03), ('normality', 0.029), ('hoeffding', 0.028), ('increasing', 0.028), ('refer', 0.027), ('retrieval', 0.027), ('smoothness', 0.027), ('diagnosis', 0.027), ('cdf', 0.027), ('degenerate', 0.027), ('measuring', 0.026), ('ordering', 0.026), ('ing', 0.026), ('shall', 0.026), ('metric', 0.025), ('zn', 0.025), ('dissimilarity', 0.024), ('cumulative', 0.024), ('maximization', 0.024), ('processes', 0.024), ('performance', 0.023), ('negligible', 0.023), ('orthogonal', 0.023), ('preliminary', 0.023), ('testing', 0.022), ('transformations', 0.022), ('discounted', 0.022), ('involved', 0.022), ('populations', 0.021), ('linear', 0.021), ('onto', 0.021), ('uniformly', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
2 0.40310261 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1
3 0.28159666 159 nips-2008-On Bootstrapping the ROC Curve
Author: Patrice Bertail, Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the ”smoothed bootstrap” is introduced. Theoretical arguments and simulation results are presented to show that the ”smoothed bootstrap” is preferable to a ”naive” bootstrap in order to construct accurate confidence bands. 1
4 0.16951099 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
5 0.11716685 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
6 0.10414447 224 nips-2008-Structured ranking learning using cumulative distribution networks
7 0.092823841 184 nips-2008-Predictive Indexing for Fast Search
8 0.078122579 39 nips-2008-Bounding Performance Loss in Approximate MDP Homomorphisms
9 0.077861384 40 nips-2008-Bounds on marginal probability distributions
10 0.074399263 239 nips-2008-Tighter Bounds for Structured Estimation
11 0.074309207 19 nips-2008-An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis
12 0.072724193 78 nips-2008-Exact Convex Confidence-Weighted Learning
13 0.065244399 178 nips-2008-Performance analysis for L\ 2 kernel classification
14 0.065139636 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
15 0.064360164 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression
16 0.058663324 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
17 0.055757027 106 nips-2008-Inferring rankings under constrained sensing
18 0.055749252 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing
19 0.055009533 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
20 0.053531669 214 nips-2008-Sparse Online Learning via Truncated Gradient
topicId topicWeight
[(0, -0.179), (1, -0.04), (2, -0.116), (3, 0.046), (4, -0.05), (5, -0.065), (6, 0.407), (7, -0.111), (8, 0.228), (9, 0.319), (10, -0.076), (11, -0.126), (12, 0.07), (13, 0.025), (14, -0.022), (15, -0.05), (16, -0.027), (17, 0.003), (18, -0.08), (19, -0.066), (20, -0.032), (21, -0.061), (22, -0.043), (23, 0.009), (24, 0.003), (25, 0.003), (26, 0.014), (27, 0.004), (28, -0.04), (29, 0.003), (30, -0.031), (31, 0.008), (32, -0.003), (33, 0.033), (34, -0.007), (35, -0.019), (36, -0.015), (37, 0.0), (38, -0.006), (39, -0.012), (40, -0.033), (41, 0.021), (42, 0.018), (43, 0.033), (44, -0.016), (45, -0.02), (46, 0.002), (47, -0.036), (48, 0.015), (49, -0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.94148725 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
2 0.92699975 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1
3 0.91272759 159 nips-2008-On Bootstrapping the ROC Curve
Author: Patrice Bertail, Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the ”smoothed bootstrap” is introduced. Theoretical arguments and simulation results are presented to show that the ”smoothed bootstrap” is preferable to a ”naive” bootstrap in order to construct accurate confidence bands. 1
4 0.47039318 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
5 0.42578164 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
6 0.3699511 224 nips-2008-Structured ranking learning using cumulative distribution networks
7 0.30928043 178 nips-2008-Performance analysis for L\ 2 kernel classification
8 0.30363202 78 nips-2008-Exact Convex Confidence-Weighted Learning
9 0.29644629 39 nips-2008-Bounding Performance Loss in Approximate MDP Homomorphisms
10 0.2951645 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
11 0.28292635 239 nips-2008-Tighter Bounds for Structured Estimation
12 0.27912828 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
13 0.27093875 196 nips-2008-Relative Margin Machines
14 0.248183 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels
15 0.24781452 151 nips-2008-Non-parametric Regression Between Manifolds
16 0.24330109 225 nips-2008-Supervised Bipartite Graph Inference
17 0.22617818 40 nips-2008-Bounds on marginal probability distributions
18 0.22441362 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
19 0.22167194 111 nips-2008-Kernel Change-point Analysis
20 0.22148414 184 nips-2008-Predictive Indexing for Fast Search
topicId topicWeight
[(6, 0.035), (7, 0.042), (12, 0.015), (15, 0.01), (28, 0.692), (57, 0.018), (63, 0.016), (71, 0.022), (77, 0.021), (78, 0.019), (83, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99895477 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
2 0.99655545 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
3 0.99549103 206 nips-2008-Sequential effects: Superstition or rational behavior?
Author: Angela J. Yu, Jonathan D. Cohen
Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1
4 0.99280274 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
5 0.98708761 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1
6 0.98620528 117 nips-2008-Learning Taxonomies by Dependence Maximization
7 0.98480368 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
8 0.98148036 126 nips-2008-Localized Sliced Inverse Regression
9 0.97714776 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
10 0.96358609 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
11 0.9613902 159 nips-2008-On Bootstrapping the ROC Curve
12 0.94132245 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
13 0.93383163 223 nips-2008-Structure Learning in Human Sequential Decision-Making
14 0.93306237 101 nips-2008-Human Active Learning
15 0.92984205 211 nips-2008-Simple Local Models for Complex Dynamical Systems
16 0.92859757 107 nips-2008-Influence of graph construction on graph-based clustering measures
17 0.92850339 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
18 0.92565691 112 nips-2008-Kernel Measures of Independence for non-iid Data
19 0.9213503 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation
20 0.92081964 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel