nips nips2008 nips2008-174 knowledge-graph by maker-knowledge-mining

174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking


Source: pdf

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Overlaying classifiers: a practical approach for optimal ranking St´ phan Cl´ mencon e e ¸ Telecom Paristech (TSI) - LTCI UMR Institut Telecom/CNRS 5141 stephan. [sent-1, score-0.234]

2 fr Abstract ROC curves are one of the most widely used displays to evaluate performance of scoring functions. [sent-5, score-0.289]

3 The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. [sent-7, score-0.099]

4 We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. [sent-8, score-0.518]

5 We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. [sent-9, score-0.387]

6 1 Introduction In applications such as medical diagnosis, credit risk screening or information retrieval, one aims at ordering instances under binary label information. [sent-10, score-0.058]

7 The problem of ranking binary classification data is known in the machine learning literature as the bipartite ranking problem ([FISS03], [AGH+ 05], [CLV08]). [sent-11, score-0.335]

8 A natural approach is to find a real-valued scoring function which mimics the order induced by the regression function. [sent-12, score-0.295]

9 A classical performance measure for scoring functions is the Receiver Operating Characteristic (ROC) curve which plots the rate of true positive against false positive ([vT68], [Ega75]). [sent-13, score-0.543]

10 The ROC curve offers a graphical display which permits to judge rapidly how a scoring rule discriminates the two populations (positive against negative). [sent-14, score-0.449]

11 A scoring rule whose ROC curve is close to the diagonal line does not discriminate at all, while the one lying above all others is the best possible choice. [sent-15, score-0.436]

12 In the present paper, we propose a statistical methodology to estimate the optimal ROC curve in a stronger sense than the AUC sense, namely in the sense of the supremum norm. [sent-17, score-0.288]

13 In the same time, we will explain how to build a nearly optimal scoring function. [sent-18, score-0.318]

14 Our approach is based on a simple observation: optimal scoring functions can be represented from the collection of level sets of the regression function. [sent-19, score-0.376]

15 Hence, the bipartite ranking problem may be viewed as a ’continuum’ of classification problems with asymmetric costs where the targets are the level sets. [sent-20, score-0.233]

16 In a nonparametric setup, regression or density level sets can be estimated with plug-in methods ([Cav97], [RV06], [AA07], [WN07], . [sent-21, score-0.062]

17 Here, we take a different approach based on a weighted empirical risk minimization principle. [sent-25, score-0.14]

18 We provide rates of convergence with which an optimal point of the ROC curve can be recovered according to this principle. [sent-26, score-0.26]

19 We also develop a practical ranking method based on a discretization of the original problem. [sent-27, score-0.131]

20 From the resulting classifiers and their related empirical errors, we show how 1 to build a linear-by-part estimate of the optimal ROC curve and a quasi-optimal piecewise constant scoring function. [sent-28, score-0.553]

21 Rate bounds in terms of the supremum norm on ROC curves for these procedures are also established. [sent-29, score-0.085]

22 2 Bipartite ranking, scoring rules and ROC curves Setup. [sent-31, score-0.312]

23 We study the ranking problem for classification data with binary labels. [sent-32, score-0.131]

24 copies of a random pair (X, Y ) ∈ X × {−1, +1} where X is a random descriptor living in the measurable space X and Y represents its binary label (relevant vs. [sent-36, score-0.043]

25 We denote by P = (µ, η) the distribution of (X, Y ), where µ is the marginal distribution of X and η is the regression function (up to an affine transformation): η(x) = P{Y = 1 | X = x}, x ∈ X . [sent-42, score-0.033]

26 We will also denote by p = P{Y = 1} the proportion of positive labels. [sent-43, score-0.029]

27 We consider the approach where the ordering can be derived by the means of a scoring function s : X → R: one expects that the higher the value s(X) is, the more likely the event ”Y = +1” should be observed. [sent-46, score-0.248]

28 The following definition sets the goal of learning methods in the setup of bipartite ranking. [sent-47, score-0.073]

29 Definition 1 (Optimal scoring functions) The class of optimal scoring functions is given by the set S ∗ = { s∗ = T ◦ η | T : [0, 1] → R strictly increasing }. [sent-48, score-0.548]

30 Interestingly, it is possible to make the connection between an arbitrary (bounded) optimal scoring function s∗ ∈ S ∗ and the distribution P (through the regression function η) completely explicit. [sent-49, score-0.333]

31 Proposition 1 (Optimal scoring functions representation, [CV08]) A bounded scoring function s∗ is optimal if and only if there exist a nonnegative integrable function w and a continuous random variable V in (0, 1) such that: ∀x ∈ X , s∗ (x) = inf s∗ + E (w(V ) · I{η(x) > V }) . [sent-50, score-0.593]

32 X A crucial consequence of the last proposition is that solving the bipartite ranking problem amounts to recovering the collection {x ∈ X | η(x) > u}u∈(0,1) of level sets of the regression function η. [sent-51, score-0.368]

33 Hence, the bipartite ranking problem can be seen as a collection of overlaid classification problems. [sent-52, score-0.241]

34 We now recall the concept of ROC curve and explain why it is a natural choice of performance measure for the ranking problem with classification data. [sent-55, score-0.297]

35 We consider here only true ROC curves which correspond to the situation where the underlying distribution is known. [sent-56, score-0.056]

36 For a given scoring rule s, the conditional cdfs of the random variable s(X) are denoted by Gs and Hs . [sent-58, score-0.294]

37 to be the residual conditional cdfs of the random variable s(X). [sent-60, score-0.037]

38 We introduce the notation Q(Z, α) to denote the quantile of order 1 − α for the distribution of a random variable Z conditioned on the event Y = −1. [sent-62, score-0.041]

39 Definition 2 (True ROC curve) The ROC curve of a scoring function s is the parametric curve: ¯ ¯ z → Hs (z), Gs (z) for thresholds z ∈ R. [sent-65, score-0.414]

40 s By convention, points of the curve corresponding to possible jumps (due to possible degenerate points of Hs or Gs ) are connected by line segments, so that the ROC curve is always continuous. [sent-67, score-0.332]

41 ¯ ¯ The residual cdf Gs is also called the true positive rate while Hs is the false positive rate, so that the ROC curve is the plot of the true positive rate against the false positive rate. [sent-69, score-0.437]

42 Note that, as a functional criterion, the ROC curve induces a partial order over the space of all scoring functions. [sent-70, score-0.427]

43 Some scoring function might provide a better ranking on some part of the observation space and a worst one on some other. [sent-71, score-0.379]

44 A natural step to take is to consider local properties of the ROC curve in order to focus on best instances but this is not straightforward as explained in [CV07]. [sent-72, score-0.166]

45 We expect optimal scoring functions to be those for which the ROC curve dominates all the others for all α ∈ (0, 1). [sent-73, score-0.483]

46 The next proposition highlights the fact that the ROC curve is relevant when evaluating performance in the bipartite ranking problem. [sent-74, score-0.421]

47 Proposition 2 The class S ∗ of optimal scoring functions provides the best possible ranking with respect to the ROC curve. [sent-75, score-0.431]

48 Indeed, for any scoring function s, we have: ∀α ∈ (0, 1) , ROC∗ (α) ≥ ROC(s, α) , and ∀s∗ ∈ S ∗ , ∀α ∈ (0, 1) , ROC(s∗ , α) = ROC∗ (α). [sent-76, score-0.248]

49 Proposition 3 We assume that the optimal ROC curve is differentiable. [sent-78, score-0.218]

50 3 Recovering a point on the optimal ROC curve We consider here the problem of recovering a single point of the optimal ROC curve from a sample of i. [sent-81, score-0.473]

51 This amounts to recovering a single level set of the regression function η but we aim at controlling the error in terms of rates of false positive and true positive. [sent-88, score-0.197]

52 We also define the weighted classification error: Lω (C) = 2p(1 − ω) (1 − β(C)) + 2(1 − p)ω α(C) , with ω ∈ (0, 1) being the asymmetry factor. [sent-90, score-0.057]

53 ∗ Proposition 4 The optimal set for this error measure is Cω = {x : η(x) > ω}. [sent-91, score-0.052]

54 Also the optimal error is given by: ∗ Lω (Cω ) = 2E min{ω(1 − η(X)), (1 − ω)η(X)} . [sent-93, score-0.052]

55 The excess risk for an arbitrary set C can be written: ∗ ∗ Lω (C) − Lω (Cω ) = 2E (| η(X) − ω | I{X ∈ C∆Cω }) , where ∆ stands for the symmetric difference between sets. [sent-94, score-0.094]

56 3 The empirical counterpart of the weighted classification error can be defined as: 2ω ˆ Lω (C) = n n I{Yi = −1, Xi ∈ C} + i=1 2(1 − ω) n n I{Yi = +1, Xi ∈ C} . [sent-95, score-0.061]

57 / i=1 This leads to consider the weighted empirical risk minimizer over a class C of candidate sets: ˆ ˆ Cω = arg min Lω (C). [sent-96, score-0.135]

58 C∈C ˆ The next result provides rates of of convergence of the weighted empirical risk minimizer Cω to the best set in the class in terms of the two types of error α and β. [sent-97, score-0.177]

59 Suppose ∗ ∗ also that both G and H are twice continuously differentiable with strictly positive first derivatives and that ROC∗ has a bounded second derivative. [sent-100, score-0.049]

60 ˆ The same result also holds for the excess risk of Cω in terms of the rate β of true positive with a factor term of (1 − p)ω in the denominator instead . [sent-102, score-0.176]

61 It is noteworthy that, while convergence in terms of classification error is expected to be of the order of n−1/2 , its two components corresponding to the rate of false positive and true positive present slower rates. [sent-103, score-0.149]

62 4 Nearly optimal scoring rule based on overlaying classifiers Main result. [sent-104, score-0.417]

63 We now propose to collect the classifiers studied in the previous section in order to build a scoring function for the bipartite ranking problem. [sent-105, score-0.483]

64 From Proposition 1, we can focus on optimal scoring rules of the form: s∗ (x) = ∗ I{x ∈ Cω } ν(dω), (1) where the integral is taken w. [sent-106, score-0.323]

65 any positive measure ν with same support as the distribution of η(X). [sent-109, score-0.029]

66 We can then construct an estimator of s∗ by overlaying a finite collection of (estimated) density level sets ˆ ˆ Cω1 , . [sent-114, score-0.17]

67 , CωK : K ˆ I{x ∈ Cωi }, s(x) = ˆ i=1 which may be seen as an empirical version of a discrete version of the target s∗ . [sent-117, score-0.029]

68 In order to consider the performance of such an estimator, we need to compare the ROC curve of s to ˆ ˆ the optimal ROC curve. [sent-118, score-0.218]

69 ,K is not decreasing, the computation of the ROC curve as a function of the errors of the overlaying classifiers becomes complicated. [sent-122, score-0.277]

70 The main result of the paper is the next theorem which is proved for a modified sequence which ˜ yields to a different estimator. [sent-123, score-0.032]

71 The corresponding scoring function is then given by: K ˜ I{x ∈ Cωi } . [sent-128, score-0.248]

72 sK (x) = ˜ i=1 4 (2) ˜ ˜ Hence, the ROC curve of sK is simply the broken line that connects the knots (α(Cωi ), β(Cωi )), ˜ 0 ≤ i ≤ K + 1. [sent-129, score-0.282]

73 The next result offers a rate bound in the ROC space, equipped with a sup-norm. [sent-130, score-0.056]

74 Up to our knowledge, this is the first result on the generalization ability of decision rules in such a functional space. [sent-131, score-0.036]

75 Then, there exists a constant c such that, with probability at least 1 − δ, we have: c log(1/δ) sup |ROC∗ (α) − ROC(˜K , α)| ≤ s . [sent-134, score-0.037]

76 ) In the present paper, we have adopted a scoring approach to ROC analysis which is somehow related to the evaluation of the performance of classifiers in ROC space. [sent-136, score-0.248]

77 Using combinations of such classifiers to improve performance in terms of ROC curves has also been pointed out in [BDH06] and [BCT07]. [sent-137, score-0.041]

78 ) Note that taking ν = λ the Lebesgue measure over [0, 1] in the expression of s∗ leads to the regression function η(x) = ∗ ˆ I{x ∈ Cω } dω. [sent-139, score-0.033]

79 An estimator for the regression function could be the following: ηK (x) = K+1 ˜ω }. [sent-140, score-0.065]

80 The key for the proof of Theorem 2 is the idea of a piecewise linear approximation of the optimal ROC curve. [sent-146, score-0.131]

81 ∗ ∗ The broken line that connects the knots {(αi , βi ); 0 ≤ i ≤ K + 1} provides a piecewise linear (concave) approximation/interpolation of the optimal ROC curve ROC∗ . [sent-159, score-0.374]

82 The piecewise linear approximation K ∗ of ROC may then be written as: K ∗ ∗ βi φ∗ (α) . [sent-165, score-0.056]

83 i ROC (α) = i=1 ∗ ˆ In order to obtain an empirical estimator of ROC (α), we propose: i) to find an estimate Cωi of the ∗ true level set Cωi based on the training sample {(Xi , Yi )}i=1,. [sent-166, score-0.105]

84 We propose ˆ ˆ ˆ ∗ the following estimator of ROC (α): K ˆˆ βi φi (α), ROC∗ (α) = i=1 5 ˆ ˆ where φK (α) = φ(. [sent-175, score-0.045]

85 Hence, ROC is the broken line connecting the empirical knots {(ˆ i , β α The next result takes the form of a deviation bound for the estimation of the optimal ROC curve. [sent-180, score-0.191]

86 It quantifies the order of magnitude of a confidence band in supremum norm around an empirical estimate based on the previous approximation scheme with empirical counterparts. [sent-181, score-0.118]

87 Then, there exists a constant c such that, with probability at least 1 − δ, sup |ROC∗ (α) − ROC∗ (α)| ≤ c log(n/δ) n −1 α∈[ ,1− ] 5 1/3 . [sent-184, score-0.037]

88 Conclusion We have provided a strategy based on overlaid classifiers to build a nearly-optimal scoring function. [sent-185, score-0.289]

89 Statistical guarantees are provided in terms of rates of convergence for a functional criterion which is the ROC space equipped with a supremum norm. [sent-186, score-0.116]

90 The idea of the proof is to relate the excess risk in terms of α-error to the excess risk in terms of weighted classification error. [sent-190, score-0.243]

91 It follows from a Taylor expansion ω of ω (α) around α∗ at the second order that there exists α0 ∈ [0, 1] such that: d2 ROC∗ (α0 ) (α − α∗ )2 dα2 Using also the fact that ROC∗ dominates any other curve of the ROC space, we have: ∀C ⊂ X measurable, β(C) ≤ ROC∗ (α(C)). [sent-194, score-0.197]

92 It remains to get the rate of convergence for the weighted empirical risk. [sent-198, score-0.105]

93 α + i=1 i=1 It is well-known folklore in linear approximation theory ([dB01]) that if sK is a piecewise constant ˜ scoring function whose ROC curve interpolates the points {(αi , ROC∗ (αi ))}i=0,. [sent-207, score-0.47]

94 ,K of the optimal ˜ ˜ ROC curve, then we can bound the first term (which is positive), ∀α ∈ [0, 1], by: − d2 1 ROC∗ (α) · max (˜ i+1 − αi )2 . [sent-210, score-0.105]

95 α ˜ inf 0≤i≤K 8 α∈[0,1] dα2 Now, to control the second term, we upper bound the following quantity: ˜ |ROC∗ (˜ i ) − βi | ≤ sup α α∈[0,1] d ∗ ˜ ROC∗ (α) · |˜ i − αi | + |βi∗ − βi | α dα ∗ ∗ ˆ We further bound: |˜ i − αi | ≤ |˜ i − αi | + |αi − αi | where αi = α(Ci ). [sent-211, score-0.063]

96 ˆ For the other term, we use the same result on approximation as in the proof of Theorem 2: K ROC∗ (ˆ i )φi (α) − ROC∗ (α) ≤ − α ˆ i=1 1 d2 ROC∗ (α) · max (αi+1 − αi )2 ˆ ˆ inf 0≤i≤K 8 α∈[0,1] dα2 ∗ ∗ ∗ max (ˆ i+1 − αi ) ≤ max (αi+1 − αi ) + 2 max |αi − αi | + 2 max |ˆ i − αi | . [sent-236, score-0.184]

97 Eventually, we get the generalization bound: K −2 + (log K/n)1/3 , which is optimal for a number of knots: K ∼ n1/6 . [sent-241, score-0.052]

98 Ranking and empirical risk minimization of e ¸ U-statistics. [sent-288, score-0.108]

99 Tree-structured ranking rules and approximation of the e ¸ optimal ROC curve. [sent-298, score-0.222]

100 Fast rates for plug-in estimators of density level sets. [sent-321, score-0.051]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('roc', 0.878), ('scoring', 0.248), ('curve', 0.166), ('ranking', 0.131), ('overlaying', 0.095), ('gs', 0.09), ('ck', 0.076), ('bipartite', 0.073), ('op', 0.067), ('hs', 0.064), ('knots', 0.064), ('risk', 0.058), ('kn', 0.057), ('classi', 0.054), ('optimal', 0.052), ('ers', 0.052), ('proposition', 0.051), ('mencon', 0.051), ('supremum', 0.044), ('agh', 0.042), ('curves', 0.041), ('piecewise', 0.04), ('recovering', 0.037), ('excess', 0.036), ('cl', 0.035), ('regression', 0.033), ('theorem', 0.032), ('false', 0.032), ('weighted', 0.032), ('estimator', 0.032), ('broken', 0.031), ('level', 0.029), ('empirical', 0.029), ('positive', 0.029), ('supu', 0.028), ('quantile', 0.028), ('vayatis', 0.025), ('asymmetry', 0.025), ('inf', 0.025), ('sk', 0.025), ('max', 0.024), ('remark', 0.024), ('cdfs', 0.024), ('umr', 0.024), ('rate', 0.024), ('measurable', 0.023), ('yi', 0.023), ('rules', 0.023), ('sup', 0.023), ('proof', 0.023), ('overlaid', 0.023), ('rule', 0.022), ('notations', 0.022), ('rates', 0.022), ('eventually', 0.021), ('lebesgue', 0.021), ('connects', 0.021), ('minimization', 0.021), ('copies', 0.02), ('bounded', 0.02), ('convergence', 0.02), ('auc', 0.019), ('build', 0.018), ('arxiv', 0.018), ('dominates', 0.017), ('equipped', 0.017), ('partition', 0.017), ('annals', 0.017), ('fix', 0.017), ('lemma', 0.016), ('log', 0.016), ('errors', 0.016), ('cation', 0.016), ('minimizer', 0.016), ('approximation', 0.016), ('true', 0.015), ('bound', 0.015), ('term', 0.014), ('collection', 0.014), ('exists', 0.014), ('classifiers', 0.014), ('esaim', 0.014), ('additivity', 0.014), ('mimics', 0.014), ('pg', 0.014), ('hence', 0.014), ('partitions', 0.013), ('quantity', 0.013), ('xi', 0.013), ('residual', 0.013), ('statistical', 0.013), ('notation', 0.013), ('propose', 0.013), ('functional', 0.013), ('audibert', 0.013), ('hat', 0.013), ('discriminates', 0.013), ('boucheron', 0.013), ('cachan', 0.013), ('cmla', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1

2 0.67855823 159 nips-2008-On Bootstrapping the ROC Curve

Author: Patrice Bertail, Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the ”smoothed bootstrap” is introduced. Theoretical arguments and simulation results are presented to show that the ”smoothed bootstrap” is preferable to a ”naive” bootstrap in order to construct accurate confidence bands. 1

3 0.40310261 72 nips-2008-Empirical performance maximization for linear rank statistics

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1

4 0.094241194 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li

Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.

5 0.076871775 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

Author: Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative costs of obtaining each annotation. We construct a multiple-instance discriminative classifier based on the initial training data. Then all remaining unlabeled and weakly labeled examples are surveyed to actively determine which annotation ought to be requested next. After each request, the current classifier is incrementally updated. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort. 1

6 0.067754522 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

7 0.065427914 184 nips-2008-Predictive Indexing for Fast Search

8 0.062514849 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

9 0.048269365 224 nips-2008-Structured ranking learning using cumulative distribution networks

10 0.047761727 225 nips-2008-Supervised Bipartite Graph Inference

11 0.046598952 39 nips-2008-Bounding Performance Loss in Approximate MDP Homomorphisms

12 0.045902446 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

13 0.045426834 239 nips-2008-Tighter Bounds for Structured Estimation

14 0.043530289 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

15 0.038698196 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

16 0.037325393 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias

17 0.036415044 228 nips-2008-Support Vector Machines with a Reject Option

18 0.035730354 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

19 0.035186097 203 nips-2008-Scalable Algorithms for String Kernels with Inexact Matching

20 0.034975056 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.125), (1, -0.03), (2, -0.106), (3, 0.027), (4, -0.067), (5, -0.059), (6, 0.473), (7, -0.163), (8, 0.291), (9, 0.446), (10, -0.117), (11, -0.18), (12, 0.099), (13, -0.07), (14, -0.096), (15, -0.057), (16, -0.131), (17, -0.017), (18, -0.2), (19, -0.163), (20, -0.064), (21, -0.029), (22, -0.023), (23, 0.051), (24, -0.067), (25, -0.005), (26, 0.036), (27, -0.022), (28, -0.088), (29, 0.054), (30, -0.058), (31, 0.048), (32, -0.001), (33, -0.059), (34, -0.025), (35, 0.006), (36, 0.047), (37, -0.008), (38, -0.051), (39, 0.025), (40, 0.07), (41, -0.016), (42, -0.002), (43, -0.018), (44, 0.017), (45, 0.002), (46, -0.026), (47, 0.012), (48, -0.032), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98196709 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1

2 0.97926235 159 nips-2008-On Bootstrapping the ROC Curve

Author: Patrice Bertail, Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: This paper is devoted to thoroughly investigating how to bootstrap the ROC curve, a widely used visual tool for evaluating the accuracy of test/scoring statistics in the bipartite setup. The issue of confidence bands for the ROC curve is considered and a resampling procedure based on a smooth version of the empirical distribution called the ”smoothed bootstrap” is introduced. Theoretical arguments and simulation results are presented to show that the ”smoothed bootstrap” is preferable to a ”naive” bootstrap in order to construct accurate confidence bands. 1

3 0.79611903 72 nips-2008-Empirical performance maximization for linear rank statistics

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1

4 0.24885115 39 nips-2008-Bounding Performance Loss in Approximate MDP Homomorphisms

Author: Jonathan Taylor, Doina Precup, Prakash Panagaden

Abstract: We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than previous bounds provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

5 0.18237381 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

Author: Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative costs of obtaining each annotation. We construct a multiple-instance discriminative classifier based on the initial training data. Then all remaining unlabeled and weakly labeled examples are surveyed to actively determine which annotation ought to be requested next. After each request, the current classifier is incrementally updated. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort. 1

6 0.17831936 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

7 0.17471543 225 nips-2008-Supervised Bipartite Graph Inference

8 0.16779573 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

9 0.16199645 178 nips-2008-Performance analysis for L\ 2 kernel classification

10 0.14757489 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

11 0.13943793 85 nips-2008-Fast Rates for Regularized Objectives

12 0.13617423 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

13 0.1358505 163 nips-2008-On the Efficient Minimization of Classification Calibrated Surrogates

14 0.13572297 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning

15 0.134165 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

16 0.13302387 224 nips-2008-Structured ranking learning using cumulative distribution networks

17 0.13225976 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

18 0.13112341 78 nips-2008-Exact Convex Confidence-Weighted Learning

19 0.12942226 228 nips-2008-Support Vector Machines with a Reject Option

20 0.12195812 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.016), (6, 0.071), (7, 0.05), (12, 0.039), (15, 0.017), (26, 0.051), (28, 0.457), (57, 0.027), (63, 0.016), (71, 0.03), (77, 0.026), (78, 0.02), (83, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99209327 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1

2 0.98793799 115 nips-2008-Learning Bounded Treewidth Bayesian Networks

Author: Gal Elidan, Stephen Gould

Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1

3 0.98753774 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

Author: Lavi Shpigelman, Hagai Lalazar, Eilon Vaadia

Abstract: Using machine learning algorithms to decode intended behavior from neural activity serves a dual purpose. First, these tools allow patients to interact with their environment through a Brain-Machine Interface (BMI). Second, analyzing the characteristics of such methods can reveal the relative significance of various features of neural activity, task stimuli, and behavior. In this study we adapted, implemented and tested a machine learning method called Kernel Auto-Regressive Moving Average (KARMA), for the task of inferring movements from neural activity in primary motor cortex. Our version of this algorithm is used in an online learning setting and is updated after a sequence of inferred movements is completed. We first used it to track real hand movements executed by a monkey in a standard 3D reaching task. We then applied it in a closed-loop BMI setting to infer intended movement, while the monkey’s arms were comfortably restrained, thus performing the task using the BMI alone. KARMA is a recurrent method that learns a nonlinear model of output dynamics. It uses similarity functions (termed kernels) to compare between inputs. These kernels can be structured to incorporate domain knowledge into the method. We compare KARMA to various state-of-the-art methods by evaluating tracking performance and present results from the KARMA based BMI experiments. 1

4 0.98679918 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

Author: Nir Ailon

Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1

5 0.98592436 117 nips-2008-Learning Taxonomies by Dependence Maximization

Author: Matthew Blaschko, Arthur Gretton

Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1

6 0.9857493 72 nips-2008-Empirical performance maximization for linear rank statistics

7 0.98552102 206 nips-2008-Sequential effects: Superstition or rational behavior?

8 0.98410463 126 nips-2008-Localized Sliced Inverse Regression

9 0.98144394 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

10 0.97625589 159 nips-2008-On Bootstrapping the ROC Curve

11 0.9723143 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

12 0.96127909 101 nips-2008-Human Active Learning

13 0.96067727 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

14 0.95530146 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

15 0.95445317 223 nips-2008-Structure Learning in Human Sequential Decision-Making

16 0.95333773 102 nips-2008-ICA based on a Smooth Estimation of the Differential Entropy

17 0.9532339 211 nips-2008-Simple Local Models for Complex Dynamical Systems

18 0.95074153 107 nips-2008-Influence of graph construction on graph-based clustering measures

19 0.9505527 112 nips-2008-Kernel Measures of Independence for non-iid Data

20 0.950315 224 nips-2008-Structured ranking learning using cumulative distribution networks