nips nips2007 nips2007-39 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. [sent-6, score-1.178]
2 We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker. [sent-7, score-0.637]
3 This can be formulated as a ranking problem, where the algorithm takes a input a list of examples of members and non-members of the class, and outputs a function that can be used to rank candidates. [sent-10, score-0.396]
4 ROC curves [12, 3] are often used to evaluate the quality of a ranking function. [sent-12, score-0.222]
5 A point on an ROC curve is obtained by cutting off the ranked list, and checking how many items above the cutoff are members of the target class (“true positives”), and how many are not (“false positives”). [sent-13, score-0.157]
6 It is obtained by rescaling the axes so the true positives and false positives vary between 0 and 1, and, as the name implies, examining the area under the resulting curve. [sent-15, score-0.151]
7 The AUC measures the ability of a ranker to identify regions in feature space that are unusually densely populated with members of a given class. [sent-16, score-0.471]
8 A ranker can succeed according to this criterion even if positive examples are less dense than negative examples everywhere, but, in order to succeed, it must identify where the positive examples tend to be. [sent-17, score-0.757]
9 We show that any weak ranker can be boosted to a strong ranker that achieves AUC arbitrarily close to the best possible value of 1. [sent-21, score-1.352]
10 We show that in this setting, given a noise-tolerant weak ranker (that achieves nontrivial AUC in the presence of noisy data as described above), we can boost to a strong ranker that achieves AUC at least 1 − ǫ, for any η < 1/2 and any ǫ > 0. [sent-23, score-1.521]
11 Freund, Iyer, Schapire and Singer [4] introduced RankBoost, which performs ranking with more fine-grained control over preferences between pairs of items than we consider here. [sent-25, score-0.222]
12 They performed an analysis that implies a bound on the AUC of the boosted ranking function in terms of a different measure of the quality of weak rankers. [sent-26, score-0.643]
13 Kalai and Servedio [7] showed that, if data is corrupted with noise at a rate η, it is possible to boost the accuracy of any noise-tolerant weak learner arbitrarily close to 1 − η, and they showed that it is impossible to boost beyond 1 − η. [sent-30, score-0.82]
14 In contrast, we show that, in the presence of noise at a rate arbitrarily close to 1/2, the AUC can be boosted arbitrarily close to 1. [sent-31, score-0.352]
15 Our noise tolerant boosting algorithm uses as a subroutine the “martingale booster” for classification of Long and Servedio [9]. [sent-32, score-0.235]
16 The key observation is that a weak ranker can be used to find a “two-sided” weak classifier (Lemma 4), which achieves accuracy slightly better than random guessing on both positive and negative examples. [sent-34, score-1.353]
17 Two-sided weak classifiers can be boosted to obtain accuracy arbitrarily close to 1, also on both the positive examples and the negative examples; a proof of this is implicit in the analysis of [9]. [sent-35, score-0.712]
18 Known approaches to noise-tolerant boosting [7, 9] force the weak learner to provide a two-sided weak hypothesis by balancing the distributions that are constructed so that both classes are equally likely. [sent-38, score-0.978]
19 (We note that the lower bound from [7] is proved using a construction in which the class probability of positive examples is less than the noise rate; the essence of that proof is to show that in that situation it is impossible to balance the distribution given access to noisy examples. [sent-40, score-0.355]
20 ) In contrast, having a weak ranker provides enough leverage to yield a two-sided weak classifier without needing any rebalancing. [sent-41, score-1.135]
21 In Section 3, we analyze boosting the AUC when there is no noise in an abstract model where the weak learner is given a distribution and returns a weak ranker, and sampling issues are abstracted away. [sent-44, score-1.058]
22 In Section 4, we consider boosting in the presence of noise in a similarly abstract model. [sent-45, score-0.302]
23 We say that D is nontrivial (for c) if D assigns nonzero probability to both positive and negative examples. [sent-49, score-0.17]
24 We write D+ to denote the marginal distribution over positive examples and D− to denote the marginal distribution over negative examples, so D is a mixture of the distributions D+ and D− . [sent-50, score-0.167]
25 As has been previously pointed out, we may view any function h : X → R as a ranking of X. [sent-51, score-0.222]
26 Note that if h(x1 ) = h(x2 ) then the ranking does not order x1 relative to x2 . [sent-52, score-0.222]
27 Given a ranking function h : X → R, for each value θ ∈ R there is a point (αθ , βθ ) on the ROC curve of h, where αθ is the false positive rate and βθ is the true positive rate of the classifier obtained by thresholding h at θ: αθ = D− [h(x) ≥ θ] and βθ = D+ [h(x) ≥ θ]. [sent-53, score-0.569]
28 This motivates the following definition: Definition 1 A weak ranker with advantage γ is an algorithm that, given any nontrivial distribution 1 D, returns a function h : X → R that has AUC(h; D) ≥ 2 + γ. [sent-62, score-0.921]
29 In the rest of the paper we show how boosting algorithms originally designed for classification can be adapted to convert weak rankers into “strong” rankers (that achieve AUC at least 1 − ǫ) in a range of different settings. [sent-63, score-0.732]
30 3 From weak to strong AUC The main result of this section is a simple proof that the AUC can be boosted. [sent-64, score-0.396]
31 We achieve this in a relatively straightforward way by using the standard AdaBoost algorithm for boosting classifiers. [sent-65, score-0.178]
32 to a weak ranker which returns ranking functions h1 , h2 , . [sent-69, score-1.029]
33 In this model we prove the following: Theorem 2 There is an algorithm AUCBoost that, given access to a weak ranker with advantage γ as an oracle, for any nontrivial distribution D, outputs a ranking function with AUC at least 1 − ǫ. [sent-74, score-1.252]
34 The AUCBoost algorithm makes T = O( log(1/ǫ) ) many calls to the weak ranker. [sent-75, score-0.398]
35 As can be seen from the observation that it does not depend on the relative frequency of positive and negative examples, the AUC requires a learner to perform well on both positive and negative examples. [sent-77, score-0.306]
36 When such a requirement is imposed on a base classifier, it has been called two-sided weak learning. [sent-78, score-0.355]
37 The key to boosting the AUC is the observation (Lemma 4 below) that a weak ranker can be used to generate a two-sided weak learner. [sent-79, score-1.286]
38 Definition 3 A γ two-sided weak learner is an algorithm that, given a nontrivial distribution D, 1 outputs a hypothesis h that satisfies both Prx∈D+ [h(x) = 1] ≥ 2 + γ and Prx∈D− [h(x) = −1] ≥ 1 2 + γ. [sent-80, score-0.577]
39 Lemma 4 Let A be a weak ranking algorithm with advantage γ. [sent-82, score-0.631]
40 Then there is a γ/4 two-sided weak learner A′ based on A that always returns classifiers with equal error rate on positive and negative examples. [sent-83, score-0.621]
41 Proof: Algorithm A′ first runs A to get a real-valued ranking function h : X → R. [sent-84, score-0.222]
42 Thus, for the classifier obtained by def def thresholding h at θ, the class conditional error rates p+ = D+ [h(x) < θ] and p− = D− [h(x) ≥ θ] γ 1 1 satisfy p+ + p− ≤ 1 − γ. [sent-88, score-0.202]
43 We will also need the following simple lemma which shows that a classifier that is good on both the positive and the negative examples, when viewed as a ranking function, achieves a good AUC. [sent-101, score-0.43]
44 8 1 false positive rate Figure 1: The curved line represents the ROC curve for ranking function h. [sent-110, score-0.427]
45 16) represents the ROC curve for a ranker h′ which agrees with h on those x for which h(x) ≥ θ but randomly ranks those x for which h(x) < θ. [sent-114, score-0.506]
46 In round t, each copy 2 of AdaBoost passes its reweighted distribution Dt to the weak ranker, and then uses the process of Lemma 4 to convert the resulting weak ranking function to a classifier ht with two-sided advantage + γ/4. [sent-121, score-1.112]
47 This can be done by sorting the examples using the weak ranking function (in O(m log m) time steps) and processing the examples in the resulting order, keeping running counts of the number of errors of each type. [sent-126, score-0.691]
48 4 Boosting weak rankers in the presence of misclassification noise The noise model: independent misclassification noise. [sent-127, score-0.676]
49 In this framework there is a noise rate η < 1/2, and each example (positive or negative) drawn from distribution D has its true label c(x) independently flipped with probability η before it is given to the learner. [sent-129, score-0.155]
50 Boosting weak rankers in the presence of independent misclassification noise. [sent-131, score-0.508]
51 We now show how the AUC can be boosted arbitrarily close to 1 even if the data given to the booster is corrupted with independent misclassification noise, using weak rankers that are able to tolerate independent misclassification noise. [sent-132, score-0.68]
52 v0,3 Figure 2: The branching program produced by the boosting algorithm. [sent-146, score-0.419]
53 Each node vi,t is labeled with a weak classifier hi,t ; left edges correspond to -1 and right edges to 1. [sent-147, score-0.505]
54 v0,T +1 v1,T +1 output -1 vT −1,T +1 vT,T +1 output 1 As in the previous section we begin by abstracting away sampling issues and using a model in which the booster passes a distribution to a weak ranker. [sent-151, score-0.471]
55 Definition 6 A noise-tolerant weak ranker with advantage γ is an algorithm with the following property: for any noise rate η < 1/2, given a noisy distribution Dη , the algorithm outputs a ranking 1 function h : X → R such that AUC(h; D) ≥ 2 + γ. [sent-153, score-1.256]
56 Our algorithm for boosting the AUC in the presence of noise uses the Basic MartiBoost algorithm (see Section 4 of [9]). [sent-154, score-0.302]
57 This algorithm boosts any two-sided weak learner to arbitrarily high accuracy and works in a series of rounds. [sent-155, score-0.524]
58 It then calls a two-sided weak learner t times using each of D0,t , . [sent-168, score-0.484]
59 The output of Basic MartiBoost is a layered branching program defined as follows. [sent-177, score-0.268]
60 There is a node vi,t for each round 1 ≤ t ≤ T + 1 and each index 0 ≤ i < t (that is, for each bin constructed during training). [sent-178, score-0.165]
61 An item x is routed through the branching program the same way a labeled example (x, y) would have been routed during the training phase: it starts in node v0,1 , and from each node vi,t it goes to vi,t+1 if hi,t (x) = −1, and to vi+1,t+1 otherwise. [sent-179, score-0.622]
62 When the item x arrives at a terminal node of the branching program in layer T + 1, it is at some node vj,T +1 . [sent-180, score-0.504]
63 The prediction is 1 if j ≥ T /2 and is −1 if j < T /2; in other words, the prediction is according to the majority vote of the weak classifiers that were encountered along the path through the branching program that the example followed. [sent-181, score-0.623]
64 (The crux of the proof is the observation that positive (respectively, negative) examples are routed through the branching program according to a random walk that is biased to the right (respectively, left); hence the name “martingale boosting. [sent-184, score-0.491]
65 Then for T = O(log(1/ǫ)/γ 2), Basic MartiBoost constructs a branching program H such that D+ [H(x) = −1] ≤ ǫ and D− [H(x) = 1] ≤ ǫ. [sent-189, score-0.305]
66 Given access to a noise-tolerant weak ranker A with advantage γ, at each node vi,t the Basic MartiRank algorithm runs A and proceeds as described in Lemma 4 to obtain a weak classifier hi,t . [sent-191, score-1.371]
67 Basic MartiRank runs Basic MartiBoost with T = O(log(1/ǫ)/γ 2) and simply uses the resulting classifier H as its ranking function. [sent-192, score-0.222]
68 The following theorem shows that Basic MartiRank is an effective AUC booster in the presence of independent misclassification noise: Theorem 8 Fix any η < 1/2 and any ǫ > 0. [sent-193, score-0.178]
69 Given access to Dη and a noise-tolerant weak ranker A with advantage γ, Basic MartiRank outputs a branching program H such that AUC(H; D) ≥ 1 − ǫ. [sent-194, score-1.211]
70 Proof: Fix any node vi,t in the branching program. [sent-195, score-0.319]
71 The crux of the proof is the following simple observation: for a labeled example (x, y), the route through the branching program that is taken by (x, y) is determined completely by the predictions of the base classifiers, i. [sent-196, score-0.372]
72 So each time the noisetolerant weak ranker A is invoked at a node vi,t , it is indeed the case that the distribution that it is given is an independent misclassification noise distribution. [sent-204, score-0.982]
73 Consequently A does construct weak rankers with AUC at least 1/2 + γ, and the conversion of Lemma 4 yields weak classifiers that have advantage γ/4 with respect to the underlying distribution Di,t . [sent-205, score-0.877]
74 Given this, Lemma 7 implies that the final classifier H has error at most ǫ on both positive and negative examples drawn from the original distribution D, and Lemma 5 then implies that H, viewed a ranker, achieves AUC at least 1 − ǫ. [sent-206, score-0.229]
75 In [9], a more complex variant of Basic MartiBoost, called Noise-Tolerant SMartiBoost, is presented and is shown to boost any noise-tolerant weak learning algorithm to any accuracy less than 1 − η in the presence of independent misclassification noise. [sent-207, score-0.523]
76 Formally, we assume that the weak ranker is given access to an oracle for the noisy distribution Dη . [sent-210, score-0.932]
77 We thus now view a noise-tolerant weak ranker with advantage γ as an algorithm A with the following property: for any noise rate η < 1/2, given access to an oracle for Dη , the algorithm outputs a ranking function h : X → R 1 such that AUC(h; D) ≥ 2 + γ. [sent-211, score-1.352]
78 We let mA denote the number of examples from each class that suffice for A to construct a ranking function as described above. [sent-212, score-0.309]
79 In other words, if A is provided with a sample of draws from Dη such that each class, positive and negative, has at least mA points in the sample with that true label, then algorithm A outputs a γ-advantage weak ranking function. [sent-213, score-0.785]
80 (Note that for simplicity we are assuming here that the weak ranker always constructs a weak ranking function with the desired advantage, i. [sent-214, score-1.394]
81 We prove that SMartiRank is computationally efficient, has moderate sample complexity, and efficiently generates a high-accuracy final ranking function with respect to the underlying distribution D. [sent-218, score-0.222]
82 The main challenge in [9] is the following: for each node vi,t in the branching program, the boosting algorithm considered there must simulate a balanced version of the induced distribution Di,t which puts equal weight on positive and negative examples. [sent-220, score-0.659]
83 The solution in [9] is to “freeze” any such node and simply classify any example that reaches it as negative; the analysis argues that since only a tiny fraction of positive examples reach such nodes, this freezing only mildly degrades the accuracy of the final hypothesis. [sent-222, score-0.392]
84 In the ranking scenario that we now consider, we do not need to construct balanced distributions, but we do need to obtain a non-negligible number of examples from each class in order to run the weak learner at a given node. [sent-223, score-0.794]
85 So as in [9] we still freeze some nodes, but with a twist: we now freeze nodes which have the property that for some class label (positive or negative), only a tiny fraction of examples from D with that class label reach the node. [sent-224, score-0.469]
86 With this criterion for freezing we can prove that the final classifier constructed has high accuracy both on positive and negative examples, which is what we need to achieve good AUC. [sent-225, score-0.205]
87 Given a node vi,t and a bit b ∈ {−1, 1}, let pb denote D[x reaches vi,t and c(x) = b]. [sent-227, score-0.238]
88 The i,t SMartiRank algorithm is like Basic MartiBoost but with the following difference: for each node vi,t and each value b ∈ {−1, 1}, if ǫ · D[c(x) = b] (2) T (T + 1) then the node vi,t is “frozen,” i. [sent-228, score-0.236]
89 (If this condition holds for both values of b at a particular node vi,t then the node is frozen and either output value may be used as the label. [sent-231, score-0.346]
90 Then for T = O(log(1/ǫ)/γ 2 ), the final branching program hypothesis H that SMartiRank constructs will have D+ [H(x) = −1] ≤ ǫ and D− [H(x) = 1] ≤ ǫ. [sent-233, score-0.336]
91 Given an unlabeled instance x ∈ X, we say that x freezes at node vi,t if x’s path through the branching program causes it to terminate at a node vi,t with t < T + 1 (i. [sent-235, score-0.598]
92 We have D[x freezes and c(x) = 1] = i,t D[x freezes at vi,t and c(x) = 1] ≤ i,t ǫ·D[c(x)=1] T (T +1) ≤ ǫ 2 · D[c(x) = 1]. [sent-238, score-0.188]
93 To allow for the fact that they are just estimates, it will be more conservative, and freeze when the estimate of ǫ pb is at most 4T (T +1) times the estimate of D[c(x) = b]. [sent-248, score-0.175]
94 Now in order to determine whether node vi,t should be frozen, we must compare this estimate with a similarly accurate estimate of pb (arguments similar to those of, e. [sent-251, score-0.178]
95 We have pb i,t = D[x reaches vi,t ] · D[c(x) = b | x reaches vi,t ] = Dη [x reaches vi,t ] · Di,t [c(x) = b] = Dη [x reaches vi,t ] · η Di,t [y = b] − η 1 − 2η . [sent-255, score-0.3]
96 ) Thus for each vi,t , we can determine whether to freeze it in the execution of SMartiRank using poly(T, 1/ǫ, 1/p, 1/(1 − 2η)) draws from Dη . [sent-259, score-0.2]
97 For each of the nodes that are not frozen, we must run the noise-tolerant weak ranker A using the η distribution Di,t . [sent-260, score-0.808]
98 Thus, O(T 2 mA /(ǫp)) many draws from Dη are required in order to run the weak learner A at any particular node. [sent-263, score-0.526]
99 Since there are O(T 2 ) many nodes overall, we have that all in all O(T 4 mA /(ǫp)) many draws from Dη are required, in addition to the poly(T, 1/ǫ, 1/p, 1/(1 − 2η)) draws required to identify which nodes to freeze. [sent-264, score-0.226]
100 Given access to an oracle for Dη and a noise-tolerant weak ranker A with advantage 1 1 1 γ, the SMartiRank algorithm makes mA · poly( 1 , γ , 1−2η , p ) calls to Dη , and and with probability ǫ 1 − δ outputs a branching program H such that AUC(h; D) ≥ 1 − ǫ. [sent-266, score-1.314]
wordName wordTfidf (topN-words)
[('ranker', 0.425), ('auc', 0.398), ('weak', 0.355), ('ranking', 0.222), ('branching', 0.201), ('smartirank', 0.181), ('martiboost', 0.163), ('boosting', 0.151), ('roc', 0.15), ('martirank', 0.126), ('misclassi', 0.12), ('node', 0.118), ('freeze', 0.115), ('frozen', 0.11), ('prx', 0.094), ('freezes', 0.094), ('er', 0.091), ('booster', 0.086), ('rankers', 0.086), ('learner', 0.086), ('draws', 0.085), ('noise', 0.084), ('classi', 0.083), ('curve', 0.081), ('aucboost', 0.072), ('program', 0.067), ('presence', 0.067), ('boosted', 0.066), ('boost', 0.064), ('access', 0.064), ('lemma', 0.063), ('servedio', 0.063), ('def', 0.062), ('adaboost', 0.062), ('pb', 0.06), ('nontrivial', 0.06), ('oracle', 0.06), ('reaches', 0.06), ('negative', 0.059), ('examples', 0.057), ('advantage', 0.054), ('positive', 0.051), ('ht', 0.049), ('poly', 0.048), ('thresholding', 0.048), ('dt', 0.048), ('round', 0.047), ('cortes', 0.046), ('members', 0.046), ('arbitrarily', 0.046), ('basic', 0.045), ('ers', 0.045), ('outputs', 0.045), ('balanced', 0.044), ('routed', 0.043), ('rate', 0.043), ('calls', 0.043), ('positives', 0.043), ('corrupted', 0.041), ('proof', 0.041), ('mohri', 0.04), ('kalai', 0.038), ('martingale', 0.038), ('tiny', 0.038), ('accuracy', 0.037), ('constructs', 0.037), ('guessing', 0.036), ('everywhere', 0.036), ('achieves', 0.035), ('simulate', 0.035), ('area', 0.035), ('rhs', 0.035), ('labeled', 0.032), ('rudin', 0.031), ('crux', 0.031), ('freezing', 0.031), ('pru', 0.031), ('hypothesis', 0.031), ('fix', 0.031), ('passes', 0.03), ('class', 0.03), ('false', 0.03), ('ma', 0.029), ('rocco', 0.029), ('nodes', 0.028), ('label', 0.028), ('noisy', 0.028), ('schapire', 0.028), ('returns', 0.027), ('least', 0.027), ('additive', 0.027), ('achieve', 0.027), ('rankboost', 0.027), ('ipped', 0.027), ('pr', 0.027), ('freund', 0.026), ('list', 0.026), ('cation', 0.026), ('theorem', 0.025), ('nal', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
2 0.18971844 147 nips-2007-One-Pass Boosting
Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio
Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.
3 0.17670812 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
4 0.16222952 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
Author: Marco Barreno, Alvaro Cardenas, J. D. Tygar
Abstract: We present a new analysis for the combination of binary classifiers. Our analysis makes use of the Neyman-Pearson lemma as a theoretical basis to analyze combinations of classifiers. We give a method for finding the optimal decision rule for a combination of classifiers and prove that it has the optimal ROC curve. We show how our method generalizes and improves previous work on combining classifiers and generating ROC curves. 1
5 0.16203326 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets
Author: Joseph K. Bradley, Robert E. Schapire
Abstract: We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. 1
6 0.14326903 166 nips-2007-Regularized Boost for Semi-Supervised Learning
7 0.11510673 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
8 0.10454079 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
9 0.09108945 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
10 0.077889055 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
11 0.070529453 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
12 0.063092545 97 nips-2007-Hidden Common Cause Relations in Relational Learning
13 0.061902571 175 nips-2007-Semi-Supervised Multitask Learning
14 0.059136894 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index
15 0.055596489 110 nips-2007-Learning Bounds for Domain Adaptation
16 0.053733364 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
17 0.053443369 27 nips-2007-Anytime Induction of Cost-sensitive Trees
18 0.051972967 32 nips-2007-Bayesian Co-Training
19 0.051585771 192 nips-2007-Testing for Homogeneity with Kernel Fisher Discriminant Analysis
20 0.046369743 129 nips-2007-Mining Internet-Scale Software Repositories
topicId topicWeight
[(0, -0.169), (1, 0.012), (2, -0.096), (3, 0.109), (4, 0.105), (5, 0.143), (6, 0.2), (7, -0.13), (8, 0.168), (9, -0.178), (10, -0.032), (11, 0.08), (12, 0.039), (13, -0.019), (14, 0.019), (15, -0.057), (16, 0.085), (17, 0.054), (18, -0.142), (19, 0.011), (20, -0.074), (21, 0.156), (22, -0.187), (23, 0.098), (24, -0.039), (25, -0.028), (26, 0.051), (27, 0.018), (28, 0.018), (29, 0.004), (30, 0.006), (31, 0.019), (32, 0.075), (33, 0.095), (34, -0.023), (35, -0.034), (36, 0.016), (37, 0.032), (38, 0.055), (39, 0.01), (40, -0.001), (41, -0.031), (42, 0.004), (43, 0.026), (44, 0.078), (45, -0.048), (46, -0.001), (47, 0.091), (48, -0.024), (49, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.96294302 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
2 0.82358772 147 nips-2007-One-Pass Boosting
Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio
Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.
3 0.75410414 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
Author: Marco Barreno, Alvaro Cardenas, J. D. Tygar
Abstract: We present a new analysis for the combination of binary classifiers. Our analysis makes use of the Neyman-Pearson lemma as a theoretical basis to analyze combinations of classifiers. We give a method for finding the optimal decision rule for a combination of classifiers and prove that it has the optimal ROC curve. We show how our method generalizes and improves previous work on combining classifiers and generating ROC curves. 1
4 0.7289862 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets
Author: Joseph K. Bradley, Robert E. Schapire
Abstract: We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. 1
5 0.63762671 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
6 0.60218692 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
7 0.55702096 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
8 0.55050439 166 nips-2007-Regularized Boost for Semi-Supervised Learning
9 0.42657363 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
10 0.40380621 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
11 0.39322937 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
12 0.38900879 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
13 0.38709214 27 nips-2007-Anytime Induction of Cost-sensitive Trees
14 0.37921721 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
15 0.30038804 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks
16 0.28815791 32 nips-2007-Bayesian Co-Training
17 0.28002852 188 nips-2007-Subspace-Based Face Recognition in Analog VLSI
18 0.27991122 97 nips-2007-Hidden Common Cause Relations in Relational Learning
19 0.27867585 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index
20 0.27678052 173 nips-2007-Second Order Bilinear Discriminant Analysis for single trial EEG analysis
topicId topicWeight
[(4, 0.298), (5, 0.036), (13, 0.044), (16, 0.023), (21, 0.042), (31, 0.013), (34, 0.074), (35, 0.034), (46, 0.011), (47, 0.06), (49, 0.034), (83, 0.117), (85, 0.035), (87, 0.016), (88, 0.023), (90, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.74089426 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
2 0.71698487 74 nips-2007-EEG-Based Brain-Computer Interaction: Improved Accuracy by Automatic Single-Trial Error Detection
Author: Pierre Ferrez, José Millán
Abstract: Brain-computer interfaces (BCIs), as any other interaction modality based on physiological signals and body channels (e.g., muscular activity, speech and gestures), are prone to errors in the recognition of subject’s intent. An elegant approach to improve the accuracy of BCIs consists in a verification procedure directly based on the presence of error-related potentials (ErrP) in the EEG recorded right after the occurrence of an error. Six healthy volunteer subjects with no prior BCI experience participated in a new human-robot interaction experiment where they were asked to mentally move a cursor towards a target that can be reached within a few steps using motor imagination. This experiment confirms the previously reported presence of a new kind of ErrP. These “Interaction ErrP” exhibit a first sharp negative peak followed by a positive peak and a second broader negative peak (∼290, ∼350 and ∼470 ms after the feedback, respectively). But in order to exploit these ErrP we need to detect them in each single trial using a short window following the feedback associated to the response of the classifier embedded in the BCI. We have achieved an average recognition rate of correct and erroneous single trials of 81.8% and 76.2%, respectively. Furthermore, we have achieved an average recognition rate of the subject’s intent while trying to mentally drive the cursor of 73.1%. These results show that it’s possible to simultaneously extract useful information for mental control to operate a brain-actuated device as well as cognitive states such as error potentials to improve the quality of the braincomputer interaction. Finally, using a well-known inverse model (sLORETA), we show that the main focus of activity at the occurrence of the ErrP are, as expected, in the pre-supplementary motor area and in the anterior cingulate cortex. 1
3 0.68208402 179 nips-2007-SpAM: Sparse Additive Models
Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1
4 0.49595451 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald
Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we define the switch-distribution, a modification of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efficient algorithm. 1
5 0.48977703 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola
Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1
6 0.48873484 167 nips-2007-Regulator Discovery from Gene Expression Time Series of Malaria Parasites: a Hierachical Approach
7 0.48364046 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
8 0.48257756 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
9 0.48244381 63 nips-2007-Convex Relaxations of Latent Variable Training
10 0.4762046 53 nips-2007-Compressed Regression
11 0.4754265 128 nips-2007-Message Passing for Max-weight Independent Set
12 0.4753519 147 nips-2007-One-Pass Boosting
13 0.4745509 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
14 0.47325006 49 nips-2007-Colored Maximum Variance Unfolding
15 0.47305593 146 nips-2007-On higher-order perceptron algorithms
16 0.47290409 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
17 0.47280851 141 nips-2007-New Outer Bounds on the Marginal Polytope
18 0.47230324 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons
19 0.47224462 187 nips-2007-Structured Learning with Approximate Inference
20 0.47134107 134 nips-2007-Multi-Task Learning via Conic Programming