nips nips2010 nips2010-15 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Boosting combines weak classifiers to form highly accurate predictors. [sent-4, score-0.349]
2 Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. [sent-5, score-1.059]
3 In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. [sent-6, score-0.431]
4 1 Introduction Boosting [17] refers to a general technique of combining rules of thumb, or weak classifiers, to form highly accurate combined classifiers. [sent-7, score-0.381]
5 Minimal demands are placed on the weak classifiers, so that a variety of learning algorithms, also called weak-learners, can be employed to discover these simple rules, making the algorithm widely applicable. [sent-8, score-0.394]
6 The theory of boosting is well-developed for the case of binary classification. [sent-9, score-0.437]
7 In particular, the exact requirements on the weak classifiers in this setting are known: any algorithm that predicts better than random on any distribution over the training set is said to satisfy the weak learning assumption. [sent-10, score-0.773]
8 Further, boosting algorithms that minimize loss as efficiently as possible have been designed. [sent-11, score-0.439]
9 However, for such multiclass problems, a complete theoretical understanding of boosting is lacking. [sent-16, score-0.634]
10 In particular, we do not know the “correct” way to define the requirements on the weak classifiers, nor has the notion of optimal boosting been explored in the multiclass setting. [sent-17, score-1.018]
11 Straightforward extensions of the binary weak-learning condition to multiclass do not work. [sent-18, score-0.428]
12 Requiring less error than random guessing on every distribution, as in the binary case, turns out to be too weak for boosting to be possible when there are more than two labels. [sent-19, score-0.908]
13 On the other hand, requiring more than 50% accuracy even when the number of labels is much larger than two is too stringent, and simple weak classifiers like decision stumps fail to meet this criterion, even though they often can be combined to produce highly accurate classifiers [9]. [sent-20, score-0.59]
14 The purpose of a weak-learning condition is to clarify the goal of the weak-learner, thus aiding in its design, while providing a specific minimal guarantee on performance that can be exploited by a boosting algorithm. [sent-22, score-0.575]
15 These considerations may significantly impact learning and generalization because knowing the correct weak-learning conditions might allow the use of simpler weak classifiers, which in turn can help prevent overfitting. [sent-23, score-0.47]
16 Furthermore, boosting algorithms that more efficiently and effectively minimize training error may prevent underfitting, which can also be important. [sent-24, score-0.426]
17 In this paper, we create a broad and general framework for studying multiclass boosting that formalizes the interaction between the boosting algorithm and the weak-learner. [sent-25, score-1.03]
18 Unlike much, but not all, of the previous work on multiclass boosting, we focus specifically on the most natural, and perhaps 1 weakest, case in which the weak classifiers are genuine classifiers in the sense of predicting a single multiclass label for each instance. [sent-26, score-0.849]
19 Within this formalism, we can also now finally make precise what is meant by correct weak-learning conditions that are neither too weak nor too strong. [sent-28, score-0.498]
20 We introduce a whole family of such conditions since there are many ways of randomly guessing on more than two labels, a key difference between the binary and multiclass settings. [sent-30, score-0.478]
21 Although these conditions impose seemingly mild demands on the weak-learner, we show that each one of them is powerful enough to guarantee boostability, meaning that some combination of the weak classifiers has high accuracy. [sent-31, score-0.585]
22 And while no individual member of the family is necessary for boostability, we also show that the entire family taken together is necessary in the sense that for every boostable learning problem, there exists one member of the family that is satisfied. [sent-32, score-0.368]
23 Thus, we have identified a family of conditions which, as a whole, is necessary and sufficient for multiclass boosting. [sent-33, score-0.405]
24 ’s SAMME algorithm [21] is too weak in the sense that even when the condition is satisfied, no boosting algorithm can guarantee to drive down the training error. [sent-43, score-0.988]
25 Employing proper weak-learning conditions is important, but we also need boosting algorithms that can exploit these conditions to effectively drive down error. [sent-47, score-0.61]
26 For a given weak-learning condition, the boosting algorithm that drives down training error most efficiently in our framework can be understood as the optimal strategy for playing a certain two-player game. [sent-48, score-0.471]
27 However, using the powerful machinery of drifting games [8, 16], we are able to compute the optimal strategy for the games arising out of each weak-learning condition in the family described above. [sent-50, score-0.495]
28 Such results can be used in turn to derive bounds on the generalization error using standard techniques that have been applied to other boosting algorithms [18, 11, 13]. [sent-53, score-0.426]
29 ) The game-theoretic strategies are non-adaptive in that they presume prior knowledge about the edge, that is, how much better than random are the weak classifiers. [sent-55, score-0.349]
30 We show therefore how to derive an adaptive boosting algorithm by modifying one of the game-theoretic strategies. [sent-57, score-0.438]
31 We present experiments aimed at testing the efficacy of the new methods when working with a very weak weak-learner to check that the conditions we have identified are indeed weaker than others that had previously been used. [sent-58, score-0.436]
32 We find that our new adaptive strategy achieves low test error compared to other multiclass boosting algorithms which usually heavily underfit. [sent-59, score-0.751]
33 This validates the potential practical benefit of a better theoretical understanding of multiclass boosting. [sent-60, score-0.271]
34 The first boosting algorithms were given by Schapire [15] and Freund [6], followed by their AdaBoost algorithm [11]. [sent-62, score-0.396]
35 There are also more general approaches that can be applied to boosting including [2, 3, 4, 12]. [sent-69, score-0.396]
36 The first one [10, 14] views the weak- 2 learning condition as a minimax game, while drifting games [16, 6] were designed to analyze the most efficient boosting algorithms. [sent-71, score-0.714]
37 These games have been further analyzed in the multiclass and continuous time setting in [8]. [sent-72, score-0.317]
38 In multiclass classification, we want to predict the labels of examples lying in some set X. [sent-86, score-0.281]
39 Boosting combines several mildly powerful predictors, called weak classifiers, to form a highly accurate combined classifier, and has been previously applied for multiclass classification. [sent-95, score-0.646]
40 In this paper, we only allow weak classifier that predict a single class for each example. [sent-96, score-0.349]
41 With binary labels, Booster outputs a distribution in each round, and Weak-Learner returns a weak classifier achieving more than 50% accuracy on that distribution. [sent-100, score-0.39]
42 The multiclass game is an extension of the binary game. [sent-101, score-0.335]
43 (2) Weak-Learner returns some weak classifier ht : X → m {1, . [sent-104, score-0.427]
44 , k} from a fixed space ht ∈ H so that the cost incurred is Ct • 1ht = i=1 Ct (i, ht (xi )), is “small enough”, according to some conditions discussed below. [sent-107, score-0.295]
45 (3) Booster computes a weight αt for the current weak classifier based on how much cost was incurred in this round. [sent-109, score-0.401]
46 The restrictions on cost-matrices created by Booster, and the maximum cost Weak-Learner can suffer in each round, together define the weak-learning condition being used. [sent-115, score-0.24]
47 , w(m) on the training set, the error of the weak classfier returned is at most (1/2 − γ/2) i wi . [sent-119, score-0.425]
48 Then, Weak-Learner searching space H satisfies the binary weak-learning condition if: ∀C ∈ C bin , ∃h ∈ H : C • 1h − Ubin ≤ 0. [sent-128, score-0.241]
49 More importantly, by varying the restrictions C bin on the cost vectors and the matrix Ubin , we can generate a vast variety of weak-learning conditions for the multiclass setting k ≥ 2 as we now show. [sent-131, score-0.467]
50 3 Let C ⊆ Rm×k and matrix B ∈ Rm×k , which we call the baseline; we say a weak classifier space H satisfies the condition (C, B) if m ∀C ∈ C, ∃h ∈ H : C • (1h − B) ≤ 0, m c(i, h(i)) ≤ i. [sent-132, score-0.498]
51 The condition therefore states that a weak classifier should not exceed the average cost when weighted according to baseline B. [sent-136, score-0.602]
52 By studying this vast class of weak-learning conditions, we hope to find the one that will serve the main purpose of the boosting game: finding a convex combination of weak classifiers that has zero training error. [sent-141, score-0.745]
53 For this to be possible, at the minimum the weak classifiers should be sufficiently rich for such a perfect combination to exist. [sent-142, score-0.349]
54 Formally, a collection H of weak classifiers is eligible for boosting, or simply boostable, if there exists a distribution λ on this space that linearly separates the data: ∀i : argmaxl∈{1,. [sent-143, score-0.387]
55 Ideally, the second factor will not cause the weak-learning condition to impose additional restrictions on the weak classifiers; in that case, the weak-learning condition is merely a reformulation of being boostable that is more appropriate for deriving an algorithm. [sent-149, score-0.819]
56 In the next section we will describe conditions captured by our framework that avoid being too weak or too strong. [sent-157, score-0.436]
57 3 Necessary and sufficient weak-learning conditions The binary weak-learning condition has an appealing form: for any distribution over the examples, the weak classifier needs to achieve error not greater than that of a random player who guesses the correct answer with probability 1/2 + γ. [sent-158, score-0.748]
58 Further, this is the weakest condition under which boosting is possible as follows from a game-theoretic perspective [10, 14] . [sent-159, score-0.578]
59 In the multiclass setting, we model a random player as a baseline predictor B ∈ Rm×k whose rows are distributions over the labels, B(i) ∈ ∆ {1, . [sent-162, score-0.348]
60 The prediction on example i is a sample from eor B(i). [sent-166, score-0.418]
61 We only consider the space of edge-over-random baselines Bγ ⊆ Rm×k who have a faint eor clue about the correct answer. [sent-167, score-0.452]
62 eor When k = 2, the space Bγ consists of the unique player Ubin , and the binary weak-learning γ condition is given by (C bin , Ubin ). [sent-169, score-0.717]
63 In particular, define γ C eor to be the multiclass extension of C bin : any cost-matrix in C eor should put the least cost on the correct label, i. [sent-171, score-1.211]
64 eor Then, for every baseline B ∈ Bγ , we introduce the condition (C eor , B), which we call an edgeover-random weak-learning condition. [sent-174, score-1.037]
65 Since C • B is the expected cost of the edge-over-random baseline B on matrix C, the constraints (2) imposed by the new condition essentially require better than random performance. [sent-175, score-0.253]
66 The seemingly mild edge-over-random conditions guarantee eligibility, meaning weak classifiers that satisfy any one such condition can be combined to form a highly accurate combined classifier. [sent-177, score-0.766]
67 If a weak classifier space H satisfies a weak-learning condition (C eor , B), eor for some B ∈ Bγ , then H is boostable. [sent-179, score-1.334]
68 On the other hand the family of such conditions, taken as a whole, is necessary for boostability in the sense that every eligible space of weak classifiers satisfies some edge-over-random condition. [sent-181, score-0.548]
69 For every boostable weak classifier space H, there exists a γ > 0 eor and B ∈ Bγ such that H satisfies the weak-learning condition (C eor , B). [sent-183, score-1.467]
70 Theorem 2 states that any boostable weak classifier space will satisfy some condition in our family, but it does not help us choose the right condition. [sent-185, score-0.671]
71 Experiments in Section 5 suggest C eor , Uγ is effective with very eor simple weak-learners compared to popular boosting algorithms. [sent-186, score-1.232]
72 A perhaps extreme way of weakening the condition is by requiring the performance on a cost matrix eor to be competitive not with a fixed baseline B ∈ Bγ , but with the worst of them: ∀C ∈ C eor , ∃h ∈ H : C • 1h ≤ max C • B. [sent-189, score-1.115]
73 eor B∈Bγ (3) Condition (3) states that during the course of the same boosting game, Weak-Learner may choose eor to beat any edge-over-random baseline B ∈ Bγ , possibly a different one for every round and every cost-matrix. [sent-190, score-1.405]
74 In other words, according to our criterion, it is neither too weak nor too strong as a weak-learning condition. [sent-193, score-0.377]
75 γ Further, the MR condition, and hence (3), can be shown to be neither too weak nor too strong. [sent-198, score-0.377]
76 The SAMME algorithm of [21] requires the weak classifiers to achieve less error than uniform random guessing for multiple labels; in our language, their weaklearning condition is (C = {(−t, t, t, . [sent-205, score-0.627]
77 As is well-known, this condition is not sufficient for boosting to be possible. [sent-209, score-0.545]
78 In particular, consider the dataset {(a, 1), (b, 2)} with k = 3, m = 2, and a weak classifier space consisting of h1 , h2 which always predict 1, 2, respectively. [sent-210, score-0.349]
79 In particular, when the cost matrix is C = (c(1) = (−1, +1, 0), c(2) = (+1, −1, 0)) ∈ C eor , both classifiers in the above example suffer more loss than the random player Uγ , and fail to satisfy our condition. [sent-214, score-0.644]
80 MH is a popular multiclass boosting algorithm that is based on the one-against-all reduction[19]. [sent-218, score-0.634]
81 However, we show that its implicit demands on the weak classifier space is too strong. [sent-219, score-0.394]
82 We construct a classifier space that satisfies the condition (C eor , Uγ ) in our family, but cannot satisfy AdaBoost. [sent-220, score-0.607]
83 4 Algorithms In this section we devise algorithms by analyzing the boosting games that employ our edge-overrandom weak-learning conditions. [sent-233, score-0.475]
84 We compute the optimum Booster strategy against a completely adversarial Weak-Learner, which here is permitted to choose weak classifiers without restriction, i. [sent-234, score-0.394]
85 Our algorithms are derived from the very general drifting games framework [16] for solving boosting games, in turn inspired by Freund’s Boost-by-majority algorithm [6], which we review next. [sent-239, score-0.54]
86 Fix the number of rounds T and an edge-over-random weak-learning condition (C, B). [sent-241, score-0.256]
87 These potential functions compute an estimate φb (st ) of whether an example x will be misclassified, t based on its current state st consisting of counts of votes received so far on various classes st (l) = t−1 t =1 1 [ht (x) = l], and the number of rounds t remaining. [sent-260, score-0.244]
88 t−1 Theorem (5) implies the OS strategy chooses the following cost matrix in round t: c(i, l) = b(i) φT −t−1 (st (i) + el ), where st (i) is the state of example i in round t. [sent-279, score-0.405]
89 In particular, when the condition is (C eor , Uγ ) and k η = (η, η, . [sent-288, score-0.567]
90 So far we have required Weak-Learner to beat random by at least a fixed amount γ > 0 in each round of the boosting game. [sent-295, score-0.517]
91 If the fixed edge is too small, not enough progress is made in the initial rounds, and if the edge is too large, Weak-Learner fails to meet the weak-learning condition in latter rounds. [sent-298, score-0.277]
92 In either case, we only use the edge-over-random condition (C eor , Uγ ), but with varying values of γ. [sent-303, score-0.567]
93 , γT the weak-learning condition (C eor , Uγt ) in each round t is different. [sent-308, score-0.662]
94 40 connect4 0 100 300 500 0 100 300 500 0 100 300 500 (b) Figure 1: Figure 1(a) plots the final test-errors of M1(black, dashed), MH(blue, dotted) and New method(red, solid) against the maximum tree-sizes allowed as weak classifiers. [sent-364, score-0.349]
95 5 M1 New Method as weak learner, and the Boostexter implementation MH of AdaBoost. [sent-380, score-0.349]
96 Test errors after 500 rounds of boosting are plotted in Figure 2. [sent-383, score-0.503]
97 connect4 forest letter pendigits poker satimage New method after 500 rounds of boosting. [sent-386, score-0.339]
98 As predicted by our theory, our algorithm succeeds in boosting the accuracy even when the tree size is too small to meet the stronger weak learning assumptions of the other algorithms. [sent-389, score-0.82]
99 But since boosting keeps creating harder cost-matrices, very soon the small-tree learning algorithms are no longer able to meet the excessive requirements of M1 and MH. [sent-393, score-0.481]
100 However, our algorithm makes more reasonable demands that are easily met by the weak learner. [sent-394, score-0.394]
wordName wordTfidf (topN-words)
[('eor', 0.418), ('boosting', 0.396), ('weak', 0.349), ('booster', 0.266), ('multiclass', 0.238), ('ft', 0.155), ('condition', 0.149), ('boostable', 0.133), ('classi', 0.121), ('ers', 0.113), ('mh', 0.112), ('os', 0.109), ('rounds', 0.107), ('eft', 0.1), ('er', 0.099), ('round', 0.095), ('robert', 0.095), ('ubin', 0.095), ('yoav', 0.091), ('conditions', 0.087), ('schapire', 0.085), ('games', 0.079), ('ht', 0.078), ('freund', 0.071), ('mr', 0.068), ('el', 0.066), ('drifting', 0.065), ('guessing', 0.061), ('adaboost', 0.06), ('player', 0.058), ('bmh', 0.057), ('bmr', 0.057), ('boostability', 0.057), ('poker', 0.057), ('satimage', 0.057), ('stumps', 0.057), ('game', 0.056), ('cost', 0.052), ('ct', 0.052), ('st', 0.052), ('baseline', 0.052), ('family', 0.051), ('bin', 0.051), ('pendigits', 0.05), ('meet', 0.05), ('rk', 0.05), ('returned', 0.046), ('demands', 0.045), ('rm', 0.045), ('strategy', 0.045), ('supplement', 0.044), ('labels', 0.043), ('loss', 0.043), ('adaptive', 0.042), ('binary', 0.041), ('satisfy', 0.04), ('drive', 0.04), ('edge', 0.039), ('restrictions', 0.039), ('sl', 0.039), ('boostexter', 0.038), ('eligible', 0.038), ('samme', 0.038), ('weaklearner', 0.038), ('weaklearning', 0.038), ('yoram', 0.037), ('satis', 0.036), ('requirements', 0.035), ('letter', 0.035), ('correct', 0.034), ('fail', 0.033), ('weakest', 0.033), ('potential', 0.033), ('forest', 0.033), ('combined', 0.032), ('incorrect', 0.032), ('turns', 0.031), ('error', 0.03), ('guarantee', 0.03), ('necessary', 0.029), ('neither', 0.028), ('powerful', 0.027), ('manfred', 0.027), ('theorem', 0.027), ('yi', 0.027), ('misclassi', 0.026), ('beat', 0.026), ('reductions', 0.026), ('suffered', 0.026), ('requiring', 0.026), ('minimax', 0.025), ('aka', 0.025), ('bl', 0.025), ('payoff', 0.025), ('tree', 0.025), ('mild', 0.024), ('sense', 0.024), ('seemingly', 0.023), ('trevor', 0.023), ('hall', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
2 0.19663693 282 nips-2010-Variable margin losses for classifier design
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
3 0.12593226 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
4 0.1171642 42 nips-2010-Boosting Classifier Cascades
Author: Nuno Vasconcelos, Mohammad J. Saberian
Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1
5 0.11661258 182 nips-2010-New Adaptive Algorithms for Online Classification
Author: Francesco Orabona, Koby Crammer
Abstract: We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive- and fixed- second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1
6 0.11371101 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
7 0.11008801 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
8 0.1100354 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
9 0.089565597 226 nips-2010-Repeated Games against Budgeted Adversaries
10 0.087433547 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
11 0.087097079 192 nips-2010-Online Classification with Specificity Constraints
12 0.085422225 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
13 0.08074636 39 nips-2010-Bayesian Action-Graph Games
14 0.079983674 162 nips-2010-Link Discovery using Graph Feature Tracking
15 0.078545868 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
16 0.071813703 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
17 0.066921093 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
18 0.065136798 228 nips-2010-Reverse Multi-Label Learning
19 0.063049972 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
20 0.060300693 151 nips-2010-Learning from Candidate Labeling Sets
topicId topicWeight
[(0, 0.185), (1, 0.028), (2, 0.081), (3, -0.08), (4, 0.035), (5, 0.084), (6, -0.156), (7, -0.052), (8, -0.027), (9, 0.066), (10, -0.031), (11, 0.045), (12, 0.076), (13, 0.094), (14, 0.01), (15, 0.066), (16, 0.067), (17, 0.092), (18, -0.164), (19, 0.035), (20, 0.018), (21, 0.042), (22, -0.002), (23, -0.03), (24, -0.044), (25, 0.023), (26, 0.026), (27, 0.158), (28, -0.069), (29, 0.092), (30, 0.054), (31, 0.073), (32, -0.005), (33, -0.012), (34, 0.077), (35, 0.022), (36, 0.004), (37, 0.14), (38, 0.074), (39, -0.055), (40, 0.044), (41, 0.001), (42, 0.022), (43, 0.068), (44, -0.067), (45, 0.001), (46, 0.092), (47, 0.033), (48, -0.024), (49, -0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.93669045 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
2 0.64233619 282 nips-2010-Variable margin losses for classifier design
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
3 0.64218366 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
4 0.62604117 42 nips-2010-Boosting Classifier Cascades
Author: Nuno Vasconcelos, Mohammad J. Saberian
Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1
5 0.5501222 226 nips-2010-Repeated Games against Budgeted Adversaries
Author: Jacob D. Abernethy, Manfred K. Warmuth
Abstract: We study repeated zero-sum games against an adversary on a budget. Given that an adversary has some constraint on the sequence of actions that he plays, we consider what ought to be the player’s best mixed strategy with knowledge of this budget. We show that, for a general class of normal-form games, the minimax strategy is indeed efficiently computable and relies on a “random playout” technique. We give three diverse applications of this new algorithmic template: a cost-sensitive “Hedge” setting, a particular problem in Metrical Task Systems, and the design of combinatorial prediction markets. 1
6 0.51093632 228 nips-2010-Reverse Multi-Label Learning
7 0.50847876 192 nips-2010-Online Classification with Specificity Constraints
8 0.49387696 236 nips-2010-Semi-Supervised Learning with Adversarially Missing Label Information
9 0.48715389 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
10 0.48703119 182 nips-2010-New Adaptive Algorithms for Online Classification
11 0.48308724 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
12 0.48229069 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
13 0.48195198 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
14 0.47715241 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
15 0.46589115 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
16 0.43863073 290 nips-2010-t-logistic regression
17 0.42011216 2 nips-2010-A Bayesian Approach to Concept Drift
18 0.40909952 39 nips-2010-Bayesian Action-Graph Games
19 0.40381104 269 nips-2010-Throttling Poisson Processes
20 0.40278432 243 nips-2010-Smoothness, Low Noise and Fast Rates
topicId topicWeight
[(13, 0.041), (27, 0.047), (30, 0.057), (35, 0.012), (45, 0.135), (50, 0.037), (52, 0.043), (60, 0.022), (77, 0.455), (78, 0.017), (90, 0.052)]
simIndex simValue paperId paperTitle
1 0.95213169 34 nips-2010-Attractor Dynamics with Synaptic Depression
Author: K. Wong, He Wang, Si Wu, Chi Fung
Abstract: Neuronal connection weights exhibit short-term depression (STD). The present study investigates the impact of STD on the dynamics of a continuous attractor neural network (CANN) and its potential roles in neural information processing. We find that the network with STD can generate both static and traveling bumps, and STD enhances the performance of the network in tracking external inputs. In particular, we find that STD endows the network with slow-decaying plateau behaviors, namely, the network being initially stimulated to an active state will decay to silence very slowly in the time scale of STD rather than that of neural signaling. We argue that this provides a mechanism for neural systems to hold short-term memory easily and shut off persistent activities naturally.
2 0.9347102 16 nips-2010-A VLSI Implementation of the Adaptive Exponential Integrate-and-Fire Neuron Model
Author: Sebastian Millner, Andreas Grübl, Karlheinz Meier, Johannes Schemmel, Marc-olivier Schwartz
Abstract: We describe an accelerated hardware neuron being capable of emulating the adaptive exponential integrate-and-fire neuron model. Firing patterns of the membrane stimulated by a step current are analyzed in transistor level simulations and in silicon on a prototype chip. The neuron is destined to be the hardware neuron of a highly integrated wafer-scale system reaching out for new computational paradigms and opening new experimentation possibilities. As the neuron is dedicated as a universal device for neuroscientific experiments, the focus lays on parameterizability and reproduction of the analytical model. 1
3 0.87395281 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
Author: Hairong Liu, Longin J. Latecki, Shuicheng Yan
Abstract: In this paper, we regard clustering as ensembles of k-ary affinity relations and clusters correspond to subsets of objects with maximal average affinity relations. The average affinity relation of a cluster is relaxed and well approximated by a constrained homogenous function. We present an efficient procedure to solve this optimization problem, and show that the underlying clusters can be robustly revealed by using priors systematically constructed from the data. Our method can automatically select some points to form clusters, leaving other points un-grouped; thus it is inherently robust to large numbers of outliers, which has seriously limited the applicability of classical methods. Our method also provides a unified solution to clustering from k-ary affinity relations with k ≥ 2, that is, it applies to both graph-based and hypergraph-based clustering problems. Both theoretical analysis and experimental results show the superiority of our method over classical solutions to the clustering problem, especially when there exists a large number of outliers.
same-paper 4 0.86801922 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
5 0.79104787 142 nips-2010-Learning Bounds for Importance Weighting
Author: Corinna Cortes, Yishay Mansour, Mehryar Mohri
Abstract: This paper presents an analysis of importance weighting for learning from finite samples and gives a series of theoretical and algorithmic results. We point out simple cases where importance weighting can fail, which suggests the need for an analysis of the properties of this technique. We then give both upper and lower bounds for generalization with bounded importance weights and, more significantly, give learning guarantees for the more common case of unbounded importance weights under the weak assumption that the second moment is bounded, a condition related to the R´ nyi divergence of the training and test distributions. e These results are based on a series of novel and general bounds we derive for unbounded loss functions, which are of independent interest. We use these bounds to guide the definition of an alternative reweighting algorithm and report the results of experiments demonstrating its benefits. Finally, we analyze the properties of normalized importance weights which are also commonly used.
6 0.74125618 234 nips-2010-Segmentation as Maximum-Weight Independent Set
7 0.68149096 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
8 0.63469285 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
9 0.63369143 252 nips-2010-SpikeAnts, a spiking neuron network modelling the emergence of organization in a complex system
10 0.61291778 253 nips-2010-Spike timing-dependent plasticity as dynamic filter
11 0.60183936 244 nips-2010-Sodium entry efficiency during action potentials: A novel single-parameter family of Hodgkin-Huxley models
12 0.59948331 8 nips-2010-A Log-Domain Implementation of the Diffusion Network in Very Large Scale Integration
13 0.59801227 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics
14 0.59587944 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
15 0.57803863 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
16 0.5532729 117 nips-2010-Identifying graph-structured activation patterns in networks
17 0.52780122 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
18 0.52775049 115 nips-2010-Identifying Dendritic Processing
19 0.52272218 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
20 0.5178256 96 nips-2010-Fractionally Predictive Spiking Neurons