nips nips2007 nips2007-38 knowledge-graph by maker-knowledge-mining

38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin


Source: pdf

Author: Gunnar Rätsch, Manfred K. Warmuth, Karen A. Glocer

Abstract: We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum. We employ relative enN tropy projection methods to prove an O( ln 2 ) iteration bound for our algorithm, δ where N is number of examples. We compare our algorithm with other approaches including LPBoost, BrownBoost, and SmoothBoost. We show that there exist cases where the number of iterations required by LPBoost grows linearly in N instead of the logarithmic growth for SoftBoost. In simulation studies we show that our algorithm converges about as fast as LPBoost, faster than BrownBoost, and much faster than SmoothBoost. In a benchmark comparison we illustrate the competitiveness of our approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Gunnar R¨ tsch a Friedrich Miescher Laboratory Max Planck Society T¨ bingen, Germany u Abstract We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeled examples that are not necessarily separable by convex combinations of base hypotheses. [sent-10, score-0.391]

2 Our algorithm achieves robustness by capping the distributions on the examples. [sent-11, score-0.216]

3 Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. [sent-12, score-0.443]

4 The capping constraints imply a soft margin in the dual optimization problem. [sent-13, score-0.676]

5 Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum. [sent-14, score-0.569]

6 We employ relative enN tropy projection methods to prove an O( ln 2 ) iteration bound for our algorithm, δ where N is number of examples. [sent-15, score-0.217]

7 For AdaBoost [7] it was frequently observed that the generalization error of the combined hypotheses kept decreasing after the training error had already reached zero [19]. [sent-21, score-0.174]

8 It became apparent that some of the power of ensemble methods lies in the fact that they tend to increase the margin of the training examples. [sent-23, score-0.274]

9 On such tasks, better generalizaton can be achieved by not enforcing a large margin on all training points. [sent-25, score-0.23]

10 While this worst-case bound can only capture part of what is going on in practice, it nevertheless suggests that in some cases it pays to allow some points to have small margin or be misclassified if this leads to a larger overall margin on the remaining points. [sent-27, score-0.496]

11 To cope with this problem, it was necessary to construct variants of AdaBoost which trade off the fraction of examples with margin at least ρ with the size of the margin ρ. [sent-28, score-0.514]

12 This idea is implemented in many algorithms including AdaBoost with soft margins [15], MadaBoost [5], ν-Arc [16, 14], SmoothBoost [21], LPBoost [4], and several others (see references in [13]). [sent-30, score-0.25]

13 1 In parallel, there has been a significant interest in how the linear combination of hypotheses generated by AdaBoost is related to the maximum margin solution [1, 19, 4, 18, 17]. [sent-33, score-0.353]

14 It was shown that AdaBoost generates a combined hypothesis with a large margin, but not necessarily the maximum hard margin [15, 18]. [sent-34, score-0.389]

15 This observation motivated the development of many Boosting algorithms that aim to maximize the margin [1, 8, 4, 17, 22, 18]. [sent-35, score-0.276]

16 AdaBoost∗ [17] and TotalBoost [22] provable converge to the maximum hard margin within precision δ in 2 ln(N/δ 2 ) iterations. [sent-36, score-0.296]

17 In this work we combine these two lines of research into a single algorithm, called SoftBoost, that for the first time implements the soft margin idea in a practical boosting algorithm. [sent-39, score-0.544]

18 SoftBoost finds in O(ln(N )/δ 2 ) iterations a linear combination of base hypotheses whose soft margin is at least the optimum soft margin minus δ. [sent-40, score-1.13]

19 BrownBoost [6] does not always optimize the soft margin. [sent-41, score-0.188]

20 SmoothBoost and MadaBoost can be related to maximizing the soft margin, but while they have known iterations bounds in terms of other criteria, it is unknown how quickly they converge to the maximum soft margin. [sent-42, score-0.474]

21 From a theoretical point of view the optimization problems underlying SoftBoost as well as LPBoost are appealing, since they directly maximize the margin of a (typically large) subset of the training data [16]. [sent-43, score-0.275]

22 Our new algorithm is most similar to LPBoost because its goal is also to optimize the soft margin. [sent-45, score-0.188]

23 In Section 3 we discuss LPBoost and give a separable setting where N/2 iterations are needed by LPBoost to achieve a hard margin within precision . [sent-50, score-0.391]

24 2 Preliminaries In the boosting setting, we are given a set of N labeled training examples (xn , yn ), n = 1 . [sent-54, score-0.263]

25 base learning algorithm), which returns a new base hypothesis h : X → [−1, 1]N with a certain guarantee of performance. [sent-65, score-0.338]

26 One measure of the performance of a base hypothesis h with respect to distribution d is its edge, N γh = n=1 dn yn h(xn ). [sent-67, score-0.322]

27 When the range of h is ±1 instead of the interval [-1,1], then the edge is 1 1 just an affine transformation of the weighted error ǫh of hypothesis h: i. [sent-68, score-0.218]

28 A hypothesis that predicts perfectly has edge γ = 1, a hypothesis that always predicts incorrectly has edge γ = −1, and a random hypothesis has edge γ ≈ 0. [sent-71, score-0.623]

29 The edge of a set of hypotheses is defined as the maximum edge of the set. [sent-73, score-0.262]

30 Boosting algorithms (for the separable case) commonly update their distribution by placing a constraint on the edge of most recent hypothesis. [sent-75, score-0.174]

31 In totally corrective updates, one constrains the distribution to have small edge with respect to all of the previous hypotheses [11, 22]. [sent-77, score-0.309]

32 The final output of the boosting algorithm is always T a convex combination of base hypotheses fw (xn ) = t=1 wt ht (xn ), where ht is the hypothesis added at iteration t and wt is its coefficient. [sent-79, score-0.778]

33 The margin of a labeled example (xn , yn ) is defined as 2 ρn = yn fw (xn ). [sent-80, score-0.418]

34 The (hard) margin of a set of examples is taken to be the minimum margin of the set. [sent-81, score-0.514]

35 It is convenient to define an N -dimensional vector um that combines the base hypothesis hm with the labels yn of the N examples: um := yn hm (xn ). [sent-82, score-0.612]

36 With this notation, the edge of the t-th n hypothesis becomes d · ut and the margin of the n-th example w. [sent-83, score-0.512]

37 a convex combination w of the t−1 first t − 1 hypotheses is m=1 um wm . [sent-86, score-0.323]

38 , ht }, the following linear programming problem (1) optimizes the minimum soft margin. [sent-90, score-0.289]

39 The term “soft” here refers to a relaxation of the margin constraint. [sent-91, score-0.23]

40 We now allow examples to lie below the margin but penalize them linearly via slack variables ψn . [sent-92, score-0.284]

41 um wm ≥ ρ − ψn , for 1 ≤ n ≤ N, n 1 m=1 d ∈ P N , d ≤ 1. [sent-100, score-0.172]

42 Note that the relationship between capping and the hinge loss has t long been exploited by the SVM community [3, 20] and has also been used before for Boosting in [16, 14]. [sent-103, score-0.216]

43 In particular, it is known that ρ in (1) is chosen such that N − ν examples have margin at least ρ. [sent-104, score-0.284]

44 The case ν = 1 is degenerate: there are no capping constraints in (2) and this is equivalent to the hard margin case. [sent-106, score-0.506]

45 1 1 Assumption on the weak learner We assume that for any distribution d ≤ ν 1 on the examples, the oracle returns a hypothesis h with edge at least g, for some fixed g. [sent-107, score-0.32]

46 For binary valued features, this is equivalent to the assumption that the base learner always returns a hypothesis with error at most 1 − 1 g. [sent-109, score-0.257]

47 the entire ∗ hypothesis set from which the oracle can choose. [sent-114, score-0.192]

48 Also, the guarantee g of the oracle can be at most γ ∗ (ν) because for the optimal distribution d∗ that realizes γ ∗ (ν), all hypotheses have edge at most γ ∗ (ν). [sent-116, score-0.301]

49 For computational reasons, g might however be lower than γ ∗ (ν) and in that case the optimum soft margin we can achieve is g. [sent-117, score-0.445]

50 3 LPBoost In iteration t, the LPBoost algorithm [4] sends its current distribution dt−1 to the oracle and receives a hypothesis ht that satisfies dt−1 · ut ≥ g. [sent-118, score-0.465]

51 It then updates its distribution to dt by solving the linear programming problem (1) based on the t hypotheses received so far. [sent-119, score-0.336]

52 The goal of the boosting algorithms is to produce a convex combination of T hypotheses such that γT (ν) ≥ g − δ. [sent-120, score-0.295]

53 To our knowledge, there is no known iteration bound for LPBoost even though it provably converges to the δ-optimal solution of the optimization problem after it has seen all hypotheses [4, 10]. [sent-123, score-0.275]

54 For the first time, we are able to establish a lower bound showing that, independent of the optimizer, LPBoost can require Ω(N ) iterations: Theorem 1 There exists a case where LPBoost requires N/2 iterations to achieve a hard margin that is within δ = . [sent-127, score-0.363]

55 , (xN , yN ) , accuracy δ, capping parameter ν ∈ [1, N ]. [sent-139, score-0.216]

56 (a) Send dt−1 to oracle and obtain hypothesis ht . [sent-146, score-0.269]

57 Set ut = ht (xn )yn and γt = min{γt−1 , dt−1 · ut }. [sent-147, score-0.237]

58 n (Assume dt−1 · ut ≥ g, where edge guarantee g is unknown. [sent-148, score-0.192]

59 ) (b) Update the distribution to any dt that solves the LP problem ∗ (dt , γt ) = argmin γ s. [sent-149, score-0.206]

60 Output: fw (x) = m=1 wm hm (x), where the coefficients wm maximize the soft margin over the hypothesis set {h1 , . [sent-154, score-0.783]

61 We assume that in each iteration the oracle will return the remaining hypothesis with maximum edge. [sent-165, score-0.293]

62 At the end of iteration t (1 ≤ t ≤ N/2), the distribution dt will focus all its weight on example N/2 + t, and the optimum mixture of the columns will put all of its weight on the tth hypothesis that was just received. [sent-168, score-0.499]

63 Although the example set used in the above proof is linearly separable, we can modify it explicitly to argue that capping the distribution on examples will not help in the sense that “soft” LPBoost with ν > 1 can still have linear iteration bounds. [sent-177, score-0.401]

64 However, it is not sufficient to improve the iteration bound of LPBoost from linear growth in N to logarithmic. [sent-182, score-0.172]

65 Another attempt might be to modify LPBoost so that at each iteration a base hypothesis is chosen that increases the value of the optimization problem the most. [sent-183, score-0.339]

66 , (xN , yN ) , desired accuracy δ, and capping parameter ν ∈ [1, N ]. [sent-191, score-0.216]

67 (a) Send dt−1 to the oracle and obtain hypothesis ht . [sent-198, score-0.269]

68 Set ut = ht (xn )yn and γt = min{γt−1 , dt−1 · ut }. [sent-199, score-0.237]

69 n (Assume dt−1 · ut ≥ g, where edge guarantee g is unknown. [sent-200, score-0.192]

70 ν (c) If above infeasible or dt contains a zero then T = t and break. [sent-204, score-0.191]

71 Output: fw (x) = m=1 wm hm (x), where the coefficients wm maximize the soft margin over the hypothesis set {h1 , . [sent-206, score-0.783]

72 b 4 SoftBoost In this section, we present the SoftBoost algorithm, which adds capping to the TotalBoost algorithm of [22]. [sent-211, score-0.216]

73 , (xN , yN ) , an accuracy parameter δ, and a capping parameter ν. [sent-215, score-0.216]

74 In each iteration t, the algorithm prompts the oracle for a new base hypothesis, incorporates it into the constraint set, and updates its distribution dt−1 to dt by minimizing the relative entropy ∆(d, d0 ) := n dn ln dn subject to linear constraints: d0 n t+1 d s. [sent-218, score-0.632]

75 0 = argmind ∆(d, d ) d · um ≤ γt − δ, for 1 ≤ m ≤ t (where γt = min1≤m≤t dm−1 · um ), 1 n dn = 1, d ≤ ν 1. [sent-220, score-0.242]

76 If we remove the relative entropy and minimize the upper bound on the edges, then we arrive at the optimization problem of LPBoost, and logarithmic growth in the number of examples is no longer possible. [sent-223, score-0.238]

77 The relative entropy in the objective assures that the probabilities of the examples are always proportional to their exponentiated negative soft margins (not shown). [sent-224, score-0.399]

78 That is, more weight is put on the examples with low soft margin, which are the examples that are hard to classify. [sent-225, score-0.347]

79 1 Iteration bounds for SoftBoost Our iteration bound for SoftBoost is very similar to the bound proven for TotalBoost [22], differing only in the additional details related to capping. [sent-227, score-0.194]

80 Also if dt contains a zero, then since the objective function ∆(d, d0 ) is strictly convex in d and minimized at the interior point d0 , there is no optimal solution in the interior ∗ of the simplex. [sent-231, score-0.336]

81 Because adding a new hypothesis in iteration t results in an additional constraint and γt ≤ γt−1 , 5 we have Ct ⊆ Ct−1 . [sent-236, score-0.225]

82 If t ≤ T − 1, then our termination condition assures that at iteration t − 1 the set Ct−1 has a feasible solution in the interior of the simplex. [sent-237, score-0.179]

83 Also, d0 lies in the interior and dt ∈ Ct ⊆ Ct−1 . [sent-238, score-0.259]

84 Also d · u ≥ γt by the definition of γt , and the constraints on the optimization problem assure that dt · ut ≤ γt − δ and thus dt−1 · ut − dt · ut ≥ 2 γt −(γt −δ) = δ. [sent-241, score-0.693]

85 When ν = 1, then capping is vacuous and the algorithm and its iteration bound coincides with the bound for TotalBoost. [sent-245, score-0.389]

86 The rows of this matrix are our examples and the columns and their negation are the base hypotheses, giving us a total of 200 of them. [sent-251, score-0.18]

87 In every boosting iteration we chose the base hypothesis which has the largest edge with respect to the current distribution on the examples. [sent-256, score-0.547]

88 If ν is large enough, most incorrectly labeled examples are likely to be identified as margin errors (ψi > 0) and the performance should stabilize. [sent-263, score-0.319]

89 3 For every iteration we record all margins and compute the soft margin objective (1) for optimally chosen ρ and ψ’s. [sent-268, score-0.582]

90 SmoothBoost takes dramatically longer to converge to the maximum soft margin than the other other three algorithms. [sent-270, score-0.433]

91 In our experiments it nearly converges to the maximum soft margin objective, even though no theoretical evidence is known for this observed convergence. [sent-271, score-0.433]

92 BrownBoost terminates in fewer iterations than the other algorithms but does not maximize the soft margin. [sent-273, score-0.312]

93 4 SmoothBoost has two parameters: a guarantee g on the edge of the base learner and the target margin g/2 θ. [sent-277, score-0.443]

94 16 soft margin objective classification error 0. [sent-285, score-0.453]

95 However, the soft margin algorithms outperform AdaBoost on most data sets. [sent-313, score-0.436]

96 Second, the iteration bound essentially says that column generation methods (of which LPBoost is a canonical example) should not solve the current subproblem at iteration t optimally. [sent-432, score-0.238]

97 The iteration bound for our algorithm is a straightforward extension of a bound given in [22] that is based on Bregman projection methods. [sent-445, score-0.19]

98 Although the proofs seem trivial in hindsight, simple logarithmic iteration bounds for boosting algorithms that maximize the soft margin have eluded many researchers (including the authors) for a long time. [sent-448, score-0.743]

99 For example in [12], a number of iteration bounds for boosting algorithms are proven with both methods. [sent-450, score-0.266]

100 Boosting in the limit: Maximizing the margin of learned ensembles. [sent-500, score-0.23]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lpboost', 0.604), ('softboost', 0.446), ('margin', 0.23), ('capping', 0.216), ('dt', 0.191), ('soft', 0.188), ('brownboost', 0.173), ('adaboost', 0.14), ('boosting', 0.126), ('hypothesis', 0.124), ('hypotheses', 0.106), ('um', 0.103), ('iteration', 0.101), ('base', 0.082), ('ut', 0.08), ('edge', 0.078), ('ht', 0.077), ('corrective', 0.072), ('smoothboost', 0.072), ('wm', 0.069), ('ct', 0.068), ('oracle', 0.068), ('yn', 0.065), ('iterations', 0.062), ('totalboost', 0.058), ('examples', 0.054), ('interior', 0.049), ('lp', 0.048), ('separable', 0.048), ('margins', 0.044), ('xn', 0.043), ('fw', 0.04), ('entropy', 0.04), ('totally', 0.038), ('bregman', 0.038), ('ln', 0.038), ('madaboost', 0.038), ('bound', 0.036), ('generalization', 0.036), ('dn', 0.036), ('hm', 0.035), ('growth', 0.035), ('tsch', 0.035), ('hard', 0.035), ('guarantee', 0.034), ('counterexample', 0.034), ('cruz', 0.034), ('simplex', 0.034), ('logarithmic', 0.031), ('santa', 0.03), ('adaboostreg', 0.029), ('assure', 0.029), ('assures', 0.029), ('inseparable', 0.029), ('onoda', 0.029), ('maximize', 0.028), ('convex', 0.028), ('benchmark', 0.027), ('optimum', 0.027), ('december', 0.025), ('columns', 0.025), ('constraints', 0.025), ('capped', 0.025), ('ensemble', 0.025), ('relative', 0.025), ('smola', 0.024), ('programming', 0.024), ('rbf', 0.022), ('bounds', 0.021), ('chose', 0.021), ('warmuth', 0.02), ('manfred', 0.02), ('send', 0.02), ('california', 0.02), ('lies', 0.019), ('rows', 0.019), ('synthetic', 0.019), ('learner', 0.019), ('objective', 0.019), ('optimizer', 0.018), ('labeled', 0.018), ('algorithms', 0.018), ('combination', 0.017), ('incorrectly', 0.017), ('projection', 0.017), ('optimization', 0.017), ('terminates', 0.016), ('returns', 0.016), ('put', 0.016), ('performances', 0.016), ('break', 0.016), ('error', 0.016), ('precision', 0.016), ('update', 0.015), ('solver', 0.015), ('converges', 0.015), ('converge', 0.015), ('distribution', 0.015), ('lkopf', 0.015), ('modify', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

Author: Gunnar Rätsch, Manfred K. Warmuth, Karen A. Glocer

Abstract: We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum. We employ relative enN tropy projection methods to prove an O( ln 2 ) iteration bound for our algorithm, δ where N is number of examples. We compare our algorithm with other approaches including LPBoost, BrownBoost, and SmoothBoost. We show that there exist cases where the number of iterations required by LPBoost grows linearly in N instead of the logarithmic growth for SoftBoost. In simulation studies we show that our algorithm converges about as fast as LPBoost, faster than BrownBoost, and much faster than SmoothBoost. In a benchmark comparison we illustrate the competitiveness of our approach.

2 0.18033756 147 nips-2007-One-Pass Boosting

Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio

Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

3 0.12669556 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

4 0.10997864 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

Author: Joseph K. Bradley, Robert E. Schapire

Abstract: We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. 1

5 0.1025399 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

Author: Krishnan Kumar, Chiru Bhattacharya, Ramesh Hariharan

Abstract: This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separable classification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosen O(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated from randomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver. 1

6 0.084324785 110 nips-2007-Learning Bounds for Domain Adaptation

7 0.083962724 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

8 0.070529453 39 nips-2007-Boosting the Area under the ROC Curve

9 0.070103042 61 nips-2007-Convex Clustering with Exemplar-Based Models

10 0.066422939 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

11 0.057649471 86 nips-2007-Exponential Family Predictive Representations of State

12 0.055775747 62 nips-2007-Convex Learning with Invariances

13 0.052499279 21 nips-2007-Adaptive Online Gradient Descent

14 0.047671981 3 nips-2007-A Bayesian Model of Conditioned Perception

15 0.046737187 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

16 0.04625101 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

17 0.044728696 194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

18 0.044251252 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

19 0.04355216 187 nips-2007-Structured Learning with Approximate Inference

20 0.043116406 15 nips-2007-A general agnostic active learning algorithm


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.152), (1, -0.035), (2, -0.074), (3, 0.129), (4, 0.075), (5, 0.027), (6, 0.124), (7, -0.053), (8, 0.032), (9, -0.087), (10, 0.092), (11, 0.039), (12, -0.104), (13, -0.047), (14, 0.036), (15, -0.014), (16, 0.105), (17, 0.071), (18, -0.054), (19, -0.037), (20, -0.093), (21, 0.088), (22, -0.194), (23, 0.065), (24, -0.06), (25, 0.006), (26, -0.117), (27, -0.014), (28, 0.018), (29, -0.125), (30, 0.025), (31, -0.033), (32, -0.131), (33, 0.03), (34, 0.005), (35, 0.053), (36, -0.018), (37, -0.027), (38, -0.071), (39, 0.029), (40, 0.011), (41, -0.053), (42, -0.086), (43, -0.003), (44, -0.045), (45, 0.09), (46, 0.055), (47, 0.021), (48, 0.026), (49, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95015758 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

Author: Gunnar Rätsch, Manfred K. Warmuth, Karen A. Glocer

Abstract: We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum. We employ relative enN tropy projection methods to prove an O( ln 2 ) iteration bound for our algorithm, δ where N is number of examples. We compare our algorithm with other approaches including LPBoost, BrownBoost, and SmoothBoost. We show that there exist cases where the number of iterations required by LPBoost grows linearly in N instead of the logarithmic growth for SoftBoost. In simulation studies we show that our algorithm converges about as fast as LPBoost, faster than BrownBoost, and much faster than SmoothBoost. In a benchmark comparison we illustrate the competitiveness of our approach.

2 0.79956466 90 nips-2007-FilterBoost: Regression and Classification on Large Datasets

Author: Joseph K. Bradley, Robert E. Schapire

Abstract: We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. 1

3 0.76892132 147 nips-2007-One-Pass Boosting

Author: Zafer Barutcuoglu, Phil Long, Rocco Servedio

Abstract: This paper studies boosting algorithms that make a single pass over a set of base classifiers. We first analyze a one-pass algorithm in the setting of boosting with diverse base classifiers. Our guarantee is the same as the best proved for any boosting algorithm, but our one-pass algorithm is much faster than previous approaches. We next exhibit a random source of examples for which a “picky” variant of AdaBoost that skips poor base classifiers can outperform the standard AdaBoost algorithm, which uses every base classifier, by an exponential factor. Experiments with Reuters and synthetic data show that one-pass boosting can substantially improve on the accuracy of Naive Bayes, and that picky boosting can sometimes lead to a further improvement in accuracy.

4 0.59305018 39 nips-2007-Boosting the Area under the ROC Curve

Author: Phil Long, Rocco Servedio

Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.

5 0.53445441 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

6 0.45373166 110 nips-2007-Learning Bounds for Domain Adaptation

7 0.42864782 159 nips-2007-Progressive mixture rules are deviation suboptimal

8 0.42274427 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

9 0.4125607 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

10 0.3877472 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers

11 0.37492827 184 nips-2007-Stability Bounds for Non-i.i.d. Processes

12 0.36642826 40 nips-2007-Bundle Methods for Machine Learning

13 0.3395083 112 nips-2007-Learning Monotonic Transformations for Classification

14 0.33322132 15 nips-2007-A general agnostic active learning algorithm

15 0.33264217 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

16 0.32464516 62 nips-2007-Convex Learning with Invariances

17 0.32146916 207 nips-2007-Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

18 0.31334269 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors

19 0.30334404 86 nips-2007-Exponential Family Predictive Representations of State

20 0.29024392 194 nips-2007-The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.036), (8, 0.01), (13, 0.067), (16, 0.026), (18, 0.021), (19, 0.015), (21, 0.043), (31, 0.025), (34, 0.023), (35, 0.042), (47, 0.07), (69, 0.212), (83, 0.189), (85, 0.013), (88, 0.037), (90, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82362586 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

Author: Gunnar Rätsch, Manfred K. Warmuth, Karen A. Glocer

Abstract: We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum. We employ relative enN tropy projection methods to prove an O( ln 2 ) iteration bound for our algorithm, δ where N is number of examples. We compare our algorithm with other approaches including LPBoost, BrownBoost, and SmoothBoost. We show that there exist cases where the number of iterations required by LPBoost grows linearly in N instead of the logarithmic growth for SoftBoost. In simulation studies we show that our algorithm converges about as fast as LPBoost, faster than BrownBoost, and much faster than SmoothBoost. In a benchmark comparison we illustrate the competitiveness of our approach.

2 0.71833509 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola

Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1

3 0.71766895 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

Author: Ronny Luss, Alexandre D'aspremont

Abstract: In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.

4 0.71459007 63 nips-2007-Convex Relaxations of Latent Variable Training

Author: Yuhong Guo, Dale Schuurmans

Abstract: We investigate a new, convex relaxation of an expectation-maximization (EM) variant that approximates a standard objective while eliminating local minima. First, a cautionary result is presented, showing that any convex relaxation of EM over hidden variables must give trivial results if any dependence on the missing values is retained. Although this appears to be a strong negative outcome, we then demonstrate how the problem can be bypassed by using equivalence relations instead of value assignments over hidden variables. In particular, we develop new algorithms for estimating exponential conditional models that only require equivalence relation information over the variable values. This reformulation leads to an exact expression for EM variants in a wide range of problems. We then develop a semidefinite relaxation that yields global training by eliminating local minima. 1

5 0.71152544 215 nips-2007-What makes some POMDP problems easy to approximate?

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

6 0.71133637 166 nips-2007-Regularized Boost for Semi-Supervised Learning

7 0.7084111 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

8 0.70803338 40 nips-2007-Bundle Methods for Machine Learning

9 0.70795292 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

10 0.70515543 102 nips-2007-Incremental Natural Actor-Critic Algorithms

11 0.70418644 116 nips-2007-Learning the structure of manifolds using random projections

12 0.70351768 187 nips-2007-Structured Learning with Approximate Inference

13 0.70305151 84 nips-2007-Expectation Maximization and Posterior Constraints

14 0.70279241 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

15 0.70214999 21 nips-2007-Adaptive Online Gradient Descent

16 0.70150942 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

17 0.70056278 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

18 0.70033175 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

19 0.70008534 134 nips-2007-Multi-Task Learning via Conic Programming

20 0.70006537 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching