nips nips2010 nips2010-282 knowledge-graph by maker-knowledge-mining

282 nips-2010-Variable margin losses for classifier design

Source: pdf

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

Abstract: The problem of controlling the margin of a classiﬁer is studied. A detailed analytical study is presented on how properties of the classiﬁcation risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classiﬁcation margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the ﬁxed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Variable margin losses for classiﬁer design Nuno Vasconcelos Statistical Visual Computing Laboratory, University of California, San Diego La Jolla, CA 92039 nuno@ucsd. [sent-1, score-0.482]

2 A detailed analytical study is presented on how properties of the classiﬁcation risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. [sent-4, score-0.875]

3 It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. [sent-5, score-0.749]

4 These enable a precise characterization of the loss for a popular class of link functions. [sent-6, score-0.495]

5 It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. [sent-7, score-1.598]

6 Novel families of Bayes consistent loss functions, of variable margin, are derived. [sent-8, score-0.316]

7 These families are then used to design boosting style algorithms with explicit control of the classiﬁcation margin. [sent-9, score-0.498]

8 Experimental results show that the proposed variable margin losses outperform the ﬁxed margin counterparts used by existing algorithms. [sent-11, score-0.686]

9 Examples of such losses include the hinge loss, used in SVM design, the exponential loss, used by boosting algorithms such as AdaBoost, or the logistic loss, used in both classical logistic regression and more recent methods, such as LogitBoost. [sent-16, score-0.84]

10 The main idea is that there are three quantities that matter in risk minimization: the loss function φ, a corresponding optimal link ∗ function fφ , which maps posterior class probabilities to classiﬁer predictions, and a minimum risk ∗ Cφ , associated with the optimal link. [sent-22, score-1.105]

11 While the standard approach to classiﬁer design is to deﬁne a loss φ, and then optimize it to obtain ∗ ∗ ∗ ∗ fφ and Cφ , [3] showed that there is an alternative: to specify fφ and Cφ , and analytically derive the loss φ. [sent-23, score-0.527]

12 This 1 loss is then used to design a robust boosting algorithm, denoted SavageBoost. [sent-26, score-0.636]

13 SavageBoost has been, more recently, shown to outperform most other boosting algorithms in computer vision problems, where outliers are prevalent [5]. [sent-27, score-0.375]

14 It turns out that many pairs (Cφ ,fφ ) are compatible with any Bayes consistent loss φ. [sent-29, score-0.338]

15 In practice, the design has to resort to trial and error, by 1) testing combinations of the latter and, 2) verifying whether the loss has the desired properties. [sent-32, score-0.334]

16 In this work, we consider one such problem: how to control the size of the margin enforced by the ∗ ∗ loss. [sent-36, score-0.279]

17 We start by showing that, while many pairs (Cφ ,fφ ) are compatible with a given φ, one of these ∗ pairs establishes a very tight connection between the optimal link and the minimum risk: that fφ is ∗ the derivative of Cφ . [sent-37, score-0.502]

18 We refer to the risk function associated with such a pair as a canonical risk, ∗ ∗ and show that it leads to an equally tight connection between the pair (Cφ ,fφ ) and the loss φ. [sent-38, score-1.143]

19 For a canonical risk, all three functions can be obtained from each other with one-to-one mappings of ∗ ∗ trivial analytical tractability. [sent-39, score-0.688]

20 This implies that, for a canonical risk, the choice of a particular link in the inverse-sigmoidal family only impacts the behavior of φ around the origin, i. [sent-45, score-0.856]

21 This is exploited to design parametric families of loss functions that allow explicit control of the classiﬁcation margin. [sent-52, score-0.445]

22 These losses are applied to the design of novel boosting algorithms of tunable margin. [sent-53, score-0.59]

23 Finally, it is shown that the requirements of 1) a canonical risk, and 2) an inverse-sigmoidal link are not unduly restrictive for classiﬁer design. [sent-54, score-0.851]

24 Given a non-negative loss function L(x, y), the classiﬁer is optimal if it minimizes the risk R(f ) = EX,Y [L(h(x), y)]. [sent-64, score-0.51]

25 This is equivalent to minimizing the conditional risk EY |X [L(h(x), y)|X = x] for all x ∈ X . [sent-65, score-0.286]

26 The associated conditional risk is C0/1 (η, f ) = η 1 − sign(f ) 1 + sign(f ) + (1 − η) = 2 2 The risk is minimized if   f (x) > 0 f (x) = 0  f (x) < 0 if η(x) > if η(x) = if η(x) < 2 1 2 1 2 1 2 1 − η, if f ≥ 0; η, if f < 0. [sent-68, score-0.595]

27 (2) (3) ∗ ∗ ∗ Table 1: Loss φ, optimal link fφ (η), optimal inverse link [fφ ]−1 (v) , and minimum conditional risk Cφ (η) for popular learning algorithms. [sent-69, score-0.893]

28 The associated optimal ∗ ∗ classiﬁer h = sign[f ] is the well known Bayes decision rule (BDR), and the associated minimum conditional (zero-one) risk is ∗ C0/1 (η) = η 1 1 − sign(2η − 1) + (1 − η) 2 2 1 1 + sign(2η − 1) . [sent-71, score-0.435]

29 These include the exponential loss of boosting, the log loss of logistic regression, and the hinge loss of SVMs. [sent-74, score-0.823]

30 The associated conditional risk Cφ (η, f ) = ηφ(f ) + (1 − η)φ(−f ). [sent-78, score-0.312]

31 (5) ∗ fφ (η) = arg min Cφ (η, f ) (6) is minimized by the link f ∗ ∗ leading to the minimum conditional risk function Cφ (η) = Cφ (η, fφ ). [sent-79, score-0.59]

32 Table 1 lists the loss, optimal link, and minimum risk of some of the most popular classiﬁer design methods. [sent-80, score-0.423]

33 Conditional risk minimization is closely related to classical probability elicitation in statistics [4]. [sent-81, score-0.364]

34 Rather than specifying a loss φ and minimizing Cφ (η, f ), so as to obtain whatever ∗ ∗ ∗ ∗ optimal link fφ and minimum expected risk Cφ (η) results, it is possible to specify fφ and Cφ (η) ∗ and derive, from (14) with J(η) = −Cφ (η), the underlying loss φ. [sent-99, score-1.027]

35 The main advantage is the ability to control directly the quantities that matter for classiﬁcation, namely the predictor and risk of the ∗ ∗ ∗ optimal classiﬁer. [sent-100, score-0.327]

36 3 Canonical risk minimization ∗ ∗ In general, given J(η) = −Cφ (η), there are multiple pairs (φ, fφ ) that satisfy (14). [sent-102, score-0.276]

37 Hence, speciﬁcation of either the minimum risk or optimal link does not completely characterize the loss. [sent-103, score-0.569]

38 Whenever this holds, the risk is said to be in canonical form, and (f ∗ , J) are denoted a canonical pair [6] . [sent-112, score-1.541]

39 If the optimal link ∗ associated with Cφ (η) is ∗ fφ (η) = J ′ (η) (16) ∗ the risk Cφ (η, f ) is said to be in canonical form. [sent-115, score-1.178]

40 fφ (η) is denoted a canonical link and φ(v), the loss given by (14), a canonical loss. [sent-116, score-1.689]

41 For example, the risk of boosting is derived from the convex, differentiable, and symmetric J(η) = −2 η(1 − η). [sent-118, score-0.607]

42 Since this has derivative J ′ (η) = 2η − 1 η(1 − η) = 1 η ∗ log = fφ (η), 2 1−η (17) the risk is not in canonical form. [sent-119, score-0.939]

43 What follows from (16) is that it is possible to derive a canonical risk for any maximal reward J(η), including that of boosting (J(η) = −2 η(1 − η)). [sent-120, score-1.242]

44 ∗ While canonical risks can be easily designed by specifying either J(η) or fφ (η), and then using (14) and (16), it is much less clear how to directly specify a loss φ(v) for which (14) holds with a canonical pair (f ∗ , J). [sent-122, score-1.535]

45 Let Cφ (η, f ) be the canonical risk derived from a convex and symmetric J(η). [sent-125, score-0.925]

46 2 margin parameter v Figure 1: Left: canonical losses compatible with an IS optimal link. [sent-139, score-1.133]

47 Right: Average classiﬁcation rank as a function of margin parameter, on the UCI data. [sent-140, score-0.277]

48 First, it establishes an easy-to-verify necessary 1 ∗ condition for the canonical form. [sent-142, score-0.648]

49 For example, logistic regression has [fφ ]−1 (v) = 1+e−v and e 1 ∗ ∗ φ′ (v) = − 1+e−v = [fφ ]−1 (v) − 1, while boosting has [fφ ]−1 (v) = 1+e−2v and φ′ (v) = −e−v = ∗ −1 ∗ [fφ ] (v) − 1. [sent-143, score-0.481]

50 This (plus the symmetry of J and fφ ) shows that the former is in canonical form but the latter is not. [sent-144, score-0.698]

51 Second, it makes it clear that, up to additive constants, the three components ∗ ∗ (φ, Cφ , and fφ ) of a canonical risk are related by one-to-one relationships. [sent-145, score-0.874]

52 Hence, it is possible to control the properties of the three components of the risk by manipulating a single function (which can be any of the three). [sent-146, score-0.318]

53 Finally, it enables a very detailed characterization of the losses compatible with most optimal links of Table 1. [sent-147, score-0.415]

54 −v 4 Inverse-sigmoidal links Inspection of Table 1 suggests that the classiﬁers produced by boosting, logistic regression, and vari∗ ∗ ants have sigmoidal inverse links [fφ ]−1 . [sent-148, score-0.346]

55 (24) It follows that, as illustrated in Figure 1, the loss φ(v) is convex, monotonically decreasing, linear (with slope −1) for large negative v, constant for large positive v, and has slope −. [sent-154, score-0.349]

56 The set of losses compatible with an IS link is, thus, strongly constrained. [sent-156, score-0.478]

57 5 1 −20 −15 −10 v −5 0 5 10 15 20 v Figure 2: canonical link (left) and loss (right) for various values of a. [sent-179, score-1.051]

58 What is interesting is that these are the degrees of freedom that control the margin characteristics of the loss φ. [sent-181, score-0.526]

59 Hence, by controlling the behavior of the IS link around the origin, it is possible to control the margin of the optimal classiﬁer. [sent-182, score-0.528]

60 In particular, the margin is a decreasing function of the ∗ curvature of the loss at the origin, φ(2) (0). [sent-183, score-0.479]

61 5 Variable margin loss functions The results above enable the derivation of families of canonical losses with controllable margin. [sent-185, score-1.395]

62 In Section 3, we have seen that the boosting loss is not canonical, but there is a canonical loss for the minimum risk of boosting. [sent-186, score-1.702]

63 (25) From (16), the canonical optimal link is ∗ fφ (η; a) = 2η − 1 a η(1 − η) (26) and it can be shown that −1 ∗ [fφ ] (v; a) = 1 av + 2 2 4 + (av)2 (27) is an IS link, i. [sent-188, score-0.983]

64 Using (18), the corresponding canonical loss is φ(v; a) = 1 ( 4 + (av)2 − av). [sent-191, score-0.836]

65 2a (28) Because it shares the minimum risk of boosting, we refer to this loss as the canonical boosting loss. [sent-192, score-1.483]

66 9 link is indeed sigmoidal, and that the margin is determined by a. [sent-209, score-0.434]

67 It is also possible to derive variable margin extensions of existing canonical losses. [sent-211, score-0.836]

68 For example, consider the parametric extension of the minimum risk of logistic regression J(η; a) = 1 1 η log(η) + (1 − η) log(1 − η). [sent-212, score-0.5]

69 log [fφ ] (v; a) = a 1−η 1 + eav This is again a sigmoidal inverse link and, from (18), ∗ [fφ ](v; a) = (30) 1 [log(1 + eav ) − av] . [sent-214, score-0.525]

70 (31) a We denote this loss the canonical logistic loss. [sent-215, score-0.965]

71 4 φ(v; a) = Note that, in (28) and (31), margin control is not achieved by simply rescaling the domain of the loss function. [sent-218, score-0.474]

72 While this type of re-scaling occurs in both families of loss functions above (which are both functions of av), it is localized around the origin, and only inﬂuences the margin properties of the loss. [sent-223, score-0.587]

73 6 Experiments A number of easily reproducible experiments were conducted to study the effect of variable margin losses on the accuracy of the resulting classiﬁers. [sent-227, score-0.413]

74 The GradientBoost algorithm [8], with histogram-based weak learners, was then used to design boosted classiﬁers which minimize the canonical logistic and boosting losses, for various margin parameters. [sent-231, score-1.361]

75 50 boosting iterations were applied to each training set, for 19 values of a ∈ {0. [sent-234, score-0.327]

76 To explore this question we show, in Figure-1, the average rank of the classiﬁer designed with each loss and margin parameter a. [sent-246, score-0.496]

77 For the canonical logistic loss, the best values of a is in the range 0. [sent-250, score-0.746]

78 Note that the average rank for this range (between 5 and 6), is better than that (close to 7) achieved with the logistic loss of LogitBoost [2] (a = 1). [sent-253, score-0.406]

79 In fact, as can be seen from Table 2, the canonical 7 Table 3: Classiﬁcation error for each loss function and UCI dataset. [sent-254, score-0.836]

80 6 logistic loss with a = 1 did not achieve rank 1 on any dataset, whereas canonical logistic losses with 0. [sent-332, score-1.346]

81 For the canonical boosting loss, there is also a range (0. [sent-337, score-0.944]

82 4 produces a loss of much larger margin for canonical boosting. [sent-341, score-1.055]

83 Furthermore, the canonical boosting loss has a heavier tail and approaches zero more slowly than the canonical logistic loss. [sent-342, score-1.909]

84 Although certain ranges of margin parameters seem to produce best results for both canonical loss functions, the optimal parameter value is likely to be dataset dependent. [sent-343, score-1.113]

85 Table 3 presents the 5-fold cross validation test error (# of misclassiﬁed points) obtained for each UCI dataset and canonical loss. [sent-346, score-0.661]

86 The table also shows the results of AdaBoost, LogitBoost (canonical logistic, a = 1), and canonical boosting loss with a = 1. [sent-347, score-1.191]

87 Cross validating the margin results in better performance for 9 out of 10 (8 out 10) datasets for the canonical logistic (boosting) loss, when compared to the ﬁxed margin (a = 1) counterparts. [sent-348, score-1.215]

88 In this case, canonical logistic and canonical boosting outperform both LogitBoost and AdaBoost in 7 and 5 of the ten datasets, respectively. [sent-354, score-1.757]

89 LogitBoost and AdaBoost outperforming both canonical losses only happens in 2 and 3 datasets, respectively. [sent-357, score-0.811]

90 7 Conclusion The probability elicitation approach to loss function design, introduced in [3], enables the derivation of new Bayes consistent loss functions. [sent-358, score-0.59]

91 In general, it is difﬁcult to anticipate the properties, and shape, of a loss function that results from combining a certain minimal risk with a certain link function. [sent-360, score-0.691]

92 In this work, we have addressed this problem for the class of canonical risks. [sent-361, score-0.617]

93 We have shown that the associated canonical loss functions lend themselves to analysis, due to a simple connection between the associated minimum conditional risk and optimal link functions. [sent-362, score-1.539]

94 This analysis was shown to enable a precise characterization of 1) the relationships between loss, optimal link, and minimum risk, and 2) the properties of the loss whenever the optimal link is in the family of inverse sigmoid functions. [sent-363, score-0.721]

95 These properties were then exploited to design parametric families of loss functions with explicit margin control. [sent-364, score-0.653]

96 Experiments with boosting algorithms derived from these variable margin losses have shown better performance than those of classical algorithms, such as AdaBoost or LogitBoost. [sent-365, score-0.799]

97 Vasconcelos, “On the design of loss functions for classiﬁcation: theory, robustness to outliers, and savageboost,” in NIPS, 2008, pp. [sent-376, score-0.317]

98 Bischof, “On robustness of on-line boosting - a competitive study,” in IEEE ICCV Workshop on On-line Computer Vision, 2009. [sent-388, score-0.327]

99 Zhang, “Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization,” Annals of Statistics, 2004. [sent-394, score-0.309]

100 Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol. [sent-397, score-0.327]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('canonical', 0.617), ('boosting', 0.327), ('risk', 0.257), ('margin', 0.219), ('loss', 0.219), ('link', 0.215), ('losses', 0.194), ('logitboost', 0.154), ('logistic', 0.129), ('av', 0.117), ('adaboost', 0.112), ('classi', 0.106), ('bayes', 0.101), ('sigmoidal', 0.095), ('uci', 0.081), ('er', 0.073), ('elicitation', 0.071), ('sign', 0.069), ('compatible', 0.069), ('design', 0.069), ('families', 0.066), ('eav', 0.066), ('minimum', 0.063), ('rank', 0.058), ('yf', 0.058), ('boost', 0.056), ('symmetry', 0.055), ('breast', 0.05), ('lim', 0.049), ('slope', 0.047), ('inverse', 0.046), ('ers', 0.046), ('cancer', 0.045), ('bdr', 0.044), ('gradientboost', 0.044), ('savageboost', 0.044), ('ten', 0.042), ('analytical', 0.042), ('decreasing', 0.041), ('reward', 0.041), ('risks', 0.04), ('origin', 0.039), ('links', 0.038), ('log', 0.037), ('classical', 0.036), ('monotonically', 0.036), ('control', 0.036), ('nuno', 0.036), ('vasconcelos', 0.036), ('optimal', 0.034), ('ev', 0.033), ('jolla', 0.032), ('characterization', 0.031), ('establishes', 0.031), ('consistent', 0.031), ('datasets', 0.031), ('enable', 0.03), ('enables', 0.029), ('functions', 0.029), ('counterparts', 0.029), ('conditional', 0.029), ('said', 0.029), ('table', 0.028), ('convex', 0.028), ('derivative', 0.028), ('guaranteeing', 0.028), ('freedom', 0.028), ('py', 0.027), ('parametric', 0.026), ('latter', 0.026), ('margins', 0.026), ('minimized', 0.026), ('associated', 0.026), ('outperform', 0.025), ('regression', 0.025), ('properties', 0.025), ('degrees', 0.024), ('connection', 0.024), ('ranked', 0.024), ('sigmoid', 0.024), ('behavior', 0.024), ('dataset', 0.024), ('diego', 0.024), ('enforced', 0.024), ('derived', 0.023), ('outliers', 0.023), ('holds', 0.022), ('cation', 0.022), ('derivation', 0.021), ('denoted', 0.021), ('la', 0.02), ('specify', 0.02), ('cross', 0.02), ('detailed', 0.02), ('trial', 0.02), ('invertible', 0.02), ('conditions', 0.02), ('pairs', 0.019), ('unduly', 0.019), ('prognostic', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 282 nips-2010-Variable margin losses for classifier design

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

2 0.19663693 15 nips-2010-A Theory of Multiclass Boosting

Author: Indraneel Mukherjee, Robert E. Schapire

Abstract: Boosting combines weak classiﬁers to form highly accurate predictors. Although the case of binary classiﬁcation is well understood, in the multiclass setting, the “correct” requirements on the weak classiﬁer, or the notion of the most efﬁcient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classiﬁer, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1

3 0.1831174 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu

Abstract: Robust regression and classiﬁcation are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classiﬁcation problems. 1

4 0.1707509 243 nips-2010-Smoothness, Low Noise and Fast Rates

Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari

Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1

5 0.1381449 290 nips-2010-t-logistic regression

Author: Nan Ding, S.v.n. Vishwanathan

Abstract: We extend logistic regression by using t-exponential families which were introduced recently in statistical physics. This gives rise to a regularized risk minimization problem with a non-convex loss function. An efﬁcient block coordinate descent optimization scheme can be derived for estimating the parameters. Because of the nature of the loss function, our algorithm is tolerant to label noise. Furthermore, unlike other algorithms which employ non-convex loss functions, our algorithm is fairly robust to the choice of initial values. We verify both these observations empirically on a number of synthetic and real datasets. 1

6 0.1121522 42 nips-2010-Boosting Classifier Cascades

7 0.096841201 228 nips-2010-Reverse Multi-Label Learning

8 0.090492226 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

9 0.085856952 269 nips-2010-Throttling Poisson Processes

10 0.07896512 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

11 0.077366717 22 nips-2010-Active Estimation of F-Measures

12 0.073898666 191 nips-2010-On the Theory of Learnining with Privileged Information

13 0.073169589 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

14 0.070817277 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control

15 0.070637703 162 nips-2010-Link Discovery using Graph Feature Tracking

16 0.070347488 270 nips-2010-Tight Sample Complexity of Large-Margin Learning

17 0.067819513 203 nips-2010-Parametric Bandits: The Generalized Linear Case

18 0.06541124 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch

19 0.06409429 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

20 0.061508 108 nips-2010-Graph-Valued Regression

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.178), (1, 0.033), (2, 0.116), (3, -0.058), (4, 0.071), (5, 0.084), (6, -0.143), (7, -0.076), (8, -0.06), (9, -0.01), (10, -0.043), (11, -0.005), (12, 0.082), (13, 0.099), (14, -0.081), (15, 0.087), (16, 0.088), (17, 0.19), (18, -0.073), (19, -0.025), (20, 0.07), (21, -0.08), (22, -0.004), (23, -0.075), (24, 0.009), (25, 0.017), (26, 0.06), (27, 0.212), (28, 0.094), (29, 0.068), (30, -0.064), (31, -0.012), (32, 0.024), (33, -0.042), (34, 0.026), (35, -0.007), (36, 0.026), (37, -0.064), (38, 0.037), (39, -0.127), (40, 0.036), (41, 0.13), (42, 0.007), (43, -0.028), (44, -0.015), (45, 0.071), (46, 0.107), (47, 0.015), (48, -0.027), (49, 0.07)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97896206 282 nips-2010-Variable margin losses for classifier design

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

2 0.76163906 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification

Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu

3 0.72685856 290 nips-2010-t-logistic regression

Author: Nan Ding, S.v.n. Vishwanathan

4 0.67731845 15 nips-2010-A Theory of Multiclass Boosting

Author: Indraneel Mukherjee, Robert E. Schapire

5 0.655168 243 nips-2010-Smoothness, Low Noise and Fast Rates

Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari

6 0.63494676 269 nips-2010-Throttling Poisson Processes

7 0.60025114 191 nips-2010-On the Theory of Learnining with Privileged Information

8 0.57571399 228 nips-2010-Reverse Multi-Label Learning

9 0.52706242 42 nips-2010-Boosting Classifier Cascades

10 0.49378452 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

11 0.48646942 61 nips-2010-Direct Loss Minimization for Structured Prediction

12 0.47598684 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

13 0.45644462 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

14 0.43842849 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch

15 0.43706906 233 nips-2010-Scrambled Objects for Least-Squares Regression

16 0.42465743 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

17 0.41297275 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods

18 0.40658525 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

19 0.40261576 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

20 0.40210956 192 nips-2010-Online Classification with Specificity Constraints

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.024), (17, 0.018), (27, 0.074), (30, 0.055), (35, 0.022), (45, 0.227), (50, 0.075), (52, 0.029), (60, 0.041), (66, 0.166), (77, 0.073), (78, 0.037), (90, 0.072)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89567161 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

Author: Surya Ganguli, Haim Sompolinsky

Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1

same-paper 2 0.89237124 282 nips-2010-Variable margin losses for classifier design

Author: Hamed Masnadi-shirazi, Nuno Vasconcelos

3 0.89092505 280 nips-2010-Unsupervised Kernel Dimension Reduction

Author: Meihong Wang, Fei Sha, Michael I. Jordan

Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classiﬁcation, our methods lead to lower classiﬁcation errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.

4 0.83334351 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efﬁcient object detection consists of building a cascade composed of several binary classiﬁers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classiﬁers, and concentrates the computation on difﬁcult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classiﬁers of such a cascade, which interprets the response of a classiﬁer as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efﬁciency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

5 0.83062762 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classiﬁcation; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classiﬁers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efﬁcient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

6 0.82991564 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes

7 0.82957906 63 nips-2010-Distributed Dual Averaging In Networks

8 0.82874709 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

9 0.82856637 158 nips-2010-Learning via Gaussian Herding

10 0.82835031 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models

11 0.82428986 117 nips-2010-Identifying graph-structured activation patterns in networks

12 0.82236326 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

13 0.82234991 155 nips-2010-Learning the context of a category

14 0.82204431 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning

15 0.82143247 148 nips-2010-Learning Networks of Stochastic Differential Equations

16 0.82103062 243 nips-2010-Smoothness, Low Noise and Fast Rates

17 0.82036978 1 nips-2010-(RF)^2 -- Random Forest Random Field

18 0.81999588 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

19 0.81905955 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

20 0.81881213 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery