nips nips2010 nips2010-282 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Variable margin losses for classifier design Nuno Vasconcelos Statistical Visual Computing Laboratory, University of California, San Diego La Jolla, CA 92039 nuno@ucsd. [sent-1, score-0.482]
2 A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. [sent-4, score-0.875]
3 It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. [sent-5, score-0.749]
4 These enable a precise characterization of the loss for a popular class of link functions. [sent-6, score-0.495]
5 It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. [sent-7, score-1.598]
6 Novel families of Bayes consistent loss functions, of variable margin, are derived. [sent-8, score-0.316]
7 These families are then used to design boosting style algorithms with explicit control of the classification margin. [sent-9, score-0.498]
8 Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. [sent-11, score-0.686]
9 Examples of such losses include the hinge loss, used in SVM design, the exponential loss, used by boosting algorithms such as AdaBoost, or the logistic loss, used in both classical logistic regression and more recent methods, such as LogitBoost. [sent-16, score-0.84]
10 The main idea is that there are three quantities that matter in risk minimization: the loss function φ, a corresponding optimal link ∗ function fφ , which maps posterior class probabilities to classifier predictions, and a minimum risk ∗ Cφ , associated with the optimal link. [sent-22, score-1.105]
11 While the standard approach to classifier design is to define a loss φ, and then optimize it to obtain ∗ ∗ ∗ ∗ fφ and Cφ , [3] showed that there is an alternative: to specify fφ and Cφ , and analytically derive the loss φ. [sent-23, score-0.527]
12 This 1 loss is then used to design a robust boosting algorithm, denoted SavageBoost. [sent-26, score-0.636]
13 SavageBoost has been, more recently, shown to outperform most other boosting algorithms in computer vision problems, where outliers are prevalent [5]. [sent-27, score-0.375]
14 It turns out that many pairs (Cφ ,fφ ) are compatible with any Bayes consistent loss φ. [sent-29, score-0.338]
15 In practice, the design has to resort to trial and error, by 1) testing combinations of the latter and, 2) verifying whether the loss has the desired properties. [sent-32, score-0.334]
16 In this work, we consider one such problem: how to control the size of the margin enforced by the ∗ ∗ loss. [sent-36, score-0.279]
17 We start by showing that, while many pairs (Cφ ,fφ ) are compatible with a given φ, one of these ∗ pairs establishes a very tight connection between the optimal link and the minimum risk: that fφ is ∗ the derivative of Cφ . [sent-37, score-0.502]
18 We refer to the risk function associated with such a pair as a canonical risk, ∗ ∗ and show that it leads to an equally tight connection between the pair (Cφ ,fφ ) and the loss φ. [sent-38, score-1.143]
19 For a canonical risk, all three functions can be obtained from each other with one-to-one mappings of ∗ ∗ trivial analytical tractability. [sent-39, score-0.688]
20 This implies that, for a canonical risk, the choice of a particular link in the inverse-sigmoidal family only impacts the behavior of φ around the origin, i. [sent-45, score-0.856]
21 This is exploited to design parametric families of loss functions that allow explicit control of the classification margin. [sent-52, score-0.445]
22 These losses are applied to the design of novel boosting algorithms of tunable margin. [sent-53, score-0.59]
23 Finally, it is shown that the requirements of 1) a canonical risk, and 2) an inverse-sigmoidal link are not unduly restrictive for classifier design. [sent-54, score-0.851]
24 Given a non-negative loss function L(x, y), the classifier is optimal if it minimizes the risk R(f ) = EX,Y [L(h(x), y)]. [sent-64, score-0.51]
25 This is equivalent to minimizing the conditional risk EY |X [L(h(x), y)|X = x] for all x ∈ X . [sent-65, score-0.286]
26 The associated conditional risk is C0/1 (η, f ) = η 1 − sign(f ) 1 + sign(f ) + (1 − η) = 2 2 The risk is minimized if f (x) > 0 f (x) = 0 f (x) < 0 if η(x) > if η(x) = if η(x) < 2 1 2 1 2 1 2 1 − η, if f ≥ 0; η, if f < 0. [sent-68, score-0.595]
27 (2) (3) ∗ ∗ ∗ Table 1: Loss φ, optimal link fφ (η), optimal inverse link [fφ ]−1 (v) , and minimum conditional risk Cφ (η) for popular learning algorithms. [sent-69, score-0.893]
28 The associated optimal ∗ ∗ classifier h = sign[f ] is the well known Bayes decision rule (BDR), and the associated minimum conditional (zero-one) risk is ∗ C0/1 (η) = η 1 1 − sign(2η − 1) + (1 − η) 2 2 1 1 + sign(2η − 1) . [sent-71, score-0.435]
29 These include the exponential loss of boosting, the log loss of logistic regression, and the hinge loss of SVMs. [sent-74, score-0.823]
30 The associated conditional risk Cφ (η, f ) = ηφ(f ) + (1 − η)φ(−f ). [sent-78, score-0.312]
31 (5) ∗ fφ (η) = arg min Cφ (η, f ) (6) is minimized by the link f ∗ ∗ leading to the minimum conditional risk function Cφ (η) = Cφ (η, fφ ). [sent-79, score-0.59]
32 Table 1 lists the loss, optimal link, and minimum risk of some of the most popular classifier design methods. [sent-80, score-0.423]
33 Conditional risk minimization is closely related to classical probability elicitation in statistics [4]. [sent-81, score-0.364]
34 Rather than specifying a loss φ and minimizing Cφ (η, f ), so as to obtain whatever ∗ ∗ ∗ ∗ optimal link fφ and minimum expected risk Cφ (η) results, it is possible to specify fφ and Cφ (η) ∗ and derive, from (14) with J(η) = −Cφ (η), the underlying loss φ. [sent-99, score-1.027]
35 The main advantage is the ability to control directly the quantities that matter for classification, namely the predictor and risk of the ∗ ∗ ∗ optimal classifier. [sent-100, score-0.327]
36 3 Canonical risk minimization ∗ ∗ In general, given J(η) = −Cφ (η), there are multiple pairs (φ, fφ ) that satisfy (14). [sent-102, score-0.276]
37 Hence, specification of either the minimum risk or optimal link does not completely characterize the loss. [sent-103, score-0.569]
38 Whenever this holds, the risk is said to be in canonical form, and (f ∗ , J) are denoted a canonical pair [6] . [sent-112, score-1.541]
39 If the optimal link ∗ associated with Cφ (η) is ∗ fφ (η) = J ′ (η) (16) ∗ the risk Cφ (η, f ) is said to be in canonical form. [sent-115, score-1.178]
40 fφ (η) is denoted a canonical link and φ(v), the loss given by (14), a canonical loss. [sent-116, score-1.689]
41 For example, the risk of boosting is derived from the convex, differentiable, and symmetric J(η) = −2 η(1 − η). [sent-118, score-0.607]
42 Since this has derivative J ′ (η) = 2η − 1 η(1 − η) = 1 η ∗ log = fφ (η), 2 1−η (17) the risk is not in canonical form. [sent-119, score-0.939]
43 What follows from (16) is that it is possible to derive a canonical risk for any maximal reward J(η), including that of boosting (J(η) = −2 η(1 − η)). [sent-120, score-1.242]
44 ∗ While canonical risks can be easily designed by specifying either J(η) or fφ (η), and then using (14) and (16), it is much less clear how to directly specify a loss φ(v) for which (14) holds with a canonical pair (f ∗ , J). [sent-122, score-1.535]
45 Let Cφ (η, f ) be the canonical risk derived from a convex and symmetric J(η). [sent-125, score-0.925]
46 2 margin parameter v Figure 1: Left: canonical losses compatible with an IS optimal link. [sent-139, score-1.133]
47 Right: Average classification rank as a function of margin parameter, on the UCI data. [sent-140, score-0.277]
48 First, it establishes an easy-to-verify necessary 1 ∗ condition for the canonical form. [sent-142, score-0.648]
49 For example, logistic regression has [fφ ]−1 (v) = 1+e−v and e 1 ∗ ∗ φ′ (v) = − 1+e−v = [fφ ]−1 (v) − 1, while boosting has [fφ ]−1 (v) = 1+e−2v and φ′ (v) = −e−v = ∗ −1 ∗ [fφ ] (v) − 1. [sent-143, score-0.481]
50 This (plus the symmetry of J and fφ ) shows that the former is in canonical form but the latter is not. [sent-144, score-0.698]
51 Second, it makes it clear that, up to additive constants, the three components ∗ ∗ (φ, Cφ , and fφ ) of a canonical risk are related by one-to-one relationships. [sent-145, score-0.874]
52 Hence, it is possible to control the properties of the three components of the risk by manipulating a single function (which can be any of the three). [sent-146, score-0.318]
53 Finally, it enables a very detailed characterization of the losses compatible with most optimal links of Table 1. [sent-147, score-0.415]
54 −v 4 Inverse-sigmoidal links Inspection of Table 1 suggests that the classifiers produced by boosting, logistic regression, and vari∗ ∗ ants have sigmoidal inverse links [fφ ]−1 . [sent-148, score-0.346]
55 (24) It follows that, as illustrated in Figure 1, the loss φ(v) is convex, monotonically decreasing, linear (with slope −1) for large negative v, constant for large positive v, and has slope −. [sent-154, score-0.349]
56 The set of losses compatible with an IS link is, thus, strongly constrained. [sent-156, score-0.478]
57 5 1 −20 −15 −10 v −5 0 5 10 15 20 v Figure 2: canonical link (left) and loss (right) for various values of a. [sent-179, score-1.051]
58 What is interesting is that these are the degrees of freedom that control the margin characteristics of the loss φ. [sent-181, score-0.526]
59 Hence, by controlling the behavior of the IS link around the origin, it is possible to control the margin of the optimal classifier. [sent-182, score-0.528]
60 In particular, the margin is a decreasing function of the ∗ curvature of the loss at the origin, φ(2) (0). [sent-183, score-0.479]
61 5 Variable margin loss functions The results above enable the derivation of families of canonical losses with controllable margin. [sent-185, score-1.395]
62 In Section 3, we have seen that the boosting loss is not canonical, but there is a canonical loss for the minimum risk of boosting. [sent-186, score-1.702]
63 (25) From (16), the canonical optimal link is ∗ fφ (η; a) = 2η − 1 a η(1 − η) (26) and it can be shown that −1 ∗ [fφ ] (v; a) = 1 av + 2 2 4 + (av)2 (27) is an IS link, i. [sent-188, score-0.983]
64 Using (18), the corresponding canonical loss is φ(v; a) = 1 ( 4 + (av)2 − av). [sent-191, score-0.836]
65 2a (28) Because it shares the minimum risk of boosting, we refer to this loss as the canonical boosting loss. [sent-192, score-1.483]
66 9 link is indeed sigmoidal, and that the margin is determined by a. [sent-209, score-0.434]
67 It is also possible to derive variable margin extensions of existing canonical losses. [sent-211, score-0.836]
68 For example, consider the parametric extension of the minimum risk of logistic regression J(η; a) = 1 1 η log(η) + (1 − η) log(1 − η). [sent-212, score-0.5]
69 log [fφ ] (v; a) = a 1−η 1 + eav This is again a sigmoidal inverse link and, from (18), ∗ [fφ ](v; a) = (30) 1 [log(1 + eav ) − av] . [sent-214, score-0.525]
70 (31) a We denote this loss the canonical logistic loss. [sent-215, score-0.965]
71 4 φ(v; a) = Note that, in (28) and (31), margin control is not achieved by simply rescaling the domain of the loss function. [sent-218, score-0.474]
72 While this type of re-scaling occurs in both families of loss functions above (which are both functions of av), it is localized around the origin, and only influences the margin properties of the loss. [sent-223, score-0.587]
73 6 Experiments A number of easily reproducible experiments were conducted to study the effect of variable margin losses on the accuracy of the resulting classifiers. [sent-227, score-0.413]
74 The GradientBoost algorithm [8], with histogram-based weak learners, was then used to design boosted classifiers which minimize the canonical logistic and boosting losses, for various margin parameters. [sent-231, score-1.361]
75 50 boosting iterations were applied to each training set, for 19 values of a ∈ {0. [sent-234, score-0.327]
76 To explore this question we show, in Figure-1, the average rank of the classifier designed with each loss and margin parameter a. [sent-246, score-0.496]
77 For the canonical logistic loss, the best values of a is in the range 0. [sent-250, score-0.746]
78 Note that the average rank for this range (between 5 and 6), is better than that (close to 7) achieved with the logistic loss of LogitBoost [2] (a = 1). [sent-253, score-0.406]
79 In fact, as can be seen from Table 2, the canonical 7 Table 3: Classification error for each loss function and UCI dataset. [sent-254, score-0.836]
80 6 logistic loss with a = 1 did not achieve rank 1 on any dataset, whereas canonical logistic losses with 0. [sent-332, score-1.346]
81 For the canonical boosting loss, there is also a range (0. [sent-337, score-0.944]
82 4 produces a loss of much larger margin for canonical boosting. [sent-341, score-1.055]
83 Furthermore, the canonical boosting loss has a heavier tail and approaches zero more slowly than the canonical logistic loss. [sent-342, score-1.909]
84 Although certain ranges of margin parameters seem to produce best results for both canonical loss functions, the optimal parameter value is likely to be dataset dependent. [sent-343, score-1.113]
85 Table 3 presents the 5-fold cross validation test error (# of misclassified points) obtained for each UCI dataset and canonical loss. [sent-346, score-0.661]
86 The table also shows the results of AdaBoost, LogitBoost (canonical logistic, a = 1), and canonical boosting loss with a = 1. [sent-347, score-1.191]
87 Cross validating the margin results in better performance for 9 out of 10 (8 out 10) datasets for the canonical logistic (boosting) loss, when compared to the fixed margin (a = 1) counterparts. [sent-348, score-1.215]
88 In this case, canonical logistic and canonical boosting outperform both LogitBoost and AdaBoost in 7 and 5 of the ten datasets, respectively. [sent-354, score-1.757]
89 LogitBoost and AdaBoost outperforming both canonical losses only happens in 2 and 3 datasets, respectively. [sent-357, score-0.811]
90 7 Conclusion The probability elicitation approach to loss function design, introduced in [3], enables the derivation of new Bayes consistent loss functions. [sent-358, score-0.59]
91 In general, it is difficult to anticipate the properties, and shape, of a loss function that results from combining a certain minimal risk with a certain link function. [sent-360, score-0.691]
92 In this work, we have addressed this problem for the class of canonical risks. [sent-361, score-0.617]
93 We have shown that the associated canonical loss functions lend themselves to analysis, due to a simple connection between the associated minimum conditional risk and optimal link functions. [sent-362, score-1.539]
94 This analysis was shown to enable a precise characterization of 1) the relationships between loss, optimal link, and minimum risk, and 2) the properties of the loss whenever the optimal link is in the family of inverse sigmoid functions. [sent-363, score-0.721]
95 These properties were then exploited to design parametric families of loss functions with explicit margin control. [sent-364, score-0.653]
96 Experiments with boosting algorithms derived from these variable margin losses have shown better performance than those of classical algorithms, such as AdaBoost or LogitBoost. [sent-365, score-0.799]
97 Vasconcelos, “On the design of loss functions for classification: theory, robustness to outliers, and savageboost,” in NIPS, 2008, pp. [sent-376, score-0.317]
98 Bischof, “On robustness of on-line boosting - a competitive study,” in IEEE ICCV Workshop on On-line Computer Vision, 2009. [sent-388, score-0.327]
99 Zhang, “Statistical behavior and consistency of classification methods based on convex risk minimization,” Annals of Statistics, 2004. [sent-394, score-0.309]
100 Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol. [sent-397, score-0.327]
wordName wordTfidf (topN-words)
[('canonical', 0.617), ('boosting', 0.327), ('risk', 0.257), ('margin', 0.219), ('loss', 0.219), ('link', 0.215), ('losses', 0.194), ('logitboost', 0.154), ('logistic', 0.129), ('av', 0.117), ('adaboost', 0.112), ('classi', 0.106), ('bayes', 0.101), ('sigmoidal', 0.095), ('uci', 0.081), ('er', 0.073), ('elicitation', 0.071), ('sign', 0.069), ('compatible', 0.069), ('design', 0.069), ('families', 0.066), ('eav', 0.066), ('minimum', 0.063), ('rank', 0.058), ('yf', 0.058), ('boost', 0.056), ('symmetry', 0.055), ('breast', 0.05), ('lim', 0.049), ('slope', 0.047), ('inverse', 0.046), ('ers', 0.046), ('cancer', 0.045), ('bdr', 0.044), ('gradientboost', 0.044), ('savageboost', 0.044), ('ten', 0.042), ('analytical', 0.042), ('decreasing', 0.041), ('reward', 0.041), ('risks', 0.04), ('origin', 0.039), ('links', 0.038), ('log', 0.037), ('classical', 0.036), ('monotonically', 0.036), ('control', 0.036), ('nuno', 0.036), ('vasconcelos', 0.036), ('optimal', 0.034), ('ev', 0.033), ('jolla', 0.032), ('characterization', 0.031), ('establishes', 0.031), ('consistent', 0.031), ('datasets', 0.031), ('enable', 0.03), ('enables', 0.029), ('functions', 0.029), ('counterparts', 0.029), ('conditional', 0.029), ('said', 0.029), ('table', 0.028), ('convex', 0.028), ('derivative', 0.028), ('guaranteeing', 0.028), ('freedom', 0.028), ('py', 0.027), ('parametric', 0.026), ('latter', 0.026), ('margins', 0.026), ('minimized', 0.026), ('associated', 0.026), ('outperform', 0.025), ('regression', 0.025), ('properties', 0.025), ('degrees', 0.024), ('connection', 0.024), ('ranked', 0.024), ('sigmoid', 0.024), ('behavior', 0.024), ('dataset', 0.024), ('diego', 0.024), ('enforced', 0.024), ('derived', 0.023), ('outliers', 0.023), ('holds', 0.022), ('cation', 0.022), ('derivation', 0.021), ('denoted', 0.021), ('la', 0.02), ('specify', 0.02), ('cross', 0.02), ('detailed', 0.02), ('trial', 0.02), ('invertible', 0.02), ('conditions', 0.02), ('pairs', 0.019), ('unduly', 0.019), ('prognostic', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 282 nips-2010-Variable margin losses for classifier design
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
2 0.19663693 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
3 0.1831174 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu
Abstract: Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems. 1
4 0.1707509 243 nips-2010-Smoothness, Low Noise and Fast Rates
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
5 0.1381449 290 nips-2010-t-logistic regression
Author: Nan Ding, S.v.n. Vishwanathan
Abstract: We extend logistic regression by using t-exponential families which were introduced recently in statistical physics. This gives rise to a regularized risk minimization problem with a non-convex loss function. An efficient block coordinate descent optimization scheme can be derived for estimating the parameters. Because of the nature of the loss function, our algorithm is tolerant to label noise. Furthermore, unlike other algorithms which employ non-convex loss functions, our algorithm is fairly robust to the choice of initial values. We verify both these observations empirically on a number of synthetic and real datasets. 1
6 0.1121522 42 nips-2010-Boosting Classifier Cascades
7 0.096841201 228 nips-2010-Reverse Multi-Label Learning
8 0.090492226 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
9 0.085856952 269 nips-2010-Throttling Poisson Processes
10 0.07896512 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
11 0.077366717 22 nips-2010-Active Estimation of F-Measures
12 0.073898666 191 nips-2010-On the Theory of Learnining with Privileged Information
13 0.073169589 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
14 0.070817277 29 nips-2010-An Approximate Inference Approach to Temporal Optimization in Optimal Control
15 0.070637703 162 nips-2010-Link Discovery using Graph Feature Tracking
16 0.070347488 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
17 0.067819513 203 nips-2010-Parametric Bandits: The Generalized Linear Case
18 0.06541124 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
19 0.06409429 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
20 0.061508 108 nips-2010-Graph-Valued Regression
topicId topicWeight
[(0, 0.178), (1, 0.033), (2, 0.116), (3, -0.058), (4, 0.071), (5, 0.084), (6, -0.143), (7, -0.076), (8, -0.06), (9, -0.01), (10, -0.043), (11, -0.005), (12, 0.082), (13, 0.099), (14, -0.081), (15, 0.087), (16, 0.088), (17, 0.19), (18, -0.073), (19, -0.025), (20, 0.07), (21, -0.08), (22, -0.004), (23, -0.075), (24, 0.009), (25, 0.017), (26, 0.06), (27, 0.212), (28, 0.094), (29, 0.068), (30, -0.064), (31, -0.012), (32, 0.024), (33, -0.042), (34, 0.026), (35, -0.007), (36, 0.026), (37, -0.064), (38, 0.037), (39, -0.127), (40, 0.036), (41, 0.13), (42, 0.007), (43, -0.028), (44, -0.015), (45, 0.071), (46, 0.107), (47, 0.015), (48, -0.027), (49, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.97896206 282 nips-2010-Variable margin losses for classifier design
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
2 0.76163906 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu
Abstract: Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems. 1
3 0.72685856 290 nips-2010-t-logistic regression
Author: Nan Ding, S.v.n. Vishwanathan
Abstract: We extend logistic regression by using t-exponential families which were introduced recently in statistical physics. This gives rise to a regularized risk minimization problem with a non-convex loss function. An efficient block coordinate descent optimization scheme can be derived for estimating the parameters. Because of the nature of the loss function, our algorithm is tolerant to label noise. Furthermore, unlike other algorithms which employ non-convex loss functions, our algorithm is fairly robust to the choice of initial values. We verify both these observations empirically on a number of synthetic and real datasets. 1
4 0.67731845 15 nips-2010-A Theory of Multiclass Boosting
Author: Indraneel Mukherjee, Robert E. Schapire
Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1
5 0.655168 243 nips-2010-Smoothness, Low Noise and Fast Rates
Author: Nathan Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: √ ˜ We establish an excess risk bound of O HR2 + HL∗ Rn for ERM with an H-smooth loss n function and a hypothesis class with Rademacher complexity Rn , where L∗ is the best risk achievable by the hypothesis class. For typical hypothesis classes where Rn = R/n, this translates to ˜ ˜ a learning rate of O (RH/n) in the separable (L∗ = 0) case and O RH/n + L∗ RH/n more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective. 1
6 0.63494676 269 nips-2010-Throttling Poisson Processes
7 0.60025114 191 nips-2010-On the Theory of Learnining with Privileged Information
8 0.57571399 228 nips-2010-Reverse Multi-Label Learning
9 0.52706242 42 nips-2010-Boosting Classifier Cascades
10 0.49378452 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
11 0.48646942 61 nips-2010-Direct Loss Minimization for Structured Prediction
12 0.47598684 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
13 0.45644462 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs
14 0.43842849 116 nips-2010-Identifying Patients at Risk of Major Adverse Cardiovascular Events Using Symbolic Mismatch
15 0.43706906 233 nips-2010-Scrambled Objects for Least-Squares Regression
16 0.42465743 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification
17 0.41297275 163 nips-2010-Lower Bounds on Rate of Convergence of Cutting Plane Methods
18 0.40658525 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers
19 0.40261576 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
20 0.40210956 192 nips-2010-Online Classification with Specificity Constraints
topicId topicWeight
[(13, 0.024), (17, 0.018), (27, 0.074), (30, 0.055), (35, 0.022), (45, 0.227), (50, 0.075), (52, 0.029), (60, 0.041), (66, 0.166), (77, 0.073), (78, 0.037), (90, 0.072)]
simIndex simValue paperId paperTitle
1 0.89567161 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
same-paper 2 0.89237124 282 nips-2010-Variable margin losses for classifier design
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The problem of controlling the margin of a classifier is studied. A detailed analytical study is presented on how properties of the classification risk, such as its optimal link and minimum risk functions, are related to the shape of the loss, and its margin enforcing properties. It is shown that for a class of risks, denoted canonical risks, asymptotic Bayes consistency is compatible with simple analytical relationships between these functions. These enable a precise characterization of the loss for a popular class of link functions. It is shown that, when the risk is in canonical form and the link is inverse sigmoidal, the margin properties of the loss are determined by a single parameter. Novel families of Bayes consistent loss functions, of variable margin, are derived. These families are then used to design boosting style algorithms with explicit control of the classification margin. The new algorithms generalize well established approaches, such as LogitBoost. Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms. Finally, it is shown that best performance can be achieved by cross-validating the margin parameter. 1
3 0.89092505 280 nips-2010-Unsupervised Kernel Dimension Reduction
Author: Meihong Wang, Fei Sha, Michael I. Jordan
Abstract: We apply the framework of kernel dimension reduction, originally designed for supervised problems, to unsupervised dimensionality reduction. In this framework, kernel-based measures of independence are used to derive low-dimensional representations that maximally capture information in covariates in order to predict responses. We extend this idea and develop similarly motivated measures for unsupervised problems where covariates and responses are the same. Our empirical studies show that the resulting compact representation yields meaningful and appealing visualization and clustering of data. Furthermore, when used in conjunction with supervised learners for classification, our methods lead to lower classification errors than state-of-the-art methods, especially when embedding data in spaces of very few dimensions.
4 0.83334351 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
Author: Leonidas Lefakis, Francois Fleuret
Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
6 0.82991564 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
7 0.82957906 63 nips-2010-Distributed Dual Averaging In Networks
8 0.82874709 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
9 0.82856637 158 nips-2010-Learning via Gaussian Herding
10 0.82835031 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
11 0.82428986 117 nips-2010-Identifying graph-structured activation patterns in networks
12 0.82236326 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
13 0.82234991 155 nips-2010-Learning the context of a category
14 0.82204431 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
15 0.82143247 148 nips-2010-Learning Networks of Stochastic Differential Equations
16 0.82103062 243 nips-2010-Smoothness, Low Noise and Fast Rates
17 0.82036978 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.81999588 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
19 0.81905955 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
20 0.81881213 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery