nips nips2012 nips2012-98 knowledge-graph by maker-knowledge-mining

98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound


Source: pdf

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Previous margin bounds, both for SVM and for boosting, are dimensionality independent. [sent-6, score-0.549]

2 A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. [sent-7, score-0.215]

3 In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. [sent-8, score-0.635]

4 The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. [sent-10, score-0.199]

5 We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. [sent-11, score-1.063]

6 In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. [sent-12, score-0.285]

7 We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers. [sent-13, score-1.228]

8 There have been extensive works on bounding the generalization errors of SVM and boosting in terms of margins (with various definitions such l2 , l1 , soft, hard, average, minimum, etc. [sent-16, score-0.268]

9 ) In 1970’s Vapnik pointed out that large margin can imply good generalization. [sent-17, score-0.438]

10 This bound was improved and simplified in a series of works [2, 3, 4, 5] mainly based on the PAC-Bayes theory [6] which was developed originally for stochastic classifiers. [sent-20, score-0.225]

11 (See Section 2 for a brief review of the PAC-Bayes theory and the PAC-Bayes margin bounds. [sent-21, score-0.44]

12 ) All these bounds state that if a linear classifier in the feature space induces large margins for most of the training examples, then it has a small generalization error bound independent of the dimensionality of the feature space. [sent-22, score-0.715]

13 The (l1 ) margin has also been extensively studied for boosting to explain its generalization ability. [sent-23, score-0.555]

14 [7] proved a margin bound for the generalization error of voting classifiers. [sent-25, score-0.807]

15 The bound is independent of the number of base classifiers combined in the voting classifier1 . [sent-26, score-0.264]

16 This margin bound was greatly improved in [8, 9] using (local) Rademacher complexities. [sent-27, score-0.573]

17 There also exist improved margin bounds for boosting from the viewpoint of PAC-Bayes theory [10], the diversity of base classifiers [11], and different definition of margins [12, 13]. [sent-28, score-0.782]

18 1 The bound depends on the VC dimension of the base hypothesis class. [sent-29, score-0.204]

19 Nevertheless, given the VC dimension of the base hypothesis space, the bound does not depend on the number of the base classifiers, which can be seen as the dimension of the feature space. [sent-30, score-0.299]

20 1 The aforementioned margin bounds are all dimensionality independent. [sent-31, score-0.679]

21 That is, the bounds are solely characterized by the margins on the training data and do not depend on the dimension of feature space. [sent-32, score-0.293]

22 A major advantage of such dimensionality independent margin bounds is that they can explain the generalization ability of SVM and boosting whose feature spaces have high or infinite dimension, in which case the standard VC bound becomes trivial. [sent-33, score-0.966]

23 Although very successful in bounding the generalization error, a natural question is whether this dimensionality independency is intrinsic for margin bounds. [sent-34, score-0.729]

24 Building upon the PAC-Bayes theory, we prove a dimensionality dependent margin bound. [sent-36, score-0.616]

25 This bound is monotone increasing with respect to the dimension when keeping all other factors fixed. [sent-37, score-0.199]

26 Comparing with the PAC-Bayes margin bound of Langford [4], the new bound is strictly sharper when the feature space is of finite dimension; and the two bounds tend to be equal as the dimension goes to infinity. [sent-38, score-1.063]

27 The experimental results show that the new bound is significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers on relatively large datasets. [sent-40, score-1.162]

28 Section 2 contains a brief review of the PAC-Bayes theory and the dimensionality independent PAC-Bayes margin bound. [sent-43, score-0.57]

29 In Section 3 we give the dimensionality dependent PAC-Bayes margin bound and further improvements. [sent-44, score-0.746]

30 When clear from the context, we often denote by erD (Q) and erS (Q) the generalization and empirical error of the stochastic classifier Q respectively. [sent-62, score-0.207]

31 In this paper we always consider homogeneous linear classifiers2 , or stochastic classifiers whose distribution is over homogeneous linear classifiers. [sent-65, score-0.224]

32 For any w ∈ Rd , the linear classifier cw is defined as cw (·) = sgn[< w, · >]. [sent-67, score-0.695]

33 When we consider a probability distribution over all homogeneous linear classifiers cw in Rd , we can equivalently consider a distribution of w ∈ Rd . [sent-68, score-0.422]

34 For any P and any δ ∈ (0, 1), with probability 1 − δ over the random draw of n training examples KL(Q||P ) + ln n+1 δ (1) n holds simultaneously for all distributions Q. [sent-78, score-0.246]

35 b 1−b kl (erS (Q) || erD (Q)) ≤ The above PAC-Bayes theorem states that if a stochastic classifier, whose distribution Q is close (in the sense of KL divergence) to the fixed prior P , has a small training error, then its generalization error is small. [sent-80, score-0.377]

36 Very interestingly, it is shown in [2] that one can derive a margin bound for linear classifiers (including SVM) from the PAC-Bayes theorem quite easily. [sent-85, score-0.629]

37 It is much simpler and slightly tighter than previous margin bounds for SVM [1, 20]. [sent-86, score-0.575]

38 Let Q(µ, w) (µ > 0, w ∈ Rd , ∥ˆ ∥ = 1) denote the distribution of w homogeneous linear classifiers cw , where w ∼ N (µˆ , I). [sent-91, score-0.422]

39 For any δ ∈ (0, 1), with probability 1 − δ w over the random draw of n training examples ˆ ˆ kl (erS (Q(µ, w)) || erD (Q(µ, w))) ≤ µ2 2 + ln n+1 δ n (2) ˆ holds simultaneously for all µ > 0 and all w ∈ Rd with ∥ˆ ∥ = 1. [sent-92, score-0.341]

40 In addition, the empirical error w of the stochastic classifier can be written as ˆ erS (Q(µ, w)) = ES Φ(µγ(ˆ ; x, y)), w ˆ is the margin of (x, y) with respect to the unit vector w; and ∫ ∞ 2 1 √ e−τ /2 dτ Φ(t) = 1 − Φ(t) = 2π t is the probability of the upper tail of Gaussian distribution. [sent-93, score-0.564]

41 , γ(w; x, y) is large for most (x, y) , then choosing a relatively small µ would ˆ yield a small erS (Q(µ, w)) and in turn a small upper bound for the generalization error of the ˆ stochastic classifier Q(µ, w). [sent-97, score-0.363]

42 Note that this bound does not depend on the dimensionality d. [sent-98, score-0.26]

43 In fact almost all previously known margin bounds are dimensionality independent3 . [sent-99, score-0.679]

44 There is a close relation between the error of a stochastic classifier defined by distribution Q and the error of the deterministic voting classifier vQ . [sent-102, score-0.38]

45 3, one can upper bound the generalization error of the ˆ voting classifier vQ associated with Q(µ, w) given in Theorem 2. [sent-111, score-0.388]

46 In fact, it is easy to see that ˆ vQ = cw , the voting classifier is exactly the linear classifier w. [sent-113, score-0.48]

47 ˆ (6) 3 There exist dimensionality dependent margin bounds [21]. [sent-115, score-0.746]

48 However these bounds grow unboundedly as the dimensionality tends to infinity. [sent-116, score-0.26]

49 3 and (6), we have that with probability 1−δ the following margin ˆ ˆ bound holds for all classifiers cw with w ∈ Rd , ∥w∥ = 1 and all µ > 0: ˆ ( ) µ2 + ln n+1 erD (cw ) ˆ δ ˆ kl erS (Q(µ, w)) || ≤ 2 . [sent-119, score-1.13]

50 (7) 2 n One disadvantage of the bounds in (5), (6) and (7) is that they involve a multiplicative factor of 2. [sent-120, score-0.213]

51 Let erD,θ (Q(µ, w)) = ˆ Ew∼N (µˆ ,I) PD y ≤ θ be the error of the stochastic classifier with margin θ. [sent-127, score-0.564]

52 ˆ The bound states that if the stochastic classifier induces small errors with large margin θ, then the linear (voting) classifier has only a slightly larger generalization error than the stochastic classifier. [sent-129, score-0.831]

53 It is also worth pointing out that the margin y considered in Proposition ∥x∥ 2. [sent-132, score-0.419]

54 2, one can show that for any θ ≥ 0 with probability 1 − δ the following bound ˆ is valid for all µ and w uniformly: ˆ ˆ kl (erS,θ (Q(µ, w)) || erD,θ (Q(µ, w))) ≤ µ2 2 + ln n+1 δ . [sent-138, score-0.353]

55 For any θ ≥ 0 and any δ > 0 with probability 1 − δ the following bound is valid ˆ for all µ and w uniformly: ( ) ˆ kl erS,θ (Q(µ, w)) || erD (cw ) − Φ(θ)) ≤ ˆ µ2 2 + ln n+1 δ . [sent-142, score-0.353]

56 Let Q(µ, w) (µ > 0, w ∈ Rd , ∥ˆ ∥ = 1) denote the distribution of linear classifiers w cw (·) = sgn[< w, · >], where w ∼ N (µˆ , I). [sent-150, score-0.36]

57 For any δ ∈ (0, 1), with probability 1 − δ over the w random draw of n training examples ˆ ˆ kl (erS (Q(µ, w)) || erD (Q(µ, w))) ≤ d 2 ln(1 + µ2 d ) n + ln n+1 δ (11) ˆ ˆ holds simultaneously for all µ > 0 and all w ∈ Rd with ∥ˆ ∥ = 1. [sent-151, score-0.341]

58 The bound (11) is sharper than (2) for any d < ∞, and the two bounds tend to be equivalent as d → ∞. [sent-159, score-0.452]

59 1 is the first dimensionality dependent margin bound that remains nontrivial in infinite dimension. [sent-161, score-0.746]

60 As described in (7) in Section 2, we can also obtain a margin bound for the deterministic linear ˆ classifier cw by combining (11) with erD (cw ) ≤ 2 erD (Q(µ, w)). [sent-170, score-0.948]

61 1 we can almost recover the VC bound [23] √ ( ( )) d 1 + ln 2n + ln 4 d δ (12) erD (c) ≤ erS (c) + n for homogenous linear classifiers in Rd under mild conditions. [sent-173, score-0.411]

62 In a sense, the dimensionality dependent margin bound in Theorem 3. [sent-182, score-0.746]

63 1 unifies the dimensionality independent margin bound and the VC bound for linear classifiers. [sent-183, score-0.834]

64 3 involves a multiplicative factor of 2 when bounding the error of the deterministic voting classifier by the error of the stochastic classifier. [sent-187, score-0.495]

65 Note that in ˆ general erD (cw ) ≤ 2erD (Q(µ, w)) cannot be improved (consider the case that with probability one ˆ ˆ the data has zero margin with respect to w). [sent-188, score-0.443]

66 Here we study how to improve it for large margin classifiers. [sent-189, score-0.419]

67 4 gives erD (cw ) ≤ erD,θ (Q(µ, w)) + Φ(θ), which bounds the generˆ alization error of the linear classifier in terms of the error of the stochastic classifier with margin θ ≥ 0. [sent-191, score-0.814]

68 It will soon be clear that this brings additional benefits when combining with the dimensionality dependent margin bound. [sent-223, score-0.616]

69 N ˆ Let erD,θ (Q(µ, w)) = Ew∼N (µˆ ,I) PD (y ∥w∥∥x∥ ≤ θ) be the true error of the stochastic classifier w N ˆ ˆ Q(µ, w) with normalized margin θ ∈ [−1, 1]. [sent-224, score-0.564]

70 N The true margin error erD,θ (Q) can be bounded by its empirical version similar to Theorem 3. [sent-232, score-0.514]

71 1: For any θ ≥ 0 and any δ > 0, with probability 1 − δ ( N ) N ˆ ˆ kl erS,θ (Q(µ, w))||erD,θ (Q(µ, w)) ≤ d 2 ln(1 + µ2 d ) + ln n+1 δ n (17) ˆ ˆ holds simultaneously for all µ > 0 and w ∈ Rd with ∥w∥ = 1. [sent-233, score-0.276]

72 Combining the previous two results we have a dimensionality dependent margin bound for the linear classifier cw . [sent-234, score-1.106]

73 For any θ ≥ 0 and any δ > 0, with probability 1 − δ over the random draw of n training examples kl ( N ˆ erS,θ (Q(µ, w))||erD (cw )Φ(µθ) ˆ ) ≤ d 2 ln(1 + µ2 d ) n + ln n+1 δ (18) ˆ holds simultaneously for all µ > 0 and w ∈ Rd with ∥ˆ ∥ = 1. [sent-238, score-0.341]

74 Observe that as µ getting large, erS,θ (Q(µ, w)) = Ew∼N (µˆ ,I) PD (y ∥w∥∥x∥ ≤ θ) tends to the ) (w w ˆ ˆ empirical error of the linear classifier w with margin θ, i. [sent-241, score-0.539]

75 Taking into the consideration that the RHS of (18) scales only in O(ln µ), we can choose a relatively large µ and (18) gives a dimensionality dependent margin bound whose multiplicative factor can be very close to 1. [sent-245, score-0.855]

76 The goal is to see to what extent the Dimensionality Dependent margin bound (will be referred to as DD-margin bound) is sharper than the Dimensionality Independent margin bound (will be referred to as DI-margin bound) as well as the VC bound. [sent-247, score-1.27]

77 We plot the values of the three bounds—the DD-margin bound, the DI-margin bound, the VC bound (12) as well as the test and training error (see Figure 1 - Figure 12). [sent-256, score-0.27]

78 For the two margin bounds, since they hold uniformly for µ > 0, we select the optimal µ to make the bounds as small as possible. [sent-257, score-0.549]

79 2 respectively to obtain the final bound for the generalization error of the deterministic linear classifiers. [sent-261, score-0.351]

80 All bounds in the figures (including training and test error) are for deterministic (voting) classifier. [sent-263, score-0.214]

81 On all these datasets, the DD-margin bounds are significantly sharper than the DI-margin bounds as well as the VC bounds. [sent-268, score-0.432]

82 The DD-margin bounds are still always, but less significantly, sharper than the DImargin bounds. [sent-276, score-0.302]

83 In sum, the experimental results demonstrate that the DD-margin bound is usually significantly sharper than the DI-margin bound as well as the VC bound if the dataset is relatively large. [sent-278, score-0.588]

84 5 Conclusion In this paper we study the problem whether dimensionality independency is intrinsic for margin bounds. [sent-281, score-0.635]

85 This bound is sharper than a previously well-known dimensionality independent margin bound when the feature space is of finite dimension; and they tend to be equivalent as the dimensionality grows to infinity. [sent-283, score-1.152]

86 Experimental results demonstrate that for relatively large datasets the new bound is often useful for model selection and significantly sharper than previous margin bound as well as the VC bound. [sent-284, score-0.912]

87 6 DD−margin DI−margin VC train error test error 1. [sent-289, score-0.267]

88 6 DD−margin DI−margin VC train error test error 0. [sent-294, score-0.267]

89 2 DD−margin DI−margin VC train error test error 0. [sent-312, score-0.267]

90 6 DD−margin DI−margin VC train error test error 1. [sent-318, score-0.267]

91 4 DD−margin DI−margin VC train error test error 1. [sent-319, score-0.267]

92 8 Figure 5: Optdigits DD−margin DI−margin VC train error test error 12 DD−margin DI−margin VC train error test error t Figure 4: Mushroom 1. [sent-325, score-0.534]

93 4 DD−margin DI−margin VC train error test error error error 0 0 12 1. [sent-334, score-0.457]

94 2 10 12 0 0 DD−margin DI−margin VC train error test error 0. [sent-354, score-0.267]

95 2 2 4 6 t Figure 10: Glass 12 1 DD−margin DI−margin VC train error test error 0. [sent-358, score-0.267]

96 4 DD−margin DI−margin VC train error test error 1. [sent-366, score-0.267]

97 6 0 0 0 0 12 t error 0 0 error DD−margin DI−margin VC train error test error 1 error 1. [sent-369, score-0.552]

98 A future work is to study whether there exist dimensionality dependent margin bounds (not necessarily PAC-Bayes) without this multiplicative factor. [sent-377, score-0.808]

99 Empirical margin distributions and bounding the generalization error of combined classifiers. [sent-407, score-0.608]

100 A refined margin analysis for boosting algorithms via equilibrium margin. [sent-423, score-0.493]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('erd', 0.464), ('margin', 0.419), ('cw', 0.335), ('vc', 0.271), ('ers', 0.236), ('sharper', 0.172), ('classi', 0.152), ('proposition', 0.135), ('bounds', 0.13), ('bound', 0.13), ('dimensionality', 0.13), ('dd', 0.13), ('ln', 0.128), ('er', 0.122), ('vq', 0.114), ('voting', 0.101), ('kl', 0.095), ('error', 0.095), ('margins', 0.081), ('pd', 0.08), ('rd', 0.074), ('boosting', 0.074), ('di', 0.071), ('dependent', 0.067), ('independency', 0.064), ('ew', 0.064), ('laviolette', 0.064), ('generalization', 0.062), ('homogeneous', 0.062), ('multiplicative', 0.062), ('theorem', 0.055), ('train', 0.052), ('mario', 0.051), ('stochastic', 0.05), ('francois', 0.049), ('germain', 0.048), ('lacasse', 0.048), ('ps', 0.047), ('sgn', 0.044), ('rhs', 0.043), ('svm', 0.042), ('ec', 0.041), ('dimension', 0.041), ('deterministic', 0.039), ('langford', 0.038), ('amiran', 0.036), ('breastcancer', 0.036), ('emilio', 0.036), ('moe', 0.036), ('peking', 0.036), ('pima', 0.036), ('john', 0.036), ('datasets', 0.035), ('vladimir', 0.034), ('base', 0.033), ('mushroom', 0.032), ('liwei', 0.032), ('marchand', 0.032), ('bounding', 0.032), ('alexandre', 0.031), ('simultaneously', 0.03), ('optdigits', 0.03), ('wdbc', 0.03), ('ndez', 0.03), ('monotone', 0.028), ('koltchinskii', 0.028), ('dmitry', 0.028), ('pendigits', 0.027), ('pac', 0.027), ('relatively', 0.026), ('tighter', 0.026), ('linear', 0.025), ('test', 0.025), ('waveform', 0.024), ('glass', 0.024), ('pascal', 0.024), ('improved', 0.024), ('draw', 0.024), ('holds', 0.023), ('intrinsic', 0.022), ('divergence', 0.022), ('matthias', 0.021), ('factor', 0.021), ('conduct', 0.021), ('feature', 0.021), ('examples', 0.021), ('schapire', 0.021), ('theory', 0.021), ('tend', 0.02), ('peter', 0.02), ('training', 0.02), ('libsvm', 0.019), ('letter', 0.019), ('pointed', 0.019), ('extensive', 0.019), ('unnormalized', 0.019), ('easy', 0.019), ('repository', 0.018), ('benchmark', 0.017), ('perception', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

2 0.22042973 200 nips-2012-Local Supervised Learning through Space Partitioning

Author: Joseph Wang, Venkatesh Saligrama

Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.

3 0.1608263 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

Author: Elad Hazan, Zohar Karnin

Abstract: We present a simplex algorithm for linear programming in a linear classification formulation. The paramount complexity parameter in linear classification problems is called the margin. We prove that for margin values of practical interest our simplex variant performs a polylogarithmic number of pivot steps in the worst case, and its overall running time is near linear. This is in contrast to general linear programming, for which no sub-polynomial pivot rule is known. 1

4 0.11533426 181 nips-2012-Learning Multiple Tasks using Shared Hypotheses

Author: Koby Crammer, Yishay Mansour

Abstract: In this work we consider a setting where we have a very large number of related tasks with few examples from each individual task. Rather than either learning each task individually (and having a large generalization error) or learning all the tasks together using a single hypothesis (and suffering a potentially large inherent error), we consider learning a small pool of shared hypotheses. Each task is then mapped to a single hypothesis in the pool (hard association). We derive VC dimension generalization bounds for our model, based on the number of tasks, shared hypothesis and the VC dimension of the hypotheses class. We conducted experiments with both synthetic problems and sentiment of reviews, which strongly support our approach. 1

5 0.11030393 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

Author: Amit Daniely, Sivan Sabato, Shai S. Shwartz

Abstract: We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over Rd . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the approximation error of hypothesis classes. This is in contrast to most previous uses of VC theory, which only deal with estimation error. 1

6 0.10741543 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

7 0.098163903 197 nips-2012-Learning with Recursive Perceptual Representations

8 0.089818612 361 nips-2012-Volume Regularization for Binary Classification

9 0.082284778 227 nips-2012-Multiclass Learning with Simplex Coding

10 0.079748526 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

11 0.070783369 291 nips-2012-Reducing statistical time-series problems to binary classification

12 0.070295803 142 nips-2012-Generalization Bounds for Domain Adaptation

13 0.070239626 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

14 0.066236727 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

15 0.065357722 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

16 0.065211251 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

17 0.064462587 324 nips-2012-Stochastic Gradient Descent with Only One Projection

18 0.064087436 37 nips-2012-Affine Independent Variational Inference

19 0.059534844 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

20 0.058030512 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.161), (1, 0.013), (2, 0.018), (3, -0.054), (4, 0.096), (5, -0.009), (6, 0.011), (7, 0.157), (8, -0.074), (9, -0.064), (10, 0.083), (11, 0.05), (12, 0.06), (13, 0.012), (14, 0.014), (15, -0.071), (16, -0.114), (17, 0.019), (18, -0.013), (19, 0.106), (20, -0.09), (21, -0.036), (22, -0.066), (23, 0.051), (24, -0.048), (25, -0.059), (26, -0.063), (27, -0.057), (28, -0.076), (29, -0.071), (30, -0.038), (31, 0.148), (32, 0.031), (33, -0.124), (34, -0.041), (35, 0.019), (36, -0.076), (37, 0.068), (38, -0.103), (39, 0.006), (40, 0.023), (41, 0.042), (42, -0.007), (43, -0.065), (44, -0.019), (45, 0.026), (46, -0.041), (47, -0.075), (48, 0.068), (49, -0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97421885 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

2 0.79432213 200 nips-2012-Local Supervised Learning through Space Partitioning

Author: Joseph Wang, Venkatesh Saligrama

Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.

3 0.7749874 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications

Author: Amit Daniely, Sivan Sabato, Shai S. Shwartz

Abstract: We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over Rd . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the approximation error of hypothesis classes. This is in contrast to most previous uses of VC theory, which only deal with estimation error. 1

4 0.69741189 361 nips-2012-Volume Regularization for Binary Classification

Author: Koby Crammer, Tal Wagner

Abstract: We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains “simple” weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of 30 NLP datasets and binarized USPS optical character recognition datasets. 1

5 0.69217026 227 nips-2012-Multiclass Learning with Simplex Coding

Author: Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco, Jean-jeacques Slotine

Abstract: In this paper we discuss a novel framework for multiclass learning, defined by a suitable coding/decoding strategy, namely the simplex coding, that allows us to generalize to multiple classes a relaxation approach commonly used in binary classification. In this framework, we develop a relaxation error analysis that avoids constraints on the considered hypotheses class. Moreover, using this setting we derive the first provably consistent regularized method with training/tuning complexity that is independent to the number of classes. We introduce tools from convex analysis that can be used beyond the scope of this paper. 1

6 0.65279138 181 nips-2012-Learning Multiple Tasks using Shared Hypotheses

7 0.6511057 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

8 0.63726729 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

9 0.60003465 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

10 0.59249616 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

11 0.58335972 197 nips-2012-Learning with Recursive Perceptual Representations

12 0.56084371 291 nips-2012-Reducing statistical time-series problems to binary classification

13 0.55832577 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

14 0.54931772 174 nips-2012-Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

15 0.54398346 271 nips-2012-Pointwise Tracking the Optimal Regression Function

16 0.54325992 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

17 0.50306541 279 nips-2012-Projection Retrieval for Classification

18 0.49586397 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

19 0.4913761 217 nips-2012-Mixability in Statistical Learning

20 0.46696064 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.019), (17, 0.01), (21, 0.018), (22, 0.211), (38, 0.165), (39, 0.023), (42, 0.105), (54, 0.035), (55, 0.02), (74, 0.04), (76, 0.134), (80, 0.094), (92, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83925045 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

Author: Ashwini Shukla, Aude Billard

Abstract: Non-linear dynamical systems (DS) have been used extensively for building generative models of human behavior. Their applications range from modeling brain dynamics to encoding motor commands. Many schemes have been proposed for encoding robot motions using dynamical systems with a single attractor placed at a predefined target in state space. Although these enable the robots to react against sudden perturbations without any re-planning, the motions are always directed towards a single target. In this work, we focus on combining several such DS with distinct attractors, resulting in a multi-stable DS. We show its applicability in reach-to-grasp tasks where the attractors represent several grasping points on the target object. While exploiting multiple attractors provides more flexibility in recovering from unseen perturbations, it also increases the complexity of the underlying learning problem. Here we present the Augmented-SVM (A-SVM) model which inherits region partitioning ability of the well known SVM classifier and is augmented with novel constraints derived from the individual DS. The new constraints modify the original SVM dual whose optimal solution then results in a new class of support vectors (SV). These new SV ensure that the resulting multistable DS incurs minimum deviation from the original dynamics and is stable at each of the attractors within a finite region of attraction. We show, via implementations on a simulated 10 degrees of freedom mobile robotic platform, that the model is capable of real-time motion generation and is able to adapt on-the-fly to perturbations.

2 0.83346021 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Author: Michael Collins, Shay B. Cohen

Abstract: We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a significant speed-up at minimal cost for parsing performance. 1

same-paper 3 0.82876974 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.

4 0.8093313 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

5 0.77264667 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method

Author: Nikhil Bhat, Vivek Farias, Ciamac C. Moallemi

Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches. 1

6 0.75898093 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

7 0.75400472 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

8 0.75048614 242 nips-2012-Non-linear Metric Learning

9 0.74788737 275 nips-2012-Privacy Aware Learning

10 0.7453683 358 nips-2012-Value Pursuit Iteration

11 0.74325031 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

12 0.7423321 289 nips-2012-Recognizing Activities by Attribute Dynamics

13 0.74223673 292 nips-2012-Regularized Off-Policy TD-Learning

14 0.74066412 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

15 0.74045205 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

16 0.73836035 285 nips-2012-Query Complexity of Derivative-Free Optimization

17 0.73763764 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

18 0.73683065 324 nips-2012-Stochastic Gradient Descent with Only One Projection

19 0.73643911 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

20 0.7361446 227 nips-2012-Multiclass Learning with Simplex Coding