nips nips2007 nips2007-62 knowledge-graph by maker-knowledge-mining

62 nips-2007-Convex Learning with Invariances


Source: pdf

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Incorporating invariances into a learning algorithm is a common problem in machine learning. [sent-11, score-0.622]

2 We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. [sent-12, score-0.243]

3 In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. [sent-13, score-0.08]

4 In recent years a number of authors have attempted to put learning with invariances on a solid mathematical footing. [sent-17, score-0.622]

5 For instance, [3] discusses how to extract invariant features for estimation and learning globally invariant estimators for a known class of invariance transforms (preferably arising from Lie groups). [sent-18, score-0.721]

6 Our goal in this work is to develop a computationally scalable and broadly applicable approach to supervised learning with invariances which is easily adapted to new types of problems and can take advantage of existing optimization infrastructures. [sent-23, score-0.732]

7 It formulates invariant learning as a convex problem and thus can be implemented directly using any existing convex solver, requiring minimal additional memory and inheriting the convergence properties/guarantees of the underlying implementation. [sent-25, score-0.312]

8 It can deal with arbitrary invariances, including gradual degradations, provided that the user provides a computational recipe to generate invariant equivalents efficiently from a given data vector. [sent-27, score-0.212]

9 2 Maximum Margin Loss with Invariances We begin by describing a maximum margin formulation of supervised learning which naturally incorporates invariance transformations on the input objects. [sent-30, score-0.641]

10 For instance, if we are training a ¯ (generative or discriminative) probabilistic model, f (x, y) = log p(y|x) then our prediction is the maximum a-posteriori estimate of the target y given x. [sent-36, score-0.091]

11 For the purpose of our analysis the precise form of f is immaterial, although our experiments focus on the kernel machines, due to the availability of scalable optimizers for that class of estimators. [sent-41, score-0.101]

12 1 Invariance Transformations and Invariance Sensitive Cost The crucial ingredient to formulating invariant learning is to capture the domain knowledge that there exists some class S of invariance transforms s which can act on the input x while leaving the target y essentially unchanged. [sent-43, score-0.651]

13 We denote by (s(x), y) s ∈ S the set of valid transformations of the pair (x, y). [sent-44, score-0.108]

14 For instance, we might believe that slight rotation (in pixel coordinates) of an input image in a pattern recognition problem do not change the image label. [sent-45, score-0.251]

15 For text classification problems such as spam filtering, we may believe that certain editing operations (such as changes in capitalization or substitutions like Viagra → V1agra,V! [sent-46, score-0.225]

16 For instance, rotating an image of the digit 6 too far might change its label to 9; applying both a substitution and an insertion can change Viagra → diagram. [sent-51, score-0.148]

17 Furthermore, certain invariances may only hold for certain pairs of input and target. [sent-52, score-0.661]

18 For example, we might believe that horizontal reflection is a valid invariance for images of digits in classes 0 and 8 but not for digits in class 2. [sent-53, score-0.573]

19 ) To complete the setup, we adopt the standard assumption that the world or task imposes a cost function such that if the true target for an input x is y and our prediction is y (x) we suffer a cost ¯ ∆(y, y (x)). [sent-56, score-0.305]

20 2 For learning with invariances, we extend the definition of ∆ to include the invariance ¯ function s(x), if any, which was applied to the input object: ∆(y, y (s(x)), s). [sent-57, score-0.372]

21 This allows the cost ¯ to depend on the transformation, for instance we might suffer less cost for poor predictions when the input has undergone very extreme transformations. [sent-58, score-0.311]

22 In a image labeling problem, for example, we might believe that a lighting/exposure invariance applies but we might want to charge small cost for extremely over-exposed or under-exposed images since they are almost impossible to label. [sent-59, score-0.545]

23 Similarly, we might assert that scale invariance holds but give small cost to severely spatially downsampled images since they contain very little information. [sent-60, score-0.445]

24 2 Max Margin Invariant Loss Our approach to the invariant learning problem is very natural, yet allows us to make a surprising amount of analytical and algorithmic progress. [sent-62, score-0.174]

25 A key quantity is the cost under the worst case transformation for each example, i. [sent-63, score-0.221]

26 the transformation under which our predicted target suffers 1 2 For more nontrivial examples see, e. [sent-65, score-0.137]

27 The function Γ(y, y ) is a nonnegative margin scaling which allows different target/prediction pairs to impose different amounts of loss on the final objective function. [sent-70, score-0.302]

28 3 The numerical scale of Γ also sets the regularization tradeoff between margin violations and the prediction cost ∆. [sent-71, score-0.238]

29 This loss function has two mathematically important properties which allow us to develop scalable and convergent algorithms as proposed above. [sent-72, score-0.213]

30 Lemma 1 The loss l(x, y, f ) is convex in f for any choice of Γ, ∆ and S. [sent-73, score-0.208]

31 Taking the supremum over a set of convex functions yields a convex function. [sent-75, score-0.183]

32 This means that we can plug l into any convex solver, in particular whenever f belongs to a linear function class, as is the case with kernel methods. [sent-76, score-0.123]

33 The primal (sub)gradient of l is easy to write: ∂f l(x, y, f ) = Γ(y, y ∗ )(φ(s∗ (x), y ∗ ) − φ(s∗ (x), y)) ∗ (3) ∗ where s , y are values of s, y for which the supremum in Eq. [sent-77, score-0.078]

34 In kernel methods φ is commonly referred to as the feature map with associated kernel k((x, y), (x , y )) = φ(x, y), φ(x , y ) . [sent-79, score-0.108]

35 All we need is a computational recipe to obtain the worst case s ∈ S in terms of the scaled margin in Eq. [sent-81, score-0.261]

36 Lemma 2 The loss l(x, y, f ) provides an upper bound on C(x, y, f ) = sups∈S ∆(y, y (s(x)), s). [sent-84, score-0.139]

37 The main modifications are the inclusion of a margin scale Γ and the use of an invariance transform s(x). [sent-92, score-0.46]

38 In section 4 we clarify how a number of existing methods for dealing with invariances can be viewed as special cases of Eq. [sent-93, score-0.65]

39 (2) penalizes estimation errors not only for the observed pair (x, y) but also for patterns s(x) which are “near” x in terms of the invariance transform s. [sent-96, score-0.333]

40 Recall, however, that the cost function ∆ may assign quite a small cost to a transformation s which takes x very far away from the original. [sent-97, score-0.242]

41 3 Such scaling has been shown to be extremely important and effective in many practical problems especially in structured prediction tasks. [sent-100, score-0.096]

42 For example, the key difference between the large margin settings of [14] and [16] is the incorporation of a sequence-length dependent margin scaling. [sent-101, score-0.254]

43 3 3 Learning Algorithms for Minimizing Invariant Loss We now turn to the question of learning algorithms for our invariant loss function. [sent-102, score-0.313]

44 We follow the common approach of minimizing, at training time, our average training loss plus a penalty for model complexity. [sent-110, score-0.197]

45 In the context of kernel methods this can be viewed as a regularized empirical risk functional of the form R[f ] = 1 m m l(xi , yi , f ) + i=1 λ f 2 2 H where f (x, y) = φ(x, y), w . [sent-111, score-0.277]

46 subject to λm (6a) i=1 y∈Y s∈S (6b) y∈Y s∈S Here the entries of the kernel matrix K are given by Kiys,jy s = Γ(yi , y)Γ(yj , y ) φ(s(xi ), y) − φ(s(xi ), yi ), φ(s (xj ), y ) − φ(s (xj ), yj ) (7) This can be expanded into four kernel functions by using Eq. [sent-113, score-0.201]

47 Moreover, the connection between the dual coefficients αiys and f is given by m αiys [k((s(xi ), y), (x , y )) − k((s(xi ), yi ), (x , y ))] . [sent-115, score-0.138]

48 f (x , y ) = (8) i=1 y∈Y s∈S There are many strategies for attempting to minimize this regularized loss, either in the primal formulation or the dual, using either batch or online algorithms. [sent-116, score-0.128]

49 In fact, a number of previous heuristics for dealing with invariances can be viewed as heuristics for approximately minimizing an approximation to an invariant loss similar to l. [sent-117, score-0.963]

50 For this reason we believe a discussion of optimization is valuable before introducing specific applications of the invariance loss. [sent-118, score-0.408]

51 Whenever the are an unlimited combination of valid transformations and targets (i. [sent-119, score-0.108]

52 the domain S × Y is infinite), the optimization above is a semi-infinite program, hence exact minimization of R[f ] or of its dual are essentially impossible. [sent-121, score-0.108]

53 1 A Variant of SVMStruct The work of [16, 10] on SVMStruct-like optimization methods can be used directly to solve regularized risk minimization problems. [sent-126, score-0.135]

54 The basic idea is to compute gradients of l(xi , yi , f ), either one observation at a time, or for the entire set of observations simultaneously and to perform updates in the dual space. [sent-127, score-0.138]

55 We give an instance of SVMStruct for invariances in Algorithm 1. [sent-129, score-0.662]

56 The basic idea is that instead of checking the constraints arising from the loss functions only for y we check them for (y, s), that is, an invariance in combination with a corresponding label which violates the margin most. [sent-130, score-0.599]

57 Algorithm 2 is an adaptation of their method to learning with our convex invariance loss. [sent-141, score-0.402]

58 In a nutshell, the algorithm performs stochastic gradient descent on the regularized 1 version of the instantaneous loss while using a learning rate of λt and while projecting the current weight vector back to a feasible region f ≤ 2R[0] λ , should it exceed it. [sent-142, score-0.199]

59 We can apply [12, Lemma 1] immediately: Theorem 3 Denote by Rt [f ] := l(xt mod m , yt mod m , f ) + t. [sent-147, score-0.142]

60 In this case Algorithm 2 satisfies the following bound: 1 T T Rt [ t=1 1 T T ft ] ≤ ¯ ¯ t 1 T T Rt [ft ] ≤ t=1 min q f ≤ 2R[0] λ 1 T λ 2 f 2 the instantaneous risk at step T Rt [f ] + t=1 R2 (1 + log T ) . [sent-148, score-0.152]

61 2λT (9) In particular, if T is a multiple of m we obtain bounds for the regularized risk R[f ]. [sent-149, score-0.102]

62 4 Related work and specific invariances While the previous sections gave a theoretical description of the loss, we now discuss a number of special cases which can be viewed as instances of a convex invariance loss function presented here. [sent-150, score-1.191]

63 An extension of this approach is to generate virtual points only from the support vectors (SVs) obtained from training on the original dataset [5]. [sent-152, score-0.242]

64 The advantage of this approach is that it results in far fewer SV than training on all virtual points. [sent-153, score-0.16]

65 Our current loss based approach does optimize an objective, and generates the required support vectors in the process of the optimization. [sent-155, score-0.221]

66 Second Order Cone Programming for Missing and Uncertain Data: In [2], the authors consider the case where the invariance is in the form of ellipsoids around the original point. [sent-156, score-0.333]

67 This is shown to correspond to a second order cone program (SOCP). [sent-157, score-0.119]

68 Semidefinite Programming for Invariances: Graepel and Herbrich [8] introduce a method for learning when the invariances are polynomial trajectories. [sent-159, score-0.622]

69 Their formulation is again an instance of our general loss based approach. [sent-161, score-0.214]

70 Robust Estimation: Globerson and Roweis [7] address the case where invariances correspond to deletion of a subset of the features (i. [sent-163, score-0.666]

71 In fact, in the next section we introduce a generalization of the invariance in [7] and show how it can be optimized efficiently. [sent-170, score-0.333]

72 5 Experiments Knowledge about invariances can be useful in a wide array of applications such as image recognition and document processing. [sent-171, score-0.707]

73 Here we study two specific cases: handwritten digit recognition on the MNIST data, and spam filtering on the ECML06 dataset. [sent-172, score-0.366]

74 Also, we take the margin scale Γ(y, y ) to be identically one. [sent-174, score-0.127]

75 1 Handwritten Digits Recognition Humans can recognize handwritten digits even when they are altered in various ways. [sent-177, score-0.145]

76 To test our invariant SVM (Invar-SVM) in this context, we used handwritten digits from the MNIST dataset [11] and modeled 20 invariance transformations: 1-pixel and 2-pixel shifts in 4 and 8 directions, rotations by ±10 degrees, scaling by ±0. [sent-178, score-0.688]

77 To test the effect of learning with these invariances we used small training samples of 10, 20, . [sent-181, score-0.651]

78 In this setting invariances are particularly important since they can compensate for the insufficient training data. [sent-185, score-0.651]

79 We compared Invar-SVM to a related method where all possible transformations were applied in advance to each data point to create virtual samples. [sent-186, score-0.207]

80 The virtual and original samples were used to train a multiclass SVM (VIR-SVM). [sent-187, score-0.176]

81 Finally, we also trained a multiclass SVM that did not use any invariance information (STD-SVM). [sent-188, score-0.378]

82 This comes at a certain cost of using more support vectors, but for Invar-SVM the number of support vectors is roughly half of that in the VIR-SVM. [sent-193, score-0.21]

83 2 SPAM Filtering The task of detecting spam emails is a challenging machine learning problem. [sent-195, score-0.244]

84 One of the key difficulties with such data is that it can change over time as a result of attempts of spam authors to outwit spam filters [4]. [sent-196, score-0.366]

85 In this context, the spam filter should be invariant to the ways in which a spam authors will change their style. [sent-197, score-0.54]

86 If documents are 6 Figure 1: Results for the MNIST handwritten digits recognition task, comparing SVM trained on original samples (STD-SVM), SVM trained on original and virtual samples (VIR-SVM), and our convex invariance-loss method (Invar-SVM). [sent-199, score-0.4]

87 Right figure shows the number of support vectors corresponding to the optimum of each method. [sent-201, score-0.082]

88 Here we consider a somewhat more general invariance class (FSCALE) where word counts may be scaled by a maximum factor of u (e. [sent-203, score-0.366]

89 The invariances we consider are thus defined by s(x) = {x ◦ α : α ∈ [l, u]d , l ≤ 1 ≤ u, #{i : αi = 1} ≤ K}, (10) where ◦ denotes element-wise product, d is the number of features, and #{·} denotes the cardinality of the set. [sent-211, score-0.622]

90 We evaluated the performance of our invariance loss FSCALE and its special case FDROP as well as the standard hinge loss on ECML’06 Discovery Challenge Task A dataset. [sent-214, score-0.674]

91 ecml06a-eval has 4000/7500 training/testing emails with dimensionality 206908, and ecml06a-tune has 4000/2500 training/testing emails with dimensionality 169620. [sent-216, score-0.122]

92 The loss FSCALE and its special case FDROP statistically significantly outperform the standard hinge loss (Hinge). [sent-235, score-0.341]

93 Our cost function is essentially a worst case margin loss, and thus its optimization only relies on finding the worst case invariance for a given data point and model. [sent-237, score-0.733]

94 This approach can allow us to solve invariance problems which previously required solving very large optimization problems (e. [sent-238, score-0.366]

95 We thus expect it to extend the scope of learning with invariances both in terms of the invariances used and efficiency of optimization. [sent-241, score-1.244]

96 A second order cone programming formulation for classifying missing data. [sent-262, score-0.169]

97 Lie transformation groups, integral transforms, and invariant pattern recognition. [sent-297, score-0.248]

98 Comparison of learning algorithms for handwritten digit u a recognition. [sent-338, score-0.128]

99 A scalable modular convex solver for regularized risk minimization. [sent-378, score-0.273]

100 Large margin methods for structured and interdependent output variables. [sent-385, score-0.16]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('invariances', 0.622), ('invariance', 0.333), ('spam', 0.183), ('invariant', 0.174), ('svmstruct', 0.156), ('iys', 0.153), ('loss', 0.139), ('virtual', 0.131), ('fscale', 0.128), ('margin', 0.127), ('ft', 0.11), ('fdrop', 0.102), ('yi', 0.093), ('cost', 0.084), ('transformations', 0.076), ('handwritten', 0.076), ('cone', 0.075), ('transformation', 0.074), ('mod', 0.071), ('convex', 0.069), ('digits', 0.069), ('mnist', 0.067), ('hinge', 0.063), ('worst', 0.063), ('pegasos', 0.061), ('emails', 0.061), ('regularized', 0.06), ('recognition', 0.055), ('solver', 0.055), ('kernel', 0.054), ('xi', 0.052), ('digit', 0.052), ('hanson', 0.051), ('iz', 0.051), ('viagra', 0.051), ('globerson', 0.051), ('si', 0.051), ('scalable', 0.047), ('solvers', 0.047), ('roweis', 0.047), ('editors', 0.046), ('qp', 0.045), ('multiclass', 0.045), ('supremum', 0.045), ('saul', 0.045), ('dual', 0.045), ('deletion', 0.044), ('giles', 0.044), ('rt', 0.044), ('support', 0.044), ('program', 0.044), ('svm', 0.043), ('risk', 0.042), ('believe', 0.042), ('kdd', 0.041), ('socp', 0.041), ('cowan', 0.041), ('simard', 0.041), ('instance', 0.04), ('transforms', 0.04), ('input', 0.039), ('insertion', 0.038), ('recipe', 0.038), ('vectors', 0.038), ('suffer', 0.036), ('teo', 0.036), ('graepel', 0.036), ('scaling', 0.036), ('formulation', 0.035), ('target', 0.035), ('nicta', 0.034), ('lecun', 0.034), ('sch', 0.033), ('optimization', 0.033), ('primal', 0.033), ('structured', 0.033), ('scaled', 0.033), ('australia', 0.033), ('semide', 0.032), ('smola', 0.032), ('valid', 0.032), ('incorporates', 0.031), ('appealing', 0.031), ('essentially', 0.03), ('programming', 0.03), ('broadly', 0.03), ('image', 0.03), ('australian', 0.029), ('perturbation', 0.029), ('training', 0.029), ('missing', 0.029), ('sdp', 0.028), ('nontrivial', 0.028), ('might', 0.028), ('viewed', 0.028), ('mathematically', 0.027), ('bottou', 0.027), ('thrun', 0.027), ('prediction', 0.027), ('rotation', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 62 nips-2007-Convex Learning with Invariances

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1

2 0.17897195 118 nips-2007-Learning with Transformation Invariant Kernels

Author: Christian Walder, Olivier Chapelle

Abstract: This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite (c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm for arbitrary c.p.d. kernels. For the thinplate kernel this leads to a classifier with only one parameter (the amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even though the Gaussian involves a second parameter (the length scale). 1

3 0.16962747 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1

4 0.11836845 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

Author: Krishnan Kumar, Chiru Bhattacharya, Ramesh Hariharan

Abstract: This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separable classification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosen O(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated from randomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver. 1

5 0.11252587 40 nips-2007-Bundle Methods for Machine Learning

Author: Quoc V. Le, Alex J. Smola, S.v.n. Vishwanathan

Abstract: We present a globally convergent method for regularized risk minimization problems. Our method applies to Support Vector estimation, regression, Gaussian Processes, and any other regularized risk minimization setting which leads to a convex optimization problem. SVMPerf can be shown to be a special case of our approach. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ ) steps to precision for general convex problems and in O(log(1/ )) steps for continuously differentiable problems. We demonstrate in experiments the performance of our approach. 1

6 0.096471436 21 nips-2007-Adaptive Online Gradient Descent

7 0.091997646 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

8 0.087746926 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

9 0.087285317 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

10 0.08441902 112 nips-2007-Learning Monotonic Transformations for Classification

11 0.080350004 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

12 0.07994362 134 nips-2007-Multi-Task Learning via Conic Programming

13 0.074784569 166 nips-2007-Regularized Boost for Semi-Supervised Learning

14 0.073599711 187 nips-2007-Structured Learning with Approximate Inference

15 0.071128726 110 nips-2007-Learning Bounds for Domain Adaptation

16 0.068110146 106 nips-2007-Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing

17 0.065741532 25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence

18 0.062878646 160 nips-2007-Random Features for Large-Scale Kernel Machines

19 0.061808303 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

20 0.057953861 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.224), (1, 0.003), (2, -0.098), (3, 0.124), (4, 0.026), (5, 0.013), (6, 0.096), (7, 0.021), (8, -0.051), (9, 0.106), (10, 0.171), (11, -0.05), (12, 0.028), (13, 0.058), (14, 0.003), (15, -0.052), (16, -0.011), (17, -0.026), (18, 0.026), (19, 0.04), (20, -0.031), (21, -0.077), (22, 0.084), (23, 0.01), (24, -0.065), (25, -0.064), (26, -0.181), (27, 0.044), (28, -0.086), (29, -0.002), (30, 0.104), (31, -0.051), (32, -0.035), (33, -0.02), (34, -0.115), (35, -0.029), (36, -0.127), (37, -0.022), (38, 0.122), (39, -0.056), (40, -0.212), (41, -0.002), (42, -0.158), (43, -0.107), (44, -0.065), (45, 0.015), (46, -0.015), (47, -0.09), (48, -0.076), (49, -0.0)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93086141 62 nips-2007-Convex Learning with Invariances

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1

2 0.67664003 118 nips-2007-Learning with Transformation Invariant Kernels

Author: Christian Walder, Olivier Chapelle

Abstract: This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite (c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm for arbitrary c.p.d. kernels. For the thinplate kernel this leads to a classifier with only one parameter (the amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even though the Gaussian involves a second parameter (the length scale). 1

3 0.61970776 179 nips-2007-SpAM: Sparse Additive Models

Author: Han Liu, Larry Wasserman, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data. 1

4 0.60189646 24 nips-2007-An Analysis of Inference with the Universum

Author: Olivier Chapelle, Alekh Agarwal, Fabian H. Sinz, Bernhard Schölkopf

Abstract: We study a pattern classification algorithm which has recently been proposed by Vapnik and coworkers. It builds on a new inductive principle which assumes that in addition to positive and negative data, a third class of data is available, termed the Universum. We assay the behavior of the algorithm by establishing links with Fisher discriminant analysis and oriented PCA, as well as with an SVM in a projected subspace (or, equivalently, with a data-dependent reduced kernel). We also provide experimental results. 1

5 0.59404904 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

Author: Krishnan Kumar, Chiru Bhattacharya, Ramesh Hariharan

Abstract: This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections, the minimal number of support vectors required to solve almost separable classification problems, such that the solution obtained is near optimal with a very high probability, is given by O(log n); if on removal of properly chosen O(log n) points the data becomes linearly separable then it is called almost separable. The second contribution is a sampling based algorithm, motivated from randomized algorithms, which solves a SVM problem by considering subsets of the dataset which are greater in size than the number of support vectors for the problem. These two ideas are combined to obtain an algorithm for SVM classification problems which performs the learning by considering only O(log n) points at a time. Experiments done on synthetic and real life datasets show that the algorithm does scale up state of the art SVM solvers in terms of memory required and execution time without loss in accuracy. It is to be noted that the algorithm presented here nicely complements existing large scale SVM learning approaches as it can be used to scale up any SVM solver. 1

6 0.53481919 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

7 0.5189172 152 nips-2007-Parallelizing Support Vector Machines on Distributed Computers

8 0.51491982 40 nips-2007-Bundle Methods for Machine Learning

9 0.49700502 112 nips-2007-Learning Monotonic Transformations for Classification

10 0.49177912 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

11 0.48574603 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

12 0.46456119 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

13 0.43004474 67 nips-2007-Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

14 0.4188756 101 nips-2007-How SVMs can estimate quantiles and the median

15 0.38882971 25 nips-2007-An in-silico Neural Model of Dynamic Routing through Neuronal Coherence

16 0.38708878 160 nips-2007-Random Features for Large-Scale Kernel Machines

17 0.38479686 79 nips-2007-Efficient multiple hyperparameter learning for log-linear models

18 0.38012478 21 nips-2007-Adaptive Online Gradient Descent

19 0.37559646 187 nips-2007-Structured Learning with Approximate Inference

20 0.36615244 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.035), (13, 0.45), (16, 0.015), (21, 0.066), (31, 0.013), (34, 0.023), (35, 0.023), (47, 0.064), (49, 0.013), (83, 0.124), (85, 0.028), (87, 0.014), (90, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96279001 14 nips-2007-A configurable analog VLSI neural network with spiking neurons and self-regulating plastic synapses

Author: Massimiliano Giulioni, Mario Pannunzi, Davide Badoni, Vittorio Dante, Paolo D. Giudice

Abstract: We summarize the implementation of an analog VLSI chip hosting a network of 32 integrate-and-fire (IF) neurons with spike-frequency adaptation and 2,048 Hebbian plastic bistable spike-driven stochastic synapses endowed with a selfregulating mechanism which stops unnecessary synaptic changes. The synaptic matrix can be flexibly configured and provides both recurrent and AER-based connectivity with external, AER compliant devices. We demonstrate the ability of the network to efficiently classify overlapping patterns, thanks to the self-regulating mechanism.

2 0.94452423 191 nips-2007-Temporal Difference Updating without a Learning Rate

Author: Marcus Hutter, Shane Legg

Abstract: We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(λ), however it lacks the parameter α that specifies the learning rate. In the place of this free parameter there is now an equation for the learning rate that is specific to each state transition. We experimentally test this new learning rule against TD(λ) and find that it offers superior performance in various settings. Finally, we make some preliminary investigations into how to extend our new temporal difference algorithm to reinforcement learning. To do this we combine our update equation with both Watkins’ Q(λ) and Sarsa(λ) and find that it again offers superior performance without a learning rate parameter. 1

3 0.92953634 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines

Author: Fred Richardson, William M. Campbell

Abstract: Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.

4 0.89777112 22 nips-2007-Agreement-Based Learning

Author: Percy Liang, Dan Klein, Michael I. Jordan

Abstract: The learning of probabilistic models with many hidden variables and nondecomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EM-style algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks. 1

same-paper 5 0.86518228 62 nips-2007-Convex Learning with Invariances

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1

6 0.69982916 95 nips-2007-HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation

7 0.64794761 84 nips-2007-Expectation Maximization and Posterior Constraints

8 0.62905049 117 nips-2007-Learning to classify complex patterns using a VLSI network of spiking neurons

9 0.58789837 102 nips-2007-Incremental Natural Actor-Critic Algorithms

10 0.56934971 9 nips-2007-A Probabilistic Approach to Language Change

11 0.56400192 205 nips-2007-Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity

12 0.55601525 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

13 0.55099744 86 nips-2007-Exponential Family Predictive Representations of State

14 0.5464952 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

15 0.54018855 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

16 0.54014838 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

17 0.5385012 63 nips-2007-Convex Relaxations of Latent Variable Training

18 0.5297181 24 nips-2007-An Analysis of Inference with the Universum

19 0.52399385 118 nips-2007-Learning with Transformation Invariant Kernels

20 0.52161539 40 nips-2007-Bundle Methods for Machine Learning