nips nips2008 nips2008-226 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, Francis R. Bach
Abstract: It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. [sent-12, score-0.361]
2 Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. [sent-13, score-0.722]
3 This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. [sent-14, score-1.058]
4 The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. [sent-15, score-0.143]
5 An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. [sent-16, score-0.268]
6 1 Introduction Sparse and overcomplete image models were first introduced in [1] for modeling the spatial receptive fields of simple cells in the human visual system. [sent-17, score-0.119]
7 In [3, 4], sparse decompositions are used with predefined dictionaries for face and signal recognition. [sent-21, score-0.545]
8 In [5], dictionaries are learned for a reconstruction task, and the corresponding sparse models are used as features in an SVM. [sent-22, score-0.536]
9 In [6], a discriminative method is introduced for various classification tasks, learning one dictionary per class; the classification process itself is based on the corresponding reconstruction error, and does not exploit the actual decomposition coefficients. [sent-23, score-0.891]
10 In [7], a generative model for documents is learned at the same time as the parameters of a deep network structure. [sent-24, score-0.226]
11 In [8], multi-task learning is performed by learning features and tasks are selected using a sparsity criterion. [sent-25, score-0.09]
12 , [10, 11, 12, 13, 14], and in neural networks [15], but not, to the best of our knowledge, in the sparse dictionary learning framework. [sent-29, score-0.662]
13 Section 2 presents a formulation for learning a dictionary tuned for a classification task, which we call supervised dictionary learning, and Section 3 its interpretation in term of probability and kernel frameworks. [sent-30, score-1.204]
14 2 Supervised dictionary learning We present in this section the core of the proposed model. [sent-32, score-0.498]
15 In classical sparse coding tasks, one considers a signal x in Rn and a fixed dictionary D = [d1 , . [sent-33, score-0.92]
16 , dk ] in Rn×k (allowing k > n, making the dictionary overcomplete). [sent-36, score-0.498]
17 In this setting, sparse coding with an ℓ1 regularization1 amounts to computing R⋆ (x, D) = min ||x − Dα||2 + λ1 ||α||1 . [sent-37, score-0.375]
18 (1) 2 α∈Rk It is well known in the statistics, optimization, and compressed sensing communities that the ℓ1 penalty yields a sparse solution, very few non-zero coefficients in α, although there is no explicit analytic link between the value of λ1 and the effective sparsity that this model yields. [sent-38, score-0.25]
19 Note that sparse coding with an ℓ0 penalty is an NP-hard problem and is often approximated using greedy algorithms. [sent-41, score-0.291]
20 We consider a training set of m labeled signals (xi )m in Rn , associated with binary labels (yi ∈ {−1, +1})m . [sent-44, score-0.114]
21 i=1 i=1 Our goal is to learn jointly a single dictionary D adapted to the classification task and a function f which should be positive for any signal in class +1 and negative otherwise. [sent-45, score-0.65]
22 We consider in this paper two different models to use the sparse code α for the classification task: (i) linear in α: f (x, α, θ) = wT α + b, where θ = {w ∈ Rk , b ∈ R} parametrizes the model. [sent-46, score-0.189]
23 (ii) bilinear in x and α: f (x, α, θ) = xT Wα + b, where θ = {W ∈ Rn×k , b ∈ R}. [sent-47, score-0.171]
24 In this case, the model is bilinear and f acts on both x and its sparse code α. [sent-48, score-0.358]
25 Note that one can interpret W as a linear filter encoding the input signal x into a model for the coefficients α, which has a role similar to the encoder in [18] but for a discriminative task. [sent-50, score-0.395]
26 Such a constraint is classical in sparse coding [2]. [sent-52, score-0.343]
27 2 (5) In this setting, the classification procedure of a new signal x with an unknown label y, given a learned dictionary D and parameters θ, involves supervised sparse coding: min S ⋆ (x, D, θ, y), (6) y∈{−1;+1} The learning procedure of Eq. [sent-57, score-1.058]
28 (4) minimizes the sum of the costs for the pairs (xi , yi )m and cori=1 responds to a generative model. [sent-58, score-0.273]
29 We will refer later to this model as SDL-G (supervised dictionary 1 2 P The ℓ1 norm of a vector x of size n is defined as ||x||1 = n |x[i]|. [sent-59, score-0.56]
30 , m D w αi xi yi Figure 1: Graphical model for the proposed generative/discriminative learning framework. [sent-65, score-0.286]
31 Note the explicit incorporation of the reconstructive and discriminative component into sparse coding, in addition to the classical reconstructive term (see [9] for a different classification component). [sent-67, score-1.064]
32 This leads to: m C(S ⋆ (xi , D, θ, −yi ) − S ⋆ (xi , D, θ, yi )) + λ2 ||θ||2 . [sent-70, score-0.144]
33 2 min D,θ (7) i=1 As detailed below, this problem is more difficult to solve than (4), and therefore we adopt instead a mixed formulation between the minimization of the generative Eq. [sent-71, score-0.228]
34 (4) and its discriminative version (7), (see also [13])—that is, m µC(S ⋆ (xi , D, θ, −yi ) − S ⋆ (xi , D, θ, yi )) + (1 − µ)S ⋆ (xi , D, θ, yi ) + λ2 ||θ||2 , 2 (8) i=1 where µ controls the trade-off between the reconstruction from Eq. [sent-72, score-0.63]
35 This is the proposed generative/discriminative model for sparse signal representation and classification from learned dictionary D and model θ. [sent-75, score-0.854]
36 We will refer to this mixed model as SDL-D, (supervised dictionary learning, discriminative). [sent-76, score-0.562]
37 All of these formulations admit a straightforward multiclass extension, using softmax discriminative p cost functions Ci (x1 , . [sent-78, score-0.345]
38 , xp ) = log( j=1 exj −xi ), which are multiclass versions of the logistic function, and learning one model θ i per class. [sent-81, score-0.139]
39 Before presenting the optimization procedure, we provide below two interpretations of the linear and bilinear versions of our formulation in terms of a probabilistic graphical model and a kernel. [sent-84, score-0.315]
40 • The signals xi are generated according to a Gaussian probability distribution conditioned on D 2 and αi , p(xi |αi , D) ∝ e−λ0 ||xi −Dαi ||2 . [sent-90, score-0.182]
41 All the xi ’s are considered independent from each other. [sent-91, score-0.119]
42 • The labels yi are generated according to a probability distribution conditioned on w and αi , and T T T given by p(yi = ǫ|αi , W) = e−ǫw αi / e−W αi + eW αi . [sent-92, score-0.144]
43 Given D and w, all the triplets (αi , xi , yi ) are independent. [sent-93, score-0.263]
44 , [12, 13]), amounts to finding the maximum likelihood estimates for D and w according to the joint distribution p({xi , yi }m , D, W), where the xi ’s and the yi ’s are the training signals and their labels respeci=1 tively. [sent-96, score-0.561]
45 It can easily be shown (details omitted due to space limitations) that there is an equivalence between this generative training and our formulation in Eq. [sent-97, score-0.172]
46 3 Although joint generative modeling of x and y through a shared representation has shown great promise [10], we show in this paper that a more discriminative approach is desirable. [sent-99, score-0.382]
47 The same kind of MAP approximation relates this discriminative training formulation to the discriminative model of Eq. [sent-101, score-0.651]
48 (8) is a classical trade-off between generative and discriminative (e. [sent-104, score-0.4]
49 , [12, 13]), where generative components are often added to discriminative frameworks to add robustness, e. [sent-106, score-0.378]
50 2 A kernel interpretation of the bilinear model Our bilinear model with f (x, α, θ) = xT Wα + b does not admit a straightforward probabilistic interpretation. [sent-110, score-0.484]
51 On the other hand, it can easily be interpreted in terms of kernels: Given two signals x1 and x2 , with coefficients α1 and α2 , using the kernel K(x1 , x2 ) = αT α2 xT x2 in a logistic 1 1 regression classifier amounts to finding a decision function of the same form as f . [sent-111, score-0.171]
52 [5] learn a dictionary adapted to reconstruction on a training set, then train an SVM a posteriori on the decomposition coefficients α. [sent-114, score-0.745]
53 4 Optimization procedure Classical dictionary learning techniques (e. [sent-117, score-0.535]
54 , [1, 5, 19]), address the problem of learning a reconstructive dictionary D in Rn×k well adapted to a training set, which is presented in Eq. [sent-119, score-0.89]
55 It can be seen as an optimization problem with respect to the dictionary D and the coefficients α. [sent-121, score-0.53]
56 Equation (4) can be rewritten as: m S(xj , αj , D, θ, yi ) + λ2 ||θ||2 , s. [sent-128, score-0.144]
57 2 min D,θ,α (9) i=1 Block coordinate descent consists therefore of iterating between supervised sparse coding, where D and θ are fixed and one optimizes with respect to the α’s and supervised dictionary update, where the coefficients αi ’s are fixed, but D and θ are updated. [sent-134, score-0.908]
58 To reach a local minimum for this difficult non-convex optimization problem, we have chosen a continuation method, starting from the generative case and ending with the discriminative one as in [6]. [sent-140, score-0.451]
59 1 Supervised sparse coding The supervised sparse coding problem from Eq. [sent-143, score-0.683]
60 The fixed-point continuation method (FPC) from 3 We are also investigating how to properly estimate D by marginalizing over α instead of maximizing with respect to α. [sent-145, score-0.093]
61 Input: n (signal dimensions); (xi , yi )m (training signals); k (size of the dictionary); λ0 , λ1 , λ2 i=1 (parameters); 0 ≤ µ1 ≤ µ2 ≤ . [sent-146, score-0.144]
62 , µm , Loop: Repeat until convergence (or a fixed number of iterations), • Supervised sparse coding: Solve, for all i = 1, . [sent-156, score-0.164]
63 α⋆ = arg minα S(α, xi , D, θ, +1) i,+ (10) • Dictionary and parameters update: Solve m µC (S(α⋆ , xi , D, θ, −yi ) − S(α⋆ , xj , D, θ, yi )) + i,− i,+ min D,θ i=1 (1 − µ)S(α⋆ i , xi , D, θ, yi ) + λ2 ||θ||2 i,y 2 s. [sent-160, score-0.72]
64 (11) Figure 2: SDL: Supervised dictionary learning algorithm. [sent-163, score-0.498]
65 (11) is not convex in general (except when µ is close to 0), but a local minimum can be obtained using projected gradient descent (as in the general literature on dictionary learning, this local minimum has experimentally been found to be good enough in terms of classification performance). [sent-169, score-0.526]
66 i,+ Partial derivatives when using our model with multiple classes or with the bilinear models f (x, α, θ) = xT Wα + b are not presented in this paper due to space limitations. [sent-174, score-0.225]
67 We also compare the performance of the linear (L) and bilinear (BL) models. [sent-176, score-0.196]
68 λ1 We define first the sparsity parameter κ = λ0 , which dictates how sparse the decompositions are. [sent-192, score-0.274]
69 For reconstructive tasks, a typical value often used in the literature (e. [sent-195, score-0.29]
70 Nevertheless, for discriminative tasks, increasing the number of parameters is likely to lead to overfitting, and smaller values like k = 64 or k = 32 are preferred. [sent-198, score-0.299]
71 As in logistic regression or support vector machines, this parameter is crucial when the number of training samples is small. [sent-200, score-0.096]
72 Then, a scale factor making the costs S ⋆ discriminative for µ > 0 can be chosen during the optimization process: Given a set of computed costs m S ⋆ , one can compute a scale factor γ ⋆ such that γ ⋆ = arg minγ i=1 C({γ(S ⋆ (xi , D, θ, −yi ) − ⋆ S (xi , D, θ, yi )). [sent-204, score-0.542]
73 Since we are following a continuation path from µ = 0 to µ = 1, the optimal value of µ is found along the path by measuring the classification performance of the model on a validation set during the optimization. [sent-207, score-0.123]
74 1 Digits recognition In this section, we present experiments on the popular MNIST [20] and USPS handwritten digit datasets. [sent-209, score-0.106]
75 USPS is composed of 7291 training images and 2007 test images of size 16 × 16. [sent-211, score-0.133]
76 For a given image x, the test procedure consists of selecting the class which receives the most votes from the pairwise classifiers. [sent-222, score-0.117]
77 4% [21] for USPS, using methods tailored to these tasks, whereas ours is generic and has not been tuned for the handwritten digit classification domain. [sent-230, score-0.106]
78 The purpose of our second experiment is not to measure the raw performance of our algorithm, but to answer the question “are the obtained dictionaries D discriminative per se? [sent-231, score-0.523]
79 To do so, we have trained on the USPS dataset 10 binary classifiers, one per digit in a one vs all fashion on the training set. [sent-233, score-0.157]
80 For a given value of µ, we obtain 10 dictionaries D and 10 sets of parameters θ, learned by the SDL-D L model. [sent-234, score-0.329]
81 To evaluate the discriminative power of the dictionaries D, we discard the learned parameters θ and use the dictionaries as if they had been learned in a reconstructive REC model: For each dictionary, 2. [sent-235, score-1.185]
82 0 Figure 3: On the left, a reconstructive and a discriminative dictionary. [sent-245, score-0.558]
83 On the right, average error rate in percents obtained by our dictionaries learned in a discriminative framework (SDL-D L) for various values of µ, when used at test time in a reconstructive framework (REC-L). [sent-246, score-0.903]
84 26 Gain 0% 2% 4% 6% 15% 25% Table 2: Error rates for the texture classification task using various methods and sizes m of the training set. [sent-282, score-0.158]
85 we decompose each image from the training set by solving the simple sparse reconstruction problem from Eq. [sent-284, score-0.341]
86 Repeating the sparse decomposition procedure on the test set permits us to evaluate the performance of these learned linear SVMs. [sent-287, score-0.32]
87 We see that using the dictionaries obtained with discrimative learning (µ > 0, SDL-D L) dramatically improves the performance of the basic linear classifier learned a posteriori on the α’s, showing that our learned dictionaries are discriminative per se. [sent-289, score-0.957]
88 Figure 3 also shows a dictionary adapted to the reconstruction of the MNIST dataset and a discriminative one, adapted to “9 vs all”. [sent-290, score-0.965]
89 2 Texture classification In the digit recognition task, our bilinear framework did not perform better than the linear one L. [sent-292, score-0.255]
90 We have chosen to consider two texture images from the Brodatz dataset, presented in Figure 4, and to build two classes, composed of 12 × 12 patches taken from these two textures. [sent-296, score-0.201]
91 We have compared the classification performance of all our methods, including BL, for a dictionary of size k = 64 and κ = 0. [sent-297, score-0.498]
92 The training set was composed of patches from the left half of each texture and the test sets of patches from the right half, so that there is no overlap between them in the training and test set. [sent-299, score-0.315]
93 Future work will be devoted to adapting the proposed framework to shift-invariant models that are standard in image processing tasks, but not readily generalized to the sparse dictionary learning setting. [sent-307, score-0.714]
94 Right: reconstructive and discriminative dictionaries Acknowledgments This paper was supported in part by ANR under grant MGA. [sent-310, score-0.789]
95 Sparse coding with an overcomplete basis set: A strategy employed by v1? [sent-318, score-0.194]
96 Image denoising via sparse and redundant representations over learned dictionaries. [sent-323, score-0.29]
97 Sparse representations for image classification: Learning discriminative and reconstructive non-parametric dictionaries. [sent-371, score-0.641]
98 A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing. [sent-426, score-0.095]
99 Efficient learning of sparse representations with an energy-based model. [sent-433, score-0.195]
100 The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representations. [sent-440, score-0.462]
wordName wordTfidf (topN-words)
[('dictionary', 0.498), ('reconstructive', 0.29), ('rec', 0.284), ('discriminative', 0.268), ('bl', 0.256), ('dictionaries', 0.231), ('bilinear', 0.171), ('sparse', 0.164), ('yi', 0.144), ('coding', 0.127), ('xi', 0.119), ('classi', 0.108), ('texture', 0.107), ('mnist', 0.104), ('supervised', 0.101), ('usps', 0.096), ('dubbed', 0.083), ('generative', 0.08), ('signal', 0.079), ('reconstruction', 0.074), ('cients', 0.072), ('decompositions', 0.071), ('sdl', 0.071), ('continuation', 0.071), ('learned', 0.067), ('overcomplete', 0.067), ('coef', 0.063), ('signals', 0.063), ('sapiro', 0.062), ('digit', 0.059), ('dj', 0.057), ('classical', 0.052), ('image', 0.052), ('adapted', 0.051), ('training', 0.051), ('cation', 0.051), ('tasks', 0.051), ('costs', 0.049), ('guillermo', 0.047), ('hg', 0.047), ('mairal', 0.047), ('percents', 0.047), ('multiclass', 0.047), ('handwritten', 0.047), ('wt', 0.047), ('logistic', 0.045), ('posteriori', 0.044), ('min', 0.044), ('interpretation', 0.043), ('rk', 0.043), ('ponce', 0.041), ('raina', 0.041), ('mixed', 0.041), ('formulation', 0.041), ('amounts', 0.04), ('norm', 0.039), ('sparsity', 0.039), ('ranzato', 0.038), ('procedure', 0.037), ('rn', 0.037), ('composed', 0.036), ('rodriguez', 0.036), ('patches', 0.035), ('shared', 0.034), ('elad', 0.034), ('optimization', 0.032), ('classes', 0.031), ('representations', 0.031), ('parameters', 0.031), ('xt', 0.03), ('scalar', 0.03), ('enjoys', 0.03), ('admit', 0.03), ('frameworks', 0.03), ('validation', 0.029), ('pairwise', 0.028), ('proven', 0.028), ('convex', 0.028), ('ers', 0.028), ('cvpr', 0.028), ('denoising', 0.028), ('residuals', 0.028), ('decomposition', 0.027), ('admits', 0.026), ('deep', 0.025), ('linear', 0.025), ('per', 0.024), ('compressed', 0.024), ('constrain', 0.024), ('kernels', 0.024), ('model', 0.023), ('images', 0.023), ('kernel', 0.023), ('discrimination', 0.023), ('nips', 0.023), ('vs', 0.023), ('presenting', 0.023), ('jointly', 0.022), ('solve', 0.022), ('investigating', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 226 nips-2008-Supervised Dictionary Learning
Author: Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, Francis R. Bach
Abstract: It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. 1
2 0.22696157 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
3 0.22101159 62 nips-2008-Differentiable Sparse Coding
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
4 0.11182868 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images
Author: Tanya Schmah, Geoffrey E. Hinton, Steven L. Small, Stephen Strother, Richard S. Zemel
Abstract: Neuroimaging datasets often have a very large number of voxels and a very small number of training cases, which means that overfitting of models for this data can become a very serious problem. Working with a set of fMRI images from a study on stroke recovery, we consider a classification task for which logistic regression performs poorly, even when L1- or L2- regularized. We show that much better discrimination can be achieved by fitting a generative model to each separate condition and then seeing which model is most likely to have generated the data. We compare discriminative training of exactly the same set of models, and we also consider convex blends of generative and discriminative training. 1
5 0.10857469 75 nips-2008-Estimating vector fields using sparse basis field expansions
Author: Stefan Haufe, Vadim V. Nikulin, Andreas Ziehe, Klaus-Robert Müller, Guido Nolte
Abstract: We introduce a novel framework for estimating vector fields using sparse basis field expansions (S-FLEX). The notion of basis fields, which are an extension of scalar basis functions, arises naturally in our framework from a rotational invariance requirement. We consider a regression setting as well as inverse problems. All variants discussed lead to second-order cone programming formulations. While our framework is generally applicable to any type of vector field, we focus in this paper on applying it to solving the EEG/MEG inverse problem. It is shown that significantly more precise and neurophysiologically more plausible location and shape estimates of cerebral current sources from EEG/MEG measurements become possible with our method when comparing to the state-of-the-art. 1
6 0.10433441 196 nips-2008-Relative Margin Machines
7 0.10160543 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
8 0.097082771 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
9 0.0947726 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
10 0.094524786 113 nips-2008-Kernelized Sorting
11 0.092633598 78 nips-2008-Exact Convex Confidence-Weighted Learning
12 0.083436511 228 nips-2008-Support Vector Machines with a Reject Option
13 0.082630888 118 nips-2008-Learning Transformational Invariants from Natural Movies
14 0.082300439 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
15 0.081378229 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
16 0.080993012 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
17 0.080542348 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization
18 0.079231225 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
19 0.07786978 178 nips-2008-Performance analysis for L\ 2 kernel classification
20 0.075712867 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
topicId topicWeight
[(0, -0.251), (1, -0.131), (2, -0.019), (3, -0.041), (4, 0.014), (5, 0.091), (6, -0.073), (7, -0.032), (8, -0.055), (9, 0.115), (10, 0.11), (11, -0.028), (12, -0.083), (13, -0.017), (14, -0.097), (15, -0.122), (16, -0.001), (17, -0.134), (18, 0.016), (19, 0.039), (20, 0.029), (21, -0.068), (22, -0.104), (23, 0.003), (24, -0.034), (25, -0.007), (26, -0.013), (27, -0.075), (28, -0.054), (29, -0.112), (30, -0.044), (31, -0.092), (32, 0.155), (33, 0.039), (34, 0.036), (35, 0.083), (36, 0.038), (37, 0.041), (38, 0.008), (39, 0.026), (40, -0.079), (41, 0.036), (42, 0.003), (43, 0.092), (44, -0.067), (45, -0.019), (46, -0.105), (47, 0.04), (48, 0.024), (49, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.94938391 226 nips-2008-Supervised Dictionary Learning
Author: Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, Francis R. Bach
Abstract: It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. 1
2 0.83493441 62 nips-2008-Differentiable Sparse Coding
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
3 0.59801412 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
4 0.59633446 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
Author: Han Liu, Larry Wasserman, John D. Lafferty
Abstract: We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data. 1
5 0.5902319 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images
Author: Tanya Schmah, Geoffrey E. Hinton, Steven L. Small, Stephen Strother, Richard S. Zemel
Abstract: Neuroimaging datasets often have a very large number of voxels and a very small number of training cases, which means that overfitting of models for this data can become a very serious problem. Working with a set of fMRI images from a study on stroke recovery, we consider a classification task for which logistic regression performs poorly, even when L1- or L2- regularized. We show that much better discrimination can be achieved by fitting a generative model to each separate condition and then seeing which model is most likely to have generated the data. We compare discriminative training of exactly the same set of models, and we also consider convex blends of generative and discriminative training. 1
6 0.58746147 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
7 0.57041663 196 nips-2008-Relative Margin Machines
8 0.54971355 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
9 0.52568102 185 nips-2008-Privacy-preserving logistic regression
10 0.52292824 228 nips-2008-Support Vector Machines with a Reject Option
11 0.52119201 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
12 0.51524949 78 nips-2008-Exact Convex Confidence-Weighted Learning
13 0.5088746 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
14 0.50722671 178 nips-2008-Performance analysis for L\ 2 kernel classification
15 0.49954873 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
16 0.49610806 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
17 0.49361387 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
18 0.49029499 119 nips-2008-Learning a discriminative hidden part model for human action recognition
19 0.4900507 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
20 0.48104694 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
topicId topicWeight
[(6, 0.105), (7, 0.101), (12, 0.031), (15, 0.019), (28, 0.144), (57, 0.095), (59, 0.015), (63, 0.025), (71, 0.015), (77, 0.054), (78, 0.018), (81, 0.226), (83, 0.067)]
simIndex simValue paperId paperTitle
1 0.89031363 180 nips-2008-Playing Pinball with non-invasive BCI
Author: Matthias Krauledat, Konrad Grzeska, Max Sagebaum, Benjamin Blankertz, Carmen Vidaurre, Klaus-Robert Müller, Michael Schröder
Abstract: Compared to invasive Brain-Computer Interfaces (BCI), non-invasive BCI systems based on Electroencephalogram (EEG) signals have not been applied successfully for precisely timed control tasks. In the present study, however, we demonstrate and report on the interaction of subjects with a real device: a pinball machine. Results of this study clearly show that fast and well-timed control well beyond chance level is possible, even though the environment is extremely rich and requires precisely timed and complex predictive behavior. Using machine learning methods for mental state decoding, BCI-based pinball control is possible within the first session without the necessity to employ lengthy subject training. The current study shows clearly that very compelling control with excellent timing and dynamics is possible for a non-invasive BCI. 1
same-paper 2 0.8204143 226 nips-2008-Supervised Dictionary Learning
Author: Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, Francis R. Bach
Abstract: It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. 1
3 0.78730273 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
Author: Alex J. Smola, Julian J. Mcauley, Tibério S. Caetano
Abstract: Models for near-rigid shape matching are typically based on distance-related features, in order to infer matches that are consistent with the isometric assumption. However, real shapes from image datasets, even when expected to be related by “almost isometric” transformations, are actually subject not only to noise but also, to some limited degree, to variations in appearance and scale. In this paper, we introduce a graphical model that parameterises appearance, distance, and angle features and we learn all of the involved parameters via structured prediction. The outcome is a model for near-rigid shape matching which is robust in the sense that it is able to capture the possibly limited but still important scale and appearance variations. Our experimental results reveal substantial improvements upon recent successful models, while maintaining similar running times. 1
4 0.73166645 75 nips-2008-Estimating vector fields using sparse basis field expansions
Author: Stefan Haufe, Vadim V. Nikulin, Andreas Ziehe, Klaus-Robert Müller, Guido Nolte
Abstract: We introduce a novel framework for estimating vector fields using sparse basis field expansions (S-FLEX). The notion of basis fields, which are an extension of scalar basis functions, arises naturally in our framework from a rotational invariance requirement. We consider a regression setting as well as inverse problems. All variants discussed lead to second-order cone programming formulations. While our framework is generally applicable to any type of vector field, we focus in this paper on applying it to solving the EEG/MEG inverse problem. It is shown that significantly more precise and neurophysiologically more plausible location and shape estimates of cerebral current sources from EEG/MEG measurements become possible with our method when comparing to the state-of-the-art. 1
5 0.71364105 62 nips-2008-Differentiable Sparse Coding
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
6 0.70133621 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
7 0.6994459 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
8 0.69693381 194 nips-2008-Regularized Learning with Networks of Features
9 0.69282144 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
10 0.69032598 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
11 0.69002461 202 nips-2008-Robust Regression and Lasso
12 0.68945837 143 nips-2008-Multi-label Multiple Kernel Learning
13 0.68885887 200 nips-2008-Robust Kernel Principal Component Analysis
14 0.68764943 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
15 0.68637413 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF
16 0.68634939 95 nips-2008-Grouping Contours Via a Related Image
17 0.68608642 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
18 0.68525183 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
19 0.68433183 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
20 0.6837005 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization