nips nips2011 nips2011-276 knowledge-graph by maker-knowledge-mining

276 nips-2011-Structured sparse coding via lateral inhibition


Source: pdf

Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun

Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Structured sparse coding via lateral inhibition Karol Gregor Janelia Farm, HHMI 19700 Helix Drive Ashburn, VA, 20147 karol. [sent-1, score-0.475]

2 edu Abstract This work describes a conceptually simple method for structured sparse coding and dictionary design. [sent-7, score-0.378]

3 Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. [sent-8, score-0.265]

4 We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. [sent-9, score-0.192]

5 We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. [sent-10, score-0.483]

6 Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. [sent-11, score-0.168]

7 A set of N data points X in the Euclidean space Rd is written as the approximate product of a d × k dictionary W and k × N coefficients Z, where each column of Z is penalized for having many non-zero entries. [sent-14, score-0.137]

8 In equations, if we take the approximation to X in the least squares sense, and the penalty on the coefficient matrix to be the l1 norm, we wish to find argminZ,W ||W zk − xk ||2 + λ||zk ||1 . [sent-15, score-0.204]

9 For example, we may wish to enforce a tree structure on Z, so that certain basis elements can be used by any data point, but others are specific to a few data points; or more generally, a graph structure on Z that specifies which elements can be used with which others. [sent-18, score-0.121]

10 Various forms of structured sparsity are explored in (Kavukcuoglu et al. [sent-19, score-0.226]

11 From an engineering perspective, structured sparse models allow us to access or enforce information about the dependencies between codewords, and to control the expressive power of the model without losing reconstruction accuracy. [sent-24, score-0.192]

12 From a biological perspective, structured sparsity is interesting because structure and sparsity are present in neocortical representations. [sent-25, score-0.27]

13 For example, neurons in the same mini-columns of V1 are receptive to similar orientations and activate together. [sent-26, score-0.205]

14 The l1 penalty is replaced with a set of interactions between the coding units corresponding to intralayer connections in the neocortex. [sent-29, score-0.601]

15 For every pair of units there is an interaction weight that specifies the cost of simultaneously activating both units. [sent-30, score-0.252]

16 Code units are placed in two dimensional grid above the image (here represented in 1-d for clarity). [sent-34, score-0.324]

17 A given unit connects to a small neighborhood of an input via W and to a small neighborhood of code units via S. [sent-35, score-0.48]

18 The S is present and positive (inhibitory) if the distance d between units satisfies r1 < d < r2 for some radii. [sent-36, score-0.215]

19 we set the interactions to reflect a prespecified structure. [sent-37, score-0.128]

20 In one example we create a locally connected network with inhibitory connections in a ring around every unit. [sent-38, score-0.385]

21 Trained with natural images, this leads to dictionaries with Gabor-like edge elements with similar orientations placed in nearby locations, leading to the pinwheel patterns analogous to those observed in V1 of higher mammals. [sent-39, score-0.289]

22 We also place the units on a tree and place inhibitory interactions between different branches of the tree, resulting in edges of similar orientation being placed in the same branch of the tree, see for example (Hyvarinen and Hoyer, 2001). [sent-40, score-0.857]

23 In the second set of experiments we learn the values of the lateral connections instead of setting them, in effect learning the structure. [sent-41, score-0.308]

24 The rest of this paper is organized as follows: in the rest of this section, we will introduce our model, and describe its relationship between the other structured sparsity mentioned above. [sent-43, score-0.181]

25 Finally, in section 3, we will display the results of experiments showing that the algorithms are efficient, and that we can effectively learn dictionaries with a given structure, and even learn the structure. [sent-45, score-0.194]

26 Then the Langrangian of the energy with respect to Z is N j=1 T ||W Zj − Xj ||2 + Zj SZj , (2) where Sij are the dual variables to each of the constraints in U , and are 0 in the unconstrained pairs. [sent-53, score-0.133]

27 At such a point, Sij can be interpreted as the weight of the inhibitory connection between Wi and Wj necessary to keep them from simultaneously activating. [sent-55, score-0.165]

28 2 Lateral inhibition model In practice, it is useful to soften the constraints in U to a fixed, prespecified penalty, instead of a maximization over S as would be suggested by the Lagrangian form. [sent-58, score-0.166]

29 To use units with both positive and negative activations we take absolute values and obtain min W,Z j ||W Zj − Xj ||2 + |Zj |T S|Zj |, (3) ||Wj || = 1 ∀j. [sent-60, score-0.255]

30 As before, instead of taking absolute values, we can instead constrain Z ≥ 0 allowing to T write the penalty as Zj SZj . [sent-63, score-0.142]

31 The Lagrangian optimization tries to increase the inhibition between a forbidden pair whenever it activates. [sent-66, score-0.166]

32 If our goal is to learn the interactions, rather than enforce the ones we have chosen, then it makes sense to do the opposite, and decrease entries of S corresponding to pairs which are often activated simultaneously. [sent-67, score-0.118]

33 3 Lateral inhibition and weighted l1 Suppose we have fixed S and W , and are inferring z from a datapoint x. [sent-73, score-0.207]

34 Relation with previous work As mentioned above, there is a growing literature on structured dictionary learning and structured sparse coding. [sent-78, score-0.382]

35 , 2009) use a greedy approach for structured sparse coding based on OMP or CoSaMP. [sent-81, score-0.241]

36 A second popular basic framework is group sparsity (Kavukcuoglu et al. [sent-84, score-0.134]

37 In our framework, the interactions in S can take any values, giving a different kind of flexibility. [sent-90, score-0.128]

38 Also note that in this setting, recovery theorems with incoherence assumptions are not applicable, because we will learn the dictionaries, and 3 so there is no guarantee that the dictionaries will satisfy such conditions. [sent-92, score-0.154]

39 The interaction between a set of units of the form z T Rz + θT z was originally used in Hopfield nets (Hopfield, 1982); there the z are binary vectors and the inference is deterministic. [sent-94, score-0.295]

40 With S fixed, one can consider our work a special case of real valued Hopfield nets with R = W T W + S and θ = W T x; because of the form of R and θ, fast inference schemes from sparse coding can be used. [sent-99, score-0.192]

41 In (Garrigues and Olshausen, 2008) lateral connections were modeled as the connections of an Ising model with the Ising units deciding which real valued units (from which input was reconstructed) were on. [sent-101, score-0.806]

42 The system learned to typically connect similar orientations at a given location. [sent-102, score-0.133]

43 Our model is related but different - it has no second layer, the lateral connections control real instead of binary values and the inference and learning is simpler, at the cost of a true generative model. [sent-103, score-0.311]

44 In (Druckmann and Chklovskii, 2010) the lateral connections were trained so that solutions zt to a related ODE starting from the inferred code of z = z0 of an input x would map via W to points close to x. [sent-104, score-0.402]

45 In that work, the lateral connections were trained in response to the dictionary, rather than simultaneously with it, and did not participate in inference. [sent-105, score-0.307]

46 However, in contrast, in our case the sparsity coefficients are modulated by the units in the same layer, and we learn the modulation, as opposed to the fixed topology in (Garrigues and Olshausen, 2010). [sent-107, score-0.344]

47 We will describe versions of FISTA (Beck and Teboulle, 2009) and coordinate descent (Wu and Lange, 2008; Li and Osher, 2009). [sent-115, score-0.239]

48 1 A FISTA like algorithm The ISTA (Iterated Shrinkage Thresholding Algorithm) minimizes the energy ||W z − x||2 + λ|z|1 by following gradient steps in the first term with a “shrinkage”; this can be thought of as gradient steps where any coordinate which crosses zero is thresholded. [sent-119, score-0.273]

49 Nesterov’s accelerated gradient descent has been found to be effective in the basis pursuit setting, where it is called FISTA (Beck and Teboulle, 2009). [sent-124, score-0.146]

50 2 Coordinate descent The coordinate descent algorithm iteratively selects a single coordinate j of z, and fixing the other coordinates, does a line search to find the value of z(k) with the lowest energy. [sent-131, score-0.478]

51 The coordinate selection can be done by picking the entry with the largest gradient Wu and Lange (2008), or by approximating the value of the energy after the line search Li and Osher (2009). [sent-132, score-0.273]

52 Suppose at the tth step we have chosen to update the kth coordinate of z t . [sent-133, score-0.183]

53 Just like in the setting of basis pursuit this has the nice property that by updating B and λ, and using a precomputed W T W , each update only requires O(K) operations, where K is the number of atoms in the dictionary; and in particular the dictionary only needs to be multiplied by x once. [sent-136, score-0.267]

54 In fact, when the actual solution is very sparse and the dictionary is large, the cost of all the iterations is often less than the cost of multiplying W T x. [sent-137, score-0.198]

55 We will use coordinate descent for a bilinear model below; in this case, we alternate updates of the left coefficients with updates of the right coefficients. [sent-138, score-0.373]

56 1 Inference First we test the speed of convergence and the quality of the resulting state of ISTA, FISTA and coordinate descent algorithms. [sent-149, score-0.239]

57 4 where the input consist of image patches and the connections in S define a tree. [sent-151, score-0.228]

58 1 shows the energy after each iteration 5 55 FISTA ISTA CD 50 45 40 35 30 25 20 0 20 40 60 80 100 120 140 160 180 200 η . [sent-153, score-0.133]

59 0 Figure 2: On the left: The energy values after each iteration of the 3 methods, averaged over all the 1 data points. [sent-186, score-0.133]

60 On the right: values of the average energy N j ||W Zj − Xj ||2 + η|Zj |T S|Zj | and 1 T average S sparsity N j |Zj | S|Zj |. [sent-187, score-0.222]

61 The “oracle” best tree structured output computed by using an exhaustive search over the projections of each data point onto each branch of the tree has the average energy 20. [sent-188, score-0.5]

62 We can see that coordinate descent very quickly moves to its resting state (note that each iteration is much cheaper as well, only requiring a few column operations), but does not on average tend to be quite as good a code as ISTA or FISTA. [sent-192, score-0.334]

63 To test the absolute quality of the methods, we also measure against the “oracle” - the lowest possible energy when none of the constraints are broken, that is, when |z|S|z| = 0. [sent-194, score-0.173]

64 This energy is obtained by exhaustive search over the projections of each data point onto each branch of the tree. [sent-195, score-0.244]

65 1, we give the values for the average energy N j ||W Zj − Xj ||2 + η|Zj |T S|Zj | 1 and for the sparsity energy N j η|Zj |T S|Zj | for various values of η. [sent-197, score-0.355]

66 This is not the case in the standard l1 sparse coding model (1). [sent-202, score-0.149]

67 In the first part of the experiment we preprocess each image patch by subtracting its mean and set the elements of S to be all equal and positive except for zeros on the diagonal. [sent-205, score-0.146]

68 In the second part of the experiment we use the original image patches without any preprocessing. [sent-206, score-0.12]

69 The sublayer 4 contains simple cells with edge like receptive fields. [sent-215, score-0.12]

70 Each such cell receives input from a small neighborhood of the input image at its corresponding location. [sent-216, score-0.127]

71 We model this by placing units in a two dimensional grid above the image and connecting each unit to a small neighborhood of the input, Figure 1. [sent-217, score-0.382]

72 We also bind connections weights for units that are far enough from each other to reduce the number of parameters without affecting the local structure (Gregor and LeCun, 2010). [sent-218, score-0.323]

73 Next we connect each unit by inhibitory interactions (the S matrix) to units in its ring-shaped neighborhood: there is a connection between two units if their distance d 6 Figure 3: (a) Filters learned on the original unprocessed image patches. [sent-219, score-0.872]

74 The S matrix was fully connected except the unit corresponding to the upper left corner which was not connected to any other unit and learned the mean. [sent-220, score-0.255]

75 The Sij = 0 if one of the i and j is descendant of the other and Sij = S 0 d(i, j) otherwise where d(i, j) is the distance between the units in the tree. [sent-223, score-0.266]

76 Figure 4: (a-b) Filters learned on images in the locally connected framework with local inhibition shown in the Figure 1. [sent-225, score-0.325]

77 The local inhibition matrix has positive value Sij = S 0 > 0 if the distance between code units Zi and Zj satisfies r1 < d(i, j) < r2 and Sij = 0 otherwise. [sent-226, score-0.476]

78 The net learned to place filters of similar orientations close together. [sent-228, score-0.188]

79 (c-d) Filters trained on 10 × 10 image patches with mean subtracted and then normalized. [sent-234, score-0.159]

80 (c) The inhibition matrix was the same as in (a-b). [sent-235, score-0.166]

81 (d) This time there was an l1 penalty on each code unit and the lateral interaction matrix S was excitatory: Sij < 0 if d(i, j) < r2 and zero otherwise. [sent-236, score-0.394]

82 satisfies r1 < d < r2 for some radii r1 and r2 (alternatively we can put r1 = 0 and create excitatory interactions in a smaller neighborhood). [sent-237, score-0.18]

83 With this arrangement units that turn on simultaneously are typically either close to each other (within r1 ) or far from each other (more distant than r2 ). [sent-238, score-0.215]

84 Training on image patches results in the filters shown in the Figure figure 4. [sent-239, score-0.12]

85 We see that filters with similar orientations are placed together as is observed in V1 (and other experiments on group sparsity, for example (Hyvarinen and Hoyer, 2001)). [sent-240, score-0.133]

86 Here we obtain these patterns by the presence of inhibitory connections. [sent-241, score-0.165]

87 4 Tree structure In this experiment we place the units z on a tree and desire that the units that are on for a given input lie on a single branch of the tree. [sent-243, score-0.637]

88 The model learns to place low frequency filters close to the root of the tree and as we go down the branches the filters “refine” their parents, Figure 3b. [sent-246, score-0.137]

89 5 A convolutional image parts model Figure 5: On the left: the dictionary of 16 × 16 filters learned by the convolutional model on faces. [sent-248, score-0.368]

90 On the right: some low energy configurations, generated randomly as in Section 3. [sent-249, score-0.133]

91 We then train a model minimizing the energy 20 i || j=1 Wj ∗ zji − Xi ||2 + p(z)T Sp(z), β ≥ S ≥ 0, S = S T , |Sj |1 = α. [sent-259, score-0.133]

92 The updates for Z are done via coordinate descent (coordinate descent in the convolutional setting works exactly as before), the updates for W via least squares, and at each update, S is averaged with . [sent-266, score-0.533]

93 Without any data to reconstruct the model will collapse to zero, so we will constrain z to have a fixed number of unit entries, and run a few steps of a greedy search to decide which entries should be on. [sent-271, score-0.119]

94 However, even though W is blind to anything larger than a 16 × 16 patch, through the inhibition of S, the model is able to learn the placement of facial structures and long edges. [sent-277, score-0.206]

95 K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. [sent-288, score-0.175]

96 Learning horizontal connections in a sparse coding model of natural images. [sent-310, score-0.257]

97 Group sparse coding with a laplacian scale mixture prior. [sent-315, score-0.149]

98 A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. [sent-344, score-0.227]

99 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-374, score-0.133]

100 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-384, score-0.234]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zj', 0.404), ('ista', 0.228), ('units', 0.215), ('fista', 0.187), ('sij', 0.184), ('inhibition', 0.166), ('inhibitory', 0.165), ('lateral', 0.16), ('zk', 0.142), ('coordinate', 0.14), ('dictionary', 0.137), ('energy', 0.133), ('olshausen', 0.131), ('interactions', 0.128), ('dictionaries', 0.114), ('garrigues', 0.109), ('connections', 0.108), ('descent', 0.099), ('zi', 0.096), ('code', 0.095), ('lters', 0.092), ('structured', 0.092), ('sparsity', 0.089), ('coding', 0.088), ('filters', 0.087), ('hop', 0.087), ('teboulle', 0.087), ('orientations', 0.086), ('tree', 0.082), ('receptive', 0.078), ('aharon', 0.076), ('hyvarinen', 0.076), ('branch', 0.07), ('lange', 0.07), ('cod', 0.07), ('updates', 0.067), ('osher', 0.066), ('neighborhood', 0.065), ('connected', 0.064), ('sh', 0.064), ('gregor', 0.062), ('hoyer', 0.062), ('image', 0.062), ('penalty', 0.062), ('convolutional', 0.061), ('sparse', 0.061), ('cients', 0.059), ('baraniuk', 0.059), ('coef', 0.058), ('patches', 0.058), ('boltzman', 0.058), ('druckmann', 0.058), ('prespeci', 0.058), ('szj', 0.058), ('kavukcuoglu', 0.057), ('lecun', 0.057), ('place', 0.055), ('excitatory', 0.052), ('beck', 0.052), ('jenatton', 0.052), ('descendant', 0.051), ('yann', 0.051), ('locally', 0.048), ('placed', 0.047), ('learned', 0.047), ('jacob', 0.047), ('pursuit', 0.047), ('chklovskii', 0.047), ('ackley', 0.047), ('field', 0.046), ('emergence', 0.046), ('et', 0.045), ('wj', 0.044), ('zeros', 0.044), ('update', 0.043), ('inference', 0.043), ('edge', 0.042), ('layer', 0.042), ('activate', 0.041), ('datapoint', 0.041), ('faces', 0.041), ('lasso', 0.041), ('kim', 0.041), ('exhaustive', 0.041), ('absolute', 0.04), ('unit', 0.04), ('learn', 0.04), ('xj', 0.04), ('constrain', 0.04), ('orientation', 0.04), ('patch', 0.04), ('atoms', 0.04), ('ha', 0.04), ('trained', 0.039), ('enforce', 0.039), ('eld', 0.039), ('entries', 0.039), ('xing', 0.039), ('modulation', 0.038), ('interaction', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 276 nips-2011-Structured sparse coding via lateral inhibition

Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun

Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1

2 0.19773071 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics. 1

3 0.16575971 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

Abstract: In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of finding, to the extent possible, regimes of dictionary invariant performance. In particular, a Type II Bayesian estimator with a dictionarydependent sparsity penalty is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the ℓ1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. 1

4 0.16250855 244 nips-2011-Selecting Receptive Fields in Deep Networks

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

5 0.14746055 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox

Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1

6 0.14192453 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

7 0.13217086 261 nips-2011-Sparse Filtering

8 0.12987083 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

9 0.12855658 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

10 0.12774011 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

11 0.11686397 250 nips-2011-Shallow vs. Deep Sum-Product Networks

12 0.11277495 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

13 0.11126408 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines

14 0.095091477 271 nips-2011-Statistical Tests for Optimization Efficiency

15 0.091957361 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

16 0.088239737 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

17 0.087728314 154 nips-2011-Learning person-object interactions for action recognition in still images

18 0.086763754 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

19 0.085631944 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

20 0.085471526 157 nips-2011-Learning to Search Efficiently in High Dimensions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.267), (1, 0.132), (2, -0.007), (3, -0.031), (4, -0.02), (5, 0.088), (6, 0.079), (7, 0.207), (8, -0.011), (9, -0.166), (10, 0.031), (11, -0.117), (12, -0.077), (13, 0.002), (14, -0.019), (15, 0.049), (16, 0.038), (17, -0.047), (18, -0.059), (19, 0.052), (20, 0.082), (21, 0.007), (22, -0.034), (23, 0.036), (24, -0.015), (25, -0.037), (26, 0.042), (27, 0.002), (28, -0.052), (29, 0.056), (30, -0.039), (31, -0.129), (32, 0.043), (33, -0.098), (34, -0.051), (35, -0.068), (36, -0.046), (37, -0.002), (38, -0.092), (39, 0.107), (40, -0.021), (41, -0.126), (42, 0.072), (43, 0.01), (44, 0.031), (45, -0.096), (46, -0.099), (47, -0.073), (48, -0.009), (49, 0.07)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95247293 276 nips-2011-Structured sparse coding via lateral inhibition

Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun

Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1

2 0.6261878 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox

Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1

3 0.5971697 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

Author: Nobuyuki Morioka, Shin'ichi Satoh

Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1

4 0.5958181 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Author: Zhen J. Xiang, Hao Xu, Peter J. Ramadge

Abstract: Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efficiently. 1

5 0.58091903 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

Abstract: In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of finding, to the extent possible, regimes of dictionary invariant performance. In particular, a Type II Bayesian estimator with a dictionarydependent sparsity penalty is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the ℓ1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. 1

6 0.57622677 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

7 0.5715552 143 nips-2011-Learning Anchor Planes for Classification

8 0.55433398 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

9 0.52299803 261 nips-2011-Sparse Filtering

10 0.51155311 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning

11 0.5008536 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

12 0.48125154 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

13 0.4786433 244 nips-2011-Selecting Receptive Fields in Deep Networks

14 0.47765785 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

15 0.46725675 111 nips-2011-Hashing Algorithms for Large-Scale Learning

16 0.46398425 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

17 0.45999622 27 nips-2011-Advice Refinement in Knowledge-Based SVMs

18 0.44807217 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

19 0.4455331 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent

20 0.44487038 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.028), (4, 0.044), (17, 0.197), (20, 0.039), (26, 0.025), (31, 0.091), (33, 0.023), (43, 0.087), (45, 0.099), (57, 0.043), (65, 0.041), (74, 0.102), (83, 0.058), (84, 0.012), (99, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8853966 27 nips-2011-Advice Refinement in Knowledge-Based SVMs

Author: Gautam Kunapuli, Richard Maclin, Jude W. Shavlik

Abstract: Knowledge-based support vector machines (KBSVMs) incorporate advice from domain experts, which can improve generalization significantly. A major limitation that has not been fully addressed occurs when the expert advice is imperfect, which can lead to poorer models. We propose a model that extends KBSVMs and is able to not only learn from data and advice, but also simultaneously improves the advice. The proposed approach is particularly effective for knowledge discovery in domains with few labeled examples. The proposed model contains bilinear constraints, and is solved using two iterative approaches: successive linear programming and a constrained concave-convex approach. Experimental results demonstrate that these algorithms yield useful refinements to expert advice, as well as improve the performance of the learning algorithm overall.

same-paper 2 0.84319788 276 nips-2011-Structured sparse coding via lateral inhibition

Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun

Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1

3 0.81912374 286 nips-2011-The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning

Author: Marius Kloft, Gilles Blanchard

Abstract: We derive an upper bound on the local Rademacher complexity of p -norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches analyzed the case p = 1 only while our analysis covers all cases 1 ≤ p ≤ ∞, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding exα cess loss, namely fast convergence rates of the order O(n− 1+α ), where α is the minimum eigenvalue decay rate of the individual kernels. 1

4 0.70977151 258 nips-2011-Sparse Bayesian Multi-Task Learning

Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau

Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1

5 0.70857894 68 nips-2011-Demixed Principal Component Analysis

Author: Wieland Brendel, Ranulfo Romo, Christian K. Machens

Abstract: In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difficult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response. 1

6 0.70692033 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

7 0.70647085 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

8 0.70283401 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

9 0.69469965 186 nips-2011-Noise Thresholds for Spectral Clustering

10 0.69011605 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data

11 0.68995982 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

12 0.68908197 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

13 0.6856975 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

14 0.68523455 219 nips-2011-Predicting response time and error rates in visual search

15 0.68465924 285 nips-2011-The Kernel Beta Process

16 0.68463212 281 nips-2011-The Doubly Correlated Nonparametric Topic Model

17 0.68341535 265 nips-2011-Sparse recovery by thresholded non-negative least squares

18 0.6826635 156 nips-2011-Learning to Learn with Compound HD Models

19 0.68223494 102 nips-2011-Generalised Coupled Tensor Factorisation

20 0.68212461 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation