nips nips2006 nips2006-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Efficient sparse coding algorithms Honglak Lee Alexis Battle Rajat Raina Computer Science Department Stanford University Stanford, CA 94305 Andrew Y. [sent-1, score-0.335]
2 Ng Abstract Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. [sent-2, score-0.441]
3 In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. [sent-4, score-0.619]
4 Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. [sent-6, score-0.324]
5 We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. [sent-7, score-0.595]
6 1 Introduction Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it learns basis functions that capture higher-level features in the data. [sent-8, score-0.441]
7 When a sparse coding algorithm is applied to natural images, the learned bases resemble the receptive fields of neurons in the visual cortex [1, 2]; moreover, sparse coding produces localized bases when applied to other natural stimuli such as speech and video [3, 4]. [sent-9, score-2.057]
8 Unlike some other unsupervised learning techniques such as PCA, sparse coding can be applied to learning overcomplete basis sets, in which the number of bases is greater than the input dimension. [sent-10, score-1.15]
9 Sparse coding can also model inhibition between the bases by sparsifying their activations. [sent-11, score-0.759]
10 Similar properties have been observed in biological neurons, thus making sparse coding a plausible model of the visual cortex [2, 5]. [sent-12, score-0.335]
11 Despite the rich promise of sparse coding models, we believe that their development has been hampered by their expensive computational cost. [sent-13, score-0.335]
12 In this paper, we develop a class of efficient sparse coding algorithms that are based on alternating optimization over two subsets of the variables. [sent-15, score-0.403]
13 The optimization problems over each of the subsets of variables are convex; in particular, the optimization over the first subset is an L1 -regularized least squares problem; the optimization over the second subset of variables is an L2 -constrained least squares problem. [sent-16, score-0.388]
14 Our method allows us to efficiently learn large overcomplete bases from natural images. [sent-18, score-0.75]
15 We demonstrate that the resulting learned bases exhibit (i) end-stopping [6] and (ii) modulation by stimuli outside the classical receptive field (nCRF surround suppression) [7]. [sent-19, score-0.804]
16 Thus, sparse coding may also provide a partial explanation for these phenomena in V1 neurons. [sent-20, score-0.393]
17 2 Preliminaries The goal of sparse coding is to represent input vectors approximately as a weighted linear combination of a small number of (unknown) “basis vectors. [sent-22, score-0.374]
18 , bn ∈ Rk and a sparse vector of weights or “coefficients” s ∈ Rn such that ξ ≈ j bj sj . [sent-27, score-0.403]
19 The basis set can be overcomplete (n > k), and can thus capture a large number of patterns in the input data. [sent-28, score-0.283]
20 Sparse coding is a method for discovering good basis vectors automatically using only unlabeled data. [sent-29, score-0.354]
21 The standard generative model assumes that the reconstruction error ξ − j bj sj is distributed as a zero-mean Gaussian distribution with covariance σ 2 I. [sent-30, score-0.27]
22 To favor sparse coefficients, the prior distribution for each coefficient sj is defined as: P (sj ) ∝ exp(−βφ(sj )), where φ(·) is a sparsity function and β is a constant. [sent-31, score-0.484]
23 For example, we can use one of the following: (L1 penalty function) sj 1 1 (1) φ(sj ) = (s2 + ) 2 (epsilonL1 penalty function) j log(1 + s2 ) (log penalty function). [sent-32, score-0.399]
24 j In this paper, we will use the L1 penalty unless otherwise mentioned; L1 regularization is known to produce sparse coefficients and can be robust to irrelevant features [9]. [sent-33, score-0.205]
25 The maximum a posteriori estimate of the bases and coefficients, assuming a uniform prior on the bases, is the solution to the following optimization problem:1 minimize{bj },{s(i) } m 1 i=1 2σ 2 ξ (i) − subject to bj (i) 2 n j=1 bj sj 2 m i=1 +β n j=1 (i) φ(sj ) (2) ≤ c, ∀j = 1, . [sent-41, score-0.957]
26 This problem can be written more concisely in matrix form: let X ∈ Rk×m be the input matrix (each column is an input vector), let B ∈ Rk×n be the basis matrix (each column is a basis vector), and let S ∈ Rn×m be the coefficient matrix (each column is a coefficient vector). [sent-45, score-0.278]
27 Assuming the use of either L1 penalty or epsilonL1 penalty as the sparsity function, the optimization problem is convex in B (while holding S fixed) and convex in S (while holding B fixed),2 but not convex in both simultaneously. [sent-50, score-0.548]
28 For learning the bases B, the optimization problem is a least squares problem with quadratic constraints. [sent-52, score-0.692]
29 However, generic convex optimization solvers are too slow to be applicable to this problem, and gradient descent using iterative projections often shows slow convergence. [sent-56, score-0.338]
30 However, for the L1 sparsity function, the objective is not continuously differentiable and the most straightforward gradient-based methods are difficult to apply. [sent-62, score-0.247]
31 In this paper, we present a new algorithm for solving the L1 -regularized least squares problem and show that it is more efficient for learning sparse coding bases. [sent-67, score-0.427]
32 L1 -regularized least squares: The feature-sign search algorithm 3 (i) Consider solving the optimization problem (2) with an L1 penalty over the coefficients {sj } while keeping the bases fixed. [sent-68, score-0.786]
33 This problem can be solved by optimizing over each s(i) individually: (i) 2 minimizes(i) ξ (i) − bj sj (i) + (2σ 2 β) j |sj |. [sent-69, score-0.27]
34 j (4) (i) Notice now that if we know the signs (positive, zero, or negative) of the sj ’s at the optimal value, we can replace each of the terms (i) |sj | with either (i) sj (if (i) sj (i) (i) > 0), −sj (if sj < 0), or 0 (if We impose a norm constraint for bases: bj 2 ≤ c, ∀j = 1, . [sent-70, score-0.916]
35 Norm constraints are necessary because, otherwise, there always exists a linear transformation of bj ’s and s(i) ’s which keeps (i) (i) n j=1 bj sj unchanged, while making sj ’s approach zero. [sent-74, score-0.54]
36 Our algo(i) rithm, therefore, tries to search for, or “guess,” the signs of the coefficients sj ; given any such guess, we can efficiently solve the resulting unconstrained QP. [sent-79, score-0.379]
37 It maintains an active set of potentially nonzero coefficients and their corresponding signs—all other coefficients must be zero—and systematically searches for the optimal active set and coefficient signs. [sent-83, score-0.4]
38 Algorithm 1 Feature-sign search algorithm 1 2 Initialize x := 0, θ := 0, and active set := {}, where θi ∈ {−1, 0, 1} denotes sign(xi ). [sent-86, score-0.251]
39 ∂xi Activate xi (add i to the active set) only if it locally improves the objective, namely: 2 If ∂ y−Ax > γ, then set θi := −1, active set := {i}∪ active set. [sent-88, score-0.522]
40 ∂xi 2 3 4 If ∂ y−Ax < −γ, then set θi := 1, active set := {i}∪ active set. [sent-89, score-0.328]
41 ˆ Let x and θ be subvectors of x and θ corresponding to the active set. [sent-91, score-0.201]
42 To sketch the proof of convergence, let a coefficient vector x be called consistent with a given active set and sign vector θ if the following two conditions hold for all i: (i) If i is in the active set, then sign(xi ) = θi , and, (ii) If i is not in the active set, then xi = 0. [sent-97, score-0.652]
43 Consider optimization problem (5) augmented with the additional constraint that x is consistent with a given active set and sign vector. [sent-100, score-0.414]
44 Then, if the current coefficients xc are consistent with the active set and sign vector, but are not optimal for the augmented problem at the start of Step 3, the feature-sign step is guaranteed to strictly reduce the objective. [sent-101, score-0.679]
45 Let xc be the subvector of xc corresponding to coefficients in the given active set. [sent-103, score-0.645]
46 Since xc is not an optimal ˆ ˜, we have f (ˆnew ) < f (ˆc ). [sent-105, score-0.262]
47 Consider optimization problem (5) augmented with the additional constraint that x is consistent with a given active set and sign vector. [sent-114, score-0.414]
48 If the coefficients xc at the start of Step 2 are optimal for the augmented problem, but are not optimal for problem (5), the feature-sign step is guaranteed to strictly reduce the objective. [sent-115, score-0.425]
49 Since xc is optimal for the augmented problem, it satisfies optimality condition (a), but 2 not (b); thus, in Step 2, there is some i, such that ∂ y−Ax > γ ; this i-th coefficient is activated, and ∂xi ˜x ˆx i is added to the active set. [sent-117, score-0.546]
50 From (i) and ˆ ˆ ˆ ˆ (ii), the line search direction xc to xnew must be consistent with the sign of the activated xi . [sent-120, score-0.733]
51 Finally, ˆ ˆ ˜x since f (ˆ) = f (ˆ) when x is consistent with the active set, either xnew is consistent, or the first x ˆ ˆ zero-crossing from xc to xnew has a lower objective value (similar argument to Lemma 3. [sent-121, score-0.921]
52 At the start of Step 2, x either satisfies optimality condition 4(a) or is 0; in either case, x is consistent with the current active set and sign vector, and must be optimal for the augmented problem described in the above lemmas. [sent-128, score-0.454]
53 Since the number of all possible active sets and coefficient signs is finite, and since no pair can be repeated (because the objective value is strictly decreasing), the outer loop of Steps 2–4(b) cannot repeat indefinitely. [sent-129, score-0.331]
54 4 Learning bases using the Lagrange dual In this subsection, we present a method for solving optimization problem (3) over bases B given fixed coefficients S. [sent-134, score-1.222]
55 In general, this constrained optimization problem can be solved using gradient descent with iterative projection [10]. [sent-140, score-0.205]
56 For each dataset, the input dimension k and the number of bases n are specified as k × n. [sent-168, score-0.571]
57 The relative error for an algorithm was defined as (fobj − f ∗ )/f ∗ , where fobj is the final objective value attained by that algorithm, and f ∗ is the best objective value attained among all the algorithms. [sent-169, score-0.195]
58 After maximizing D(λ), we obtain the optimal bases B as follows: B = (SS + Λ)−1 (XS ) . [sent-172, score-0.572]
59 Note that the dual formulation is independent of the sparsity function (e. [sent-175, score-0.258]
60 1 The feature-sign search algorithm We evaluated the performance of our algorithms on four natural stimulus datasets: natural images, speech, stereo images, and natural image videos. [sent-179, score-0.345]
61 First, we evaluated the feature-sign search algorithm for learning coefficients with the L1 sparsity function. [sent-181, score-0.255]
62 We compared the running time and accuracy to previous state-of-the-art algorithms: a generic QP solver,7 a modified version of LARS [12] with early stopping,8 grafting [13], and Chen et al. [sent-182, score-0.239]
63 Over all datasets, feature-sign search achieved the best objective values as well as the shortest running times. [sent-186, score-0.236]
64 Moreover, feature-sign search has the crucial advantage that it can be initialized with arbitrary starting coefficients (unlike LARS); we will demonstrate that feature-sign search leads to even further speedup over LARS when applied to iterative coefficient learning. [sent-190, score-0.21]
65 2 Total time for learning bases The Lagrange dual method for one basis learning iteration was much faster than gradient descent with iterative projections, and we omit discussion of those results due to space constraints. [sent-192, score-0.885]
66 Below, we directly present results for the overall time taken by sparse coding for learning bases from natural stimulus datasets. [sent-193, score-0.959]
67 1 The sparsity penalty for topographic cells can be written as l φ(( j∈cell l s2 ) 2 ), where φ(·) is a sparsity j function and cell l is a topographic cell (e. [sent-194, score-0.496]
68 Thus, its solution tends to have many very small nonzero coefficients; as a result, the objective values obtained from CVX were always slightly worse than those obtained from feature-sign search or LARS. [sent-206, score-0.198]
69 / Basis learning ConjGrad / LagDual ConjGrad / GradDesc L1 sparsity function natural image speech 260. [sent-209, score-0.285]
70 2 Table 2: The running time (in seconds) for different algorithm combinations using different sparsity functions. [sent-241, score-0.238]
71 Right: The running time per iteration for modified LARS and grafting as a multiple of the running time per iteration for feature-sign search. [sent-244, score-0.322]
72 We initialized the bases randomly and ran each algorithm combination (by alternatingly optimizing the coefficients and the bases) until convergence. [sent-247, score-0.569]
73 First, we observe that the Lagrange dual method significantly outperformed gradient descent with iterative projections for both L1 and epsilonL1 sparsity; a typical convergence pattern is shown in Figure 1 (left). [sent-249, score-0.258]
74 This result demonstrates that feature-sign search is particularly efficient for iterative optimization, such as learning sparse coding bases. [sent-252, score-0.458]
75 3 Learning highly overcomplete natural image bases Using our efficient algorithms, we were able to learn highly overcomplete bases of natural images as shown in Figure 2. [sent-254, score-1.546]
76 For example, we were able to learn a set of 1,024 bases (each 14×14 pixels) 12 We ran each algorithm combination until the relative change of the objective per iteration became less than 10−6 (i. [sent-255, score-0.668]
77 13 We also evaluated a generic conjugate gradient implementation on the L1 sparsity function; however, it did not converge even after 60,000 seconds. [sent-262, score-0.286]
78 in about 2 hours and a set of 2,000 bases (each 20×20 pixels) in about 10 hours. [sent-269, score-0.532]
79 14 In contrast, the gradient descent method for basis learning did not result in any reasonable bases even after running for 24 hours. [sent-270, score-0.803]
80 For instance, many visual neurons display “end-stopping,” in which the neuron’s response to a bar image of optimal orientation and placement is actually suppressed as the bar length exceeds an optimal length [6]. [sent-274, score-0.293]
81 Sparse coding can model the interaction (inhibition) between the bases (neurons) by sparsifying their coefficients (activations), and our algorithms enable these phenomena to be tested with highly overcomplete bases. [sent-275, score-0.961]
82 First, we evaluated whether end-stopping behavior could be observed in the sparse coding framework. [sent-276, score-0.335]
83 We generated random bars with different orientations and lengths in 14×14 image patches, and picked the stimulus bar which most strongly activates each basis, considering only the bases which are significantly activated by one of the test bars. [sent-277, score-0.751]
84 For each such highly activated basis, and the corresponding optimal bar position and orientation, we vary length of the bar from 1 pixel to the maximal size and run sparse coding to measure the coefficients for the selected basis, relative to their maximum coefficient. [sent-278, score-0.577]
85 As shown in Figure 3 (left), for highly overcomplete bases, we observe many cases in which the coefficient decreases significantly as the bar length is increased beyond the optimal point. [sent-279, score-0.256]
86 Second, using the learned overcomplete bases, we tested for center-surround non-classical receptive field (nCRF) effects [7]. [sent-281, score-0.241]
87 We found the optimal bar stimuli for 50 random bases and checked that these bases were among the most strongly activated ones for the optimal stimulus. [sent-282, score-1.319]
88 For each of these 14 We used Lagrange dual formulation for learning bases, and both conjugate gradient with epsilonL1 sparsity as well as the feature-sign search with L1 sparsity for learning coefficients. [sent-283, score-0.592]
89 The bases learned from both methods showed qualitatively similar receptive fields. [sent-284, score-0.629]
90 The bases shown in the Figure 2 were learned using epsilonL1 sparsity function and 4,000 input image patches randomly sampled for every iteration. [sent-285, score-0.806]
91 bases, we measured the response with its optimal bar stimulus with and without the aligned bar stimulus in the surround region (Figure 3 (right)). [sent-286, score-0.438]
92 We then compared the basis response in these two cases to measure the suppression or facilitation due to the surround stimulus. [sent-287, score-0.327]
93 The aligned surround stimuli produced a suppression of basis activation; 42 out of 50 bases showed suppression with aligned surround input images, and 13 bases among them showed more than 10% suppression, in qualitative accordance with observed nCRF surround suppression effects. [sent-288, score-1.981]
94 6 Application to self-taught learning Sparse coding is an unsupervised algorithm that learns to represent input data succinctly using only a small number of bases. [sent-289, score-0.273]
95 For example, using the “image edge” bases in Figure 2, it represents a new image patch ξ as a linear combination of just a small number of these bases bj . [sent-290, score-1.191]
96 ) We apply our sparse coding algorithms to the unlabeled data to learn bases, which gives us a higher-level representation for images, thus making the supervised learning task easier. [sent-295, score-0.418]
97 Our algorithms can be used to learn an overcomplete set of bases, and show that sparse coding could partially explain the phenomena of end-stopping and nCRF surround suppression in V1 neurons. [sent-299, score-0.795]
98 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-308, score-0.246]
99 Sparse coding with an overcomplete basis set: A strategy employed by V1? [sent-315, score-0.446]
100 Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. [sent-353, score-0.224]
wordName wordTfidf (topN-words)
[('bases', 0.532), ('xc', 0.222), ('lars', 0.217), ('coef', 0.215), ('xnew', 0.206), ('coding', 0.202), ('sj', 0.183), ('sparsity', 0.168), ('active', 0.164), ('cients', 0.159), ('overcomplete', 0.144), ('sparse', 0.133), ('surround', 0.13), ('grafting', 0.13), ('lagrange', 0.11), ('basis', 0.1), ('suppression', 0.097), ('qp', 0.093), ('ncrf', 0.091), ('dual', 0.09), ('bj', 0.087), ('search', 0.087), ('sign', 0.086), ('objective', 0.079), ('cvx', 0.079), ('graddesc', 0.073), ('lagdual', 0.073), ('penalty', 0.072), ('bar', 0.072), ('running', 0.07), ('ax', 0.07), ('receptive', 0.07), ('optimization', 0.068), ('squares', 0.065), ('ss', 0.064), ('xs', 0.062), ('phenomena', 0.058), ('activated', 0.058), ('olshausen', 0.058), ('signs', 0.057), ('descent', 0.055), ('unlabeled', 0.052), ('augmented', 0.052), ('unconstrained', 0.052), ('stimulus', 0.049), ('bs', 0.048), ('succinct', 0.048), ('gradient', 0.046), ('stimuli', 0.045), ('chen', 0.045), ('consistent', 0.044), ('topographic', 0.044), ('modi', 0.044), ('natural', 0.043), ('optimality', 0.041), ('optimal', 0.04), ('image', 0.04), ('guess', 0.04), ('stereo', 0.04), ('step', 0.04), ('generic', 0.039), ('input', 0.039), ('images', 0.037), ('alternatingly', 0.037), ('conjgrad', 0.037), ('fobj', 0.037), ('minimizex', 0.037), ('raina', 0.037), ('subvector', 0.037), ('subvectors', 0.037), ('holding', 0.036), ('iterative', 0.036), ('rk', 0.035), ('speech', 0.034), ('conjugate', 0.033), ('convex', 0.032), ('solver', 0.032), ('nonzero', 0.032), ('video', 0.032), ('battle', 0.032), ('packer', 0.032), ('succinctly', 0.032), ('solvers', 0.031), ('learn', 0.031), ('strictly', 0.031), ('projections', 0.031), ('check', 0.03), ('xi', 0.03), ('neurons', 0.029), ('motorcycles', 0.029), ('learned', 0.027), ('condition', 0.027), ('xd', 0.027), ('codes', 0.027), ('least', 0.027), ('aligned', 0.026), ('iteration', 0.026), ('sparsifying', 0.025), ('cient', 0.025), ('macaque', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 75 nips-2006-Efficient sparse coding algorithms
Author: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
2 0.13795148 167 nips-2006-Recursive ICA
Author: Honghao Shan, Lingyun Zhang, Garrison W. Cottrell
Abstract: Independent Component Analysis (ICA) is a popular method for extracting independent features from visual data. However, as a fundamentally linear technique, there is always nonlinear residual redundancy that is not captured by ICA. Hence there have been many attempts to try to create a hierarchical version of ICA, but so far none of the approaches have a natural way to apply them more than once. Here we show that there is a relatively simple technique that transforms the absolute values of the outputs of a previous application of ICA into a normal distribution, to which ICA maybe applied again. This results in a recursive ICA algorithm that may be applied any number of times in order to extract higher order structure from previous layers. 1
3 0.11597763 153 nips-2006-Online Clustering of Moving Hyperplanes
Author: René Vidal
Abstract: We propose a recursive algorithm for clustering trajectories lying in multiple moving hyperplanes. Starting from a given or random initial condition, we use normalized gradient descent to update the coefficients of a time varying polynomial whose degree is the number of hyperplanes and whose derivatives at a trajectory give an estimate of the vector normal to the hyperplane containing that trajectory. As time proceeds, the estimates of the hyperplane normals are shown to track their true values in a stable fashion. The segmentation of the trajectories is then obtained by clustering their associated normal vectors. The final result is a simple recursive algorithm for segmenting a variable number of moving hyperplanes. We test our algorithm on the segmentation of dynamic scenes containing rigid motions and dynamic textures, e.g., a bird floating on water. Our method not only segments the bird motion from the surrounding water motion, but also determines patterns of motion in the scene (e.g., periodic motion) directly from the temporal evolution of the estimated polynomial coefficients. Our experiments also show that our method can deal with appearing and disappearing motions in the scene.
4 0.10678688 179 nips-2006-Sparse Representation for Signal Classification
Author: Ke Huang, Selin Aviyente
Abstract: In this paper, application of sparse representation (factorization) of signals over an overcomplete basis (dictionary) for signal classification is discussed. Searching for the sparse representation of a signal over an overcomplete dictionary is achieved by optimizing an objective function that includes two terms: one that measures the signal reconstruction error and another that measures the sparsity. This objective function works well in applications where signals need to be reconstructed, like coding and denoising. On the other hand, discriminative methods, such as linear discriminative analysis (LDA), are better suited for classification tasks. However, discriminative methods are usually sensitive to corruption in signals due to lacking crucial properties for signal reconstruction. In this paper, we present a theoretical framework for signal classification with sparse representation. The approach combines the discrimination power of the discriminative methods with the reconstruction property and the sparsity of the sparse representation that enables one to deal with signal corruptions: noise, missing data and outliers. The proposed approach is therefore capable of robust classification with a sparse representation of signals. The theoretical results are demonstrated with signal classification tasks, showing that the proposed approach outperforms the standard discriminative methods and the standard sparse representation in the case of corrupted signals. 1
5 0.10414305 16 nips-2006-A Theory of Retinal Population Coding
Author: Eizaburo Doi, Michael S. Lewicki
Abstract: Efficient coding models predict that the optimal code for natural images is a population of oriented Gabor receptive fields. These results match response properties of neurons in primary visual cortex, but not those in the retina. Does the retina use an optimal code, and if so, what is it optimized for? Previous theories of retinal coding have assumed that the goal is to encode the maximal amount of information about the sensory signal. However, the image sampled by retinal photoreceptors is degraded both by the optics of the eye and by the photoreceptor noise. Therefore, de-blurring and de-noising of the retinal signal should be important aspects of retinal coding. Furthermore, the ideal retinal code should be robust to neural noise and make optimal use of all available neurons. Here we present a theoretical framework to derive codes that simultaneously satisfy all of these desiderata. When optimized for natural images, the model yields filters that show strong similarities to retinal ganglion cell (RGC) receptive fields. Importantly, the characteristics of receptive fields vary with retinal eccentricities where the optical blur and the number of RGCs are significantly different. The proposed model provides a unified account of retinal coding, and more generally, it may be viewed as an extension of the Wiener filter with an arbitrary number of noisy units. 1
6 0.10043645 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
7 0.09835013 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
8 0.095005579 186 nips-2006-Support Vector Machines on a Budget
9 0.093304202 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
10 0.085127577 20 nips-2006-Active learning for misspecified generalized linear models
11 0.075996071 165 nips-2006-Real-time adaptive information-theoretic optimization of neurophysiology experiments
12 0.068255402 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions
13 0.065919451 92 nips-2006-High-Dimensional Graphical Model Selection Using $\ell 1$-Regularized Logistic Regression
14 0.065336071 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
15 0.062847018 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
16 0.060816228 187 nips-2006-Temporal Coding using the Response Properties of Spiking Neurons
17 0.059477933 104 nips-2006-Large-Scale Sparsified Manifold Regularization
18 0.058429997 126 nips-2006-Logistic Regression for Single Trial EEG Classification
19 0.057982642 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
20 0.056644693 48 nips-2006-Branch and Bound for Semi-Supervised Support Vector Machines
topicId topicWeight
[(0, -0.205), (1, -0.038), (2, 0.05), (3, -0.008), (4, -0.049), (5, -0.011), (6, -0.077), (7, -0.049), (8, -0.043), (9, 0.105), (10, -0.033), (11, -0.122), (12, 0.001), (13, -0.133), (14, -0.05), (15, 0.077), (16, 0.102), (17, 0.09), (18, 0.145), (19, -0.004), (20, 0.027), (21, -0.024), (22, 0.053), (23, -0.155), (24, 0.011), (25, 0.023), (26, 0.005), (27, -0.009), (28, 0.062), (29, -0.019), (30, 0.072), (31, -0.174), (32, 0.062), (33, -0.096), (34, -0.004), (35, 0.212), (36, -0.074), (37, 0.152), (38, 0.029), (39, -0.115), (40, 0.072), (41, 0.02), (42, 0.074), (43, -0.059), (44, -0.169), (45, 0.039), (46, 0.08), (47, 0.049), (48, 0.065), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.96264386 75 nips-2006-Efficient sparse coding algorithms
Author: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
2 0.66484225 179 nips-2006-Sparse Representation for Signal Classification
Author: Ke Huang, Selin Aviyente
Abstract: In this paper, application of sparse representation (factorization) of signals over an overcomplete basis (dictionary) for signal classification is discussed. Searching for the sparse representation of a signal over an overcomplete dictionary is achieved by optimizing an objective function that includes two terms: one that measures the signal reconstruction error and another that measures the sparsity. This objective function works well in applications where signals need to be reconstructed, like coding and denoising. On the other hand, discriminative methods, such as linear discriminative analysis (LDA), are better suited for classification tasks. However, discriminative methods are usually sensitive to corruption in signals due to lacking crucial properties for signal reconstruction. In this paper, we present a theoretical framework for signal classification with sparse representation. The approach combines the discrimination power of the discriminative methods with the reconstruction property and the sparsity of the sparse representation that enables one to deal with signal corruptions: noise, missing data and outliers. The proposed approach is therefore capable of robust classification with a sparse representation of signals. The theoretical results are demonstrated with signal classification tasks, showing that the proposed approach outperforms the standard discriminative methods and the standard sparse representation in the case of corrupted signals. 1
3 0.57931238 16 nips-2006-A Theory of Retinal Population Coding
Author: Eizaburo Doi, Michael S. Lewicki
Abstract: Efficient coding models predict that the optimal code for natural images is a population of oriented Gabor receptive fields. These results match response properties of neurons in primary visual cortex, but not those in the retina. Does the retina use an optimal code, and if so, what is it optimized for? Previous theories of retinal coding have assumed that the goal is to encode the maximal amount of information about the sensory signal. However, the image sampled by retinal photoreceptors is degraded both by the optics of the eye and by the photoreceptor noise. Therefore, de-blurring and de-noising of the retinal signal should be important aspects of retinal coding. Furthermore, the ideal retinal code should be robust to neural noise and make optimal use of all available neurons. Here we present a theoretical framework to derive codes that simultaneously satisfy all of these desiderata. When optimized for natural images, the model yields filters that show strong similarities to retinal ganglion cell (RGC) receptive fields. Importantly, the characteristics of receptive fields vary with retinal eccentricities where the optical blur and the number of RGCs are significantly different. The proposed model provides a unified account of retinal coding, and more generally, it may be viewed as an extension of the Wiener filter with an arbitrary number of noisy units. 1
4 0.56975734 153 nips-2006-Online Clustering of Moving Hyperplanes
Author: René Vidal
Abstract: We propose a recursive algorithm for clustering trajectories lying in multiple moving hyperplanes. Starting from a given or random initial condition, we use normalized gradient descent to update the coefficients of a time varying polynomial whose degree is the number of hyperplanes and whose derivatives at a trajectory give an estimate of the vector normal to the hyperplane containing that trajectory. As time proceeds, the estimates of the hyperplane normals are shown to track their true values in a stable fashion. The segmentation of the trajectories is then obtained by clustering their associated normal vectors. The final result is a simple recursive algorithm for segmenting a variable number of moving hyperplanes. We test our algorithm on the segmentation of dynamic scenes containing rigid motions and dynamic textures, e.g., a bird floating on water. Our method not only segments the bird motion from the surrounding water motion, but also determines patterns of motion in the scene (e.g., periodic motion) directly from the temporal evolution of the estimated polynomial coefficients. Our experiments also show that our method can deal with appearing and disappearing motions in the scene.
5 0.55622536 107 nips-2006-Large Margin Multi-channel Analog-to-Digital Conversion with Applications to Neural Prosthesis
Author: Amit Gore, Shantanu Chakrabartty
Abstract: A key challenge in designing analog-to-digital converters for cortically implanted prosthesis is to sense and process high-dimensional neural signals recorded by the micro-electrode arrays. In this paper, we describe a novel architecture for analog-to-digital (A/D) conversion that combines Σ∆ conversion with spatial de-correlation within a single module. The architecture called multiple-input multiple-output (MIMO) Σ∆ is based on a min-max gradient descent optimization of a regularized linear cost function that naturally lends to an A/D formulation. Using an online formulation, the architecture can adapt to slow variations in cross-channel correlations, observed due to relative motion of the microelectrodes with respect to the signal sources. Experimental results with real recorded multi-channel neural data demonstrate the effectiveness of the proposed algorithm in alleviating cross-channel redundancy across electrodes and performing data-compression directly at the A/D converter. 1
6 0.4992407 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
7 0.46178916 167 nips-2006-Recursive ICA
8 0.44194564 186 nips-2006-Support Vector Machines on a Budget
9 0.42833346 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
10 0.42286286 149 nips-2006-Nonnegative Sparse PCA
11 0.42037401 29 nips-2006-An Information Theoretic Framework for Eukaryotic Gradient Sensing
12 0.40962088 92 nips-2006-High-Dimensional Graphical Model Selection Using $\ell 1$-Regularized Logistic Regression
13 0.39973435 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
14 0.38866949 191 nips-2006-The Robustness-Performance Tradeoff in Markov Decision Processes
15 0.37796018 129 nips-2006-Map-Reduce for Machine Learning on Multicore
16 0.37557787 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions
17 0.35560212 178 nips-2006-Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
18 0.33104521 130 nips-2006-Max-margin classification of incomplete data
19 0.32983422 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
20 0.31193373 20 nips-2006-Active learning for misspecified generalized linear models
topicId topicWeight
[(1, 0.151), (3, 0.03), (7, 0.067), (9, 0.057), (18, 0.013), (20, 0.016), (22, 0.075), (44, 0.052), (57, 0.09), (64, 0.281), (65, 0.03), (69, 0.03), (71, 0.034)]
simIndex simValue paperId paperTitle
1 0.86586714 14 nips-2006-A Small World Threshold for Economic Network Formation
Author: Eyal Even-dar, Michael Kearns
Abstract: We introduce a game-theoretic model for network formation inspired by earlier stochastic models that mix localized and long-distance connectivity. In this model, players may purchase edges at distance d at a cost of dα , and wish to minimize the sum of their edge purchases and their average distance to other players. In this model, we show there is a striking “small world” threshold phenomenon: in two dimensions, if α < 2 then every Nash equilibrium results in a network of constant diameter (independent of network size), and if α > 2 then every Nash equilibrium results in a network whose diameter grows as a root of the network size, and thus is unbounded. We contrast our results with those of Kleinberg [8] in a stochastic model, and empirically investigate the “navigability” of equilibrium networks. Our theoretical results all generalize to higher dimensions. 1
same-paper 2 0.82916534 75 nips-2006-Efficient sparse coding algorithms
Author: Honglak Lee, Alexis Battle, Rajat Raina, Andrew Y. Ng
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1 -regularized least squares problem and an L2 -constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
3 0.80268049 150 nips-2006-On Transductive Regression
Author: Corinna Cortes, Mehryar Mohri
Abstract: In many modern large-scale learning applications, the amount of unlabeled data far exceeds that of labeled data. A common instance of this problem is the transductive setting where the unlabeled test points are known to the learning algorithm. This paper presents a study of regression problems in that setting. It presents explicit VC-dimension error bounds for transductive regression that hold for all bounded loss functions and coincide with the tight classification bounds of Vapnik when applied to classification. It also presents a new transductive regression algorithm inspired by our bound that admits a primal and kernelized closedform solution and deals efficiently with large amounts of unlabeled data. The algorithm exploits the position of unlabeled points to locally estimate their labels and then uses a global optimization to ensure robust predictions. Our study also includes the results of experiments with several publicly available regression data sets with up to 20,000 unlabeled examples. The comparison with other transductive regression algorithms shows that it performs well and that it can scale to large data sets.
4 0.6191926 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
Author: Marc'aurelio Ranzato, Christopher Poultney, Sumit Chopra, Yann L. Cun
Abstract: We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector. Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output. Learning proceeds in a two-phase EM-like fashion: (1) compute the minimum-energy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy. The model produces “stroke detectors” when trained on handwritten numerals, and Gabor-like filters when trained on natural image patches. Inference and learning are very fast, requiring no preprocessing, and no expensive sampling. Using the proposed unsupervised method to initialize the first layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset. Finally, an extension of the method is described to learn topographical filter maps. 1
5 0.61572671 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
Author: Graham Mcneill, Sethu Vijayakumar
Abstract: Correspondence algorithms typically struggle with shapes that display part-based variation. We present a probabilistic approach that matches shapes using independent part transformations, where the parts themselves are learnt during matching. Ideas from semi-supervised learning are used to bias the algorithm towards finding ‘perceptually valid’ part structures. Shapes are represented by unlabeled point sets of arbitrary size and a background component is used to handle occlusion, local dissimilarity and clutter. Thus, unlike many shape matching techniques, our approach can be applied to shapes extracted from real images. Model parameters are estimated using an EM algorithm that alternates between finding a soft correspondence and computing the optimal part transformations using Procrustes analysis.
6 0.60837328 16 nips-2006-A Theory of Retinal Population Coding
7 0.60620826 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
8 0.60401565 65 nips-2006-Denoising and Dimension Reduction in Feature Space
9 0.60309005 138 nips-2006-Multi-Task Feature Learning
10 0.601322 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
11 0.59952003 32 nips-2006-Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization
12 0.59800237 179 nips-2006-Sparse Representation for Signal Classification
13 0.59734303 163 nips-2006-Prediction on a Graph with a Perceptron
14 0.59664661 95 nips-2006-Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions
15 0.59631056 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
16 0.59408873 167 nips-2006-Recursive ICA
17 0.59387517 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections
18 0.59328032 20 nips-2006-Active learning for misspecified generalized linear models
19 0.59324706 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
20 0.59311873 175 nips-2006-Simplifying Mixture Models through Function Approximation