nips nips2011 nips2011-124 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. [sent-6, score-0.344]
2 In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. [sent-9, score-0.722]
3 We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. [sent-12, score-0.476]
4 Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. [sent-13, score-0.51]
5 These include: sparse auto-encoders [8], Restricted Boltzmann Machines (RBMs) [9], sparse coding [10] and Independent Component Analysis (ICA) [11]. [sent-17, score-0.199]
6 First, it is difficult to learn overcomplete feature representations (i. [sent-21, score-0.369]
7 [6] have shown that classification performance improves for algorithms such as sparse autoencoders [8], K-means [6] and RBMs [9], when the learned features are overcomplete. [sent-25, score-0.309]
8 Second, ICA is sensitive to whitening (a preprocessing step that decorrelates the input data, and cannot always be computed exactly for high dimensional data). [sent-26, score-0.209]
9 In this paper we propose a modification to ICA that not only addresses these shortcomings but also reveals strong connections between ICA, sparse autoencoders and sparse coding. [sent-28, score-0.331]
10 This hard orthonormality constraint, W W T = I, is used to prevent degenerate solutions in the feature matrix W (where each feature is a row of W ). [sent-30, score-0.375]
11 Furthermore, while alternative orthonormalization procedures or score matching can learn overcomplete representations, they are expensive to compute. [sent-37, score-0.5]
12 1 Our algorithm enables ICA to scale to overcomplete representations by replacing the orthonormalization constraint with a linear reconstruction penalty (akin to the one used in sparse auto-encoders). [sent-39, score-0.824]
13 This reconstruction penalty removes the need for a constrained optimizer. [sent-40, score-0.252]
14 In addition, recent ICA-based algorithms, such as tiled convolutional neural networks (also known as local receptive field TICA) [12], also suffer from the difficulty of enforcing the hard orthonormality constraint globally. [sent-45, score-0.583]
15 As a result, orthonormalization is typically performed locally instead, which results in copied (i. [sent-46, score-0.204]
16 Our reconstruction penalty, on the other hand, can be enforced globally across all receptive fields. [sent-49, score-0.322]
17 Furthermore, ICA’s sensitivity to whitening is undesirable because exactly whitening high dimensional data is often not feasible. [sent-51, score-0.362]
18 For example, exact whitening using principal component analysis (PCA) for input images of size 200x200 pixels is challenging, because it requires solving the eigendecomposition of a 40,000 x 40,000 covariance matrix. [sent-52, score-0.243]
19 Other methods, such as sparse autoencoders or RBMs, work well using approximate whitening and in some cases work even without any whitening. [sent-53, score-0.415]
20 In particular, on the STL-10 dataset, we learn highly overcomplete representations and achieve 52. [sent-61, score-0.352]
21 2 Standard ICA and Reconstruction ICA We begin by introducing our proposed algorithm for overcomplete ICA. [sent-65, score-0.263]
22 The orthonormality constraint W W T = I is used to prevent the bases in W from becoming degenerate. [sent-72, score-0.37]
23 This preprocessing step is also known as whitening or sphering the data. [sent-76, score-0.171]
24 For overcomplete representations (k > n) [17, 18], the orthonormality constraint can no longer hold. [sent-77, score-0.598]
25 Here, we focus our attention on ICA and its variants such as ISA and TICA in the context of overcomplete representations, where FastICA does not work. [sent-81, score-0.263]
26 A frequently employed form of non-degeneracy control in auto-encoders and sparse coding is the use of reconstruction costs. [sent-89, score-0.337]
27 As a result, we propose to replace the hard orthonormal constraint in ICA with a soft reconstruction cost. [sent-90, score-0.31]
28 The choice to swap the orthonormality constraint with a reconstruction penalty seems arbitrary at first. [sent-95, score-0.55]
29 And second, the reconstruction penalty works even when W is overcomplete and the data not fully white. [sent-102, score-0.515]
30 3 Connections between orthonormality and reconstruction Sparse autoencoders, sparse coding and ICA have been previously suspected to be strongly connected because they learn edge filters for natural image data. [sent-103, score-0.632]
31 We start by reviewing the optimization problems of two common unsupervised feature learning algorithms: sparse autoencoders and sparse coding. [sent-108, score-0.357]
32 In particular, the objective function of tiedweight sparse autoencoders [8, 21, 22, 23] is: m minimize W,b,c λ m σ(W T σ(W x(i) + b) + c) − x(i) 2 2 + S({W, b}, x(1) , . [sent-109, score-0.265]
33 (4) i=1 j=1 From these formulations, it is clear there are links between ICA, RICA, sparse autoencoders and sparse coding. [sent-126, score-0.313]
34 In particular, most methods use the L1 sparsity penalty and, except for ICA, most use reconstruction costs as a non-degeneracy control. [sent-127, score-0.286]
35 ICA’s main distinction compared to sparse coding and autoencoders is its use of the hard orthonormality constraint in lieu of reconstruction costs. [sent-129, score-0.804]
36 However, we will now present a proof (consisting of two lemmas) that derives the relationship between ICA’s orthonormality constraint and RICA’s reconstruction cost. [sent-130, score-0.472]
37 1 When the input data {x(i) }m is whitened, the reconstruction i=1 m λ W T W x(i) − x(i) 2 is equivalent to the orthonormality cost λ W T W − I 2 . [sent-144, score-0.498]
38 2 F i=1 m cost Our second lemma states that minimizing column orthonormality and row orthonormality costs turns out to be equivalent due to a property of the Frobenius norm: Lemma 3. [sent-145, score-0.575]
39 2 The column orthonormality cost λ W T W − In mality cost λ W W T − Ik 2 up to an additive constant. [sent-146, score-0.337]
40 F 2 F is equivalent to the row orthonor- Together these two lemmas tell us that reconstruction cost is equivalent to both column and row orthonormality cost for whitened data. [sent-147, score-0.791]
41 Furthermore, as λ approaches infinity the orthonormality cost becomes the hard orthonormality constraint of ICA (see equations 1 & 2) if W is complete or undercomplete. [sent-148, score-0.594]
42 Thus, ICA’s hard orthonormality constraint and RICA’s reconstruction cost are related under these conditions. [sent-149, score-0.55]
43 More formally, the following remarks explain this conclusion, and describe the set of conditions under which RICA (and by extension ICA) is equivalent to autoencoders and sparse coding. [sent-150, score-0.265]
44 2 The column orthonormality cost is zero only if the columns of W are orthonormal. [sent-158, score-0.286]
45 Soft penalties are also preferred if we want to learn overcomplete representations where explicitly constraining W W T = I is not possible3 . [sent-165, score-0.352]
46 We derive an additional relationship in the appendix (see supplementary material), which shows that for whitened data denoising autoencoders are equivalent to RICA with weight decay. [sent-166, score-0.399]
47 Another interesting connection between RBMs and denoising autoencoders is derived in [25]. [sent-167, score-0.217]
48 The connections, between RBMs, autoencoders, denoising autoencoders and the fact that reconstruction cost captures whitening (by the above lemmas), likely explains why whitening does not matter much for RBMs and autoencoders in [6]. [sent-168, score-0.976]
49 4 Effects of whitening on ICA and RICA In practice, ICA tends to be much more sensitive to whitening compared to sparse autoencoders. [sent-169, score-0.429]
50 In this section, we study empirically how whitening affects ICA and our formulation, RICA. [sent-171, score-0.171]
51 We sampled 20000 patches of size 16x16 from a set of 11 natural images [16] and visualized the filters learned using ICA and RICA with raw images, as well as approximately whitened images. [sent-172, score-0.3]
52 For approximate whitening, we use 1/f whitening with low pass filtering. [sent-173, score-0.171]
53 This 1/f whitening transformation uses Fourier analysis of natural image statistics and produces transformed data which has an approximate identity covariance matrix. [sent-174, score-0.193]
54 As a result, 1/f whitening runs quickly and scales well to high dimensional data. [sent-176, score-0.191]
55 (a) ICA on 1/f whitened images (b) ICA on raw images (c) RICA on 1/f whitened images (d) RICA on raw images Figure 1: ICA and RICA on approximately whitened and raw images. [sent-178, score-0.826]
56 Figure 1 shows the results of running ICA and RICA on raw and 1/f whitened images. [sent-184, score-0.214]
57 As can be seen, ICA learns very noisy bases on raw data, as well as approximately whitened data. [sent-185, score-0.276]
58 In contrast, RICA works well for 1/f whitened data and raw data. [sent-186, score-0.214]
59 Our quantitative analysis with kurtosis (not shown due to space limits) agrees with visual inspection: RICA learns more kurtotic representations than ICA on approximately whitened or raw data. [sent-187, score-0.268]
60 Robustness to approximate whitening is desirable, because exactly whitening high dimensional data using PCA may not be feasible. [sent-188, score-0.362]
61 With RICA, approximate whitening or raw data can be used instead. [sent-190, score-0.224]
62 5 Local receptive field TICA The first application of our RICA algorithm that we examine is local receptive field neural networks. [sent-192, score-0.246]
63 Specifically, rather 3 Note that when W is overcomplete, some rows may degenerate and become zero, because the reconstruction constraint can be satisfied with only a complete subset of rows. [sent-194, score-0.271]
64 We show that swapping out locally enforced orthogonality constraints with a global reconstruction cost solves this issue. [sent-200, score-0.307]
65 The pre-training step for the TCNN (local receptive field TICA) [12] is performed by minimizing the following cost function: m k ǫ + Hj (W x(i) )2 , subject to W W T = I minimize W (9) i=1 j=1 where H is the spatial pooling matrix and W is a learned weight matrix. [sent-209, score-0.204]
66 (a) Local receptive field neural net (b) Local orthogonalization (c) RICA global reconstruction cost Figure 2: (a) Local receptive field neural network with fully untied weights. [sent-215, score-0.478]
67 A single map consists of local receptive fields that do not share a location (i. [sent-216, score-0.179]
68 (b) Hard orthonormalization [12] is applied at each location only (i. [sent-220, score-0.181]
69 , nodes of the same color), which results in copied filters (for example, see the filters outlined in red; notice that the location of the edge stays the same within the image even though the receptive field areas are different). [sent-222, score-0.272]
70 (c) Global reconstruction (this paper) is applied both within each location and across locations (nodes of the same and different colors), which prevents copying of receptive fields. [sent-223, score-0.369]
71 Enforcing the hard orthonormality constraint on the entire sparse W matrix is challenging because it is typically overcomplete for TCNNs. [sent-224, score-0.64]
72 However, visualizing the filters learned by a TCNN with local orthonormalization, shows that many adjacent receptive fields end up learning the same (copied) filters due to the lack of an orthonormality constraint between them. [sent-230, score-0.443]
73 For instance, the green nodes in Figure 2 may end up being copies of the red nodes (see the copied receptive fields in Figure 2b). [sent-231, score-0.208]
74 In order to prevent copied features, we replace the local orthonormalization constraint with a global reconstruction cost (i. [sent-232, score-0.549]
75 , computing the reconstruction cost W T W x(i) − x(i) 2 for the entire over2 complete sparse W matrix). [sent-234, score-0.311]
76 Figure 3 shows that the reconstruction penalty produces a better distribution of edge detector locations within the image patch (this also holds true for frequencies and orientations). [sent-236, score-0.35]
77 The reconstruction penalty prevents copied filters, producing a more uniform distribution of edge detectors. [sent-257, score-0.364]
78 6 Experiments The following experiments compare the speed gains of RICA over standard overcomplete ICA. [sent-258, score-0.283]
79 1 Speed improvements for overcomplete ICA In this experiment, we examine the speed performance of RICA and overcomplete ICA with score matching [26]. [sent-261, score-0.608]
80 We trained overcomplete ICA on 20000 gray-scale image patches, each patch of size 16x16. [sent-262, score-0.315]
81 In particular, learning features that are 6x overcomplete takes 1 hour using our method, whereas [26] requires 2 days. [sent-268, score-0.304]
82 Score matching ICA RICA Speed up 2x overcomplete 33000 seconds 1000 seconds 33x 4x overcomplete 65000 seconds 1600 seconds 40x 6x overcomplete 180000 seconds 3700 seconds 48x Figure 4 shows the peak frequencies and orientations for 4x overcomplete bases learned using our method. [sent-270, score-1.401]
83 Figure 4: Scatter plot of peak frequencies and orientations of Gabor functions fitted to the filters learned by RICA on whitened images. [sent-274, score-0.244]
84 2 Overcomplete ICA on STL-10 dataset In this section, we evaluate the overcomplete features learned by our model. [sent-277, score-0.328]
85 The experiments are carried out on the STL-10 dataset [6] where overcomplete representations have been shown to work well. [sent-278, score-0.317]
86 We use RICA to learn overcomplete features on 100,000 randomly sampled color patches from the unlabeled images in the STL-10 dataset. [sent-282, score-0.417]
87 [6] on 96x96 images and 10x10 receptive fields, our soft reconstruction ICA achieves 52. [sent-285, score-0.391]
88 35 whitened raw 30 25 0 200 400 600 800 1000 Number of Features 1200 1400 1600 Figure 5: Classification accuracy on the STL-10 dataset as a function of the number of bases learned (for a patch size of 8x8 pixels). [sent-291, score-0.33]
89 Notice that the reconstruction cost in RICA allows us to learn overcomplete representations that outperform the complete representation obtained by the regular ICA. [sent-295, score-0.61]
90 4 In this section we compare the effects of reconstruction versus orthogonality on classification performance using ISA. [sent-300, score-0.233]
91 In our experiments we swap out the orthonormality constraint employed by ISA with a reconstruction penalty. [sent-301, score-0.505]
92 We observe that the reconstruction penalty tends to works better than orthogonality constraints. [sent-303, score-0.294]
93 7 Discussion In this paper, we presented a novel soft reconstruction approach that enables the learning of overcomplete representations in ICA and TICA. [sent-310, score-0.554]
94 We have also presented mathematical proofs that connect ICA with autoencoders and sparse coding. [sent-311, score-0.278]
95 We showed that our algorithm works well even without whitening; and that the reconstruction cost allows us to fix replicated filters in tiled convolutional neural networks. [sent-312, score-0.362]
96 In particular, we found our method to be 30-50x faster than overcomplete ICA with score matching. [sent-314, score-0.295]
97 Furthermore, our overcomplete features achieve state-of-the-art performance on the STL-10 and Hollywood2 datasets. [sent-315, score-0.304]
98 Linear spatial pyramid matching using sparse coding for image classification. [sent-347, score-0.182]
99 Emergence of simple-cell receptive field properties by learning a sparse code for natural images. [sent-386, score-0.177]
100 Sparse coding with an overcomplete basis set: A strategy employed by v1. [sent-443, score-0.34]
wordName wordTfidf (topN-words)
[('ica', 0.604), ('rica', 0.45), ('overcomplete', 0.263), ('orthonormality', 0.235), ('reconstruction', 0.191), ('autoencoders', 0.175), ('whitening', 0.171), ('whitened', 0.161), ('isa', 0.146), ('orthonormalization', 0.14), ('receptive', 0.108), ('tica', 0.081), ('sparse', 0.069), ('copied', 0.064), ('lters', 0.064), ('bases', 0.062), ('coding', 0.061), ('penalty', 0.061), ('wj', 0.055), ('representations', 0.054), ('tiled', 0.053), ('raw', 0.053), ('rbms', 0.052), ('cost', 0.051), ('convolutional', 0.049), ('constraint', 0.046), ('soft', 0.046), ('images', 0.046), ('coates', 0.045), ('orthogonality', 0.042), ('denoising', 0.042), ('features', 0.041), ('location', 0.041), ('undercomplete', 0.04), ('unconstrained', 0.038), ('learn', 0.035), ('ngiam', 0.035), ('degenerate', 0.034), ('eld', 0.033), ('score', 0.032), ('laptev', 0.032), ('optimizers', 0.032), ('orientations', 0.032), ('activation', 0.032), ('patch', 0.03), ('local', 0.03), ('matching', 0.03), ('prevents', 0.029), ('seconds', 0.029), ('unsupervised', 0.027), ('prevent', 0.027), ('hard', 0.027), ('frequencies', 0.027), ('deep', 0.026), ('tcnn', 0.026), ('unwhitened', 0.026), ('cg', 0.026), ('eigendecomposition', 0.026), ('hyv', 0.026), ('le', 0.025), ('lemmas', 0.024), ('learned', 0.024), ('orthonormalized', 0.023), ('enforced', 0.023), ('cvpr', 0.023), ('image', 0.022), ('unlabelled', 0.021), ('degeneracy', 0.021), ('fastica', 0.021), ('minimize', 0.021), ('equivalent', 0.021), ('dimensional', 0.02), ('optional', 0.02), ('orthogonalization', 0.02), ('speed', 0.02), ('edge', 0.019), ('elds', 0.019), ('rinen', 0.019), ('sparsity', 0.019), ('enforcing', 0.019), ('connect', 0.018), ('invariances', 0.018), ('replicated', 0.018), ('row', 0.018), ('nodes', 0.018), ('connections', 0.018), ('formal', 0.018), ('sensitive', 0.018), ('topographic', 0.017), ('subspace', 0.017), ('swap', 0.017), ('feature', 0.017), ('sigmoid', 0.016), ('employed', 0.016), ('unlabeled', 0.016), ('patches', 0.016), ('proofs', 0.016), ('regular', 0.016), ('networks', 0.016), ('costs', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
2 0.35738248 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
3 0.21176554 244 nips-2011-Selecting Receptive Fields in Deep Networks
Author: Adam Coates, Andrew Y. Ng
Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1
4 0.15754573 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
5 0.14533152 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng
Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1
6 0.14156607 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
7 0.1034798 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
8 0.084726408 276 nips-2011-Structured sparse coding via lateral inhibition
9 0.07434798 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
10 0.059567396 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
11 0.052380078 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
12 0.051653054 259 nips-2011-Sparse Estimation with Structured Dictionaries
13 0.048990443 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis
14 0.048946921 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
15 0.047593907 4 nips-2011-A Convergence Analysis of Log-Linear Training
16 0.045609474 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
17 0.045475811 287 nips-2011-The Manifold Tangent Classifier
18 0.039311405 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
19 0.039087527 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
20 0.038938288 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability
topicId topicWeight
[(0, 0.136), (1, 0.12), (2, 0.019), (3, 0.038), (4, 0.017), (5, 0.128), (6, 0.147), (7, 0.277), (8, -0.004), (9, -0.191), (10, -0.077), (11, -0.116), (12, 0.102), (13, -0.046), (14, 0.043), (15, 0.033), (16, 0.048), (17, -0.037), (18, 0.03), (19, 0.003), (20, -0.064), (21, -0.098), (22, 0.009), (23, -0.013), (24, 0.028), (25, 0.031), (26, -0.011), (27, 0.088), (28, -0.086), (29, -0.092), (30, 0.022), (31, -0.056), (32, 0.018), (33, 0.013), (34, -0.143), (35, -0.033), (36, 0.066), (37, 0.039), (38, 0.048), (39, -0.137), (40, -0.025), (41, 0.03), (42, -0.076), (43, 0.012), (44, 0.147), (45, 0.152), (46, -0.045), (47, -0.011), (48, -0.089), (49, -0.091)]
simIndex simValue paperId paperTitle
same-paper 1 0.95409876 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
2 0.81585079 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
3 0.71087587 244 nips-2011-Selecting Receptive Fields in Deep Networks
Author: Adam Coates, Andrew Y. Ng
Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1
Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng
Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1
5 0.57410103 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
6 0.53263593 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
7 0.50314879 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
8 0.49123532 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
9 0.39886996 276 nips-2011-Structured sparse coding via lateral inhibition
10 0.37100694 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
11 0.34699109 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
12 0.33755678 143 nips-2011-Learning Anchor Planes for Classification
13 0.33012557 130 nips-2011-Inductive reasoning about chimeric creatures
14 0.32763579 260 nips-2011-Sparse Features for PCA-Like Linear Regression
15 0.32240129 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing
16 0.30741286 275 nips-2011-Structured Learning for Cell Tracking
17 0.30176607 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
18 0.30092669 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
19 0.29542413 293 nips-2011-Understanding the Intrinsic Memorability of Images
20 0.29178348 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
topicId topicWeight
[(0, 0.025), (4, 0.022), (20, 0.056), (24, 0.174), (26, 0.01), (31, 0.062), (33, 0.029), (43, 0.039), (45, 0.102), (57, 0.035), (65, 0.221), (74, 0.057), (83, 0.03), (84, 0.011), (99, 0.022)]
simIndex simValue paperId paperTitle
1 0.81678247 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network
Author: Ke Chen, Ahmad Salman
Abstract: Speech conveys different yet mixed information ranging from linguistic to speaker-specific components, and each of them should be exclusively used in a specific task. However, it is extremely difficult to extract a specific information component given the fact that nearly all existing acoustic representations carry all types of speech information. Thus, the use of the same representation in both speech and speaker recognition hinders a system from producing better performance due to interference of irrelevant information. In this paper, we present a deep neural architecture to extract speaker-specific information from MFCCs. As a result, a multi-objective loss function is proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. With LDC benchmark corpora and a Chinese speech corpus, we demonstrate that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. We discuss relevant issues and relate our approach to previous work. 1
2 0.80754054 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity
Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng
Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1
same-paper 3 0.7878018 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning
Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng
Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1
4 0.68945122 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the non-parametric extension of group sparsity in linear models. While the two dominant algorithmic strands of sparse learning, namely convex relaxations using l1 norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorously extended for group sparsity, the sparse MKL literature has so far mainly adopted the former with mild empirical success. In this paper, we close this gap by proposing a Group-OMP based framework for sparse MKL. Unlike l1 -MKL, our approach decouples the sparsity regularizer (via a direct l0 constraint) from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a black-box single-kernel solver. The algorithmic development and empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds and sparse recovery conditions analogous to those for OMP [27] and Group-OMP [16]. 1
5 0.67550796 261 nips-2011-Sparse Filtering
Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng
Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1
6 0.66760689 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
7 0.64361542 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
8 0.62261403 244 nips-2011-Selecting Receptive Fields in Deep Networks
9 0.57556182 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
10 0.54707754 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
11 0.52691829 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
12 0.52490467 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
13 0.52172518 35 nips-2011-An ideal observer model for identifying the reference frame of objects
14 0.51827323 276 nips-2011-Structured sparse coding via lateral inhibition
15 0.51655549 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
16 0.51614034 287 nips-2011-The Manifold Tangent Classifier
17 0.51168358 156 nips-2011-Learning to Learn with Compound HD Models
18 0.50838947 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
19 0.5083819 263 nips-2011-Sparse Manifold Clustering and Embedding
20 0.50609446 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries