nips nips2011 nips2011-244 knowledge-graph by maker-knowledge-mining

244 nips-2011-Selecting Receptive Fields in Deep Networks


Source: pdf

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. [sent-6, score-0.74]

2 Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e. [sent-7, score-0.919]

3 Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. [sent-11, score-1.043]

4 We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82. [sent-13, score-0.321]

5 An important practical concern in building such networks is to specify how the features in each layer connect to the features in the layers beneath. [sent-17, score-0.964]

6 For instance, in the first layer of a network used for object recognition it is common to connect each feature extractor to a small rectangular area within a larger image instead of connecting every feature to the entire image [14, 15]. [sent-19, score-0.742]

7 In this paper, we propose a method to automatically choose such receptive fields in situations where we do not know how to specify them by hand—a situation that, as we will explain, is commonly encountered in deep networks. [sent-21, score-0.813]

8 A major obstacle to scaling up these representations further is the blowup in the number of network parameters: for n input features, a complete representation with n features requires a matrix of n2 weights—one weight for every feature and input. [sent-25, score-0.377]

9 As mentioned above, we can solve this problem by limiting the “fan in” to each feature by connecting each feature extractor to a small receptive field of inputs. [sent-27, score-0.826]

10 In this work, we will propose a method that chooses these receptive fields automatically during unsupervised training of deep networks. [sent-28, score-0.94]

11 It may not be clear yet why it is necessary to have an automated way to choose receptive fields since, after all, it is already common practice to pick receptive fields simply based on prior knowledge. [sent-31, score-1.279]

12 For instance, in local receptive field architectures for image data, we typically train a bank of linear filters that apply only to a small image patch. [sent-33, score-0.903]

13 These filters are then convolved with the input image to yield the first layer of features. [sent-34, score-0.326]

14 Though there are still spatial relationships amongst the feature values within each map, it is not clear how two features in different maps are related. [sent-37, score-0.479]

15 Thus when we train a second layer of features we must typically resort to connecting each feature to every input map or to a random subset of maps [12, 4] (though we may still take advantage of the remaining spatial organization within each map). [sent-38, score-0.812]

16 At even higher layers of deep networks, this problem becomes extreme: our array of responses will have very small spatial resolution (e. [sent-39, score-0.496]

17 , 1-by-1) yet will have a large number of maps and thus we can no longer make use of spatial receptive fields. [sent-41, score-0.809]

18 In this work we propose a way to address the problem of choosing receptive fields that is not only a flexible addition to unsupervised learning and pre-training pipelines, but that can scale up to the extremely large networks used in state-of-the-art systems. [sent-43, score-0.862]

19 In our method we select local receptive fields that group together (pre-trained) lower-level features according to a pairwise similarity metric between features. [sent-44, score-1.056]

20 Each receptive field is constructed using a greedy selection scheme so that it contains features that are similar according to the similarity metric. [sent-45, score-0.891]

21 Depending on the choice of metric, we can cause our system to choose receptive fields that are similar to those that might be learned implicitly by popular learning algorithms like ICA [11]. [sent-46, score-0.688]

22 Given the learned receptive fields (groups of features) we can subsequently apply an unsupervised learning method independently over each receptive field. [sent-47, score-1.405]

23 Using our method in conjunction with the pipeline proposed by [6], we demonstrate the ability to train multi-layered networks using only vector quantization as our unsupervised learning module. [sent-49, score-0.378]

24 [1] use a non-parametric Bayesian prior to jointly infer the depth and number of hidden units at each layer of a deep belief network during training. [sent-68, score-0.465]

25 In this work, the receptive fields will be built by analyzing the relationships between feature responses rather than relying on prior knowledge of their organization. [sent-71, score-0.761]

26 In general, these learning algorithms train a set of features (usually linear filters) such that features nearby in a pre-specified topography share certain characteristics. [sent-73, score-0.508]

27 These methods have many advantages but require us to specify a topography first, then solve a large-scale optimization problem in order to organize our features according to the given topographic layout. [sent-79, score-0.399]

28 In this work, we perform this procedure in reverse: our features are pre-trained using whatever method we like, then we will extract a useful grouping of the features post-hoc. [sent-81, score-0.447]

29 In that respect, part of the novelty in our approach is to convert existing notions of topography and statistical dependence in deep networks into a highly scalable “wrapper method” that can be re-used with other algorithms. [sent-83, score-0.329]

30 , how to “learn” the receptive field structure of the high-level features) from an arbitrary set of data based on a particular pairwise similarity metric: squarecorrelation of feature responses. [sent-86, score-0.839]

31 , pixel values) but will usually be features generated by lower layers of a deep network. [sent-94, score-0.646]

32 1 Similarity of Features In order to group features together, we must first define a similarity metric between features. [sent-96, score-0.364]

33 By putting such features in the same receptive field, we allow their relationship to be modeled more finely by higher level learning algorithms. [sent-100, score-0.813]

34 Meanwhile, it also makes sense to model seemingly independent subsets of features separately, and thus we would like such features to end up in different receptive fields. [sent-101, score-1.013]

35 The idea is that if our dataset X consists of linearly uncorrelated features (as can be obtained by applying a whitening procedure), then a measure of the higher-order dependence between two features can be obtained by looking at the correlation of their energies (squared responses). [sent-104, score-0.661]

36 3 between features xj and xk as the correlation between the squared responses: S[xj , xk ] = corr(x2 , x2 ) = E x2 x2 − 1 / j k j k E x4 − 1 E [x4 − 1]. [sent-106, score-0.492]

37 j k This metric is easy to compute by first whitening our input dataset with ZCA2 whitening [2], then computing the pairwise similarities between all of the features: Sj,k ≡ SX [xj , xk ] ≡ i (i) 2 (i) 2 xk xj (i) 4 i (xj − 1) −1 (i) 4 i (xk . [sent-107, score-0.659]

38 We then construct a receptive field Rn that contains the features xk corresponding to the top T values of Sjn ,k . [sent-124, score-0.909]

39 Upon completion, we have N (possibly overlapping) receptive fields Rn that can be used during training of the next layer of features. [sent-126, score-0.919]

40 3 Approximate Similarity Computing the similarity matrix Sj,k using square correlation is practical for fairly large numbers of features using the obvious procedure given above. [sent-128, score-0.354]

41 However, if we want to learn receptive fields over huge numbers of features (as arise, for instance, when we use hundreds or thousands of maps), we may often be unable to compute S directly. [sent-129, score-0.813]

42 For instance, as explained above, if we use square correlation as our similarity criterion then we must perform whitening over a large number of features. [sent-130, score-0.309]

43 To avoid performing the whitening step for all of the input features, we can instead perform pair-wise whitening between features. [sent-133, score-0.31]

44 Specifically, to compute the squared correlation of xj and xk , we whiten the jth and kth columns of X together (independently of all other columns), then compute the square correlation between the whitened values xj and xk . [sent-134, score-0.525]

45 For instance, for a given “seed”, the receptive field chosen using this approximation typically overlaps with the “true” receptive field (computed with full whitening) by 70% or more. [sent-136, score-1.226]

46 We can typically implement these computations in a single pass over the dataset that accumulates the needed statistics and then selects the receptive fields based on the results. [sent-143, score-0.649]

47 In this section we will briefly review this system as it is used in conjunction with our receptive field learning approach, but it should be noted that our basic method is equally applicable to many other choices of processing pipeline and unsupervised learning method. [sent-150, score-0.826]

48 The architecture proposed by [6] works by constructing a feature representation of a small image patch (say, a 6-by-6 pixel region) and then extracting these features from many overlapping patches within a larger image (much like a convolutional neural net). [sent-151, score-0.826]

49 Let X ∈ Rm×108 be a dataset composed of a large number of 3-channel (RGB), 6-by-6 pixel image patches extracted from random locations in unlabeled training images and let x(i) ∈ R108 be the vector of RGB pixel values representing the ith patch. [sent-152, score-0.458]

50 To compute its feature representation we simply extract features from every overlapping patch within the image (using a stride of 1 pixel between patches) and then concatenate all of the features into a single vector, yielding a (usually large) new representation of the entire image. [sent-166, score-0.886]

51 Clearly we can modify this procedure to use choices of receptive fields other than 6-by-6 patches of images. [sent-167, score-0.684]

52 In general, if X is now any training set (not necessarily image patches), we can define XRn as the training set X reduced to include only the features in one receptive field, Rn (that is, we simply discard all of the columns of X that do not correspond to features in Rn ). [sent-170, score-1.177]

53 We may then apply the feature learning and extraction methods above to each XRn separately, just as we would for the hand-chosen patch receptive fields used in previous work. [sent-171, score-0.741]

54 5 Network Details The above components, conceptually, allow us to lump together arbitrary types and quantities of data into our unlabeled training set and then automatically partition them into receptive fields in order to learn higher-level features. [sent-173, score-0.736]

55 The automated receptive field selection can choose receptive fields that span multiple feature maps, but the receptive fields will often span only small spatial areas (since features extracted from locations far apart tend to appear nearly independent). [sent-174, score-2.326]

56 Note that this is mainly to reduce the expense of feature extraction and to allow us to use spatial pooling (which introduces some invariance between layers of features); the receptive field selection method itself can be applied to hundreds of thousands of inputs. [sent-176, score-1.118]

57 First, there is little point in applying the receptive field learning method to the raw pixel layer. [sent-178, score-0.723]

58 Thus, we use 6-by-6 pixel receptive fields with a stride (step) of 1 pixel between them for the first layer of features. [sent-179, score-1.149]

59 , K1 filters), then a 32-by-32 pixel color image takes on a 27-by-27-by-K1 representation after the first layer of (convolutional) feature extraction. [sent-182, score-0.547]

60 Second, depending on the unsupervised learning module, it can be difficult to learn features that are invariant to image transformations like translation. [sent-183, score-0.4]

61 Thus, applied to the 27-by-27-by-K1 representation from layer 1, this yields a 9-by-9-by-K1 pooled representation. [sent-186, score-0.337]

62 After extracting the 9-by-9-by-K1 pooled representation from the first two layers, we apply our receptive field selection method. [sent-187, score-0.692]

63 As explained above, it is useful to retain spatial structure so that we can perform spatial pooling and convolutional feature extraction. [sent-189, score-0.446]

64 Thus, rather than applying our algorithm to the entire input, we apply the receptive field learning to 2-by-2 spatial regions within the 9-by-9-by-K1 pooled representation. [sent-190, score-0.793]

65 Thus the receptive field learning algorithm must find receptive fields to cover 2 × 2 × K1 inputs. [sent-191, score-1.226]

66 The next layer of feature learning then operates on each receptive field within the 2-by-2 spatial regions separately. [sent-192, score-1.049]

67 This is similar to the structure commonly employed by prior work [4, 12], but here we are able to choose receptive fields that span several feature maps in a deliberate way while also exploiting knowledge of the spatial structure. [sent-193, score-0.948]

68 In our experiments we will benchmark our system on image recognition datasets using K1 = 1600 first layer maps and K2 = 3200 second layer maps learned from N = 32 receptive fields. [sent-194, score-1.485]

69 When we use three layers, we apply an additional 2-by-2 average pooling stage to the layer 2 outputs (with stride of 1) and then train K3 = 3200 third layer maps (again with N = 32 receptive fields). [sent-195, score-1.439]

70 To construct a final feature representation for classification, the outputs of the first and second layers of trained features are average-pooled over quadrants as is done by [6]. [sent-196, score-0.5]

71 Thus, our first layer of features result in 1600 × 4 = 6400 values in the final feature vector, and our second layer of features results in 3200 × 4 = 12800 values. [sent-197, score-0.999]

72 The features for all layers are then concatenated into a single long vector and used to train a linear classifier (L2-SVM). [sent-199, score-0.438]

73 For each set of experiments we provide test results for 1 to 3 layers of features, where the receptive fields for the 2nd and 3rd layers of features are learned using the method of Section 3. [sent-206, score-1.238]

74 For comparison, we also provide test results in each case using several alternative receptive field choices. [sent-208, score-0.613]

75 In particular, we have also tested architectures where we use a single receptive field (N = 1) 6 where R1 contains all of the inputs, and random receptive fields (N = 32) where Rn is filled according to the same algorithm as in Section 3. [sent-209, score-1.306]

76 For instance, the completely-connected layers are connected to all the maps within a 2-by-2 spatial window. [sent-214, score-0.385]

77 Finally, we will also provide test results using a larger 1st layer representation (K1 = 4800 maps) to verify that the performance gains we achieve are not merely the result of passing more projections of the data to the supervised classification stage. [sent-215, score-0.316]

78 1 CIFAR-10 Learned 2nd-layer Receptive Fields and Features Before we look at classification results, we first inspect the learned features and their receptive fields from the second layer (i. [sent-219, score-1.118]

79 , the features that take the pooled first-layer responses as their input). [sent-221, score-0.316]

80 Figure 1 shows two typical examples of receptive fields chosen by our method when using squarecorrelation as the similarity metric. [sent-222, score-0.73]

81 In both of the examples, the receptive field incorporates filters with similar orientation tuning but varying phase, frequency and, sometimes, varying color. [sent-223, score-0.613]

82 Figure 1: Two examples of receptive fields chosen from 2-by-2-by-1600 image representations. [sent-226, score-0.681]

83 Only the most strongly dependent features from the T = 200 total features are shown. [sent-228, score-0.4]

84 ) We also visualize some of the higher-level features constructed by the vector quantization algorithm when applied to these two receptive fields. [sent-232, score-0.869]

85 The filters obtained from VQ assign weights to each of the lower level features in the receptive field. [sent-233, score-0.813]

86 The 5 most inhibitory and excitatory inputs for two learned features are shown in Figure 2 (one from each receptive field in Figure 1). [sent-235, score-0.992]

87 We first note the comparison of our 2nd layer results with the alternative of a single large 1st layer using an equivalent number of maps (4800) and see that, indeed, our 2nd layer created with learned receptive fields performs better (81. [sent-241, score-1.535]

88 We also see that the random and single receptive field choices work poorly, barely matching the smaller single-layer network. [sent-245, score-0.645]

89 This appears to confirm our belief that grouping together similar features is necessary to allow our unsupervised learning module (VQ) to identify useful higher-level structure in the data. [sent-246, score-0.444]

90 0%) It is difficult to assess the strength of feature learning methods on the full CIFAR dataset because the performance may be attributed to the success of the supervised SVM training and not the unsupervised feature training. [sent-276, score-0.382]

91 As with the full CIFAR dataset, we note that it was not possible to achieve equivalent performance by merely expanding the first layer or by using either of the alternative receptive field structures (which, again, make minimal gains over a single layer). [sent-280, score-0.901]

92 We used the same architecture for this dataset as for CIFAR, but rather than train our features each time on the labeled training fold (which is too small), we use 20000 examples taken from the unlabeled dataset. [sent-301, score-0.489]

93 We note, one more time, that none of the alternative architectures (which roughly represent common practice for training deep networks) makes significant gains over the single layer system. [sent-305, score-0.563]

94 5 Conclusions We have proposed a method for selecting local receptive fields in deep networks. [sent-306, score-0.785]

95 Inspired by the grouping behavior of topographic learning methods, our algorithm selects qualitatively similar groups of features directly using arbitrary choices of similarity metric, while also being compatible with any unsupervised learning algorithm we wish to use. [sent-307, score-0.604]

96 We expect that the method proposed here is a useful new tool for managing extremely large, higher-level feature representations where more traditional spatio-temporal local receptive fields are unhelpful or impossible to employ successfully. [sent-311, score-0.746]

97 3 Our networks are still trained unsupervised from the entire training set. [sent-312, score-0.306]

98 An analysis of single-layer networks in unsupervised feature learning. [sent-349, score-0.307]

99 Emergence of complex-like cells in a temporal product network with local receptive fields, 2010. [sent-365, score-0.67]

100 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-405, score-0.402]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('receptive', 0.613), ('layer', 0.258), ('elds', 0.201), ('features', 0.2), ('layers', 0.189), ('whitening', 0.155), ('deep', 0.147), ('rf', 0.135), ('unsupervised', 0.132), ('cifar', 0.125), ('jk', 0.125), ('topographic', 0.115), ('pixel', 0.11), ('eld', 0.109), ('stl', 0.103), ('pooling', 0.102), ('maps', 0.101), ('xk', 0.096), ('spatial', 0.095), ('lters', 0.093), ('networks', 0.092), ('vq', 0.088), ('feature', 0.083), ('architecture', 0.082), ('architectures', 0.08), ('similarity', 0.078), ('convolutional', 0.071), ('image', 0.068), ('responses', 0.065), ('ica', 0.06), ('topography', 0.059), ('stride', 0.058), ('quantization', 0.056), ('xj', 0.054), ('pooled', 0.051), ('zca', 0.051), ('pipeline', 0.049), ('train', 0.049), ('inputs', 0.049), ('training', 0.048), ('unlabeled', 0.047), ('learned', 0.047), ('grouping', 0.047), ('extractor', 0.047), ('excitatory', 0.046), ('correlation', 0.046), ('group', 0.045), ('patch', 0.045), ('whitened', 0.044), ('rgb', 0.044), ('rn', 0.044), ('metric', 0.041), ('coates', 0.04), ('recognition', 0.039), ('patches', 0.039), ('squarecorrelation', 0.039), ('xrn', 0.039), ('module', 0.037), ('inhibitory', 0.037), ('seed', 0.037), ('dataset', 0.036), ('expense', 0.036), ('entire', 0.034), ('blowup', 0.034), ('hyvarinen', 0.034), ('network', 0.032), ('overlapping', 0.032), ('choices', 0.032), ('stanford', 0.031), ('pinto', 0.031), ('whiten', 0.031), ('extractors', 0.031), ('scalable', 0.031), ('object', 0.03), ('gains', 0.03), ('coding', 0.03), ('square', 0.03), ('tending', 0.029), ('saxe', 0.029), ('together', 0.028), ('fields', 0.028), ('span', 0.028), ('screening', 0.028), ('representation', 0.028), ('choose', 0.028), ('units', 0.028), ('labeled', 0.027), ('unstructured', 0.027), ('xx', 0.027), ('pairwise', 0.026), ('instance', 0.026), ('organization', 0.026), ('connections', 0.026), ('benchmarks', 0.026), ('automated', 0.025), ('extremely', 0.025), ('adams', 0.025), ('local', 0.025), ('specify', 0.025), ('energies', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 244 nips-2011-Selecting Receptive Fields in Deep Networks

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

2 0.41683188 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng

Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1

3 0.23125519 261 nips-2011-Sparse Filtering

Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng

Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1

4 0.21176554 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning

Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng

Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1

5 0.20763142 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Author: Yan Karklin, Eero P. Simoncelli

Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.

6 0.1966684 250 nips-2011-Shallow vs. Deep Sum-Product Networks

7 0.18612789 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

8 0.16250855 276 nips-2011-Structured sparse coding via lateral inhibition

9 0.14960915 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

10 0.14667539 156 nips-2011-Learning to Learn with Compound HD Models

11 0.13760294 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis

12 0.1189393 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

13 0.10373282 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

14 0.10156802 287 nips-2011-The Manifold Tangent Classifier

15 0.097573861 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

16 0.093929537 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

17 0.093223922 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability

18 0.087318815 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

19 0.086070865 4 nips-2011-A Convergence Analysis of Log-Linear Training

20 0.083765082 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.237), (1, 0.208), (2, 0.044), (3, 0.106), (4, 0.057), (5, 0.124), (6, 0.174), (7, 0.313), (8, -0.036), (9, -0.312), (10, -0.101), (11, -0.116), (12, 0.127), (13, -0.101), (14, 0.032), (15, 0.061), (16, 0.053), (17, -0.032), (18, -0.038), (19, -0.019), (20, -0.126), (21, -0.062), (22, -0.054), (23, -0.02), (24, -0.005), (25, 0.035), (26, -0.007), (27, 0.009), (28, 0.008), (29, -0.093), (30, 0.125), (31, -0.011), (32, -0.046), (33, 0.062), (34, -0.013), (35, 0.103), (36, -0.056), (37, -0.023), (38, 0.049), (39, -0.093), (40, -0.083), (41, 0.024), (42, -0.004), (43, -0.003), (44, -0.024), (45, -0.035), (46, 0.032), (47, 0.064), (48, 0.027), (49, 0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96001881 244 nips-2011-Selecting Receptive Fields in Deep Networks

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

2 0.88567376 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng

Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1

3 0.76417261 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning

Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng

Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1

4 0.68025571 261 nips-2011-Sparse Filtering

Author: Jiquan Ngiam, Zhenghao Chen, Sonia A. Bhaskar, Pang W. Koh, Andrew Y. Ng

Abstract: Unsupervised feature learning has been shown to be effective at learning representations that perform well on image, video and audio classification. However, many existing feature learning algorithms are hard to use and require extensive hyperparameter tuning. In this work, we present sparse filtering, a simple new algorithm which is efficient and only has one hyperparameter, the number of features to learn. In contrast to most other feature learning methods, sparse filtering does not explicitly attempt to construct a model of the data distribution. Instead, it optimizes a simple cost function – the sparsity of 2 -normalized features – which can easily be implemented in a few lines of MATLAB code. Sparse filtering scales gracefully to handle high-dimensional inputs, and can also be used to learn meaningful features in additional layers with greedy layer-wise stacking. We evaluate sparse filtering on natural images, object classification (STL-10), and phone classification (TIMIT), and show that our method works well on a range of different modalities. 1

5 0.59761423 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Author: Yan Karklin, Eero P. Simoncelli

Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.

6 0.5955978 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

7 0.57167333 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

8 0.54625738 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

9 0.52850246 250 nips-2011-Shallow vs. Deep Sum-Product Networks

10 0.52669328 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

11 0.50901103 156 nips-2011-Learning to Learn with Compound HD Models

12 0.49418998 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis

13 0.49116713 184 nips-2011-Neuronal Adaptation for Sampling-Based Probabilistic Inference in Perceptual Bistability

14 0.48350987 287 nips-2011-The Manifold Tangent Classifier

15 0.47830138 276 nips-2011-Structured sparse coding via lateral inhibition

16 0.47515348 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

17 0.41285592 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

18 0.40504378 293 nips-2011-Understanding the Intrinsic Memorability of Images

19 0.39650559 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

20 0.37061799 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.058), (4, 0.031), (20, 0.046), (26, 0.015), (30, 0.092), (31, 0.065), (33, 0.032), (43, 0.078), (45, 0.123), (57, 0.045), (65, 0.15), (74, 0.066), (83, 0.038), (84, 0.027), (99, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90171939 93 nips-2011-Extracting Speaker-Specific Information with a Regularized Siamese Deep Network

Author: Ke Chen, Ahmad Salman

Abstract: Speech conveys different yet mixed information ranging from linguistic to speaker-specific components, and each of them should be exclusively used in a specific task. However, it is extremely difficult to extract a specific information component given the fact that nearly all existing acoustic representations carry all types of speech information. Thus, the use of the same representation in both speech and speaker recognition hinders a system from producing better performance due to interference of irrelevant information. In this paper, we present a deep neural architecture to extract speaker-specific information from MFCCs. As a result, a multi-objective loss function is proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. With LDC benchmark corpora and a Chinese speech corpus, we demonstrate that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. We discuss relevant issues and relate our approach to previous work. 1

2 0.89199775 298 nips-2011-Unsupervised learning models of primary cortical receptive fields and receptive field plasticity

Author: Maneesh Bhand, Ritvik Mudur, Bipin Suresh, Andrew Saxe, Andrew Y. Ng

Abstract: The efficient coding hypothesis holds that neural receptive fields are adapted to the statistics of the environment, but is agnostic to the timescale of this adaptation, which occurs on both evolutionary and developmental timescales. In this work we focus on that component of adaptation which occurs during an organism’s lifetime, and show that a number of unsupervised feature learning algorithms can account for features of normal receptive field properties across multiple primary sensory cortices. Furthermore, we show that the same algorithms account for altered receptive field properties in response to experimentally altered environmental statistics. Based on these modeling results we propose these models as phenomenological models of receptive field plasticity during an organism’s lifetime. Finally, due to the success of the same models in multiple sensory areas, we suggest that these algorithms may provide a constructive realization of the theory, first proposed by Mountcastle [1], that a qualitatively similar learning algorithm acts throughout primary sensory cortices. 1

3 0.88213891 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

Author: Vikas Sindhwani, Aurelie C. Lozano

Abstract: We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the non-parametric extension of group sparsity in linear models. While the two dominant algorithmic strands of sparse learning, namely convex relaxations using l1 norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorously extended for group sparsity, the sparse MKL literature has so far mainly adopted the former with mild empirical success. In this paper, we close this gap by proposing a Group-OMP based framework for sparse MKL. Unlike l1 -MKL, our approach decouples the sparsity regularizer (via a direct l0 constraint) from the smoothness regularizer (via RKHS norms), which leads to better empirical performance and a simpler optimization procedure that only requires a black-box single-kernel solver. The algorithmic development and empirical studies are complemented by theoretical analyses in terms of Rademacher generalization bounds and sparse recovery conditions analogous to those for OMP [27] and Group-OMP [16]. 1

4 0.86282939 124 nips-2011-ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning

Author: Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, Andrew Y. Ng

Abstract: Independent Components Analysis (ICA) and its variants have been successfully used for unsupervised feature learning. However, standard ICA requires an orthonoramlity constraint to be enforced, which makes it difficult to learn overcomplete features. In addition, ICA is sensitive to whitening. These properties make it challenging to scale ICA to high dimensional data. In this paper, we propose a robust soft reconstruction cost for ICA that allows us to learn highly overcomplete sparse features even on unwhitened data. Our formulation reveals formal connections between ICA and sparse autoencoders, which have previously been observed only empirically. Our algorithm can be used in conjunction with off-the-shelf fast unconstrained optimizers. We show that the soft reconstruction cost can also be used to prevent replicated features in tiled convolutional neural networks. Using our method to learn highly overcomplete sparse features and tiled convolutional neural networks, we obtain competitive performances on a wide variety of object recognition tasks. We achieve state-of-the-art test accuracies on the STL-10 and Hollywood2 datasets. 1

same-paper 5 0.85860121 244 nips-2011-Selecting Receptive Fields in Deep Networks

Author: Adam Coates, Andrew Y. Ng

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded “local receptive fields” that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively. 1

6 0.85203183 77 nips-2011-Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

7 0.83892488 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

8 0.83407772 261 nips-2011-Sparse Filtering

9 0.78283048 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

10 0.76152271 276 nips-2011-Structured sparse coding via lateral inhibition

11 0.75770468 186 nips-2011-Noise Thresholds for Spectral Clustering

12 0.75729609 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

13 0.75490481 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

14 0.75255138 265 nips-2011-Sparse recovery by thresholded non-negative least squares

15 0.74956846 263 nips-2011-Sparse Manifold Clustering and Embedding

16 0.74706841 156 nips-2011-Learning to Learn with Compound HD Models

17 0.74694937 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

18 0.74682194 287 nips-2011-The Manifold Tangent Classifier

19 0.7428453 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

20 0.74247897 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons