nips nips2010 nips2010-271 knowledge-graph by maker-knowledge-mining

271 nips-2010-Tiled convolutional neural networks


Source: pdf

Author: Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Abstract: Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. [sent-6, score-0.291]

2 We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. [sent-8, score-1.061]

3 By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. [sent-9, score-0.388]

4 However, one disadvantage of this hard-coding approach is that the pooling architecture captures only translational invariance; the network does not, for example, pool across units that are rotations of each other or capture more complex invariances, such as out-of-plane rotations. [sent-17, score-0.521]

5 Is it better to hard-code translational invariance – since this is a useful form of prior knowledge – or let the network learn its own invariances from unlabeled data? [sent-18, score-0.313]

6 In particular, we present tiled convolutional networks (Tiled CNNs), which use a novel weight-tying scheme (“tiling”) that simultaneously enjoys the benefit of significantly reducing the number of learnable parameters while giving the algorithm flexibility to learn other invariances. [sent-20, score-0.757]

7 In order to learn these invariances from unlabeled data, we employ unsupervised pretraining, which has been shown to help performance [5, 6, 7]. [sent-22, score-0.224]

8 In particular, we use a modification of Topographic ICA (TICA) [8], which learns to organize features in a topographical map by pooling together groups 1 Figure 1: Left: Convolutional Neural Networks with local receptive fields and tied weights. [sent-23, score-0.415]

9 Right: Partially untied local receptive field networks – Tiled CNNs. [sent-24, score-0.32]

10 Units with the same color belong to the same map; within each map, units with the same fill texture have tied weights. [sent-25, score-0.224]

11 By pooling together local groups of features, it produces representations that are robust to local transformations [9]. [sent-28, score-0.26]

12 The resulting Tiled CNNs pretrained with TICA are indeed able to learn invariant representations, with pooling units that are robust to both scaling and rotation. [sent-30, score-0.418]

13 2 Tiled CNNs CNNs [1, 11] are based on two key concepts: local receptive fields, and weight-tying. [sent-32, score-0.172]

14 Using local receptive fields means that each unit in the network only “looks” at a small, localized region of the input image. [sent-33, score-0.248]

15 This is more computationally efficient than having full receptive fields, and allows CNNs to scale up well. [sent-34, score-0.127]

16 This reduces the number of learnable parameters, and (by pooling over neighboring units) further hard-codes translational invariance into the model. [sent-36, score-0.338]

17 Even though weight-tying allows one to hard-code translational invariance, it also prevents the pooling units from capturing more complex invariances, such as scale and rotation invariance. [sent-37, score-0.417]

18 This is because the second layer units are constrained to pool over translations of identical bases. [sent-38, score-0.295]

19 This lets second-layer units pool over simple units that have different basis functions, and hence learn a more complex range of invariances. [sent-40, score-0.429]

20 ” Tiled CNNs are parametrized by a tile size k: we constrain only units that are k steps away from each other to be tied. [sent-42, score-0.293]

21 By varying k, we obtain a spectrum of models which trade off between being able to learn complex invariances, and having few learnable parameters. [sent-43, score-0.133]

22 At one end of the spectrum we have traditional CNNs (k = 1), and at the other, we have fully untied simple units. [sent-44, score-0.13]

23 A map is a set of pooling units and simple units that collectively cover the entire image (see Figure 1-Right). [sent-46, score-0.527]

24 When varying the tiling size, we change the degree of weight tying within each map; for example, if k = 1, the simple units within each map will have the same weights. [sent-47, score-0.256]

25 In our model, simple units in different maps are never tied. [sent-48, score-0.257]

26 By having units in different maps learn different features, our model can learn a rich and diverse set of features. [sent-49, score-0.317]

27 Tiled CNNs with multiple maps enjoy the twin benefits of (i) being able to represent complex invariances, by pooling over (partially) untied weights, and (ii) having a relatively small number of learnable parameters. [sent-50, score-0.43]

28 Unfortunately, existing methods for pretraining CNNs [11, 12] are not suitable for untied weights; for example, the CDBN algorithm [11] breaks down without the weight-tying constraints. [sent-53, score-0.25]

29 In the following sections, we discuss a pretraining method for Tiled CNNs based on the TICA algorithm. [sent-54, score-0.137]

30 3 Unsupervised feature learning via TICA TICA is an unsupervised learning algorithm that learns features from unlabeled image patches. [sent-55, score-0.13]

31 The weights W in the first layer are learned, while the weights V in the second layer are fixed and hard-coded to represent the neighborhood/topographical structure of the neurons in the first layer. [sent-57, score-0.308]

32 Specifically, each second layer hidden unit pi pools over a small neighborhood of adjacent first layer units hi . [sent-58, score-0.494]

33 We call the hi and pi simple and pooling units, respectively. [sent-59, score-0.233]

34 More precisely, given an input pattern x(t) , the activation of each second layer unit is m (t) n 2 pi (x(t) ; W, V ) = k=1 Vik ( j=1 Wkj xj ) . [sent-60, score-0.158]

35 1 Here, W ∈ Rm×n and V ∈ Rm×m , where n t=1 is the size of the input and m is the number of hidden units in a layer. [sent-62, score-0.194]

36 V is a fixed matrix (Vij = 1 or 0) that encodes the 2D topography of the hidden units hi . [sent-63, score-0.252]

37 Specifically, the hi units lie on a 2D grid, with each pi connected to a contiguous 3x3 (or other size) block of hi units. [sent-64, score-0.306]

38 One important property of TICA is that it can learn invariances even when trained only on unlabeled data, as demonstrated in [8, 9]. [sent-67, score-0.187]

39 This is due both to the pooling architecture, which gives rise to pooling units that are robust to local transformations of their inputs, and the learning algorithm, which promotes selectivity by optimizing for sparsity. [sent-68, score-0.524]

40 If we choose square and square-root activations for the simple and pooling units in the Tiled CNN, we can view the Tiled CNN as a special case of a TICA network, with the topography of the pooling units specifying the matrix V . [sent-70, score-0.684]

41 3 Crucially, Tiled CNNs incorporate local receptive fields, which play an important role in speeding up TICA. [sent-71, score-0.172]

42 When learning overcomplete representations [14], the orthogonality constraint cannot be satisfied exactly, and we instead try to satisfy an approximate orthogonality constraint [15]. [sent-77, score-0.154]

43 We can avoid approximate orthogonalization by using local receptive fields, which are inherently built into Tiled CNNs. [sent-80, score-0.249]

44 This locality constraint automatically ensures that the weights of any two simple units with non-overlapping receptive fields are orthogonal, without the need for an explicit orthogonality constraint. [sent-82, score-0.428]

45 Empirically, we find that orthogonalizing partially overlapping receptive fields is not necessary for learning distinct, informative features either. [sent-83, score-0.15]

46 However, orthogonalization is still needed to decorrelate units that occupy the same position in their respective maps, for they look at the same region on the image. [sent-84, score-0.254]

47 Specifically, so long as l ≤ s, we can demand that these l units that share an input patch be orthogonal. [sent-86, score-0.196]

48 We note that setting k to its maximum value of n − s + 1 gives exactly the untied local TICA model outlined in the previous section. [sent-93, score-0.158]

49 5α end while W ← W new until convergence Our pretraining algorithm, which is based on gradient descent on the TICA objective function (1), is shown in Algorithm 1. [sent-95, score-0.137]

50 For example, when trained on natural images, TICA’s first layer weights usually resemble localized Gabor filters (Figure 2-Right). [sent-98, score-0.195]

51 4 In orthogonalize local RF (W new ), we only orthogonalize the weights that have completely overlapping receptive fields. [sent-100, score-0.359]

52 1 Speed-up We first establish that the local receptive fields intrinsic to Tiled CNNs allows us to implement TICA learning for overcomplete representations in a much more efficient manner. [sent-106, score-0.246]

53 Figure 3 shows the relative speed-up of pretraining Tiled CNNs over standard TICA using approximate fixed-point orthogonalization 1 3 (W = 2 W − 2 W W T W )[15]. [sent-107, score-0.214]

54 Hence, the speed-up observed here is not from an efficient convolutional implementation, but purely due to the local receptive fields. [sent-112, score-0.298]

55 2 Classification on NORB Next, we show that TICA pretraining for Tiled CNNs performs well on object recognition. [sent-116, score-0.166]

56 6 In our classification experiments, we fix the size of each local receptive field to 8x8, and set V such that each pooling unit pi in the second layer pools over a block of 3x3 simple units in the first layer, without wraparound at the borders. [sent-119, score-0.694]

57 The number of pooling units in each map is exactly the same as the number of simple units. [sent-120, score-0.35]

58 We densely tile the input images with overlapping 8x8 local receptive fields, with a step size (or “stride”) of 1. [sent-121, score-0.314]

59 This gives us 25 × 25 = 625 simple units and 625 pooling units per map in our experiments on 32x32 images. [sent-122, score-0.527]

60 1 Unsupervised pretraining We first consider the case in which the features are learned purely from unsupervised data. [sent-126, score-0.258]

61 We call this initial phase the unsupervised pretraining phase. [sent-147, score-0.198]

62 , the activations of the pooling units) on the labeled training set. [sent-150, score-0.172]

63 During this supervised training phase, only the weights of the linear classifier were learned, while the lower weights of the Tiled CNN model remained fixed. [sent-151, score-0.172]

64 We train a range of models to investigate the role of the tile size k and the number of maps l. [sent-152, score-0.196]

65 Using a randomly sampled hold-out validation set of 2430 examples (10%) taken from the training set, we selected a convolutional model with 48 maps that achieved an accuracy of 94. [sent-154, score-0.263]

66 2 Supervised finetuning of W Next, we study the effects of supervised finetuning [23] on the models produced by the unsupervised pretraining phase. [sent-158, score-0.227]

67 Using softmax regression to calculate the gradients, we backpropagated the error signal from the output back to the learned features in order to update W , the weights of the simple units in the Tiled CNN model. [sent-160, score-0.281]

68 The best performing fine-tuned model on the validation set was the model with 16 maps and k = 2, which achieved a test-set accuracy of 96. [sent-165, score-0.133]

69 3 Limited training data To test the ability of our pretrained features to generalize across rotations and lighting conditions given only a weak supervised signal, we limited the labeled training set to comprise only examples with a particular set of viewing angles and lighting conditions. [sent-170, score-0.226]

70 Models were trained with various untied map sizes k ∈ {1, 2, 9, 16, 25} and number of maps l ∈ {4, 6, 10, 16}. [sent-175, score-0.239]

71 When k = 1, we were able to use an efficient convolutional implementation to scale up the number of maps in the models, allowing us to train additional models with l ∈ {22, 36, 48}. [sent-176, score-0.189]

72 6 Figure 4: Left: NORB test set accuracy across various tile sizes and numbers of maps, without finetuning. [sent-177, score-0.144]

73 5x overcomplete model with k = 2 and 4 maps obtained an accuracy of 64. [sent-187, score-0.163]

74 1 Unsupervised pretraining and supervised finetuning As before, models were trained with tile size k ∈ {1, 2, 25}, and number of maps l ∈ {4, 10, 16, 22, 32}. [sent-213, score-0.386]

75 The convolutional model (k = 1) was also trained with l = 48 maps. [sent-214, score-0.133]

76 This 48-map convolutional model performed the best on our 10% hold-out validation set, and achieved a test set accuracy of 66. [sent-215, score-0.179]

77 We find that supervised finetuning of these models on CIFAR-10 causes overfitting, and generally reduces test-set accuracy; the top model on the validation set, with 32 maps and k = 1, only achieves 65. [sent-217, score-0.134]

78 2 Deep Tiled CNNs We additionally investigate the possibility of training a deep Tiled CNN in a greedy layer-wise fashion, similar to models such as DBNs [6] and stacked autoencoders [26, 18]. [sent-223, score-0.139]

79 The resulting four-layer network has the structure W1 → V1 → W2 → V2 , where the weights W1 are local receptive fields of size 4x4, and W2 is of size 3x3, i. [sent-225, score-0.296]

80 , each unit in the third layer “looks” at a 3x3 window of each of the 10 maps in the first layer. [sent-227, score-0.203]

81 The number of maps in the third and fourth layer is also 10. [sent-229, score-0.173]

82 After finetuning, we found that the deep model outperformed all previous models on the validation set, and achieved a test set accuracy of 73. [sent-230, score-0.166]

83 This demonstrates the potential of deep Tiled CNNs to learn more complex representations. [sent-232, score-0.146]

84 4 Effects of optimizing the pooling units When the tile size is 1 (i. [sent-234, score-0.444]

85 , a fully tied model), a na¨ve approach to learn the filter weights is to ı directly train the first layer filters using small patches (e. [sent-236, score-0.231]

86 We use ICA to learn the first layer weights on CIFAR-10 with 16 filters. [sent-241, score-0.184]

87 These weights are then used in a Tiled CNN with a tile size of 1 and 16 maps. [sent-242, score-0.177]

88 This method is compared to pretraining the model of the same architecture with TICA. [sent-243, score-0.168]

89 54% on the test set, while pretraining with TICA achieves ı 58. [sent-246, score-0.154]

90 These results confirm that optimizing for sparsity of the pooling units results in better features than just na¨vely approximating the first layer weights. [sent-248, score-0.444]

91 Specifically, we find that selecting a tile size of k = 2 achieves the best results for both the NORB and CIFAR-10 datasets, even with deep networks. [sent-250, score-0.212]

92 More importantly, untying weights allow the networks to learn more complex invariances from unlabeled data. [sent-251, score-0.329]

93 By visualizing [28, 29] the range of optimal stimulus that activate each pooling unit in a Tiled CNN, we found units that were scale and rotationally invariant. [sent-252, score-0.376]

94 A natural choice of the tile size k would be to set it to the size of the pooling region p, which in this case is 3. [sent-254, score-0.284]

95 In this case, each pooling unit always combines simple units which are not tied. [sent-255, score-0.358]

96 Our preliminary results on networks pretrained using 250000 unlabeled images from the Tiny images dataset [30] show that performance increases as k goes from 1 to 3, flattening out at k = 4. [sent-258, score-0.173]

97 In this paper, we introduced Tiled CNNs as an extension of CNNs that support both unsupervised pretraining and weight tiling. [sent-260, score-0.198]

98 Furthermore, the use of local receptive fields enable our models to scale up well, producing massively overcomplete representations that perform well on classification tasks. [sent-262, score-0.246]

99 Best practices for convolutional neural networks applied to visual document analysis. [sent-281, score-0.144]

100 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-336, score-0.192]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cnns', 0.526), ('tiled', 0.517), ('tica', 0.33), ('netuning', 0.188), ('units', 0.177), ('norb', 0.175), ('pooling', 0.151), ('pretraining', 0.137), ('receptive', 0.127), ('untied', 0.113), ('convolutional', 0.109), ('tile', 0.099), ('deep', 0.096), ('cnn', 0.095), ('layer', 0.093), ('invariances', 0.087), ('maps', 0.08), ('orthogonalization', 0.077), ('translational', 0.069), ('learnable', 0.066), ('orthogonalize', 0.063), ('weights', 0.061), ('unsupervised', 0.061), ('overcomplete', 0.055), ('invariance', 0.052), ('untying', 0.05), ('wkj', 0.05), ('hi', 0.047), ('tied', 0.047), ('elds', 0.047), ('unlabeled', 0.046), ('local', 0.045), ('pretrained', 0.04), ('orthogonality', 0.04), ('tiling', 0.038), ('vik', 0.038), ('pi', 0.035), ('networks', 0.035), ('topographic', 0.034), ('saxe', 0.033), ('architecture', 0.031), ('tiny', 0.031), ('hyvarinen', 0.03), ('unit', 0.03), ('learn', 0.03), ('network', 0.029), ('object', 0.029), ('supervised', 0.029), ('topography', 0.028), ('accuracy', 0.028), ('lighting', 0.028), ('images', 0.026), ('layers', 0.026), ('validation', 0.025), ('qpm', 0.025), ('pool', 0.025), ('trained', 0.024), ('na', 0.024), ('rf', 0.023), ('locality', 0.023), ('features', 0.023), ('classi', 0.022), ('sees', 0.022), ('ica', 0.022), ('map', 0.022), ('elevations', 0.022), ('autoencoders', 0.022), ('rbm', 0.022), ('bengio', 0.021), ('training', 0.021), ('recognition', 0.021), ('learned', 0.02), ('complex', 0.02), ('lcc', 0.02), ('azimuths', 0.02), ('koh', 0.02), ('invariant', 0.02), ('ranzato', 0.02), ('representations', 0.019), ('binocular', 0.019), ('pools', 0.019), ('rgb', 0.019), ('tying', 0.019), ('courville', 0.019), ('erhan', 0.019), ('rotations', 0.019), ('patch', 0.019), ('tie', 0.018), ('visualizing', 0.018), ('boltzmann', 0.018), ('purely', 0.017), ('localized', 0.017), ('size', 0.017), ('pixels', 0.017), ('spectrum', 0.017), ('coding', 0.017), ('test', 0.017), ('kavukcuoglu', 0.016), ('whitened', 0.016), ('avoided', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 271 nips-2010-Tiled convolutional neural networks

Author: Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Abstract: Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1

2 0.17771298 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels

Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun

Abstract: Deep networks can potentially express a learning problem more efficiently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1

3 0.14179838 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun

Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1

4 0.11484646 206 nips-2010-Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Author: George Dahl, Marc'aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Abstract: Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonalcovariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), first introduced for modeling natural images, is a much more representationally efficient and powerful way of modeling the covariance structure of speech data. Every configuration of the precision units of the mcRBM specifies a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5%, which is superior to all published results on speaker-independent TIMIT to date. 1

5 0.08888337 59 nips-2010-Deep Coding Network

Author: Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu

Abstract: This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding. The two-layer approach can be easily generalized to deeper structures in a hierarchical multiple-layer manner. Empirically, it is shown that the deep coding approach yields improved performance in benchmark datasets.

6 0.08406055 103 nips-2010-Generating more realistic images using gated MRF's

7 0.079395019 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

8 0.078420728 99 nips-2010-Gated Softmax Classification

9 0.064574555 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

10 0.054200105 128 nips-2010-Infinite Relational Modeling of Functional Connectivity in Resting State fMRI

11 0.053554326 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

12 0.049009785 133 nips-2010-Kernel Descriptors for Visual Recognition

13 0.047017235 101 nips-2010-Gaussian sampling by local perturbations

14 0.046544883 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters

15 0.046166461 111 nips-2010-Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model

16 0.045937464 156 nips-2010-Learning to combine foveal glimpses with a third-order Boltzmann machine

17 0.045188177 224 nips-2010-Regularized estimation of image statistics by Score Matching

18 0.041025124 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

19 0.040584657 17 nips-2010-A biologically plausible network for the computation of orientation dominance

20 0.039384197 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.111), (1, 0.058), (2, -0.122), (3, -0.072), (4, 0.028), (5, -0.01), (6, 0.0), (7, 0.043), (8, -0.065), (9, 0.025), (10, 0.027), (11, -0.045), (12, 0.054), (13, -0.156), (14, -0.107), (15, -0.06), (16, -0.042), (17, -0.051), (18, -0.101), (19, -0.127), (20, 0.05), (21, 0.065), (22, -0.062), (23, 0.041), (24, 0.021), (25, 0.111), (26, 0.029), (27, 0.03), (28, 0.056), (29, 0.057), (30, -0.059), (31, -0.107), (32, -0.004), (33, 0.103), (34, 0.077), (35, 0.026), (36, 0.032), (37, 0.026), (38, 0.008), (39, 0.02), (40, 0.01), (41, -0.015), (42, -0.033), (43, -0.007), (44, 0.023), (45, -0.038), (46, 0.025), (47, -0.007), (48, -0.022), (49, -0.075)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92576438 271 nips-2010-Tiled convolutional neural networks

Author: Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Abstract: Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1

2 0.82755792 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels

Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun

Abstract: Deep networks can potentially express a learning problem more efficiently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1

3 0.74577129 206 nips-2010-Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Author: George Dahl, Marc'aurelio Ranzato, Abdel-rahman Mohamed, Geoffrey E. Hinton

Abstract: Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonalcovariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), first introduced for modeling natural images, is a much more representationally efficient and powerful way of modeling the covariance structure of speech data. Every configuration of the precision units of the mcRBM specifies a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5%, which is superior to all published results on speaker-independent TIMIT to date. 1

4 0.69564748 111 nips-2010-Hallucinations in Charles Bonnet Syndrome Induced by Homeostasis: a Deep Boltzmann Machine Model

Author: Peggy Series, David P. Reichert, Amos J. Storkey

Abstract: The Charles Bonnet Syndrome (CBS) is characterized by complex vivid visual hallucinations in people with, primarily, eye diseases and no other neurological pathology. We present a Deep Boltzmann Machine model of CBS, exploring two core hypotheses: First, that the visual cortex learns a generative or predictive model of sensory input, thus explaining its capability to generate internal imagery. And second, that homeostatic mechanisms stabilize neuronal activity levels, leading to hallucinations being formed when input is lacking. We reproduce a variety of qualitative findings in CBS. We also introduce a modification to the DBM that allows us to model a possible role of acetylcholine in CBS as mediating the balance of feed-forward and feed-back processing. Our model might provide new insights into CBS and also demonstrates that generative frameworks are promising as hypothetical models of cortical learning and perception. 1

5 0.65156341 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

Author: Koray Kavukcuoglu, Pierre Sermanet, Y-lan Boureau, Karol Gregor, Michael Mathieu, Yann L. Cun

Abstract: We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasisparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multistage convolutional network architecture improves performance on a number of visual recognition and detection tasks. 1

6 0.60763472 99 nips-2010-Gated Softmax Classification

7 0.56349558 59 nips-2010-Deep Coding Network

8 0.54291987 156 nips-2010-Learning to combine foveal glimpses with a third-order Boltzmann machine

9 0.52311844 207 nips-2010-Phoneme Recognition with Large Hierarchical Reservoirs

10 0.47683284 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression

11 0.44892406 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

12 0.40556392 103 nips-2010-Generating more realistic images using gated MRF's

13 0.3932831 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

14 0.37908107 188 nips-2010-On Herding and the Perceptron Cycling Theorem

15 0.35405609 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

16 0.34711111 17 nips-2010-A biologically plausible network for the computation of orientation dominance

17 0.33566248 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

18 0.3286784 76 nips-2010-Energy Disaggregation via Discriminative Sparse Coding

19 0.32634583 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters

20 0.31868184 94 nips-2010-Feature Set Embedding for Incomplete Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.034), (17, 0.058), (27, 0.089), (30, 0.022), (35, 0.041), (45, 0.156), (46, 0.283), (50, 0.035), (52, 0.077), (59, 0.013), (60, 0.029), (77, 0.035), (78, 0.013), (90, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71593255 271 nips-2010-Tiled convolutional neural networks

Author: Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Pang W. Koh, Quoc V. Le, Andrew Y. Ng

Abstract: Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1

2 0.60856384 118 nips-2010-Implicit Differentiation by Perturbation

Author: Justin Domke

Abstract: This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification. 1

3 0.57229191 96 nips-2010-Fractionally Predictive Spiking Neurons

Author: Jaldert Rombouts, Sander M. Bohte

Abstract: Recent experimental work has suggested that the neural firing rate can be interpreted as a fractional derivative, at least when signal variation induces neural adaptation. Here, we show that the actual neural spike-train itself can be considered as the fractional derivative, provided that the neural signal is approximated by a sum of power-law kernels. A simple standard thresholding spiking neuron suffices to carry out such an approximation, given a suitable refractory response. Empirically, we find that the online approximation of signals with a sum of powerlaw kernels is beneficial for encoding signals with slowly varying components, like long-memory self-similar signals. For such signals, the online power-law kernel approximation typically required less than half the number of spikes for similar SNR as compared to sums of similar but exponentially decaying kernels. As power-law kernels can be accurately approximated using sums or cascades of weighted exponentials, we demonstrate that the corresponding decoding of spiketrains by a receiving neuron allows for natural and transparent temporal signal filtering by tuning the weights of the decoding kernel. 1

4 0.57205784 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models

Author: Ryan Kelly, Matthew Smith, Robert Kass, Tai S. Lee

Abstract: Activity of a neuron, even in the early sensory areas, is not simply a function of its local receptive field or tuning properties, but depends on global context of the stimulus, as well as the neural context. This suggests the activity of the surrounding neurons and global brain states can exert considerable influence on the activity of a neuron. In this paper we implemented an L1 regularized point process model to assess the contribution of multiple factors to the firing rate of many individual units recorded simultaneously from V1 with a 96-electrode “Utah” array. We found that the spikes of surrounding neurons indeed provide strong predictions of a neuron’s response, in addition to the neuron’s receptive field transfer function. We also found that the same spikes could be accounted for with the local field potentials, a surrogate measure of global network states. This work shows that accounting for network fluctuations can improve estimates of single trial firing rate and stimulus-response transfer functions. 1

5 0.56986809 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

Author: Pierre Garrigues, Bruno A. Olshausen

Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1

6 0.56483638 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication

7 0.56422621 91 nips-2010-Fast detection of multiple change-points shared by many signals using group LARS

8 0.56245762 161 nips-2010-Linear readout from a neural population with partial correlation data

9 0.5618552 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

10 0.56164849 17 nips-2010-A biologically plausible network for the computation of orientation dominance

11 0.55988663 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

12 0.55983347 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts

13 0.55949402 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels

14 0.55685949 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform

15 0.55540782 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents

16 0.55302984 268 nips-2010-The Neural Costs of Optimal Control

17 0.55263102 98 nips-2010-Functional form of motion priors in human motion perception

18 0.55254489 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes

19 0.55074418 117 nips-2010-Identifying graph-structured activation patterns in networks

20 0.5500952 7 nips-2010-A Family of Penalty Functions for Structured Sparsity