nips nips2009 nips2009-151 knowledge-graph by maker-knowledge-mining

151 nips-2009-Measuring Invariances in Deep Networks


Source: pdf

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. [sent-8, score-0.471]

2 We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. [sent-11, score-0.588]

3 We find that convolutional deep belief networks learn substantially more invariant features in each layer. [sent-12, score-0.742]

4 The concept of invariance implies a selectivity for complex, high level features of the input and yet a robustness to irrelevant input transformations. [sent-17, score-0.726]

5 In the case of object recognition, an invariant feature should respond only to one stimulus despite changes in translation, rotation, complex illumination, scale, perspective, and other properties. [sent-19, score-0.358]

6 In this paper, we propose to use a suite of “invariance tests” that directly measure the invariance properties of features; this gives us a measure of the quality of features learned in an unsupervised manner by a deep learning algorithm. [sent-20, score-1.118]

7 Our observations lend credence to the common view of invariances to minor shifts, rotations and deformations being learned in the lower layers, and being combined in the higher layers to form progressively more invariant features. [sent-25, score-0.477]

8 In computer vision, one can view object recognition performance as a measure of the invariance of the underlying features. [sent-26, score-0.693]

9 While such an end-to-end system performance measure has many benefits, it can also be expensive to compute and does not give much insight into how to directly improve representations in each layer of deep architectures. [sent-27, score-0.575]

10 The test suite presented in this paper provides an alternative that can identify the robustness of deep architectures to specific types of variations. [sent-29, score-0.578]

11 For example, using videos of natural scenes, our invariance tests measure the degree to which the learned representations are invariant to 2-D (in-plane) rotations, 3-D (out-of-plane) rotations, and translations. [sent-30, score-1.089]

12 Our proposed invariance measure is broadly applicable to evaluating many deep learning algorithms for many tasks, but the present paper will focus on two different algorithms applied to computer vision. [sent-33, score-0.958]

13 First, we examine the invariances of stacked autoencoder networks [2]. [sent-34, score-0.42]

14 We find that when trained under these conditions, stacked autoencoders learn increasingly invariant features with depth, but the effect of depth is small compared to other factors such as regularization. [sent-38, score-0.55]

15 Next, we show that convolutional deep belief networks (CDBNs) [5], which are hand-designed to be invariant to certain local image translations, do enjoy dramatically increasing invariance with depth. [sent-39, score-1.333]

16 This suggests that there is a benefit to using deep architectures, but that mechanisms besides simple stacking of autoencoders are important for gaining increasing invariance. [sent-40, score-0.514]

17 By combining the output of lower layers in higher layers, deep networks can represent progressively more complex features of the input. [sent-43, score-0.582]

18 introduced the deep belief network, in which each layer consists of a restricted Boltzmann machine [4]. [sent-45, score-0.558]

19 built a deep network using an autoencoder neural network in each layer [2, 3, 6]. [sent-47, score-0.809]

20 show that deep networks are able to learn good features for classification tasks even when trained on data that does not include examples of the classes to be recognized [5, 9]. [sent-54, score-0.495]

21 Some work in deep architectures draws inspiration from the biology of sensory systems. [sent-55, score-0.436]

22 , for example, compared the response properties of the second layer of a sparse deep belief network to V2, the second stage of the visual hierarchy [11]. [sent-58, score-0.712]

23 One important property of the visual system is a progressive increase in the invariance of neural responses in higher layers. [sent-59, score-0.619]

24 Computational models such as the neocognitron [13], HMAX model [14], and Convolutional Network [15] achieve invariance by alternating layers of feature detectors with local pooling and subsampling of the feature maps. [sent-65, score-0.791]

25 This approach has been used to endow deep networks with some degree of translation invariance [8, 5]. [sent-66, score-1.112]

26 Additionally, while deep architectures provide a task-independent method of learning features, convolutional and max-pooling techniques are somewhat specialized to visual and audio processing. [sent-68, score-0.556]

27 2 3 Network architecture and optimization We train all of our networks on natural images collected separately (and in geographically different areas) from the videos used in the invariance tests. [sent-69, score-0.922]

28 Specifically, the training set comprises a set of still images taken in outdoor environments free from artificial objects, and was not designed to relate in any way to the invariance tests. [sent-70, score-0.644]

29 1 Stacked autoencoder The majority of our tests focus on the stacked autoencoder of Bengio et al. [sent-72, score-0.425]

30 [2], which is a deep network consisting of an autoencoding neural network in each layer. [sent-73, score-0.528]

31 , 2x2) neighboring convolution units, giving each max-pooling unit an overall receptive field size of 11x11 pixels in the first layer and 31x31 pixels in the second layer. [sent-87, score-0.36]

32 Because the convolution units share weights and because their outputs are combined in the maxpooling units, the CDBN is explicitly designed to be invariant to small amounts of image translation. [sent-90, score-0.419]

33 We interpret the hidden units as feature detectors that should respond strongly when the feature they represent is present in the input, and otherwise respond weakly when it is absent. [sent-92, score-0.567]

34 For example, a face selective neuron might respond strongly whenever a face is present in the image; if it is invariant, it might continue to respond strongly even as the image rotates. [sent-94, score-0.386]

35 In particular we choose a separate threshold for each hidden unit such that all units fire at the same rate when presented with random stimuli. [sent-97, score-0.412]

36 More formally, a hidden unit i is said to fire when si hi (x) > ti , where ti is a threshold chosen by our test for that hidden unit and si ∈ {−1, 1} gives the sign of that hidden unit’s values. [sent-99, score-0.904]

37 The sign term si is necessary because, in general, hidden units are as likely to use low values as to use high values to indicate the presence of the feature that they detect. [sent-100, score-0.359]

38 We therefore choose si to maximize the invariance score. [sent-101, score-0.63]

39 ) In order for a function τ to be useful with our invariance measure, |γ| should relate to the semantic dissimilarity between x and τ (x, γ). [sent-108, score-0.585]

40 Using these definitions, we can measure the robustness of a hidden unit as follows. [sent-112, score-0.34]

41 The local firing rate is the firing rate of a hidden unit when it is applied to local trajectories surrounding inputs z ∈ Z that maximally activate the hidden unit, 1 1 L(i) = fi (x), |Z| |T (z)| z∈Z x∈T (z) i. [sent-114, score-0.44]

42 , L(i) is the proportion of transformed inputs that the neuron fires in response to, and hence is a measure of the robustness of the neuron’s response to the transformation τ . [sent-116, score-0.367]

43 Our invariance score for a hidden unit hi is given by S(i) = L(i) . [sent-117, score-0.953]

44 G(i) The numerator is a measure of the hidden unit’s robustness to transformation τ near the unit’s optimal inputs, and the denominator ensures that the neuron is selective and not simply always active. [sent-118, score-0.404]

45 In our tests, we tried to select the threshold ti for each hidden unit so that it fires one percent of the time in response to random inputs, that is, G(i) = 0. [sent-119, score-0.342]

46 For hidden units that frequently repeat the same activation value (up to machine precision), it is sometimes not possible to choose ti such that G(i) = 0. [sent-121, score-0.361]

47 S(i) gives the invariance score for a single hidden unit. [sent-126, score-0.812]

48 The invariance score Invp (N ) of a network N is given by the mean of S(i) over the top-scoring proportion p of hidden units in the deepest layer of N . [sent-127, score-1.243]

49 We discard the (1 − p) worst hidden units because different subpopulations of units may be invariant to different transformations. [sent-128, score-0.573]

50 Reporting the mean of all unit scores would strongly penalize networks that discover several hidden units that are invariant to transformation τ but do not devote more than proportion p of their hidden units to such a task. [sent-129, score-1.051]

51 Finally, note that while we use this metric to measure invariances in the visual features learned by deep networks, it could be applied to virtually any kind of feature in virtually any application domain. [sent-130, score-0.631]

52 5 Grating test Our first invariance test is based on the response of neurons to synthetic images. [sent-131, score-0.758]

53 To implement our invariance measure, we define P (x) as a distribution over grating images. [sent-135, score-0.82]

54 We measure invariance to translation by defining τ (x, γ) to change φ by γ. [sent-136, score-0.746]

55 We measure invariance to rotation by defining τ (x, γ) to change ω by γ. [sent-137, score-0.773]

56 Our second suite of invariance tests uses natural video data. [sent-139, score-0.81]

57 Using this method, we will measure the degree to which various learned features are invariant to a wide range of more complex image parameters. [sent-140, score-0.359]

58 This will allow us to perform quantitative comparisons of representations at each layer of a deep network. [sent-141, score-0.53]

59 We also verify that the results using this technique align closely with those obtained with the grating-based invariance tests. [sent-142, score-0.585]

60 2 Invariance calculation To implement our invariance measure using natural images, we define P (x) as a uniform distribution over image patches contained in the test videos, and τ (x, γ) to be the image patch at the same image location as x but occurring γ video frames later in time. [sent-152, score-0.966]

61 To measure invariance to different types of transformation, we simply use videos that involve each type of transformation. [sent-157, score-0.787]

62 1 Relationship between grating test and natural video test Sinusoidal gratings are already used as a common reference stimulus. [sent-162, score-0.505]

63 To validate our approach of using natural videos, we show that videos involving translation give similar test results to the phase variation grating test. [sent-163, score-0.596]

64 1 plots the invariance score for each of 378 one layer autoencoders regularized with a range of sparsity and weight decay parameters (shown in Fig. [sent-165, score-1.151]

65 We were not able to find as close of a correspondence between the grating orientation test and natural videos involving 2-D (in-plane) rotation. [sent-167, score-0.513]

66 To verify that the problem is not that rotation when viewed far from the image center resembles translation, we compare the invariance test scores for translation and for rotation in Fig. [sent-169, score-1.142]

67 For the translation test, local trajectories T (x) are generated by modifying φ from the optimal value φopt to φ = φopt ± {0, · · · , π} in steps of π/20, where φopt is the optimal grating phase shift. [sent-173, score-0.351]

68 For the rotation test, local trajectories T (x) are generated by modifying θ from the optimal value θopt to θ = θopt ± {0, · · · , π} in steps of π/40, where θopt is the optimal grating orientation. [sent-174, score-0.378]

69 Figure 2: We verify that our translation and 2-D rotation videos do indeed capture different transformations. [sent-176, score-0.416]

70 5 −3 log10 Weight Decay −4 −4 log 10 Target Mean Activation Figure 3: Our invariance measure selects networks that learn edge detectors resembling Gabor functions as the maximally invariant single-layer networks. [sent-181, score-0.89]

71 2 Pronounced effect of sparsity and weight decay We trained several single-layer autoencoders using sparsity regularization with various target mean activations and amounts of weight decay. [sent-187, score-0.462]

72 For these experiments, we averaged the invariance scores of all the hidden units to form the network score, i. [sent-188, score-1.003]

73 We found that sparsity and weight decay have a large effect on the invariance of a single-layer network. [sent-192, score-0.736]

74 In particular, there is a semicircular ridge trading sparsity and weight decay where invariance scores are high. [sent-193, score-0.786]

75 3 Modest improvements with depth To investigate the effect of depth on invariance, we chose to extensively cross-validate several depths of autoencoders using only weight decay. [sent-202, score-0.382]

76 The majority of successful image classification results in 6 Figure 4: Left to right: weight visualizations from layer 1, layer 2, and layer 3 of the autoencoders; layer 1 and layer 2 of the CDBN. [sent-203, score-0.991]

77 We trained a total of 73 networks with weight decay at each layer set to a value from {10, 1, 10−1 , 10−2 , 10−3 , 10−5 , 0}. [sent-209, score-0.398]

78 For these experiments, we averaged the invariance scores of the top 20% of the hidden units to form the network score, i. [sent-210, score-1.003]

79 2, and chose si for each hidden unit to maximize the invariance score, since there was no sparsity regularization to impose a sign on the hidden unit values. [sent-213, score-1.158]

80 After performing this grid search, we trained 100 additional copies of the network with the best mean invariance score at each depth, holding the weight decay parameters constant and varying only the random weights used to initialize training. [sent-214, score-0.893]

81 However, the magnitude of the increase in invariance is limited compared to the increase that can be gained with the correct sparsity and weight decay. [sent-217, score-0.684]

82 2 Convolutional Deep Belief Networks We also ran our invariance tests on a two layer CDBN. [sent-219, score-0.829]

83 5 autoencoder for the lower layers, but larger than 33 17 8 18 those in the autoencoder for the higher layers 32. [sent-235, score-0.36]

84 This is the most complex transformation included in our tests, and it is where depth provides the greatest Figure 5: To verify that the improvement in invariance score of the best network at each layer is an percentagewise increase. [sent-246, score-1.111]

85 lines so that the worst invariance score appears at At the level of a single hidden unit, our firing the same height in each plot. [sent-252, score-0.812]

86 rate invariance measure requires learned features to balance high local firing rates with low global firing rates. [sent-253, score-0.708]

87 advanced, another appropriate measure of invariance may be a hidden unit’s invariance to object identity. [sent-268, score-1.385]

88 As an initial step in this direction, we attempted to score hidden units by their mutual information with categories in the Caltech 101 dataset [22]. [sent-269, score-0.372]

89 At the network level, our measure requires networks to have at least some subpopulation of hidden units that are invariant to each type of transformation. [sent-272, score-0.639]

90 This is accomplished by using only the top-scoring proportion p of hidden units when calculating the network score. [sent-273, score-0.394]

91 We also illustrated extensive findings made by applying the invariance test on computer vision tasks. [sent-277, score-0.635]

92 However, the definition of our metric is sufficiently general that it could easily be used to test, for example, invariance of auditory features to rate of speech, or invariance of textual features to author identity. [sent-278, score-1.268]

93 A surprising finding in our experiments with visual data is that stacked autoencoders yield only modest improvements in invariance as depth increases. [sent-279, score-0.973]

94 This suggests that while depth is valuable, mere stacking of shallow architectures may not be sufficient to exploit the full potential of deep architectures to learn invariant features. [sent-280, score-0.812]

95 We also document that explicit approaches to achieving invariance such as max-pooling and weightsharing in CDBNs are currently successful strategies for achieving invariance. [sent-284, score-0.585]

96 This is not suprising given the fact that invariance is hard-wired into the network, but it validates the fact that our metric faithfully measures invariances. [sent-285, score-0.585]

97 An empirical evaluation of deep architectures on problems with many factors of variation. [sent-315, score-0.436]

98 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-331, score-0.464]

99 Sparse deep belief network model for visual area v2. [sent-371, score-0.498]

100 Learning methods for generic object recognition with invariance to pose and lighting. [sent-436, score-0.648]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('invariance', 0.585), ('deep', 0.328), ('grating', 0.235), ('layer', 0.177), ('videos', 0.157), ('autoencoders', 0.151), ('units', 0.145), ('invariant', 0.143), ('rotation', 0.143), ('hidden', 0.14), ('autoencoder', 0.138), ('invariances', 0.117), ('translation', 0.116), ('architectures', 0.108), ('unit', 0.1), ('depth', 0.09), ('score', 0.087), ('convolutional', 0.086), ('layers', 0.084), ('network', 0.083), ('networks', 0.083), ('video', 0.083), ('stacked', 0.082), ('respond', 0.081), ('neuron', 0.08), ('rotations', 0.078), ('ring', 0.072), ('opt', 0.07), ('tests', 0.067), ('cdbn', 0.06), ('images', 0.059), ('robustness', 0.055), ('image', 0.055), ('belief', 0.053), ('decay', 0.052), ('halle', 0.052), ('weight', 0.051), ('transformation', 0.051), ('convolution', 0.05), ('test', 0.05), ('scores', 0.05), ('features', 0.049), ('gratings', 0.049), ('transformations', 0.049), ('sparsity', 0.048), ('si', 0.045), ('measure', 0.045), ('hi', 0.041), ('lee', 0.04), ('ti', 0.038), ('natural', 0.038), ('bengio', 0.038), ('activation', 0.038), ('complex', 0.038), ('response', 0.037), ('selectivity', 0.037), ('stimulus', 0.037), ('suite', 0.037), ('tanh', 0.037), ('inputs', 0.036), ('neurons', 0.036), ('trained', 0.035), ('larochelle', 0.035), ('stacking', 0.035), ('res', 0.035), ('autoencoding', 0.034), ('saxe', 0.034), ('shallower', 0.034), ('visual', 0.034), ('ranzato', 0.034), ('detectors', 0.034), ('receptive', 0.033), ('selective', 0.033), ('orientation', 0.033), ('recognition', 0.033), ('translations', 0.032), ('illumination', 0.032), ('modest', 0.031), ('cdbns', 0.03), ('neocognitron', 0.03), ('quiroga', 0.03), ('object', 0.03), ('learned', 0.029), ('feature', 0.029), ('lecun', 0.029), ('strongly', 0.028), ('sinusoidal', 0.028), ('whitened', 0.028), ('honglak', 0.028), ('threshold', 0.027), ('andrew', 0.026), ('deformations', 0.026), ('berkes', 0.026), ('proportion', 0.026), ('amounts', 0.026), ('systematically', 0.025), ('representations', 0.025), ('boureau', 0.024), ('caltech', 0.024), ('activate', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

2 0.38848785 176 nips-2009-On Invariance in Hierarchical Models

Author: Jake Bouvrie, Lorenzo Rosasco, Tomaso Poggio

Abstract: A goal of central importance in the study of hierarchical models for object recognition – and indeed the mammalian visual cortex – is that of understanding quantitatively the trade-off between invariance and selectivity, and how invariance and discrimination properties contribute towards providing an improved representation useful for learning from data. In this work we provide a general group-theoretic framework for characterizing and understanding invariance in a family of hierarchical models. We show that by taking an algebraic perspective, one can provide a concise set of conditions which must be met to establish invariance, as well as a constructive prescription for meeting those conditions. Analyses in specific cases of particular relevance to computer vision and text processing are given, yielding insight into how and when invariance can be achieved. We find that the minimal intrinsic properties of a hierarchical model needed to support a particular invariance can be clearly described, thereby encouraging efficient computational implementations. 1

3 0.24295647 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

4 0.2180918 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1

5 0.20832877 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

Author: Yoshua Bengio, James S. Bergstra

Abstract: We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1. A single-hiddenlayer neural network of this kind of model achieves 1.50% error on MNIST. We also introduce an existing criterion for learning slow, decorrelated features as a pretraining strategy for image models. This pretraining strategy results in orientation-selective features, similar to the receptive fields of complex cells. With this pretraining, the same single-hidden-layer model achieves 1.34% error, even though the pretraining sample distribution is very different from the fine-tuning distribution. To implement this pretraining strategy, we derive a fast algorithm for online learning of decorrelated features such that each iteration of the algorithm runs in linear time with respect to the number of features. 1

6 0.19288452 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

7 0.13475204 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

8 0.095347323 137 nips-2009-Learning transport operators for image manifolds

9 0.091532014 164 nips-2009-No evidence for active sparsification in the visual cortex

10 0.088231608 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

11 0.087432429 175 nips-2009-Occlusive Components Analysis

12 0.080663197 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters

13 0.07929758 177 nips-2009-On Learning Rotations

14 0.076785363 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

15 0.071640983 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

16 0.069346294 246 nips-2009-Time-Varying Dynamic Bayesian Networks

17 0.066594206 236 nips-2009-Structured output regression for detection with partial truncation

18 0.066212021 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning

19 0.065848812 13 nips-2009-A Neural Implementation of the Kalman Filter

20 0.06576319 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.212), (1, -0.167), (2, -0.0), (3, 0.074), (4, -0.042), (5, 0.114), (6, -0.015), (7, 0.222), (8, 0.023), (9, -0.069), (10, 0.076), (11, 0.003), (12, -0.433), (13, 0.15), (14, 0.241), (15, 0.062), (16, 0.129), (17, -0.168), (18, -0.076), (19, 0.039), (20, -0.152), (21, 0.135), (22, -0.011), (23, -0.024), (24, 0.004), (25, -0.013), (26, -0.044), (27, -0.144), (28, 0.017), (29, -0.002), (30, 0.006), (31, 0.045), (32, 0.009), (33, 0.002), (34, -0.017), (35, 0.042), (36, 0.015), (37, -0.083), (38, -0.009), (39, -0.059), (40, 0.148), (41, -0.036), (42, -0.01), (43, 0.125), (44, 0.094), (45, 0.012), (46, -0.112), (47, -0.05), (48, 0.006), (49, -0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96613413 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

2 0.84411496 176 nips-2009-On Invariance in Hierarchical Models

Author: Jake Bouvrie, Lorenzo Rosasco, Tomaso Poggio

Abstract: A goal of central importance in the study of hierarchical models for object recognition – and indeed the mammalian visual cortex – is that of understanding quantitatively the trade-off between invariance and selectivity, and how invariance and discrimination properties contribute towards providing an improved representation useful for learning from data. In this work we provide a general group-theoretic framework for characterizing and understanding invariance in a family of hierarchical models. We show that by taking an algebraic perspective, one can provide a concise set of conditions which must be met to establish invariance, as well as a constructive prescription for meeting those conditions. Analyses in specific cases of particular relevance to computer vision and text processing are given, yielding insight into how and when invariance can be achieved. We find that the minimal intrinsic properties of a hierarchical model needed to support a particular invariance can be clearly described, thereby encouraging efficient computational implementations. 1

3 0.60504812 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

Author: Yoshua Bengio, James S. Bergstra

Abstract: We introduce a new type of neural network activation function based on recent physiological rate models for complex cells in visual area V1. A single-hiddenlayer neural network of this kind of model achieves 1.50% error on MNIST. We also introduce an existing criterion for learning slow, decorrelated features as a pretraining strategy for image models. This pretraining strategy results in orientation-selective features, similar to the receptive fields of complex cells. With this pretraining, the same single-hidden-layer model achieves 1.34% error, even though the pretraining sample distribution is very different from the fine-tuning distribution. To implement this pretraining strategy, we derive a fast algorithm for online learning of decorrelated features such that each iteration of the algorithm runs in linear time with respect to the number of features. 1

4 0.59535956 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

5 0.53544712 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

6 0.50850427 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

7 0.47613388 119 nips-2009-Kernel Methods for Deep Learning

8 0.35837242 137 nips-2009-Learning transport operators for image manifolds

9 0.3362312 56 nips-2009-Conditional Neural Fields

10 0.31051484 164 nips-2009-No evidence for active sparsification in the visual cortex

11 0.30035567 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

12 0.29708567 203 nips-2009-Replacing supervised classification learning by Slow Feature Analysis in spiking neural networks

13 0.28550813 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

14 0.28543824 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

15 0.27345014 175 nips-2009-Occlusive Components Analysis

16 0.25969499 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

17 0.25165787 177 nips-2009-On Learning Rotations

18 0.24385113 13 nips-2009-A Neural Implementation of the Kalman Filter

19 0.24189822 70 nips-2009-Discriminative Network Models of Schizophrenia

20 0.23900838 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.014), (25, 0.071), (35, 0.086), (36, 0.079), (39, 0.045), (42, 0.182), (58, 0.073), (62, 0.013), (71, 0.054), (81, 0.029), (86, 0.239), (91, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91643387 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

2 0.83486146 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1

3 0.82770687 190 nips-2009-Polynomial Semantic Indexing

Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1

4 0.81607378 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

5 0.81435061 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming

Author: Xiao-ming Wu, Anthony M. So, Zhenguo Li, Shuo-yen R. Li

Abstract: Kernel learning is a powerful framework for nonlinear data modeling. Using the kernel trick, a number of problems have been formulated as semidefinite programs (SDPs). These include Maximum Variance Unfolding (MVU) (Weinberger et al., 2004) in nonlinear dimensionality reduction, and Pairwise Constraint Propagation (PCP) (Li et al., 2008) in constrained clustering. Although in theory SDPs can be efficiently solved, the high computational complexity incurred in numerically processing the huge linear matrix inequality constraints has rendered the SDP approach unscalable. In this paper, we show that a large class of kernel learning problems can be reformulated as semidefinite-quadratic-linear programs (SQLPs), which only contain a simple positive semidefinite constraint, a second-order cone constraint and a number of linear constraints. These constraints are much easier to process numerically, and the gain in speedup over previous approaches is at least of the order m2.5 , where m is the matrix dimension. Experimental results are also presented to show the superb computational efficiency of our approach.

6 0.81130552 176 nips-2009-On Invariance in Hierarchical Models

7 0.80592698 120 nips-2009-Kernels and learning curves for Gaussian process regression on random graphs

8 0.80385482 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

9 0.7815522 119 nips-2009-Kernel Methods for Deep Learning

10 0.7717135 137 nips-2009-Learning transport operators for image manifolds

11 0.76006478 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

12 0.76005697 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

13 0.75285226 104 nips-2009-Group Sparse Coding

14 0.74788857 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

15 0.74518192 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

16 0.74463707 196 nips-2009-Quantification and the language of thought

17 0.74015683 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

18 0.73611718 13 nips-2009-A Neural Implementation of the Kalman Filter

19 0.73480463 2 nips-2009-3D Object Recognition with Deep Belief Nets

20 0.73431379 96 nips-2009-Filtering Abstract Senses From Image Search Results