nips nips2009 nips2009-2 knowledge-graph by maker-knowledge-mining

2 nips-2009-3D Object Recognition with Deep Belief Nets


Source: pdf

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. [sent-5, score-0.501]

2 DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. [sent-12, score-0.388]

3 With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5. [sent-13, score-0.304]

4 1 Introduction Recent work on deep belief nets (DBNs) [10], [13] has shown that it is possible to learn multiple layers of non-linear features that are useful for object classification without requiring labeled data. [sent-15, score-0.628]

5 The features are trained one layer at a time as a restricted Boltzmann machine (RBM) using contrastive divergence (CD) [4], or as some form of autoencoder [20], [16], and the feature activations learned by one module become the data for training the next module. [sent-16, score-0.55]

6 After a pre-training phase that learns layers of features which are good at modeling the statistical structure in a set of unlabeled images, supervised backpropagation can be used to fine-tune the features for classification [7]. [sent-17, score-0.287]

7 Alternatively, classification can be performed by learning a top layer of features that models the joint density of the class labels and the highest layer of unsupervised features [6]. [sent-18, score-0.444]

8 These unsupervised features (plus the class labels) then become the penultimate layer of the deep belief net [6]. [sent-19, score-0.674]

9 Early work on deep belief nets was evaluated using the MNIST dataset of handwritten digits [6] which has the advantage that a few million parameters are adequate for modeling most of the structure in the domain. [sent-20, score-0.384]

10 For 3D object classification, however, many more parameters are probably required to allow a deep belief net with no prior knowledge of spatial structure to capture all of the variations caused by lighting and viewpoint. [sent-21, score-0.458]

11 It is not yet clear how well deep belief nets perform at 3D object classification when compared with shallow techniques such as SVM’s [19], [3] or deep discriminative techniques like convolutional neural networks [11]. [sent-22, score-0.993]

12 In this paper, we describe a better type of top-level model for deep belief nets that is trained using a combination of generative and discriminative gradients [5], [8], [9]. [sent-23, score-0.748]

13 (a) Every clique in the model contains a visible unit, hidden unit, and label unit. [sent-25, score-0.407]

14 There is one clique for every possible triplet of units created by selecting one of each type. [sent-28, score-0.275]

15 The “restricted” architecture precludes cliques with multiple units of the same type. [sent-29, score-0.313]

16 Our model significantly outperforms SVM’s, and it also outperforms convolutional neural nets when given additional unlabeled data produced by small translations of the training images. [sent-34, score-0.486]

17 We use restricted Boltzmann machines trained with one-step contrastive divergence as our basic module for learning layers of features. [sent-35, score-0.36]

18 2 A Third-Order RBM as the Top-Level Model Until now, the only top-level model that has been considered for a DBN is an RBM with two types of observed units (one for the label, another for the penultimate feature vector). [sent-37, score-0.372]

19 We now consider an alternative model for the top-level joint distribution in which the class label multiplicatively interacts with both the penultimate layer units and the hidden units to determine the energy of a full configuration. [sent-38, score-1.126]

20 It is a Boltzmann machine with three-way cliques [17], each containing a penultimate layer unit vi , a hidden unit hj , and a label unit lk . [sent-39, score-1.005]

21 The model can be defined in terms of its energy function E(v, h, l) = − Wijk vi hj lk , (1) i,j,k where Wijk is a learnable scalar parameter. [sent-43, score-0.47]

22 P (v, h, l) = 2 The main difference between the new top-level model and the earlier one is that now the class label multiplicatively modulates how the visible and hidden units contribute to the energy of a full configuration. [sent-47, score-0.779]

23 If the label’s k th unit is 1 (and the rest are 0), then the k th slice of the tensor determines the energy function. [sent-48, score-0.424]

24 more than one label has non-zero probability), a weighted blend of the tensor’s slices specifies the energy function. [sent-51, score-0.202]

25 The earlier top-level (RBM) model limits the label’s effect to changing the biases into the hidden units, which modifies only how the hidden units contribute to the energy of a full configuration. [sent-52, score-0.668]

26 There is no direct interaction between the label and the visible units. [sent-53, score-0.203]

27 Unlike an RBM, the model structure is not bipartite, but it is still “restricted” in the sense that there are no direct connections between two units of the same type. [sent-56, score-0.273]

28 Once l is observed, the model reduces to an RBM whose parameters are the k th slice of the 3D parameter tensor. [sent-61, score-0.171]

29 For a restricted third-order model with Nv visible units, Nh hidden units and Nl class labels, the distribution P (l|v) can be exactly computed in O(Nv Nh Nl ) time. [sent-63, score-0.54]

30 This result follows from two observations: 1) setting lk = 1 reduces the model to an RBM defined by the k th slice of the tensor, and 2) the negative log probability of v, up to an additive constant, under this RBM is the free energy: Nh Nv Wijk vi )). [sent-64, score-0.466]

31 Simply switch the role of v and h in equation 3 to compute the free energy of h under the k th RBM. [sent-68, score-0.172]

32 Then the model reduces to its k th -slice RBM from which v ∼ P (v|h, l˜ = 1) can be ˜ lk k easily sampled. [sent-73, score-0.332]

33 When trained as the top-level model of a DBN, the visible vector v is a penultimate layer feature vector. [sent-80, score-0.454]

34 We can also train the model directly on images as a shallow model, in which case v is an image (in row vector form). [sent-81, score-0.227]

35 In both cases the label l represents the Nl object categories using 1-of-Nl encoding. [sent-82, score-0.194]

36 Contrastive divergence uses the parameter updates given by three half-steps of this chain, with the chain initialized from a training case (rather than a random state). [sent-86, score-0.199]

37 Given a labeled training pair {v+ , lk = 1}, sample h+ ∼ P (h|v+ , lk = 1). [sent-90, score-0.604]

38 Let W·,·,k denote the Nh × Nv matrix of parameters corresponding to the k th slice along the label dimension of the 3D tensor. [sent-100, score-0.254]

39 Typically, the updates computed from a “mini-batch” of training cases (a small subset of the entire training set) are averaged together into one update and then applied to the parameters. [sent-102, score-0.228]

40 In particular, the label l− generated in step 3 above is unlikely to be different from the true label l+ of the training case used in step 1. [sent-104, score-0.314]

41 Empirically, the chain has a tendency to stay “stuck” on the same state for the label variable because in the positive phase the hidden activities are inferred with the label clamped to its true value. [sent-105, score-0.471]

42 So the hidden activities contain information about the true label, which gives it an advantage over the other labels. [sent-106, score-0.178]

43 Consider the extreme case where we initialize the Markov chain with a training pair + {v+ , lk = 1} and the label variable never changes from its initial state during the chain’s entire run. [sent-107, score-0.479]

44 In effect, the model that ends up being learned is a class-conditional generative distribution P (v|lk = 1), represented by the k th slice RBM. [sent-108, score-0.262]

45 The parameter updates are identical to those for training Nl independent RBMs, one per class, with only the training cases of each class being used to learn the RBM for that class. [sent-109, score-0.186]

46 Note that this is very different from the model in section 2: here the energy functions implemented by the class-conditional RBMs are learned independently and their energy units are not commensurate with each other. [sent-110, score-0.437]

47 Also, as the results show, learning a purely discriminative model at the top level of a DBN gives much worse performance. [sent-115, score-0.189]

48 As explained below, this approach 1) avoids the slow mixing of the CD learning for P (v, l), and 2) allows learning with both labeled and unlabeled data. [sent-117, score-0.215]

49 In our experiments, a model trained with this hybrid learning algorithm has the highest classification accuracy, beating both a generative model trained using CD as well as a purely discriminative model. [sent-119, score-0.621]

50 Hybrid learning algorithm for P (v, l): + Let {v+ , lk = 1} be a labeled training case. [sent-121, score-0.372]

51 Using the result from step 1 and the true label lk = 1, compute the update d ∆W·,·,k = ∂ log P (l|v)/∂W·,·,c for c ∈ {1, . [sent-140, score-0.394]

52 The two types of update for the cth slice of the tensor W·,·,c are then combined by a weighted sum: g d W·,·,c ← W·,·,c + η(∆W·,·,c + λ∆W·,·,c ), (7) where λ is a parameter that sets the relative weighting of the generative and discriminative updates, and η is the learning rate. [sent-144, score-0.423]

53 The earlier problem of slow mixing does not appear in the hybrid algorithm because the chain in the generative part does not involve sampling the label. [sent-148, score-0.334]

54 Semi-supervised learning: The hybrid learning algorithm can also make use of unlabeled training cases by treating their labels as missing inputs. [sent-149, score-0.417]

55 The model first infers the missing label by sampling P (l|vu ) for an unlabeled training case vu . [sent-150, score-0.414]

56 The generative update is then computed by treating the inferred label as the true label. [sent-151, score-0.283]

57 ) Therefore the unlabeled training cases contribute an extra generative term to the parameter update. [sent-153, score-0.397]

58 For logistic units this has a simple derivative of p−q with respect to the total input to a unit. [sent-157, score-0.236]

59 One 5 added advantage of this sparseness penalty is that it revives any hidden units whose average activities are much lower than p. [sent-168, score-0.449]

60 So at test time a trained model has to recognize unseen instances of the same object classes. [sent-176, score-0.194]

61 2 Training Details Model architecture: The two main decisions to make when training DBNs are the number of hidden layers to greedily pre-train and the number of hidden units to use in each layer. [sent-185, score-0.774]

62 To simplify the experiments we constrain the number of hidden units to be the same at all layers (including the top-level model). [sent-186, score-0.434]

63 We have tried hidden layer sizes of 2000, 4000, and 8000 units. [sent-187, score-0.315]

64 We have also tried models with two, one, or no greedily pre-trained hidden layers. [sent-188, score-0.301]

65 The best classification results are given by the DBN with one greedily pre-trained sparse hidden layer of 4000 units (regardless of the type of top-level model). [sent-190, score-0.654]

66 A DBN trained on the pre-processed input with one greedily pre-trained layer of 4000 hidden units and a third-order model on top of it, also with 4000 hidden units, has roughly 116 million learnable parameters in total. [sent-191, score-0.96]

67 [15] that uses GPUs to train a deep model with roughly the same number of parameters much more quickly. [sent-195, score-0.252]

68 We put Gaussian units at the lowest (pixel) layer of the DBN, which have been shown to be effective for modelling grayscale images [7]. [sent-196, score-0.497]

69 6 Results The results are presented in three parts: part 1 compares deep models to shallow ones, all trained using CD. [sent-198, score-0.407]

70 Part 2 compares CD to the hybrid learning algorithm for training the top-level model of a DBN. [sent-199, score-0.275]

71 Part 3 compares DBNs trained with and without unlabeled data, using either CD or the hybrid algorithm at the top level. [sent-200, score-0.422]

72 Shallow Models Trained with CD We consider here DBNs with one greedily pre-trained layer and a top-level model that contains the greedily pretrained features as its “visible” layer. [sent-210, score-0.499]

73 The corresponding shallow version trains the top-level model directly on the pixels (using Gaussian visible units), with no pre-trained layers in between. [sent-211, score-0.299]

74 The test error rates for these four models(see table 1) show that one greedily pre-trained layer reduces the error substantially, even without any subsequent fine-tuning of the pre-trained layer. [sent-213, score-0.384]

75 6% Table 1: NORB test set error rates for deep and shallow models trained using CD with two types of top-level models. [sent-218, score-0.47]

76 The third-order RBM outperforms the standard RBM top-level model when they both have the same number of hidden units, but a better comparison might be to match the number of parameters by increasing the hidden layer size of the standard RBM model by five times (i. [sent-219, score-0.482]

77 We have tried training such an RBM, but the error rate is worse than the RBM with 4000 hidden units. [sent-222, score-0.268]

78 All these DBNs share the same greedily pre-trained first layer – only the top-level model differs among them. [sent-227, score-0.327]

79 Learning algorithm CD Hybrid RBM with label unit 11. [sent-228, score-0.172]

80 5% Table 2: NORB test set error rates for top-level models trained using CD and the hybrid learning algorithms. [sent-232, score-0.31]

81 The lower error rates of hybrid learning are partly due to its ability to avoid the poor mixing of the label variable when CD is used to learn the joint density P (v, l) and partly due to its greater emphasis on discrimination (but with strong regularization provided by also learning P (v|l)). [sent-233, score-0.399]

82 Supervised Learning In this final part, we create additional images from the original NORB training set by applying global translations of 2, 4, and 6 pixels in eight directions (two horizontal, two vertical and four diagonal directions) to the original stereo-pair images2 . [sent-236, score-0.197]

83 These “jittered” images are treated as extra unlabeled training cases that are combined with the original labeled cases to form a much larger training set. [sent-237, score-0.496]

84 Note that we could have assigned the jittered images the same class label as their source images. [sent-238, score-0.32]

85 By treating them as unlabeled, the goal is to test whether improving the unsupervised, generative part of the learning alone can improve discriminative performance. [sent-239, score-0.247]

86 Use it for greedy pre-training of the lower layers only, and then train the top-level model as before, with only labeled data and the hybrid algorithm. [sent-241, score-0.378]

87 Use it for learning the top-level model as well, now with the semi-supervised variant of the hybrid algorithm at the top-level. [sent-244, score-0.201]

88 Top-level model (hyrbid learning only) RBM with label unit Third-order model Unlabeled jitter for pre-training lower layer? [sent-246, score-0.28]

89 2% Table 3: NORB test set error rates for DBNs trained with and without unlabeled data, and using the hybrid learning algorithm at the top-level. [sent-253, score-0.459]

90 The key conclusion from table 3 is that simply using more unlabeled training data in the unsupervised, greedy pre-training phase alone can significantly improve the classification accuracy of the DBN. [sent-254, score-0.264]

91 Using more unlabeled data also at the top level further improves accuracy, but only slightly, to 5. [sent-258, score-0.175]

92 These models use the same greedily pre-trained lower layer, learned with unlabeled jitter. [sent-263, score-0.287]

93 They differ in how the top-level parameters are initialized, and whether they use the jittered images as extra labeled cases for learning P (l|v). [sent-264, score-0.318]

94 We compare training the discriminative toplevel model “from scratch” (random initializaInitialization Use jittered tion) versus initializing its parameters to those of top-level images as Error of a generative model learned by the hybrid alparameters labeled? [sent-265, score-0.729]

95 1% tioned before, it is possible to assign the jittered Model with images the same labels as the original NORB 5. [sent-270, score-0.2]

96 0% images they are generated from, which expands from table 3 the labeled training set by 25 times. [sent-272, score-0.221]

97 The botTable 4: NORB test set error rates for dis- tom two rows of table 4 compare a discriminative criminative third-order models at the top third-order model initialized with and without pre-training. [sent-273, score-0.252]

98 But note that discriminative training only makes a small additional improvement (5. [sent-278, score-0.2]

99 The main two points are: 1) Unsupervised, greedy, generative learning can extract an image representation that supports more accurate object recognition than the raw pixel representation. [sent-282, score-0.2]

100 An empirical evaluation of deep architectures on problems with many factors of variation. [sent-343, score-0.215]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rbm', 0.352), ('norb', 0.272), ('units', 0.236), ('lk', 0.232), ('cd', 0.225), ('deep', 0.215), ('nl', 0.177), ('hybrid', 0.164), ('nh', 0.158), ('layer', 0.152), ('unlabeled', 0.149), ('dbns', 0.145), ('greedily', 0.138), ('boltzmann', 0.129), ('hidden', 0.128), ('discriminative', 0.126), ('nv', 0.123), ('label', 0.12), ('jittered', 0.119), ('erent', 0.113), ('shallow', 0.109), ('di', 0.107), ('nets', 0.099), ('penultimate', 0.099), ('dbn', 0.096), ('tensor', 0.093), ('generative', 0.091), ('contrastive', 0.088), ('convolutional', 0.085), ('yes', 0.085), ('visible', 0.083), ('trained', 0.083), ('energy', 0.082), ('images', 0.081), ('dk', 0.079), ('training', 0.074), ('object', 0.074), ('slice', 0.071), ('layers', 0.07), ('belief', 0.07), ('labeled', 0.066), ('th', 0.063), ('net', 0.058), ('restricted', 0.056), ('wijk', 0.054), ('chain', 0.053), ('unit', 0.052), ('extra', 0.052), ('hj', 0.051), ('rbms', 0.051), ('activities', 0.05), ('fk', 0.048), ('outer', 0.046), ('architecture', 0.046), ('larochelle', 0.046), ('unsupervised', 0.046), ('qcurrent', 0.045), ('translations', 0.042), ('update', 0.042), ('ring', 0.042), ('lighting', 0.041), ('greedy', 0.041), ('nair', 0.04), ('clique', 0.039), ('toronto', 0.039), ('updates', 0.038), ('bipartite', 0.038), ('model', 0.037), ('foveal', 0.036), ('multiplicatively', 0.036), ('vi', 0.036), ('penalty', 0.035), ('tried', 0.035), ('recognition', 0.035), ('jitter', 0.034), ('softmax', 0.034), ('vu', 0.034), ('features', 0.034), ('divergence', 0.034), ('bengio', 0.033), ('learnable', 0.032), ('rates', 0.032), ('contribute', 0.031), ('hinton', 0.031), ('published', 0.031), ('error', 0.031), ('cliques', 0.031), ('energies', 0.031), ('treating', 0.03), ('module', 0.029), ('raina', 0.029), ('icml', 0.029), ('grayscale', 0.028), ('classi', 0.028), ('blocks', 0.027), ('gradients', 0.027), ('free', 0.027), ('partly', 0.026), ('earlier', 0.026), ('top', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

2 0.24295647 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

3 0.18801914 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

4 0.18113963 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1

5 0.17611705 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

6 0.15564391 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

7 0.14398326 204 nips-2009-Replicated Softmax: an Undirected Topic Model

8 0.13432167 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

9 0.12624422 97 nips-2009-Free energy score space

10 0.10923322 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases

11 0.10455531 133 nips-2009-Learning models of object structure

12 0.08816272 201 nips-2009-Region-based Segmentation and Object Detection

13 0.087276176 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

14 0.081510313 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

15 0.07731992 157 nips-2009-Multi-Label Prediction via Compressed Sensing

16 0.072348677 175 nips-2009-Occlusive Components Analysis

17 0.069884837 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

18 0.069210805 135 nips-2009-Learning to Hash with Binary Reconstructive Embeddings

19 0.068930723 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

20 0.066862769 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.23), (1, -0.101), (2, -0.1), (3, 0.01), (4, -0.05), (5, 0.049), (6, -0.022), (7, 0.193), (8, -0.078), (9, 0.03), (10, 0.027), (11, 0.106), (12, -0.361), (13, 0.014), (14, 0.182), (15, 0.049), (16, 0.089), (17, -0.162), (18, -0.018), (19, -0.057), (20, -0.093), (21, 0.051), (22, -0.006), (23, 0.051), (24, 0.025), (25, -0.145), (26, 0.006), (27, 0.022), (28, -0.014), (29, 0.024), (30, -0.045), (31, 0.079), (32, 0.01), (33, -0.017), (34, 0.016), (35, 0.048), (36, -0.044), (37, 0.04), (38, 0.02), (39, -0.036), (40, -0.092), (41, -0.014), (42, -0.032), (43, -0.091), (44, -0.028), (45, -0.054), (46, 0.091), (47, 0.027), (48, 0.113), (49, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94234633 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

2 0.79024088 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

Author: Honglak Lee, Peter Pham, Yan Largman, Andrew Y. Ng

Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks. 1

3 0.69893324 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

4 0.63618159 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

5 0.62908942 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

6 0.56561989 97 nips-2009-Free energy score space

7 0.556741 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

8 0.53129709 119 nips-2009-Kernel Methods for Deep Learning

9 0.46158478 15 nips-2009-A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

10 0.46022737 56 nips-2009-Conditional Neural Fields

11 0.44835374 204 nips-2009-Replicated Softmax: an Undirected Topic Model

12 0.40236557 127 nips-2009-Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition

13 0.39678013 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

14 0.39220148 39 nips-2009-Bayesian Belief Polarization

15 0.39040452 49 nips-2009-Breaking Boundaries Between Induction Time and Diagnosis Time Active Information Acquisition

16 0.38850167 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

17 0.34781331 176 nips-2009-On Invariance in Hierarchical Models

18 0.34372011 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

19 0.34348378 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

20 0.34116402 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(12, 0.223), (21, 0.012), (24, 0.055), (25, 0.067), (35, 0.066), (36, 0.124), (39, 0.049), (42, 0.017), (58, 0.045), (71, 0.056), (81, 0.016), (86, 0.145), (91, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85495269 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

2 0.83713043 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1

3 0.7314505 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

4 0.702748 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

5 0.69856668 119 nips-2009-Kernel Methods for Deep Learning

Author: Youngmin Cho, Lawrence K. Saul

Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1

6 0.69308817 137 nips-2009-Learning transport operators for image manifolds

7 0.69213504 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

8 0.68856025 129 nips-2009-Learning a Small Mixture of Trees

9 0.68854457 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

10 0.68799531 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

11 0.68710804 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems

12 0.68710309 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

13 0.68631327 104 nips-2009-Group Sparse Coding

14 0.68627852 87 nips-2009-Exponential Family Graph Matching and Ranking

15 0.68527269 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

16 0.68473083 260 nips-2009-Zero-shot Learning with Semantic Output Codes

17 0.68392766 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

18 0.68330705 112 nips-2009-Human Rademacher Complexity

19 0.6831215 72 nips-2009-Distribution Matching for Transduction

20 0.68196166 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections