nips nips2013 nips2013-331 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim
Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1
Reference: text
sentIndex sentText sentNum sentScore
1 sg Abstract Designing a principled and effective algorithm for learning deep architectures is a challenging problem. [sent-6, score-0.2]
2 The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. [sent-7, score-0.201]
3 We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. [sent-8, score-0.2]
4 We propose to implement the scheme using a method to regularize deep belief networks with top-down information. [sent-9, score-0.363]
5 The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. [sent-10, score-0.213]
6 A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. [sent-11, score-0.203]
7 Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. [sent-12, score-0.28]
8 However, when the architecture is deep, it is challenging to train the entire network through supervised learning due to the large number of parameters, the non-convex optimization problem and the dilution of the error signal through the layers. [sent-17, score-0.208]
9 Recent developments in unsupervised feature learning and deep learning algorithms have made it possible to learn deep feature hierarchies. [sent-19, score-0.49]
10 The first phase greedily learns unsupervised modules layer-by-layer from the bottom-up [1, 5]. [sent-21, score-0.322]
11 This is subsequently followed by a supervised phase that fine-tunes the network using a supervised, usually discriminative algorithm, such as supervised error backpropagation. [sent-23, score-0.592]
12 The unsupervised learning phase initializes the parameters without taking into account the ultimate task of interest, such as classification. [sent-24, score-0.32]
13 The second phase assumes the entire burden of modifying the model to fit the task. [sent-25, score-0.202]
14 This is done by adding an intermediate training phase between the two existing deep learning phases, which enhances the unsupervised representation by incorporating top-down information. [sent-27, score-0.575]
15 1 that regularizes the deep belief network (DBN) from the top-down. [sent-30, score-0.364]
16 The new regularization method and deep learning strategy are applied to handwritten digit recognition and dictionary learning for object recognition, with competitive empirical results. [sent-32, score-0.466]
17 A restricted Boltzmann machine (RBM) [8] is a bipartite Markov random field with an input layer x ∈ RI and a latent layer z ∈ RJ (see Figure 1). [sent-37, score-0.645]
18 The units in a layer are conditionally independent with distributions given by logistic functions: P (z|x) = j P (x|z) = i P (zj |x), P (zj |x) = 1/(1 + exp(−wj x − bj )), (3) P (xi |z), P (xi |z) = 1/(1 + exp(−wi z − ci )). [sent-48, score-0.338]
19 The first term samples the data distribution at t = 0, while the second term approximates the equilibrium distribution at t = ∞ using the contrastive divergence method [9] by using a small and finite number of sampling steps N to obtain a distribution of reconstructed states at t = N . [sent-51, score-0.189]
20 E(x, y, z) = −z Wx − z Vy − b z − c x − d y (6) The conditional distribution of the concatenated vector is now: P (x, y|z) = P (x|z)P (y|z) = i P (xi |z) c P (yc |z), (7) where P (xi |z) is given in Equation 4 and the outputs yc may either be logistic units or the softmax units. [sent-53, score-0.22]
21 The RBM may again be trained using contrastive divergence algorithm [9] to approximate the maximum likelihood of joint distribution. [sent-54, score-0.184]
22 However, if the objective is to train a deep network, then with ever new layer, the previous V has to be discarded and retrained. [sent-80, score-0.2]
23 It may also not be desirable to use a discriminative criterion directly from the outputs, especially in the initial layers of the network. [sent-81, score-0.177]
24 Deep belief networks (DBN) [1] are probabilistic graphical models made up of a hierarchy of stochastic latent variables. [sent-83, score-0.188]
25 It follows a two-phase training strategy of unsupervised greedy pre-training followed by supervised fine-tuning. [sent-85, score-0.295]
26 For unsupervised pre-training, a stack of RBMs is trained greedily from the bottom-up, with the latent activations of each layer used as the inputs for the next RBM. [sent-86, score-0.665]
27 Each new layer RBM models the data distribution P (x), such that when higher-level layers are sufficiently large, the variational bound on the likelihood always improves [1]. [sent-87, score-0.391]
28 An alternative supervised method is a generative model that implements a supervised RBM (Figure 2) that models P (x, y) at the top layer. [sent-90, score-0.285]
29 First, a stochastic bottom-up pass is performed and the generative weights are adjusted to be good at reconstructing the layer below. [sent-93, score-0.384]
30 Next, a few iterations of alternating sampling using the respective conditional probabilities are done at the top-level supervised RBM between the concatenated vector and the latent layer. [sent-94, score-0.338]
31 Using contrastive divergence the RBM is updated by fitting to its posterior distribution. [sent-95, score-0.155]
32 Finally, a stochastic top-down pass adjusts bottom-up recognition weights to reconstruct the activations of the layer above. [sent-96, score-0.559]
33 In this work, we extend the existing DBN training strategy by having an additional supervised training phase before the discriminative error backpropagation. [sent-97, score-0.49]
34 The aim is to construct a top-down regularized building block for deep networks, instead of combining the optimization criteria directly [12], which is done for the supervised RBM model (Figure 2). [sent-103, score-0.441]
35 To give control over individual elements in the latent vector, one way to manipulate the representations is to point-wise bias the activations for each latent variable j [11]. [sent-104, score-0.353]
36 z (8) The update rule of the cross-entropy-regularized RBM can be modified to: ∆wij ∝ xi sj 0 − xi zj N , (9) where sj = (1 − λ) zj + λˆj z (10) is the merger of the latent and target activations used to update the parameters. [sent-106, score-0.391]
37 Here, the influences of zj and zj are regulated by parameter λ. [sent-107, score-0.176]
38 zj = zj ), ˆ ˆ then the parameter update is exactly that the original contrastive divergence learning algorithm. [sent-110, score-0.331]
39 The same principle of regularizing the latent activations can be used to combine signals from the bottom-up and top-down. [sent-112, score-0.215]
40 The basic building block is a three-layer structure consisting of three consecutive layers: the previous zl−1 ∈ RI , current zl ∈ RJ and next zl+1 ∈ RH layers. [sent-114, score-0.502]
41 The layers are connected by two sets of weight parameters Wl−1 and Wl to the previous and next layers respectively. [sent-115, score-0.238]
42 Meanwhile, sampling from the next layer zl+1 via weights Wl drives the top-down representations zl,l+1 : P (zl,l+1,j | zl+1 ; Wl ) = 1/(1 + exp(−wl,j zl+1 − cl,j )). [sent-117, score-0.376]
43 (12) The objective is to learn the RBM parameters Wl−1 that map from the previous layer zl−1 to the current latent layer zl,l−1 , by maximizing the likelihood of the previous layer P (zl−1 ) while considering the top-down samples zl,l+1 from the next layer zl+1 as target representations. [sent-118, score-1.156]
44 Additionally, the alternating Gibbs sampling, necessary for the contrastive divergence updates, is performed from the unbiased bottom-up samples using Equation 11 and a symmetric decoder: P (zl−1,l,j = 1 | zl,l−1 ; Wl−1 ) = 1/(1 + exp(−wl−1,i zl,l−1 − cl−1,j )). [sent-122, score-0.199]
45 Figure 3: The basic building block learns a bottom-up latent representation regularized by topdown signals. [sent-130, score-0.254]
46 Bottom-up zl,l−1 and top-down zl,l+1 latent activations are sampled from zl−1 and zl+1 respectively. [sent-131, score-0.215]
47 They are merged to get the modified activations sl used for parameter updates. [sent-132, score-0.286]
48 In the DBN, RBMs are stacked from the bottom-up in a greedy layer-wise manner, with each new layer modeling the posterior distribution of the previous layer. [sent-135, score-0.371]
49 Similarly, regularized building blocks can also be used to construct the regularized DBN (Figure 4). [sent-136, score-0.18]
50 The network can be trained with a forward and backward strategy (Figure 4(b)). [sent-138, score-0.244]
51 It integrates top-down regularization with contrastive divergence learning, which is given by alternating Gibbs sampling between the layers (Figure 4(c)). [sent-139, score-0.386]
52 s2 s3 s4 s5 z2,1 x z5,4 z4,3 z3,2 z2,3 z1,2 z3,4 z4,5 z2,1 z2,3 z3,2 z3,4 z4,3 z4,5 y z5,4 (a) Top-down regularized deep belief network. [sent-145, score-0.343]
53 z1,2 (c) Alternating Gibbs sampling chains for contrastive divergence learning. [sent-151, score-0.189]
54 Figure 4: Constructing a top-down regularized deep belief network (DBN). [sent-152, score-0.427]
55 All the restricted Boltzmann machines (RBM) that make up the network are concurrently optimized. [sent-153, score-0.159]
56 Both bottom-up and top-down activations are used for training the network. [sent-155, score-0.2]
57 (b) Activations for the top-down regularization are obtained by sampling and merging the forward pass and the backward pass. [sent-156, score-0.274]
58 (c) From the activations of the forward pass, the reconstructions can be obtained by performing alternating Gibbs sampling with the previous layer. [sent-157, score-0.306]
59 In the forward pass, given the input features, each layer zl is sampled from the bottom-up, based on the representation of the previous layer zl−1 (Equation 11). [sent-158, score-1.045]
60 Upon reaching the output layer, the backward pass begins. [sent-160, score-0.181]
61 This is repeated until the second layer is reached (l = 2) and s2 is computed. [sent-162, score-0.272]
62 All other backward activations from this point onwards are based on the merged representation from instance- and class-based representations. [sent-169, score-0.293]
63 This suggest that the network can adopt a three-phase strategy for training, whereby the parameters learned in one phase initializes the next, as follows: • Phase 1 – Unsupervised Greedy. [sent-173, score-0.314]
64 The network is constructed by greedily learning a new unsupervised RBM on top of the existing network. [sent-174, score-0.204]
65 The stacking process is repeated for L − 2 RBMs, until layer L − 1 is added to the network. [sent-176, score-0.272]
66 This phase begins by connecting the L − 1 to a final layer, which is activated by the softmax activation function for a classification problem. [sent-178, score-0.296]
67 Using the one-hot coded output vector y ∈ RC as its target activations and setting λL to 1, the RBM is learned as an associative memory with the following update: ∆wL−1,ic ∝ zL−1,L−2,i yc 0 − zL−1,L,i zL,L−1,c N. [sent-179, score-0.229]
68 This phase is used to fine-tune the network using generative learning, and binds the layers together by aligning all the parameters of the network with the outputs. [sent-181, score-0.526]
69 Finally, the supervised error backpropagation algorithm is used to improve class discrimination in the representations. [sent-183, score-0.24]
70 In the forward pass, each layer is activated from the bottom-up to obtain the class predictions. [sent-185, score-0.359]
71 The classification error is then computed based on the groundtruth and the backward pass performs gradient descent on the parameters by backpropagating the errors through the layers from the top-down. [sent-186, score-0.306]
72 Essentially, the two phases are performing a variant of the contrastive divergence algorithm. [sent-189, score-0.246]
73 5 Empirical Evaluation In this work, the proposed deep learning strategy and top-down regularization method were evaluated and analyzed using the MNIST handwritten digit dataset [16] and the Caltech-101 object recognition dataset [17]. [sent-191, score-0.434]
74 [1] by initially using 44, 000 training and 10, 000 validation images to train the network before retraining it with the full training set. [sent-207, score-0.218]
75 In phase 3, sets of 50, 000 and 10, 000 images were used as the initial training and validation sets. [sent-208, score-0.283]
76 To simplify the parameterization for the forward-backward learning in phase 2, the top-down modulation parameter λl across the layers were controlled by a single parameter γ using the function: λl = |l − 1|γ /(|l − 1|γ − |L − l|γ ). [sent-210, score-0.321]
77 The top-down influence for a layer l is also dependent on its relative position in the network. [sent-212, score-0.272]
78 The function assigns λl such that the layers nearer to the input will have stronger influences from the input, while the layers near the output will be biased towards the output. [sent-213, score-0.266]
79 For each setup, the intermediate results for each training phase are reported in Table 1. [sent-224, score-0.285]
80 The deep convex net [19], which utilized more complex convex-optimized modules as building blocks but did not perform fine-tuning on a global network level, got a score of 0. [sent-228, score-0.37]
81 23% and used a heavy architecture of a committee of 35 deep convolutional neural nets with elastic distortions and image normalization [20]. [sent-231, score-0.3]
82 Setup / Learning algorithm* Classification error rate Phase 1 Phase 2 Phase 1 Phase 2 Phase 3 Deep belief network (reported in [1]) 1. [sent-239, score-0.164]
83 Additionally, SIFT descriptors from a spatial neighborhood of 2 × 2 were concatenated to form a macrofeature [22]. [sent-263, score-0.156]
84 Two layers of RBMs were stacked to model the macrofeatures. [sent-265, score-0.19]
85 The resulting representations of the first RBM were then concatenated within each spatial neighborhood of 2 × 2. [sent-269, score-0.172]
86 For each experimental trial, a set of 30 training examples per class (totaling to 3060) was randomly selected for supervised learning. [sent-273, score-0.177]
87 The results demonstrate a consistent improvement moving from Phase 1 to phase 3. [sent-280, score-0.202]
88 Method / Training phase Accuracy Proposed top-down regularized DBN Phase 1: Unsupervised stacking Phase 2: Top-down regularization Phase 3: Error backpropagation 72. [sent-287, score-0.415]
89 9% Conclusion We proposed the notion of deep learning by gradually transitioning from being fully unsupervised to strongly discriminative. [sent-295, score-0.29]
90 This is achieved through the introduction of an intermediate phase between the unsupervised and supervised learning phases. [sent-296, score-0.446]
91 The method is easily integrated into the intermediate learning phase based on simple building blocks. [sent-298, score-0.286]
92 It can be performed to complement greedy layer-wise unsupervised learning and discriminative optimization using error backpropagation. [sent-299, score-0.176]
93 Empirical evaluation show that the method leads to competitive results for handwritten digit recognition and object recognition datasets. [sent-300, score-0.265]
94 Teh, “A fast learning algorithm for deep belief networks,” Neural Computation, vol. [sent-306, score-0.28]
95 Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. [sent-322, score-0.2]
96 Larochelle, “Greedy layer-wise training of deep networks,” in NIPS, 2006. [sent-330, score-0.253]
97 Ng, “Sparse deep belief net model for visual area V2,” in NIPS, 2008. [sent-355, score-0.28]
98 Bengio, “Representational power of restricted Boltzmann machines and deep belief networks,” Neural Computation, vol. [sent-365, score-0.355]
99 Schmidhuber, “Multi-column deep neural networks for image classifica¸ tion,” in CVPR, 2012. [sent-400, score-0.279]
100 Lim, “Unsupervised and supervised visual codes with restricted Boltzmann machines,” in ECCV, 2012. [sent-413, score-0.157]
wordName wordTfidf (topN-words)
[('zl', 0.448), ('rbm', 0.338), ('layer', 0.272), ('wl', 0.255), ('rbms', 0.22), ('phase', 0.202), ('deep', 0.2), ('dbn', 0.199), ('activations', 0.147), ('supervised', 0.124), ('layers', 0.119), ('backpropagation', 0.116), ('dtrain', 0.103), ('contrastive', 0.101), ('thome', 0.098), ('phases', 0.091), ('unsupervised', 0.09), ('zj', 0.088), ('network', 0.084), ('belief', 0.08), ('boltzmann', 0.079), ('goh', 0.078), ('backward', 0.078), ('pass', 0.075), ('cord', 0.075), ('stacked', 0.071), ('sl', 0.071), ('representations', 0.07), ('topdown', 0.069), ('latent', 0.068), ('concatenated', 0.068), ('merged', 0.068), ('units', 0.066), ('recognition', 0.065), ('regularized', 0.063), ('discriminative', 0.058), ('hinton', 0.055), ('divergence', 0.054), ('yc', 0.054), ('building', 0.054), ('descriptors', 0.054), ('forward', 0.053), ('training', 0.053), ('bengio', 0.052), ('mnist', 0.049), ('object', 0.047), ('digit', 0.045), ('singapore', 0.044), ('alternating', 0.044), ('regularize', 0.043), ('handwritten', 0.043), ('machines', 0.042), ('larochelle', 0.042), ('gibbs', 0.041), ('networks', 0.04), ('hanlin', 0.039), ('macrofeatures', 0.039), ('umi', 0.039), ('image', 0.039), ('generative', 0.037), ('lecun', 0.036), ('classi', 0.035), ('regularization', 0.034), ('sohn', 0.034), ('gated', 0.034), ('backpropagating', 0.034), ('infocomm', 0.034), ('convolutional', 0.034), ('spatial', 0.034), ('sampling', 0.034), ('rc', 0.034), ('activated', 0.034), ('sift', 0.033), ('restricted', 0.033), ('uences', 0.032), ('wrongly', 0.032), ('got', 0.032), ('dictionary', 0.032), ('softmax', 0.032), ('greedily', 0.03), ('intermediate', 0.03), ('pervasive', 0.03), ('pooling', 0.03), ('trained', 0.029), ('inputs', 0.029), ('initializes', 0.028), ('wx', 0.028), ('reconstructions', 0.028), ('hmax', 0.028), ('cvpr', 0.028), ('activation', 0.028), ('images', 0.028), ('output', 0.028), ('greedy', 0.028), ('setup', 0.028), ('ponce', 0.027), ('gradual', 0.027), ('pyramid', 0.027), ('cnrs', 0.027), ('distortions', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 331 nips-2013-Top-Down Regularization of Deep Belief Networks
Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim
Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1
2 0.28165838 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
Author: James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel
Abstract: This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM’s unnormalized log-likelihood function as a type of neural network, and through a series of simulation results relate these networks to ones whose representational properties are better understood. We show the surprising result that RBMs can efficiently capture any distribution whose density depends on the number of 1’s in their input. We also provide the first known example of a particular type of distribution that provably cannot be efficiently represented by an RBM, assuming a realistic exponential upper bound on the weights. By formally demonstrating that a relatively simple distribution cannot be represented efficiently by an RBM our results provide a new rigorous justification for the use of potentially more expressive generative models, such as deeper ones. 1
3 0.25816742 251 nips-2013-Predicting Parameters in Deep Learning
Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1
4 0.25302657 75 nips-2013-Convex Two-Layer Modeling
Author: Özlem Aslan, Hao Cheng, Xinhua Zhang, Dale Schuurmans
Abstract: Latent variable prediction models, such as multi-layer networks, impose auxiliary latent variables between inputs and outputs to allow automatic inference of implicit features useful for prediction. Unfortunately, such models are difficult to train because inference over latent variables must be performed concurrently with parameter optimization—creating a highly non-convex problem. Instead of proposing another local training method, we develop a convex relaxation of hidden-layer conditional models that admits global training. Our approach extends current convex modeling approaches to handle two nested nonlinearities separated by a non-trivial adaptive latent layer. The resulting methods are able to acquire two-layer models that cannot be represented by any single-layer model over the same features, while improving training quality over local heuristics. 1
5 0.23008844 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
Author: Michiel Hermans, Benjamin Schrauwen
Abstract: Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difficult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales. 1
6 0.21237992 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
7 0.20089671 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
8 0.16598836 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
9 0.15702772 30 nips-2013-Adaptive dropout for training deep neural networks
10 0.14239295 5 nips-2013-A Deep Architecture for Matching Short Texts
11 0.13271904 36 nips-2013-Annealing between distributions by averaging moments
12 0.12680694 64 nips-2013-Compete to Compute
13 0.11636496 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
14 0.10728402 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
15 0.10567956 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors
16 0.095733471 84 nips-2013-Deep Neural Networks for Object Detection
17 0.093624093 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
18 0.09309613 160 nips-2013-Learning Stochastic Feedforward Neural Networks
19 0.09253414 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
20 0.089011148 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
topicId topicWeight
[(0, 0.199), (1, 0.115), (2, -0.202), (3, -0.137), (4, 0.134), (5, -0.155), (6, -0.081), (7, 0.09), (8, 0.046), (9, -0.219), (10, 0.243), (11, 0.018), (12, -0.056), (13, 0.05), (14, 0.071), (15, 0.088), (16, 0.0), (17, -0.068), (18, -0.041), (19, -0.018), (20, 0.036), (21, -0.036), (22, 0.183), (23, 0.017), (24, 0.058), (25, -0.01), (26, -0.252), (27, -0.01), (28, 0.076), (29, 0.039), (30, 0.042), (31, 0.015), (32, -0.063), (33, -0.084), (34, 0.096), (35, 0.016), (36, 0.005), (37, 0.034), (38, -0.066), (39, 0.003), (40, 0.028), (41, 0.044), (42, 0.051), (43, -0.0), (44, 0.029), (45, -0.053), (46, 0.038), (47, 0.028), (48, 0.039), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.961496 331 nips-2013-Top-Down Regularization of Deep Belief Networks
Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim
Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1
2 0.82100177 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
Author: Michiel Hermans, Benjamin Schrauwen
Abstract: Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this paper we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hierarchical processing on difficult temporal tasks, and more naturally capture the structure of time series. We show that they reach state-of-the-art performance for recurrent networks in character-level language modeling when trained with simple stochastic gradient descent. We also offer an analysis of the different emergent time scales. 1
3 0.78536129 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
Author: James Martens, Arkadev Chattopadhya, Toni Pitassi, Richard Zemel
Abstract: This paper examines the question: What kinds of distributions can be efficiently represented by Restricted Boltzmann Machines (RBMs)? We characterize the RBM’s unnormalized log-likelihood function as a type of neural network, and through a series of simulation results relate these networks to ones whose representational properties are better understood. We show the surprising result that RBMs can efficiently capture any distribution whose density depends on the number of 1’s in their input. We also provide the first known example of a particular type of distribution that provably cannot be efficiently represented by an RBM, assuming a realistic exponential upper bound on the weights. By formally demonstrating that a relatively simple distribution cannot be represented efficiently by an RBM our results provide a new rigorous justification for the use of potentially more expressive generative models, such as deeper ones. 1
4 0.74720627 251 nips-2013-Predicting Parameters in Deep Learning
Author: Misha Denil, Babak Shakibi, Laurent Dinh, Marc'Aurelio Ranzato, Nando de Freitas
Abstract: We demonstrate that there is significant redundancy in the parameterization of several deep learning models. Given only a few weight values for each feature it is possible to accurately predict the remaining values. Moreover, we show that not only can the parameter values be predicted, but many of them need not be learned at all. We train several different architectures by learning only a small number of weights and predicting the rest. In the best case we are able to predict more than 95% of the weights of a network without any drop in accuracy. 1
5 0.74411517 315 nips-2013-Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs
Author: Yann Dauphin, Yoshua Bengio
Abstract: Sparse high-dimensional data vectors are common in many application domains where a very large number of rarely non-zero features can be devised. Unfortunately, this creates a computational bottleneck for unsupervised feature learning algorithms such as those based on auto-encoders and RBMs, because they involve a reconstruction step where the whole input vector is predicted from the current feature values. An algorithm was recently developed to successfully handle the case of auto-encoders, based on an importance sampling scheme stochastically selecting which input elements to actually reconstruct during training for each particular example. To generalize this idea to RBMs, we propose a stochastic ratio-matching algorithm that inherits all the computational advantages and unbiasedness of the importance sampling scheme. We show that stochastic ratio matching is a good estimator, allowing the approach to beat the state-of-the-art on two bag-of-word text classification benchmarks (20 Newsgroups and RCV1), while keeping computational cost linear in the number of non-zeros. 1
6 0.73324311 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
7 0.66692597 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
8 0.61868376 75 nips-2013-Convex Two-Layer Modeling
9 0.61390346 64 nips-2013-Compete to Compute
10 0.60085142 5 nips-2013-A Deep Architecture for Matching Short Texts
11 0.58937174 160 nips-2013-Learning Stochastic Feedforward Neural Networks
12 0.57220066 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
13 0.56300205 30 nips-2013-Adaptive dropout for training deep neural networks
14 0.53229189 36 nips-2013-Annealing between distributions by averaging moments
15 0.49453935 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
16 0.49290478 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
17 0.46303231 84 nips-2013-Deep Neural Networks for Object Detection
18 0.45628944 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
19 0.42784867 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
20 0.42257085 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
topicId topicWeight
[(2, 0.017), (16, 0.035), (33, 0.191), (34, 0.091), (41, 0.023), (49, 0.097), (51, 0.147), (56, 0.073), (70, 0.066), (85, 0.027), (89, 0.049), (93, 0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.89137316 331 nips-2013-Top-Down Regularization of Deep Belief Networks
Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim
Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1
2 0.87725163 86 nips-2013-Demixing odors - fast inference in olfaction
Author: Agnieszka Grabska-Barwinska, Jeff Beck, Alexandre Pouget, Peter Latham
Abstract: The olfactory system faces a difficult inference problem: it has to determine what odors are present based on the distributed activation of its receptor neurons. Here we derive neural implementations of two approximate inference algorithms that could be used by the brain. One is a variational algorithm (which builds on the work of Beck. et al., 2012), the other is based on sampling. Importantly, we use a more realistic prior distribution over odors than has been used in the past: we use a “spike and slab” prior, for which most odors have zero concentration. After mapping the two algorithms onto neural dynamics, we find that both can infer correct odors in less than 100 ms. Thus, at the behavioral level, the two algorithms make very similar predictions. However, they make different assumptions about connectivity and neural computations, and make different predictions about neural activity. Thus, they should be distinguishable experimentally. If so, that would provide insight into the mechanisms employed by the olfactory system, and, because the two algorithms use very different coding strategies, that would also provide insight into how networks represent probabilities. 1
3 0.84055889 64 nips-2013-Compete to Compute
Author: Rupesh K. Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, Jürgen Schmidhuber
Abstract: Local competition among neighboring neurons is common in biological neural networks (NNs). In this paper, we apply the concept to gradient-based, backprop-trained artificial multilayer NNs. NNs with competing linear units tend to outperform those with non-competing nonlinear units, and avoid catastrophic forgetting when training sets change over time. 1
4 0.83881795 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
Author: Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori
Abstract: We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths. 1
5 0.8343395 121 nips-2013-Firing rate predictions in optimal balanced networks
Author: David G. Barrett, Sophie Denève, Christian K. Machens
Abstract: How are firing rates in a spiking network related to neural input, connectivity and network function? This is an important problem because firing rates are a key measure of network activity, in both the study of neural computation and neural network dynamics. However, it is a difficult problem, because the spiking mechanism of individual neurons is highly non-linear, and these individual neurons interact strongly through connectivity. We develop a new technique for calculating firing rates in optimal balanced networks. These are particularly interesting networks because they provide an optimal spike-based signal representation while producing cortex-like spiking activity through a dynamic balance of excitation and inhibition. We can calculate firing rates by treating balanced network dynamics as an algorithm for optimising signal representation. We identify this algorithm and then calculate firing rates by finding the solution to the algorithm. Our firing rate calculation relates network firing rates directly to network input, connectivity and function. This allows us to explain the function and underlying mechanism of tuning curves in a variety of systems. 1
6 0.82805163 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
7 0.82313585 301 nips-2013-Sparse Additive Text Models with Low Rank Background
8 0.82303023 183 nips-2013-Mapping paradigm ontologies to and from the brain
9 0.82083035 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
10 0.82077444 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
11 0.8207702 30 nips-2013-Adaptive dropout for training deep neural networks
12 0.82073146 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization
13 0.82044744 334 nips-2013-Training and Analysing Deep Recurrent Neural Networks
14 0.82019496 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
15 0.82003421 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
16 0.81935281 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
17 0.81871504 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
18 0.81640714 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
19 0.81496322 251 nips-2013-Predicting Parameters in Deep Learning
20 0.81412995 200 nips-2013-Multi-Prediction Deep Boltzmann Machines