nips nips2009 nips2009-132 knowledge-graph by maker-knowledge-mining

132 nips-2009-Learning in Markov Random Fields using Tempered Transitions


Source: pdf

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. [sent-4, score-0.285]

2 We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. [sent-5, score-1.41]

3 Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks. [sent-6, score-0.09]

4 When modeling high-dimensional, richly structured data, the inference problem becomes much more difficult because the distribution we need to infer is likely to be highly multimodal [17]. [sent-11, score-0.145]

5 The main problem here is the inability of Markov chains to efficiently explore distributions with many isolated modes. [sent-18, score-0.11]

6 1 In this paper we concentrate on the class of stochastic approximation algorithms of the RobbinsMonro type that use MCMC to estimate the model’s expected sufficient statistics, needed for maximum likelihood learning. [sent-19, score-0.225]

7 We first show that using this class of algorithms allows us to make very rapid progress towards finding a fairly good set of parameters, even for models containing millions of parameters. [sent-20, score-0.097]

8 Second, we show that using MCMC operators based on tempered transitions [9] enables the stochastic algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates, particularly in large, densely-connected MRF’s. [sent-21, score-1.365]

9 Our results on the MNIST and NORB datasets demonstrate that the stochastic approximation algorithm together with tempered transitions can be successfully used to model high-dimensional real-world distributions. [sent-22, score-1.186]

10 Except for simple models such as the tree structured graphs exact maximum likelihood learning is intractable, because exact computation of the expectation Ep(x;θ) [·] takes time that is exponential in the treewidth of the graph1. [sent-30, score-0.108]

11 Another approach, called the MCMC maximum likelihood estimator (MCMC-MLE) [3], has been shown to sometimes provide considerably better results than PL [3, 20]. [sent-34, score-0.151]

12 Consider running a Markov chain to obtain samples x(1) , x(2) , . [sent-36, score-0.121]

13 Randomly initialize θ1 and M sample particles {x1,1 , . [sent-44, score-0.141]

14 2: for t = 1 : T (number of iterations) do 3: for m = 1 : M (number of parallel Markov chains) do 4: Sample xt+1,m given xt,m using transition operator Tθt (xt+1,m ← xt,m ). [sent-49, score-0.118]

15 This implies that as the number of samples n, drawn from our proposal distributions, goes to infinity, MCMC-MLE will converge to the true maximum likelihood estimator. [sent-57, score-0.155]

16 In high-dimensional spaces, the variance of an estimator Ln (θ) will be very large, or possibly infinite, unless the proposal distribution p(x; ψ) is a near-perfect approximation to p(x; θ). [sent-59, score-0.103]

17 3 Stochastic Approximation Procedure (SAP) We now consider a stochastic approximation procedure that uses MCMC to estimate the model’s expected sufficient statistics. [sent-61, score-0.14]

18 SAP belongs to the general class of well-studied stochastic approximation algorithms of the Robbins-Monro type [19, 12]. [sent-62, score-0.14]

19 The algorithm itself dates back to 1988 [19], but only recently it has been shown to work surprisingly well when training large MRF’s, including restricted Boltzmann machines [15] and deep Boltzmann machines [14, 13]. [sent-63, score-0.137]

20 Then the state and the parameters are updated sequentially: (7) θt+1 = θt + αt φ(x0 ) − φ(xt+1 ) , where xt+1 ∼ Tθt (xt+1 ← xt ). [sent-66, score-0.12]

21 Given xt , we sample a new state xt+1 using the transition operator Tθt (xt+1 ← xt ) that leaves p(·; θt ) invariant. [sent-67, score-0.341]

22 One important property of this algorithm is that just like MCMC-MLE, it can be shown to asymptotically converge to the maximum likelihood estimator θ∗ . [sent-76, score-0.112]

23 3 In particular, for fully visible discrete 1 MRF’s, if one uses a Gibbs transition operator and the learning rate is set to αt = (t+1)U , where U a. [sent-77, score-0.179]

24 for s = S − 1 : 0 (Forward pass) do Sample xs given xs+1 using Ts (xs ← xs+1 ). [sent-95, score-0.101]

25 for s = 0 : S − 1 (Backward pass) do e x ˜ ˜ ˜ Sample xs+1 given xs using Ts (˜ s+1 ← xs ). [sent-97, score-0.202]

26 Intuitively, as the learning rate becomes sufficiently small compared to the mixing rate of the Markov chain, the chain will stay close to the stationary distribution, even if it is only run for a few MCMC steps per parameter update. [sent-104, score-0.114]

27 However, as the algorithm begins to capture the multimodality of the data distribution, the Markov chain tends to mix poorly, producing highly correlated samples for successive parameter updates. [sent-107, score-0.15]

28 The main problem here is the inability of the Markov chain to efficiently explore a distribution with many isolated modes. [sent-109, score-0.139]

29 However, the transition operators Tθt (xt+1 ← xt ) used in the stochastic approximation algorithm do not necessarily need to be simple Gibbs or Metropolis-Hastings updates to guarantee almost sure convergence. [sent-110, score-0.461]

30 Instead, we propose to use MCMC operators based on tempered transitions [9] that can more efficiently explore highly multimodal distributions. [sent-111, score-1.231]

31 In addition, implementing tempered transitions requires very little extra work beyond the implementation of the Gibbs sampler. [sent-112, score-1.067]

32 , S−1 we define a transition operator Ts (x′ ← x) that leaves ps invariant. [sent-126, score-0.179]

33 We also need to define a reverse transition operator Ts (x ← x′ ) that satisfies the following reversibility condition for all x and x′ : ps (x)Ts (x′ ← x) = Ts (x ← x′ )ps (x′ ). [sent-128, score-0.211]

34 Non-reversible operators are usually composed of several reversible sub-transitions applied in sequence Ts = Q1 . [sent-131, score-0.099]

35 QK , such as the single component updates in a Gibbs sampler. [sent-134, score-0.104]

36 The reverse operator can be simply constructed from the same sub-transitions, but applied in the reverse order Ts = QK . [sent-135, score-0.105]

37 Given the current state x of the Markov chain, tempered transitions apply a sequence of transition operators TS−1 . [sent-139, score-1.195]

38 The x-axis show the number of Gibbs updates and the y-axis displays the training log-probability in nats. [sent-163, score-0.135]

39 Bottom: Classification performance of the semi-restricted Boltzmann machines with 500 hidden units on the full MNIST datasets. [sent-164, score-0.151]

40 Tempered transitions can make major changes to the current state, which allows the Markov chain to produce less correlated samples between successive parameter updates. [sent-168, score-0.426]

41 We therefore propose to alternate between applying a more expensive tempered transitions operator and the standard Gibbs updates. [sent-170, score-1.087]

42 The number of sample particles used for estimating the model’s expected sufficient statistics was also set to 100. [sent-174, score-0.141]

43 For the stochastic approximation algorithm, we always apply a single Gibbs update to the sample particles. [sent-175, score-0.175]

44 1 MNIST The MNIST digit dataset contains 60,000 training and 10,000 test images of ten handwritten digits (0 to 9), with 28×28 pixels. [sent-179, score-0.132]

45 From the training data, a random sample of 10,000 images was set aside for validation. [sent-181, score-0.126]

46 An RBM is a particular type of Markov random field that has a two-layer architecture, in which the visible binary stochastic units x are connected to hidden binary stochastic units h, as shown in Fig. [sent-183, score-0.509]

47 θi xi + θij xi hj + exp  Z(θ) j i i,j h 5 Samples before Tempered Transitions Samples after Tempered Transitions Model Samples Figure 2: Left: Sample particles produced by the stochastic approximation algorithm after 100,000 parameter updates. [sent-186, score-0.246]

48 Middle: Sample particles after applying a tempered transitions run. [sent-187, score-1.152]

49 Right: Samples generated from the current model by randomly initializing all binary states and running the Gibbs sampler for 500,000 steps. [sent-188, score-0.103]

50 After applying tempered transitions, sample particles look more like the samples generated from the current model. [sent-189, score-0.899]

51 The images shown are the probabilities of the visible units given the binary states of the hidden units. [sent-190, score-0.25]

52 This allowed us to calculate the exact value of the partition function simply by summing out the 784 visible units for each configuration of the hiddens. [sent-192, score-0.166]

53 For the stochastic approximation procedure, the total number of parameter updates was 100,000, so the learning took about 25. [sent-193, score-0.266]

54 For comparison, we also trained the same model using exact maximum likelihood with exactly the same learning schedule. [sent-198, score-0.136]

55 Perhaps surprisingly, SAP makes very rapid progress towards the maximum likelihood solution, even though the model contains 8634 free parameters. [sent-199, score-0.156]

56 1 further shows that combining regular Gibbs updates with tempered transitions provides a more accurate estimator. [sent-201, score-1.15]

57 We applied tempered transitions only during the last 50,000 Gibbs steps, alternating between 200 Gibbs updates and a single tempered transitions run that used 50 β’s spaced uniformly from 1 to 0. [sent-202, score-2.251]

58 The acceptance rate for the tempered transitions was about 0. [sent-204, score-1.104]

59 For SAP, parameters were updated after each Gibbs step (see Algorithm 1), whereas for Trans-SAP, parameters were updated after each Gibbs update but not during the tempered transitions run4 . [sent-207, score-1.046]

60 Pseudo-likelihood and MCMC maximum likelihood estimators perform quite poorly, even for this small toy problem. [sent-209, score-0.112]

61 In contrast to RBM’s, the visible units in this model form a fully connected pairwise binary MRF (see Fig. [sent-211, score-0.164]

62 The model had 500 hidden units and was trained to model the joint probability distribution over the digit images and labels. [sent-213, score-0.246]

63 The total number of Gibbs updates was set to 200,000, so the learning took about 19. [sent-214, score-0.126]

64 As in our previous experiment, tempered transitions were only applied during the last 100,000 Gibbs updates, alternating between 1000 Gibbs updates and a single tempered transitions run that used 500 β’s spaced uniformly from 1 to 0. [sent-222, score-2.251]

65 To estimate the models’ partition functions, we used Annealed Importance Sampling [10, 13] – a technique that is very similar to tempered transitions. [sent-227, score-0.75]

66 The plain stochastic approximation algorithm achieved an average test log-probability of -87. [sent-228, score-0.218]

67 4 This reduced the total number of parameter updates from 100, 000 to 50, 000 + 50, 000 ∗ 2/3 = 83, 333. [sent-231, score-0.104]

68 6 Training Samples Model trained with Tempered Transitions Model trained without Tempered Transitions Figure 3: Results on the NORB dataset. [sent-232, score-0.102]

69 Samples generated from the two RBM models, trained using SAP with (Middle) and without (Right) tempered transitions. [sent-234, score-0.77]

70 To get an intuitive picture of how tempered transitions operate, we looked at the sample particles before and after applying a tempered transitions run. [sent-236, score-2.233]

71 Figure 2 shows sample particles after 100,000 parameter updates. [sent-237, score-0.141]

72 Observe that the particles look like the real handwritten digits. [sent-238, score-0.138]

73 However, a run of tempered transitions reveals that the current model is very unbalanced, with more probability mass placed on images of four. [sent-239, score-1.081]

74 To further test whether the “refreshed” particles were representative of the current model, we generated samples from the current model by randomly initializing binary states of the visible and hidden units, and running the Gibbs sampler for 500,000 steps. [sent-240, score-0.359]

75 Clearly, the refreshed particles look more like the samples generated from the true model. [sent-241, score-0.182]

76 2 NORB Results on MNIST show that the stochastic approximation algorithm works well on the relatively simple task of handwritten digit recognition. [sent-244, score-0.206]

77 The training set contains 24,300 stereo image pairs of 25 objects, whereas the test set contains 24,300 stereo pairs of the remaining, different 25 objects. [sent-247, score-0.103]

78 We also augmented the training data with additional unlabeled data by applying simple pixel translations, creating a total of 1,166,400 training instances. [sent-252, score-0.094]

79 To deal with raw pixel data, we followed the approach of [8] by first learning a Gaussian-binary RBM with 4000 hidden units, and then treating the the activities of its hidden layer as “preprocessed” data. [sent-253, score-0.155]

80 The model was trained using contrastive divergence learning for 500 epochs. [sent-254, score-0.1]

81 The learned low-level RBM effectively acts as a preprocessor that transforms greyscale images into 4000-dimensional binary vectors, which we use as the input for training our models. [sent-255, score-0.169]

82 We proceeded to training an RBM with 4000 hidden units using binary representations learned by the preprocessor module6 . [sent-256, score-0.223]

83 The total number of Gibbs updates was set to 400,000. [sent-258, score-0.104]

84 Similar to the previous experiments, tempered transitions were applied during the last 200,000 Gibbs updates, alternating between 1000 Gibbs updates and a single tempered transitions run that used 1000 β’s spaced uniformly from 1 to 0. [sent-261, score-2.251]

85 7 Figure 3 shows samples generated from two models, trained using stochastic approximation with and without tempered transitions. [sent-265, score-0.949]

86 The plain stochastic approximation algorithm produced a very unbalanced model with a large fraction of the model’s probability mass placed on images of humans. [sent-267, score-0.253]

87 Using tempered transitions allowed us to learn a better and more balanced generative model, including the lighting effects. [sent-268, score-1.069]

88 We also tested the classification performance of both models simply by fitting a logistic regression model to the labeled data (using only the 24300 labeled training examples without any translations) using the top-level hidden activities as inputs. [sent-272, score-0.103]

89 The model trained by SAP achieved an error rate of 8. [sent-273, score-0.103]

90 5 Conclusions We have presented a class of stochastic approximation algorithms of the Robbins-Monro type that can be used to efficiently learn parameters in large densely-connected MRF’s. [sent-280, score-0.14]

91 Using MCMC operators based on tempered transitions allows the stochastic approximation algorithm to better explore highly multimodal distributions, which in turn allows us to learn good generative models of handwritten digits and 3D objects in a reasonable amount of computer time. [sent-281, score-1.403]

92 In this paper we have concentrated only on using tempered transition operators. [sent-282, score-0.77]

93 There exist a variety of other methods for sampling from distributions with many isolated modes, including simulated tempering [7] and parallel tempering [3], all of which can be incorporated into SAP. [sent-283, score-0.265]

94 In particular, the concurrent work of [2] employs parallel tempering techniques to imrpove mixing in RBM’s. [sent-284, score-0.116]

95 There are, however, several advantages of using tempered transitions over other existing methods. [sent-285, score-1.046]

96 First, tempered transitions do not require specifying any extra variables, such as the approximate values of normalizing constants of intermediate distributions, which are needed for the simulated tempering method. [sent-286, score-1.23]

97 Second, tempered transitions have modest memory requirements, unlike, for example, parallel tempering, since the acceptance rule can be computed on the fly as the intermediate states are generated. [sent-287, score-1.151]

98 Finally, the implementation of tempered transitions requires almost no extra work beyond implementing the Gibbs sampler, and can be easily integrated into existing code. [sent-288, score-1.067]

99 Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines. [sent-301, score-0.119]

100 Training restricted Boltzmann machines using approximations to the likelihood gradient. [sent-366, score-0.127]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tempered', 0.719), ('transitions', 0.327), ('mrf', 0.186), ('sap', 0.168), ('gibbs', 0.165), ('boltzmann', 0.127), ('rbm', 0.126), ('particles', 0.106), ('norb', 0.105), ('updates', 0.104), ('xs', 0.101), ('stochastic', 0.095), ('xt', 0.094), ('tempering', 0.09), ('ps', 0.087), ('mcmc', 0.076), ('mnist', 0.076), ('units', 0.075), ('operators', 0.072), ('ep', 0.07), ('multimodal', 0.068), ('ts', 0.064), ('visible', 0.06), ('chain', 0.06), ('annealed', 0.056), ('markov', 0.056), ('plain', 0.053), ('likelihood', 0.052), ('trained', 0.051), ('transition', 0.051), ('hidden', 0.051), ('contrastive', 0.049), ('intermediate', 0.048), ('approximation', 0.045), ('operator', 0.041), ('considerably', 0.039), ('samples', 0.039), ('rapid', 0.038), ('classified', 0.037), ('gibbsian', 0.037), ('greyscale', 0.037), ('preprocessor', 0.037), ('refreshed', 0.037), ('stereo', 0.036), ('images', 0.035), ('sample', 0.035), ('digit', 0.034), ('progress', 0.033), ('spaced', 0.033), ('maximum', 0.033), ('nair', 0.033), ('richly', 0.033), ('pixel', 0.032), ('handwritten', 0.032), ('sampler', 0.032), ('reverse', 0.032), ('proposal', 0.031), ('distributions', 0.031), ('partition', 0.031), ('acceptance', 0.031), ('xk', 0.031), ('training', 0.031), ('multimodality', 0.03), ('poorly', 0.029), ('binary', 0.029), ('ln', 0.028), ('deep', 0.028), ('isolated', 0.028), ('restricted', 0.028), ('estimator', 0.027), ('toy', 0.027), ('rate', 0.027), ('inability', 0.027), ('reversible', 0.027), ('parallel', 0.026), ('state', 0.026), ('millions', 0.026), ('normalizing', 0.025), ('unbalanced', 0.025), ('aside', 0.025), ('achieved', 0.025), ('machines', 0.025), ('carlo', 0.024), ('monte', 0.024), ('explore', 0.024), ('kept', 0.024), ('translations', 0.023), ('structured', 0.023), ('lighting', 0.023), ('pl', 0.023), ('took', 0.022), ('running', 0.022), ('panel', 0.022), ('alternating', 0.022), ('approximations', 0.022), ('extra', 0.021), ('highly', 0.021), ('activities', 0.021), ('salakhutdinov', 0.021), ('initializing', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

2 0.18801914 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

3 0.14565077 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

Author: Peter Carbonetto, Matthew King, Firas Hamze

Abstract: We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “population-based” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artificial sequence of distributions. Significantly, our framework offers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a difficult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can offer improvements in stability and accuracy over existing methods, and at a comparable cost. 1

4 0.11652269 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

Abstract: Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions defined on discrete random variables. Improvements include guarantees of convergence, approximations that are provably more accurate, and bounds on the results of exact inference. However, extending these methods to continuous-valued systems has lagged behind. While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. The resulting algorithms behave similarly to their purely discrete counterparts, extending the benefits of these more advanced inference techniques to the continuous domain. 1

5 0.10595332 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

Author: Kyomin Jung, Pushmeet Kohli, Devavrat Shah

Abstract: We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF). We present a randomized iterative algorithm based on simple local updates. The algorithm, starting with an arbitrary initial assignment, updates it in each iteration by first, picking a random node, then selecting an (appropriately chosen) random local neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show that this algorithm finds a near optimal assignment within n log 2 n iterations with high probability for any n node pair-wise MRF with geometry (i.e. MRF graph with polynomial growth) with the approximation error depending on (in a reasonable manner) the geometric growth rate of the graph and the average radius of the local neighborhood – this allows for a graceful tradeoff between the complexity of the algorithm and the approximation error. Through extensive simulations, we show that our algorithm finds extremely good approximate solutions for various kinds of MRFs with geometry.

6 0.087885782 204 nips-2009-Replicated Softmax: an Undirected Topic Model

7 0.085998967 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

8 0.083547555 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

9 0.081976332 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

10 0.081807643 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

11 0.080059201 226 nips-2009-Spatial Normalized Gamma Processes

12 0.073634028 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

13 0.072718978 137 nips-2009-Learning transport operators for image manifolds

14 0.072127938 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

15 0.063051634 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

16 0.060581297 154 nips-2009-Modeling the spacing effect in sequential category learning

17 0.05928728 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

18 0.057286527 151 nips-2009-Measuring Invariances in Deep Networks

19 0.057151366 94 nips-2009-Fast Learning from Non-i.i.d. Observations

20 0.057080977 242 nips-2009-The Infinite Partially Observable Markov Decision Process


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.182), (1, -0.015), (2, 0.021), (3, -0.094), (4, 0.056), (5, -0.017), (6, 0.085), (7, 0.093), (8, -0.056), (9, -0.062), (10, 0.003), (11, 0.097), (12, -0.138), (13, 0.018), (14, 0.109), (15, -0.017), (16, -0.04), (17, -0.125), (18, 0.087), (19, -0.109), (20, -0.049), (21, 0.022), (22, 0.038), (23, 0.049), (24, 0.087), (25, -0.138), (26, 0.013), (27, 0.074), (28, -0.062), (29, 0.065), (30, 0.029), (31, -0.013), (32, -0.058), (33, -0.115), (34, -0.004), (35, 0.044), (36, 0.07), (37, 0.016), (38, -0.044), (39, -0.049), (40, -0.146), (41, 0.06), (42, -0.068), (43, -0.064), (44, -0.148), (45, 0.01), (46, 0.033), (47, -0.015), (48, 0.061), (49, 0.114)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9252851 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

2 0.62521917 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

Abstract: Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions defined on discrete random variables. Improvements include guarantees of convergence, approximations that are provably more accurate, and bounds on the results of exact inference. However, extending these methods to continuous-valued systems has lagged behind. While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. The resulting algorithms behave similarly to their purely discrete counterparts, extending the benefits of these more advanced inference techniques to the continuous domain. 1

3 0.60405958 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

4 0.5569697 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

Author: Samuel Gershman, Ed Vul, Joshua B. Tenenbaum

Abstract: While many perceptual and cognitive phenomena are well described in terms of Bayesian inference, the necessary computations are intractable at the scale of realworld tasks, and it remains unclear how the human mind approximates Bayesian computations algorithmically. We explore the proposal that for some tasks, humans use a form of Markov Chain Monte Carlo to approximate the posterior distribution over hidden variables. As a case study, we show how several phenomena of perceptual multistability can be explained as MCMC inference in simple graphical models for low-level vision. 1

5 0.54268628 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

Author: Peter Carbonetto, Matthew King, Firas Hamze

Abstract: We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “population-based” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artificial sequence of distributions. Significantly, our framework offers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a difficult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can offer improvements in stability and accuracy over existing methods, and at a comparable cost. 1

6 0.46469194 97 nips-2009-Free energy score space

7 0.45709029 39 nips-2009-Bayesian Belief Polarization

8 0.43563303 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

9 0.42525977 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

10 0.40365502 226 nips-2009-Spatial Normalized Gamma Processes

11 0.40171096 204 nips-2009-Replicated Softmax: an Undirected Topic Model

12 0.3974728 197 nips-2009-Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

13 0.39375785 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

14 0.36372679 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

15 0.35984325 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

16 0.35923493 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

17 0.35638568 94 nips-2009-Fast Learning from Non-i.i.d. Observations

18 0.35580903 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

19 0.34457922 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

20 0.34149793 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.165), (12, 0.049), (24, 0.049), (25, 0.075), (35, 0.074), (36, 0.116), (39, 0.046), (58, 0.065), (61, 0.02), (66, 0.014), (71, 0.09), (81, 0.036), (86, 0.086), (91, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86363298 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

2 0.77253437 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

Author: Khashayar Rohanimanesh, Sameer Singh, Andrew McCallum, Michael J. Black

Abstract: Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these samplingbased methods suffer from local minima—the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL). Rather than setting parameters to maximize the likelihood of the training data, parameters of the factor graph are treated as a log-linear function approximator and learned with methods of temporal difference (TD); MAP inference is performed by executing the resulting policy on held out test data. Our method allows efficient gradient updates since only factors in the neighborhood of variables affected by an action need to be computed—we bypass the need to compute marginals entirely. Our method yields dramatic empirical success, producing new state-of-the-art results on a complex joint model of ontology alignment, with a 48% reduction in error over state-of-the-art in that domain. 1

3 0.76998591 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

4 0.76907587 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1

5 0.76663399 260 nips-2009-Zero-shot Learning with Semantic Output Codes

Author: Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell

Abstract: We consider the problem of zero-shot learning, where the goal is to learn a classifier f : X → Y that must predict novel values of Y that were omitted from the training set. To achieve this, we define the notion of a semantic output code classifier (SOC) which utilizes a knowledge base of semantic properties of Y to extrapolate to novel classes. We provide a formalism for this type of classifier and study its theoretical properties in a PAC framework, showing conditions under which the classifier can accurately predict novel classes. As a case study, we build a SOC classifier for a neural decoding task and show that it can often predict words that people are thinking about from functional magnetic resonance images (fMRI) of their neural activity, even without training examples for those words. 1

6 0.76612437 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

7 0.7640177 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

8 0.76303971 97 nips-2009-Free energy score space

9 0.7621718 187 nips-2009-Particle-based Variational Inference for Continuous Systems

10 0.76182616 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

11 0.76168925 129 nips-2009-Learning a Small Mixture of Trees

12 0.75966024 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

13 0.75725114 204 nips-2009-Replicated Softmax: an Undirected Topic Model

14 0.75671613 113 nips-2009-Improving Existing Fault Recovery Policies

15 0.7561506 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

16 0.75404668 226 nips-2009-Spatial Normalized Gamma Processes

17 0.75310522 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

18 0.75257462 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

19 0.75233567 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

20 0.75219637 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA