nips nips2012 nips2012-349 knowledge-graph by maker-knowledge-mining

349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space


Source: pdf

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Training sparse natural image models with a fast Gibbs sampler of an extended state space Jascha Sohl-Dickstein Redwood Center for Theoretical Neuroscience jascha@berkeley. [sent-1, score-0.351]

2 org Abstract We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. [sent-4, score-1.01]

3 Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. [sent-5, score-0.612]

4 Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. [sent-6, score-0.303]

5 Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. [sent-7, score-0.549]

6 When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. [sent-8, score-0.266]

7 We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. [sent-9, score-1.234]

8 In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. [sent-10, score-1.117]

9 1 Introduction Here we study learning and inference in the overcomplete linear model given by x = As, p(s) = fi (si ), (1) i where A ∈ RM ×N , N ≥ M , and each marginal source distribution fi may depend on additional parameters. [sent-11, score-0.891]

10 Most of the literature on overcomplete linear models assumes observations corrupted by additive Gaussian noise, that is, x = As + ε for a Gaussian distributed random variable ε. [sent-13, score-0.529]

11 When the observations are image patches, the source distributions fi (si ) are typically assumed to be sparse or leptokurtotic [e. [sent-15, score-0.502]

12 A large family of leptokurtotic distributions which also contains 1 A s2 p(z|x) p(s|x) A B s1 λ s A x B z Figure 1: A: In the noiseless overcomplete linear model, the posterior distribution over hidden sources s lives on a linear subspace. [sent-19, score-0.958]

13 For sparse source distributions, the posterior will generally be heavy-tailed and multimodal, as can be seen on the right. [sent-21, score-0.333]

14 B: A graphical model representation of the overcomplete linear model extended by two sets of auxiliary variables (Equation 2 and 3). [sent-22, score-0.529]

15 We perform blocked Gibbs sampling between λ and z to sample from the posterior distribution over all latent variables given an observation x. [sent-23, score-0.452]

16 For a given λ, the posterior over z becomes Gaussian while for given z, the posterior over λ becomes factorial and is thus easy to sample from. [sent-24, score-0.261]

17 the aforementioned distributions as a special case is formed by Gaussian scale mixtures (GSMs), ∞ fi (si ) = 0 gi (λi )N (si ; 0, λ−1 ) dλi , i (2) where gi (λi ) is a univariate density over precisions λi . [sent-25, score-0.406]

18 In the following, we will concentrate on linear models whose marginal source distributions can be represented as GSMs. [sent-26, score-0.289]

19 In particular, the posterior distribution over sources p(s | x) is constrained to a linear subspace and can have multiple modes with heavy tails (Figure 1A). [sent-29, score-0.263]

20 Inference can be simplified by assuming additive Gaussian noise, constraining the source distributions to be log-concave or making crude approximations to the posterior. [sent-30, score-0.273]

21 On this account, we use Markov chain Monte Carlo (MCMC) methods to obtain samples with which we represent the posterior distribution. [sent-32, score-0.207]

22 In the following, we will describe an efficient blocked Gibbs sampler which exploits the specific structure of the sparse linear model. [sent-38, score-0.53]

23 2 Sampling and inference In this section, we first review the nullspace sampling algorithm of Chen and Wu [4], which solves the problem of sampling from a linear subspace in the noiseless case of the overcomplete linear model. [sent-39, score-1.093]

24 We then introduce an additional set of auxiliary variables which leads to an efficient blocked Gibbs sampler. [sent-40, score-0.234]

25 1 Nullspace sampling The basic idea behind the nullspace sampling algorithm is to extend the overcomplete linear model by an additional set of variables z which essentially makes it complete (Figure 1B), x z A B = s, (3) where B ∈ R(N −M )×N and square brackets denote concatenation. [sent-42, score-1.038]

26 If the rows of A and B are orthogonal, AB = 0, or, in other words, B spans the nullspace of A, we have s = A+ x + B + z, (4) where A+ and B + are the pseudoinverses [24] of A and B, respectively. [sent-44, score-0.223]

27 An orthogonal basis spanning the nullspace of A can be obtained from A’s singular value decomposition [4]. [sent-46, score-0.3]

28 2 Blocked Gibbs sampling The fact that the marginals fi (si ) are expressed as Gaussian mixtures (Equation 2) can be used to derive an efficient blocked Gibbs sampler. [sent-51, score-0.535]

29 The Gibbs sampler alternately samples nullspace representations z and precisions of the source marginals λ. [sent-52, score-0.913]

30 The key observation here is that given the precisions λ, the distribution over x and z becomes Gaussian which makes sampling from the posterior distribution tractable. [sent-53, score-0.356]

31 A similar idea was pursued by Olshausen and Millman [21], who modeled the source distributions with mixtures of Gaussians and conditionally Gibbs sampled precisions one by one. [sent-54, score-0.45]

32 In contrast, here we update all precision variables in parallel by conditioning on the nullspace representation z. [sent-56, score-0.223]

33 Using a finite number of precisions ϑik with prior probabilities πik , for example, the posterior probability of λi being ϑij becomes N (si ; 0, ϑ−1 )πij ij p(λi = ϑij | x, z) = k N (si ; 0, ϑ−1 )πik ik . [sent-59, score-0.331]

34 We use the following computationally xx xx efficient method to conditionally sample Gaussian distributions [8, 14]: x z ∼ N (0, Σ), z = z + Σxz Σ−1 (x − x ). [sent-63, score-0.187]

35 Together, equations 7 and 9 implement a rapidly mixing blocked Gibbs sampler. [sent-65, score-0.234]

36 We empirically show in the results section that for natural image patches the benefits of blocked Gibbs sampling outweigh its computational costs. [sent-67, score-0.507]

37 A closely related sampling algorithm was proposed by Park and Casella [23] for implementing Bayesian inference in the linear regression model with Laplace prior. [sent-68, score-0.19]

38 The main differences here are that we also consider the noiseless case by exploiting the nullspace representation, that instead of using a fixed Laplace prior we will use the sampler to learn the distribution over source variables, and that we apply the algorithm in the context of image modeling. [sent-69, score-0.706]

39 3 Learning In the following, we describe a learning strategy for the overcomplete linear model based on the idea of persistent Markov chains [26, 32, 36], which already has led to improved learning strategies for a number of different models [e. [sent-72, score-0.713]

40 Following Girolami [11] and others, we use expectation maximization (EM) [7] to maximize the likelihood of the overcomplete linear model. [sent-75, score-0.529]

41 Instead of a variational approximation, here we use the blocked Gibbs sampler to sample a hidden state z for every data point x in the E-step. [sent-76, score-0.461]

42 Despite our efforts to make sampling efficient, running the Markov chain till convergence can still be a costly operation due to the generally large number of data points and high dimensionality of posterior samples. [sent-79, score-0.317]

43 To further reduce computational costs, we developed a learning strategy which makes use of persistent Markov chains and only requires a few sampling steps in every iteration. [sent-80, score-0.294]

44 Second, if updating the Markov chain has only a small effect on the posterior samples z, also the distribution of the complete data (x, z) will change very little. [sent-85, score-0.273]

45 This causes an inefficient Markov chain to automatically slow down the learning process, so that the posterior samples will always be close to the stationary distribution. [sent-87, score-0.207]

46 Interestingly, improving the lower bound F with respect to q can be accomplished by driving the Markov chain with our Gibbs sampler or some other transition operator [26]. [sent-93, score-0.292]

47 5 0 0 Time [s] Image model Toy model 5 10 15 Time [s] 0 20 40 60 80 Time [s] Figure 2: A: The average energy of posterior samples for different sampling methods after deterministic initialization. [sent-101, score-0.284]

48 This stands in contrast to other contexts where persistent Markov chains have been successful but training can diverge [10]. [sent-110, score-0.184]

49 4 Results We trained several linear models on log-transformed, centered and symmetrically whitened image patches extracted from van Hateren’s dataset of natural images [33]. [sent-113, score-0.274]

50 We explicitly modeled the DC component of the whitened image patches using a mixture of Gaussians and constrained the remaining components of the linear basis to be orthogonal to the DC component. [sent-114, score-0.321]

51 For faster convergence, we initialized the linear basis with the sparse coding algorithm of Olshausen and Field [19], which corresponds to learning with MAP inference and fixed marginal source distributions. [sent-115, score-0.382]

52 After initialization, we optimized the basis using L-BFGS [3] during each M-step and updated the representation of the posterior using 2 steps of Gibbs sampling in each E-step. [sent-116, score-0.295]

53 To represent the source marginals, we used finite GSMs (Equation 8) with 10 precisions ϑij each and equal prior weights, that is, πij = 0. [sent-117, score-0.309]

54 The source marginals were initialized by fitting them to samples from the Laplace distribution and later optimized using 10 iterations of standard EM at the beginning of each M-step. [sent-119, score-0.271]

55 1 Performance of the blocked Gibbs sampler We compared the sampling performance of our Gibbs sampler to MALA sampling—as used by Chen and Wu [4]—as well as HMC sampling [9], which is a generalization of MALA. [sent-121, score-0.84]

56 The HMC sampler has two parameters: a step width and a number of so called leap frog steps. [sent-122, score-0.193]

57 25 0 64 128 192 256 Basis coefficient, i Figure 3: We trained models with up to four times overcomplete representations using either Laplace marginals or GSM marginals. [sent-128, score-0.604]

58 A four times overcomplete basis set is shown in the center. [sent-129, score-0.557]

59 Basis vectors were normalized so that the corresponding source distributions had unit variance. [sent-130, score-0.24]

60 The right panel shows log-densities of the source distributions corresponding to basis vectors inside the dashed rectangle. [sent-134, score-0.317]

61 The algorithms were tested on one toy model and one two times overcomplete model trained on 8 × 8 image patches. [sent-136, score-0.602]

62 Figure 2 shows trace plots and autocorrelation functions for the different sampling methods. [sent-141, score-0.236]

63 Autocorrelation functions were estimated from single Markov chain runs of equal duration for each sampler and data point. [sent-143, score-0.258]

64 All Markov chains were initialized using 100 burn-in steps of Gibbs sampling, independent of the sampler used to generate the autocorrelation functions. [sent-144, score-0.377]

65 Still, even the best HMC sampler produced more correlated samples than the blocked Gibbs sampler. [sent-148, score-0.461]

66 While the best HMC sampler reached an autocorrelation of 0. [sent-149, score-0.319]

67 05 after about 64 seconds, it took only about 26 seconds with the blocked Gibbs sampler (right-hand side of Figure 2B). [sent-150, score-0.427]

68 [2] found that even for very sparse choices of the Student-t prior, the representations learned by the linear model are barely overcomplete if a variational approximation to the posterior is used. [sent-155, score-0.784]

69 Consistent with the study of Seeger [28], if we fix the source distributions to be Laplacian, our algorithm learns representations which are only slightly overcomplete (Figure 3). [sent-160, score-0.778]

70 However, much more overcomplete representations were obtained when the source distributions were learned from the data. [sent-161, score-0.778]

71 This is in line with the results of Olshausen and Millman [21], who used mixtures of two 6 Log-likelihood ± SEM [bit/pixel] 16 × 16 image patches 8 × 8 image patches 1. [sent-162, score-0.319]

72 While using overcomplete representations (OLM) yields substantial improvements over the complete linear model (LM), it still cannot compete with other models of natural image patches. [sent-189, score-0.757]

73 and three Gaussians as source distributions and obtained two times overcomplete representations for 8 × 8 image patches. [sent-193, score-0.856]

74 Figure 3 suggests that with GSMs as source distributions, the model can make use of three and up to four times overcomplete representations. [sent-194, score-0.651]

75 Our quantitative evaluations confirmed a substantial improvement of the two-times overcomplete model over the complete model. [sent-195, score-0.546]

76 The source distributions discovered by our algorithm were extremely sparse and resembled spikeand-slab distributions, generating mostly values close to zero with the occasional outlier. [sent-197, score-0.294]

77 3 Model comparison To compare the performance of the overcomplete linear model to the complete linear model and other image models, we would like to evaluate the overcomplete linear models’ log-likelihood on a test set of images. [sent-200, score-1.251]

78 q(z | x) (13) We can then estimate the above integral by sampling states zn from q(z | x) and averaging over p(x, zn )/q(zn | x), a technique called importance sampling. [sent-203, score-0.205]

79 A procedure for constructing distributions q(z | x) from transition operators such as our Gibbs sampling operator is annealed importance sampling (AIS) [16]. [sent-205, score-0.404]

80 Here, we used our Gibbs sampler and constructed intermediate distributions by interpolating between a Gaussian distribution and the overcomplete linear model. [sent-208, score-0.791]

81 For the four-times overcomplete model, we used 300 intermediate distributions and 300 importance samples to estimate the density of each data point. [sent-209, score-0.612]

82 We find that the overcomplete linear model is still worse than, for example, a single multivariate GSM with separately modeled DC component (Figure 4; see also Supplementary Section 3). [sent-210, score-0.556]

83 7 An alternative overcomplete generalization of the complete linear model is the family of products of experts (PoE) [13]. [sent-211, score-0.639]

84 Instead of introducing additional source variables, a PoE can have more factors than visible units, s = W x, p(x) ∝ fi (si ), (14) i where W ∈ RN ×M and each factor is also called an expert. [sent-212, score-0.284]

85 In contrast to the overcomplete linear model, the prior over hidden sources s here is in general not factorial. [sent-214, score-0.641]

86 A popular choice of PoE in the context of natural images is the product of Student-t (PoT) distributions, in which experts have the form fi (si ) = (1 + s2 )−αi [35]. [sent-215, score-0.181]

87 We find that the PoT is better suited for modeling the statistics of natural images and takes better advantage of overcomplete representations (Figure 4). [sent-218, score-0.595]

88 While both the estimator for the PoT and the estimator for the overcomplete linear model are consistent, the former tends to overestimate and the latter tends to underestimate the average loglikelihood. [sent-219, score-0.529]

89 5 Discussion We have shown how to efficiently perform inference, training and evaluation in the sparse overcomplete linear model. [sent-221, score-0.583]

90 While general purpose sampling algorithms such as MALA or HMC have the advantage of being more widely applicable, we showed that blocked Gibbs sampling can be much faster when the source distributions are sparse, as for natural images. [sent-222, score-0.72]

91 Another advantage of our sampler is that it is parameter free. [sent-223, score-0.193]

92 Choosing suboptimal parameters for the HMC sampler can lead to extremely poor performance. [sent-224, score-0.193]

93 We showed that by training a model with a persistent variant of Monte Carlo EM, even the number of sampling steps performed in each E-step becomes much less crucial for the success of training. [sent-227, score-0.265]

94 Optimizing and evaluating the likelihood of overcomplete linear models is a challenging problem. [sent-228, score-0.529]

95 To our knowledge, our study is the first to show a clear advantage of the overcomplete linear model over its complete counterpart on natural images. [sent-229, score-0.621]

96 At the same time, we demonstrated that with the assumptions of a factorial prior, the overcomplete linear model underperforms other generalizations of the complete linear model. [sent-230, score-0.739]

97 Code for training and evaluating overcomplete linear models is available at http://bethgelab. [sent-233, score-0.529]

98 A variational method for learning sparse and overcomplete representations. [sent-305, score-0.534]

99 Sparse coding with an overcomplete basis set: A strategy employed by V1? [sent-356, score-0.557]

100 Hamiltonian annealed importance sampling for partition function estimation, 2012. [sent-409, score-0.191]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('overcomplete', 0.48), ('hmc', 0.234), ('blocked', 0.234), ('gibbs', 0.228), ('nullspace', 0.223), ('sampler', 0.193), ('gsm', 0.182), ('mala', 0.175), ('source', 0.171), ('pot', 0.149), ('precisions', 0.138), ('xz', 0.131), ('persistent', 0.126), ('autocorrelation', 0.126), ('sampling', 0.11), ('posterior', 0.108), ('laplace', 0.103), ('gsms', 0.099), ('poe', 0.099), ('olshausen', 0.096), ('overcompleteness', 0.095), ('fi', 0.08), ('sources', 0.078), ('image', 0.078), ('basis', 0.077), ('papandreou', 0.074), ('em', 0.074), ('si', 0.071), ('distributions', 0.069), ('dkl', 0.066), ('complete', 0.066), ('marginals', 0.066), ('chain', 0.065), ('markov', 0.062), ('ais', 0.061), ('patches', 0.059), ('xx', 0.059), ('representations', 0.058), ('chains', 0.058), ('sparse', 0.054), ('annealed', 0.052), ('jascha', 0.05), ('leptokurtotic', 0.05), ('millman', 0.05), ('olm', 0.05), ('reichardt', 0.05), ('underperforms', 0.05), ('linear', 0.049), ('gaussian', 0.047), ('ik', 0.046), ('mixtures', 0.045), ('factorial', 0.045), ('experts', 0.044), ('toy', 0.044), ('sem', 0.044), ('andrews', 0.044), ('integrative', 0.044), ('werner', 0.044), ('dc', 0.043), ('mcmc', 0.041), ('noiseless', 0.041), ('ij', 0.039), ('heess', 0.038), ('lucas', 0.038), ('hateren', 0.038), ('langevin', 0.038), ('gi', 0.037), ('gaussians', 0.036), ('wu', 0.035), ('barely', 0.035), ('hamiltonian', 0.035), ('transition', 0.034), ('monte', 0.034), ('ow', 0.034), ('convergence', 0.034), ('hidden', 0.034), ('samples', 0.034), ('zn', 0.033), ('crude', 0.033), ('zz', 0.033), ('visible', 0.033), ('lm', 0.032), ('berkes', 0.032), ('energy', 0.032), ('images', 0.031), ('whitened', 0.031), ('inference', 0.031), ('alternately', 0.03), ('seeger', 0.029), ('equation', 0.029), ('importance', 0.029), ('variant', 0.029), ('carlo', 0.029), ('matthias', 0.029), ('tails', 0.028), ('chen', 0.028), ('schmidt', 0.027), ('maxima', 0.027), ('modeled', 0.027), ('natural', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

2 0.22229247 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton

Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1

3 0.13628662 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

4 0.12733965 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

5 0.12039321 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

Author: Daniel Zoran, Yair Weiss

Abstract: Simple Gaussian Mixture Models (GMMs) learned from pixels of natural image patches have been recently shown to be surprisingly strong performers in modeling the statistics of natural images. Here we provide an in depth analysis of this simple yet rich model. We show that such a GMM model is able to compete with even the most successful models of natural images in log likelihood scores, denoising performance and sample quality. We provide an analysis of what such a model learns from natural images as a function of number of mixture components including covariance structure, contrast variation and intricate structures such as textures, boundaries and more. Finally, we show that the salient properties of the GMM learned from natural images can be derived from a simplified Dead Leaves model which explicitly models occlusion, explaining its surprising success relative to other models. 1 GMMs and natural image statistics models Many models for the statistics of natural image patches have been suggested in recent years. Finding good models for natural images is important to many different research areas - computer vision, biological vision and neuroscience among others. Recently, there has been a growing interest in comparing different aspects of models for natural images such as log-likelihood and multi-information reduction performance, and much progress has been achieved [1,2, 3,4,5, 6]. Out of these results there is one which is particularly interesting: simple, unconstrained Gaussian Mixture Models (GMMs) with a relatively small number of mixture components learned from image patches are extraordinarily good in modeling image statistics [6, 4]. This is a surprising result due to the simplicity of GMMs and their ubiquity. Another surprising aspect of this result is that many of the current models may be thought of as GMMs with an exponential or infinite number of components, having different constraints on the covariance structure of the mixture components. In this work we study the nature of GMMs learned from natural image patches. We start with a thorough comparison to some popular and cutting edge image models. We show that indeed, GMMs are excellent performers in modeling natural image patches. We then analyze what properties of natural images these GMMs capture, their dependence on the number of components in the mixture and their relation to the structure of the world around us. Finally, we show that the learned GMM suggests a strong connection between natural image statistics and a simple variant of the dead leaves model [7, 8] , explicitly modeling occlusions and explaining some of the success of GMMs in modeling natural images. 1 3.5 .,...- ••.......-.-.. -..---'-. 1 ~~6\8161·· -.. .-.. --...--.-- ---..-.- -. --------------MII+··+ilIl ..... .. . . ~ '[25 . . . ---- ] B'II 1_ -- ~2 ;t:: fI 1 - --- ,---- ._.. : 61.5 ..... '

6 0.11385672 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

7 0.11378959 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

8 0.11062238 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

9 0.10468809 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

10 0.10404634 126 nips-2012-FastEx: Hash Clustering with Exponential Families

11 0.093728036 41 nips-2012-Ancestor Sampling for Particle Gibbs

12 0.092792422 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

13 0.092379406 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

14 0.078245655 205 nips-2012-MCMC for continuous-time discrete-state systems

15 0.077799082 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models

16 0.077349037 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

17 0.073125459 37 nips-2012-Affine Independent Variational Inference

18 0.072011583 60 nips-2012-Bayesian nonparametric models for ranked data

19 0.071068853 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

20 0.068129092 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.209), (1, 0.045), (2, -0.055), (3, 0.024), (4, -0.151), (5, -0.033), (6, -0.009), (7, -0.04), (8, 0.072), (9, -0.048), (10, -0.031), (11, -0.048), (12, -0.039), (13, -0.018), (14, -0.055), (15, -0.14), (16, -0.025), (17, -0.162), (18, 0.064), (19, -0.079), (20, 0.106), (21, 0.003), (22, -0.083), (23, 0.069), (24, -0.079), (25, 0.041), (26, 0.001), (27, 0.014), (28, -0.023), (29, -0.003), (30, -0.085), (31, 0.027), (32, -0.017), (33, -0.02), (34, -0.024), (35, -0.068), (36, -0.061), (37, -0.043), (38, 0.045), (39, -0.1), (40, 0.068), (41, -0.053), (42, -0.03), (43, -0.087), (44, -0.021), (45, 0.109), (46, 0.033), (47, 0.06), (48, -0.046), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94835043 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

2 0.81729627 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton

Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1

3 0.68717676 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

4 0.64752316 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of first-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the first-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the first-order level. We propose to achieve this by partitioning the first-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

5 0.64595103 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

Author: Maksims Volkovs, Richard S. Zemel

Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1

6 0.64424503 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

7 0.61612672 205 nips-2012-MCMC for continuous-time discrete-state systems

8 0.61575526 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts

9 0.61410302 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models

10 0.5943656 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

11 0.5912739 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

12 0.58875185 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

13 0.56298065 41 nips-2012-Ancestor Sampling for Particle Gibbs

14 0.55871999 294 nips-2012-Repulsive Mixtures

15 0.53305441 26 nips-2012-A nonparametric variable clustering model

16 0.5265066 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

17 0.51716781 8 nips-2012-A Generative Model for Parts-based Object Segmentation

18 0.51176596 126 nips-2012-FastEx: Hash Clustering with Exponential Families

19 0.50716603 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes

20 0.49625444 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (17, 0.01), (21, 0.022), (38, 0.098), (39, 0.013), (42, 0.015), (54, 0.017), (55, 0.016), (74, 0.032), (76, 0.143), (80, 0.101), (92, 0.434)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9174484 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification

Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1

2 0.90353876 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

Author: Joachim Giesen, Jens Mueller, Soeren Laue, Sascha Swiercy

Abstract: We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the √ parameter can always be approximated with accuracy ε > 0 by a set of size O(1/ ε). A √ lower bound of size Ω(1/ ε) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an √ approximate path of size O(1/ ε). Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. 1

3 0.90059495 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton

Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1

4 0.89620245 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search

Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik

Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1

same-paper 5 0.86975896 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1

6 0.8494221 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

7 0.67851359 329 nips-2012-Super-Bit Locality-Sensitive Hashing

8 0.67366362 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

9 0.65887535 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

10 0.63115454 65 nips-2012-Cardinality Restricted Boltzmann Machines

11 0.62971419 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

12 0.62942839 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.62809825 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

14 0.62493455 260 nips-2012-Online Sum-Product Computation Over Trees

15 0.62261176 163 nips-2012-Isotropic Hashing

16 0.61837596 94 nips-2012-Delay Compensation with Dynamical Synapses

17 0.61811864 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

18 0.61149466 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

19 0.61024433 41 nips-2012-Ancestor Sampling for Particle Gibbs

20 0.60973865 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines