nips nips2012 nips2012-82 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. [sent-14, score-0.204]
2 Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. [sent-15, score-0.34]
3 The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. [sent-16, score-0.472]
4 We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. [sent-17, score-0.191]
5 Although sophisticated inference algorithms exist for these models, including both exact algorithms and variational approximations, it has proven more difficult to develop discrete Markov chain Monte Carlo (MCMC) methods. [sent-19, score-0.27]
6 Despite much work and many recent advances [3], the most commonly used MCMC methods in practice for discrete models are based on MetropolisHastings, the effectiveness of which is strongly dependent on the choice of proposal distribution. [sent-20, score-0.134]
7 An appealing idea is to relax the constraint that the random variables of interest take integral values. [sent-21, score-0.17]
8 Continuous problems are appealing because the gradient is on your side: Unlike discrete probability mass functions, in the continuous setting, densities have derivatives, contours, and curvature that can be used to inform sampling algorithms [6, 16, 18, 20, 27]. [sent-23, score-0.287]
9 For this reason, continuous relaxations are widespread in combinatorial optimization, and likewise a major appeal of variational methods is that they convert discrete inference problems into continuous optimization problems. [sent-24, score-0.506]
10 In this paper we provide a method for relaxing a discrete model into a continuous one, using a technique from statistical physics that Hertz et al. [sent-26, score-0.326]
11 [8] call the “Gaussian integral trick,” and that we present in a more general form than is typical. [sent-27, score-0.17]
12 This trick is also known as the Hubbard-Stratonovich transform [10]. [sent-28, score-0.282]
13 Starting with a discrete Markov random field (MRF), the trick introduces an auxiliary Gaussian variable in such a way that the discrete dependencies cancel out. [sent-29, score-0.576]
14 This allows the discrete variables to be summed away, leaving a continuous problem. [sent-30, score-0.279]
15 The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, highlights an equivalence between Boltzmann machines and the Gaussian-Bernoulli harmonium model [25], and in general opens up a number of new avenues for inference in difficult discrete 1 systems. [sent-31, score-0.59]
16 On synthetic problems and a real world problem in text processing, we show that HMC in the continuous relaxation can be much more accurate than standard MCMC methods on the discrete distribution. [sent-32, score-0.293]
17 The only previous work of which we are aware that uses the Gaussian integral trick for inference in graphical models is Martens and Sutskever [12]. [sent-33, score-0.533]
18 They use the trick to transform an arbitrary MRF into an equivalent restricted Boltzmann machine (RBM), on which they then do block Gibbs sampling. [sent-34, score-0.41]
19 They show that this transformation is useful when each block Gibbs step can be performed in parallel. [sent-35, score-0.128]
20 However, unlike the current work, they do not sum out the discrete variables, so they do not perform a full continuous relaxation. [sent-36, score-0.245]
21 Every discrete undirected model can be converted into a pairwise model at the cost of expanding the state space. [sent-52, score-0.271]
22 The undirected pairwise graphical model can be written in the form 1 p(s) = exp(−Eij (si , sj )) (1) Z (i,j)∈G where Z is a normalisation term, and is a sum over all valid states of (s1 , s2 , . [sent-53, score-0.214]
23 Equivalently we can set Eij (si , sj ) to be very large when si and sj are derived from the same variable tk (for some k and i = j, and expanding G to include (i, j)), making the resulting product for the terms that break the 1 of Ki constraints to be exponentially small. [sent-57, score-0.344]
24 The normalization function is 1 Z= exp aT s + sT W s . [sent-60, score-0.114]
25 (3) 2 s 3 Gaussian Integral Trick Inference in Boltzmann machines (which is equivalent to inference in Ising models) has always been a challenging problem. [sent-61, score-0.12]
26 Typically Markov chain Monte Carlo procedures such as Gibbs sampling have been used, but the high levels of connectivity in Boltzmann machines can cause trouble and result in slow mixing in many situations. [sent-62, score-0.176]
27 Furthermore for frustrated systems, such models are highly multimodal [1], often with large potential barriers between the different modes. [sent-63, score-0.219]
28 For this reason, we choose to work with a real valued augmentation of the Boltzmann machine using the Gaussian integral trick. [sent-65, score-0.256]
29 We generalise the standard form of the Gaussian integral trick by using the following form for the conditional distribution of the auxiliary variable x: p(x|s) = N (x; A(W + D)s, A(W + D)AT ) (4) for any choice of invertible matrix A and any diagonal matrix D for which W + D is positive definite. [sent-67, score-0.521]
30 Then the marginal density is 1 p(x) ∝ exp − xT A−1 (W + D)−1 (A−1 )T x 2 1 + exp αx;i + ai − i di 2 . [sent-74, score-0.405]
31 We have now converted the discrete distribution p(s) into a corresponding continuous distribution p(x). [sent-79, score-0.283]
32 First, all of the si are independent given x, because s appears only log-linearly in (6). [sent-81, score-0.176]
33 Using the sigmoid σ(z) = (1 + exp{−z})−1 , this is p(si |x) = σ −αx;i − ai + di 2 1−si σ αx;i + ai − si di 2 (8) Two choices for A are of particular interest because they introduce additional independence rela1 tionships into the augmented model. [sent-82, score-0.576]
34 First, if A = Λ− 2 V T for the eigendecomposition W + D = V ΛV T , then the result is an undirected bipartite graphical model in the joint space of (x, s): 1 1 1 p(x, s) ∝ exp − xT x + sT V Λ 2 x + (a − d)T s . [sent-83, score-0.141]
35 2 2 (9) This is a Gaussian-Bernoulli form of exponential family harmonium [25]. [sent-84, score-0.147]
36 Hence we see that the Gaussian-Bernoulli harmonium is equivalent to a general Boltzmann machine over the discrete variables only. [sent-85, score-0.281]
37 A given xi determines the Bernoulli probabilities for the variable si , independent of the states of any of the other variables. [sent-87, score-0.209]
38 Then, conditioned on x, the log odds of si = 1 is a recentered version of xi , in particular, xi − ai − di /2. [sent-89, score-0.42]
39 The different versions of the Gaussian integral trick can be compactly summarized by the independence relations that they introduce. [sent-90, score-0.454]
40 All versions of Gaussian integral trick give us that all si and sj are independent given x. [sent-91, score-0.696]
41 Finally if we instead take A = I, we get that si and sj are independent given only xi and xj . [sent-93, score-0.275]
42 3 x x x s s s p(s) Original MRF A = ⇤ 1/2 V T [MS10; HKP91] General A A=I Current Approach Figure 1: Graphical depiction of the different versions of the Gaussian integral trick. [sent-95, score-0.204]
43 In all of the models here si ∈ {0, 1} while xi ∈ R. [sent-96, score-0.209]
44 1 Convexity of Log Density Because probabilistic inference is NP-hard, it is too much to expect that the continuous transformation will always help. [sent-99, score-0.191]
45 Sometimes difficult discrete distributions will be converted into difficult continuous ones. [sent-100, score-0.283]
46 Experimentally we have noticed that highly frustrated systems typically result in multimodal p(x). [sent-101, score-0.219]
47 It is Hx := 2 x log p(x) = Cx − (W + D)−1 (13) where Cx is a diagonal matrix with elements cii = σ(−ai − xi + di )(1 − σ(−ai − xi + di )). [sent-120, score-0.238]
48 2 MCMC in the Continuous Relaxation Now we discuss how to perform inference in the augmented distribution resulting from the trick. [sent-130, score-0.124]
49 Therefore one can sample the joint distribution p(x, s) in a block Gibbs style that switches sampling between p(x|s) and p(s|x). [sent-133, score-0.17]
50 In spite of the simplicity of this method, it has the potential difficulty that it may generate highly correlated samples, due to the coupling between discrete and continuous samples. [sent-134, score-0.281]
51 To overcome the drawbacks of block Gibbs sampling, we propose running MCMC directly on the marginal p(x). [sent-135, score-0.229]
52 We refer to the use of HMC on p(x) as discrete Hamiltonian Monte Carlo (DHMC). [sent-139, score-0.134]
53 3 Estimating Marginal Probabilities Given a set of samples that are approximately distributed from p(x) we can estimate the marginal distribution over any subset Sq ⊆ S of the discrete variables. [sent-142, score-0.275]
54 The marginal probability p(sq ) can be estimated as p(sq ) ≈ M 1 M p(Sq |x(m) ) = m=1 1 M M p(si |x(m) ) x(m) ∼ p(x) m=1 si ∈Sq This gives us a Rao-Blackwellized estimate of p(sq ) without needing to sample s directly. [sent-145, score-0.277]
55 4 Normalizing Constants Because the normalizing factor Z −1 of the Boltzmann machine is equal to the probability p(s = 0), we can estimate the normalizing factor using the technique from the previous section Z −1 1 = p(s = 0) ≈ M M p(s = 0|x(m) ) x(m) ∼ p(x). [sent-148, score-0.112]
56 Using the identity Z −1 = Z −1 dx q(x) we have Z −1 = Z −1 dx q(x) p∗ (x) = p∗ (x) dx q(x) p(x), p∗ (x) (m) 1 ˆ A Monte Carlo estimate of this integral is Z −1 = M m pq(x (m))) for x(m) ∼ p(x). [sent-158, score-0.263]
57 This importance trick is well known in the statistics literature, e. [sent-165, score-0.25]
58 The Ising model was studied as a model of physical spin systems, and so the dynamics used were typically representative of the physics, with Glauber dynamics [7] being a common model, e. [sent-171, score-0.24]
59 In the context of stochastic neural models, though, there was the potential to examine other dynamics that did not match the standard 5 physics of spin systems. [sent-174, score-0.228]
60 The Gaussian integral trick is also known as the Hubbard-Stratonovich transformation in physics[10]. [sent-178, score-0.42]
61 In the area of neural modelling, the Gaussian integral trick was also common for theoretical reasons rather than as a practical augmentation strategy [8]. [sent-179, score-0.488]
62 The Gaussian integral trick formed a critical part of replica-theoretical analysis [8] for phase analysis of spin systems, as it enabled ensemble averaging of the spin components, leading to saddle-point equations in the continuous domain. [sent-180, score-0.701]
63 The Gaussian integral trick relates the general Boltzmann machines to exponential family harmoniums [25], which generalise the restricted Boltzmann machines. [sent-182, score-0.555]
64 The specific Gaussian-Bernoulli harmonium is in common use, but where the real valued variables are visible units and the binary variables are hidden variables [9]. [sent-183, score-0.198]
65 The only work of which we are aware that uses the Gaussian integral trick for probabilistic inference is that of Martens and Sutskever [12]. [sent-185, score-0.5]
66 This work also considers inference in MRFs, using the special 1 case of the Gaussian integral trick in which A = Λ− 2 V T . [sent-186, score-0.5]
67 Instead they perform inference directly in the resulting harmonium, using block Gibbs sampling alternating between s and x. [sent-188, score-0.25]
68 On serial computers, they do not find that this expanded representation offers much benefit over performing single-site Gibbs in the discrete space. [sent-189, score-0.134]
69 Indeed they find that the sampler in the augmented model is actually slightly slower than the one in the original discrete space. [sent-190, score-0.276]
70 We evaluate both the estimation of node marginal and of the normalisation factor estimation on two synthetic models. [sent-193, score-0.253]
71 We compare the accuracy of the discrete HMC sampler to Gibbs sampling in the original discrete model p(s) and to block Gibbs sampling the augmented model p(x, s). [sent-194, score-0.622]
72 The Gibbs sampler resamples one node at a time from p(si |s−i ). [sent-196, score-0.15]
73 The node marginal probability p(si ) is estimated by the empirical probability of the samples. [sent-197, score-0.153]
74 The block Gibbs sampler over p(x, s) we use is based on [12]. [sent-199, score-0.226]
75 This comparison is designed to evaluate the benefits of summing away the discrete variables. [sent-200, score-0.134]
76 To estimate the node marginals, we use the block Gibbs sampler to generate samples of x and then apply the Rao-Blackwellized estimators from Sections 3. [sent-201, score-0.354]
77 HMC can generate better samples while a large number of leapfrog steps is used, but this requires much more computation time. [sent-206, score-0.194]
78 For a fixed computational budget, using more leapfrog steps causes fewer samples to be generated, which can also undermine the accuracy of the estimator. [sent-207, score-0.158]
79 So, we empirically pick 5 leapfrog steps and tuning the leapfrog step size so that acceptance rate is around 90%. [sent-208, score-0.236]
80 5 25 20 biases scale biases scale biases scale 3 2. [sent-219, score-0.861]
81 5 3 weights scale Figure 2: Performance of samplers on synthetic grid-structured Boltzmann machines. [sent-260, score-0.294]
82 The top row shows error in the normalization constant, while the bottom row shows average error in the single-mode marginal distributions. [sent-262, score-0.17]
83 In the standard case, for each node si , the biases are generated as ai ∼ c1 N (0, 4). [sent-265, score-0.52]
84 The parameters c1 and c2 define the scales of the biases and weights and determine how hard the problem is. [sent-267, score-0.276]
85 In the frustrated case, we shift the weights to make the problem more difficult. [sent-268, score-0.252]
86 We still generate the weights as wij ∼ c2 N (0, 4) but now we generate the biases as ai ∼ c1 N ( i wij , 4). [sent-269, score-0.53]
87 We report the MSE of the node marginal estimate and the log normalising constant estimate averaged over 10 runs. [sent-273, score-0.225]
88 DHMC beats Gibbs on p(s) at the normalization constant and beats block Gibbs on p(x, s) at marginal estimation. [sent-278, score-0.384]
89 The frustrated graphs (Figure 3) are significantly more difficult for DHMC, as expected. [sent-279, score-0.176]
90 All three samplers seem to have trouble in the same area of model space, although DHMC suffers somewhat worse than the other methods in marginal error, while still beating Chib’s method for normalization constants. [sent-280, score-0.291]
91 We observe that in both cases block Gibbs of p(x, s) performs roughly the same at marginal estimation as Gibbs on p(s). [sent-283, score-0.229]
92 5 biases scale biases scale biases scale 90 2 1. [sent-295, score-0.861]
93 5 3 weights scale Figure 3: Performance of samplers on a set of highly frustrated grid-structured Boltzmann machines. [sent-336, score-0.422]
94 The top row shows error in the normalization constant, while the bottom row shows average error in the single-mode marginal distributions. [sent-338, score-0.17]
95 We see that the HMC sampler is much more accurate than either of the other samplers at estimating single-node marginals. [sent-344, score-0.181]
96 The block Gibbs sampler yields both worse estimates of the marginals and a significantly worse estimate of the normalization constant. [sent-346, score-0.328]
97 We described a continuum of different versions of the trick that have different properties. [sent-354, score-0.284]
98 Although we illustrated the benefits of the continuous setting by using Hamiltonian Monte Carlo, in future work other inference methods such as elliptical slice sampling or more advanced HMC methods may prove superior. [sent-355, score-0.274]
99 Bayesian inference for PCFGs via Markov chain Monte Carlo. [sent-446, score-0.136]
100 An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. [sent-467, score-0.128]
wordName wordTfidf (topN-words)
[('hamiltonian', 0.348), ('trick', 0.25), ('gibbs', 0.225), ('hmc', 0.214), ('dhmc', 0.206), ('biases', 0.2), ('boltzmann', 0.192), ('frustrated', 0.176), ('si', 0.176), ('integral', 0.17), ('harmonium', 0.147), ('sq', 0.138), ('discrete', 0.134), ('monte', 0.13), ('block', 0.128), ('leapfrog', 0.118), ('continuous', 0.111), ('mcmc', 0.108), ('carlo', 0.102), ('marginal', 0.101), ('sampler', 0.098), ('ai', 0.092), ('scale', 0.087), ('di', 0.086), ('spin', 0.085), ('ising', 0.083), ('samplers', 0.083), ('inference', 0.08), ('martens', 0.078), ('weights', 0.076), ('normalising', 0.072), ('hop', 0.072), ('relaxations', 0.07), ('cx', 0.07), ('normalization', 0.069), ('gaussian', 0.068), ('sj', 0.066), ('st', 0.065), ('mrf', 0.064), ('undirected', 0.063), ('dynamics', 0.061), ('auxiliary', 0.058), ('chain', 0.056), ('normalizing', 0.056), ('hx', 0.055), ('messages', 0.054), ('estimator', 0.054), ('node', 0.052), ('normalisation', 0.052), ('harmoniums', 0.052), ('valued', 0.051), ('ki', 0.051), ('axes', 0.051), ('physics', 0.049), ('synthetic', 0.048), ('chib', 0.048), ('hertz', 0.048), ('murray', 0.046), ('exp', 0.045), ('wij', 0.045), ('avenues', 0.045), ('augmented', 0.044), ('markov', 0.043), ('eij', 0.043), ('beats', 0.043), ('generalise', 0.043), ('multimodal', 0.043), ('gaussians', 0.043), ('sampling', 0.042), ('sutton', 0.042), ('sutskever', 0.041), ('elliptical', 0.041), ('kingdom', 0.041), ('machines', 0.04), ('samples', 0.04), ('modes', 0.039), ('trouble', 0.038), ('converted', 0.038), ('extraction', 0.037), ('generate', 0.036), ('welling', 0.036), ('expanding', 0.036), ('density', 0.036), ('augmentation', 0.035), ('versions', 0.034), ('summed', 0.034), ('physical', 0.033), ('marginals', 0.033), ('xi', 0.033), ('zoubin', 0.033), ('opens', 0.033), ('xt', 0.033), ('graphical', 0.033), ('neural', 0.033), ('dif', 0.033), ('relaxing', 0.032), ('crf', 0.032), ('cult', 0.032), ('transform', 0.032), ('dx', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
2 0.22229247 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge
Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1
3 0.13957424 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
Author: Christoph H. Lampert
Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1
4 0.11847179 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
Author: Ben Calderhead, Mátyás A. Sustik
Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1
5 0.11706315 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
6 0.10855456 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
7 0.10809717 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
8 0.10214404 41 nips-2012-Ancestor Sampling for Particle Gibbs
9 0.10045412 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
10 0.098005377 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
11 0.096554741 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
12 0.093845777 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
13 0.092908531 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature
14 0.089087352 38 nips-2012-Algorithms for Learning Markov Field Policies
15 0.088011041 126 nips-2012-FastEx: Hash Clustering with Exponential Families
16 0.082847685 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
17 0.082440302 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
18 0.081809394 37 nips-2012-Affine Independent Variational Inference
19 0.08047878 147 nips-2012-Graphical Models via Generalized Linear Models
20 0.079731978 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
topicId topicWeight
[(0, 0.242), (1, 0.034), (2, 0.016), (3, 0.009), (4, -0.173), (5, -0.019), (6, -0.011), (7, -0.048), (8, -0.019), (9, -0.043), (10, -0.041), (11, -0.055), (12, -0.011), (13, 0.032), (14, -0.058), (15, -0.137), (16, -0.026), (17, -0.184), (18, 0.044), (19, -0.025), (20, 0.176), (21, -0.065), (22, -0.075), (23, 0.032), (24, -0.086), (25, 0.042), (26, 0.016), (27, 0.011), (28, 0.029), (29, -0.027), (30, -0.027), (31, 0.004), (32, -0.063), (33, 0.057), (34, -0.009), (35, -0.077), (36, -0.045), (37, -0.052), (38, -0.025), (39, -0.058), (40, 0.08), (41, -0.018), (42, 0.006), (43, -0.024), (44, 0.012), (45, 0.06), (46, -0.015), (47, 0.038), (48, -0.014), (49, -0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.95961857 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
2 0.84943914 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge
Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1
3 0.73534811 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
Author: Christoph H. Lampert
Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1
4 0.70934922 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
Author: Gal Elidan, Cobi Cario
Abstract: The empirical success of the belief propagation approximate inference algorithm has inspired numerous theoretical and algorithmic advances. Yet, for continuous non-Gaussian domains performing belief propagation remains a challenging task: recent innovations such as nonparametric or kernel belief propagation, while useful, come with a substantial computational cost and offer little theoretical guarantees, even for tree structured models. In this work we present Nonparanormal BP for performing efficient inference on distributions parameterized by a Gaussian copulas network and any univariate marginals. For tree structured networks, our approach is guaranteed to be exact for this powerful class of non-Gaussian models. Importantly, the method is as efficient as standard Gaussian BP, and its convergence properties do not depend on the complexity of the univariate marginals, even when a nonparametric representation is used. 1
5 0.66365427 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
6 0.65306944 205 nips-2012-MCMC for continuous-time discrete-state systems
7 0.64002144 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
8 0.60988766 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
9 0.60042351 136 nips-2012-Forward-Backward Activation Algorithm for Hierarchical Hidden Markov Models
10 0.58293039 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
11 0.57830459 41 nips-2012-Ancestor Sampling for Particle Gibbs
12 0.56438732 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
13 0.55552524 37 nips-2012-Affine Independent Variational Inference
14 0.55419111 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
15 0.54111868 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
16 0.53408915 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data
17 0.5259887 118 nips-2012-Entangled Monte Carlo
18 0.52457756 65 nips-2012-Cardinality Restricted Boltzmann Machines
19 0.52233464 126 nips-2012-FastEx: Hash Clustering with Exponential Families
20 0.51254481 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
topicId topicWeight
[(0, 0.023), (21, 0.018), (38, 0.098), (42, 0.024), (54, 0.023), (55, 0.014), (74, 0.031), (76, 0.148), (80, 0.104), (92, 0.439)]
simIndex simValue paperId paperTitle
1 0.91719192 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
Author: Richard Socher, Brody Huval, Bharath Bath, Christopher D. Manning, Andrew Y. Ng
Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs. 1
2 0.90527666 44 nips-2012-Approximating Concavely Parameterized Optimization Problems
Author: Joachim Giesen, Jens Mueller, Soeren Laue, Sascha Swiercy
Abstract: We consider an abstract class of optimization problems that are parameterized concavely in a single parameter, and show that the solution path along the √ parameter can always be approximated with accuracy ε > 0 by a set of size O(1/ ε). A √ lower bound of size Ω(1/ ε) shows that the upper bound is tight up to a constant factor. We also devise an algorithm that calls a step-size oracle and computes an √ approximate path of size O(1/ ε). Finally, we provide an implementation of the oracle for soft-margin support vector machines, and a parameterized semi-definite program for matrix completion. 1
same-paper 3 0.9028511 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
4 0.89718676 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
Author: Yunchao Gong, Sanjiv Kumar, Vishal Verma, Svetlana Lazebnik
Abstract: This paper focuses on the problem of learning binary codes for efficient retrieval of high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features. The similarity of such feature vectors is commonly measured using the cosine of the angle between them. In this work, we introduce a novel angular quantization-based binary coding (AQBC) technique for such data and analyze its properties. In its most basic form, AQBC works by mapping each non-negative feature vector onto the vertex of the binary hypercube with which it has the smallest angle. Even though the number of vertices (quantization landmarks) in this scheme grows exponentially with data dimensionality d, we propose a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d). Further, we propose a method for learning a linear transformation of the data to minimize the quantization error, and show that it results in improved binary codes. Experiments on image and text datasets show that the proposed AQBC method outperforms the state of the art. 1
5 0.87141979 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge
Abstract: We present a new learning strategy based on an efficient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we find that overcomplete representations lead to significant improvements, but that the overcomplete linear model still underperforms other models. 1
6 0.85076118 110 nips-2012-Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems
7 0.68266314 329 nips-2012-Super-Bit Locality-Sensitive Hashing
8 0.6749413 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
9 0.66106331 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
10 0.63214654 65 nips-2012-Cardinality Restricted Boltzmann Machines
11 0.63166177 197 nips-2012-Learning with Recursive Perceptual Representations
12 0.63008577 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
13 0.62906921 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves
14 0.62479609 163 nips-2012-Isotropic Hashing
15 0.62478268 260 nips-2012-Online Sum-Product Computation Over Trees
16 0.61569101 94 nips-2012-Delay Compensation with Dynamical Synapses
17 0.61547112 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations
18 0.61450398 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
19 0.61270422 41 nips-2012-Ancestor Sampling for Particle Gibbs
20 0.61202085 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines