nips nips2012 nips2012-213 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. [sent-5, score-0.2]
2 While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. [sent-6, score-0.349]
3 For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. [sent-7, score-0.152]
4 Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. [sent-8, score-0.269]
5 We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. [sent-9, score-0.192]
6 We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. [sent-10, score-0.124]
7 Typically the optimization is formulated by minimizing an objective known as the Gibbs free energy [1]. [sent-12, score-0.349]
8 Local message passing algorithms offer a computationally efficient method for extremizing variational free energies. [sent-14, score-0.385]
9 Loopy belief propagation (LBP), for example, optimizes a relaxed objective known as the Bethe free energy [1, 2], which we review in Sec. [sent-15, score-0.504]
10 Expectation propagation (EP) [3] is a generalization of LBP which shares the same objective, but optimizes over a relaxed set of constraints [4] applicable to a broader family of continuous inference problems. [sent-17, score-0.178]
11 Even in simple continuous models, both methods may improperly estimate invalid or degenerate marginal distributions, such as Gaussians with negative variance. [sent-19, score-0.188]
12 Work on optimization of continuous variational free energies has primarily focused on addressing convergence problems [11]. [sent-22, score-0.348]
13 By leveraging gradient projection methods from the extensive literature on constrained nonlinear optimization, we develop an algorithm which ensures that marginal estimates remain valid and normalizable at all iterations. [sent-24, score-0.38]
14 In doing so, we account for important constraints which have been ignored 1 by previous variational derivations of the expectation propagation algorithm [12, 6, 11]. [sent-25, score-0.253]
15 Moreover, by adapting the method of multipliers [13], we guarantee that our inference algorithm converges for most models of practical interest. [sent-26, score-0.146]
16 We briefly review the correspondence between the Lagrangian formalism and message passing and discuss implicit normalizability assumptions which, when violated, lead to degeneracy in message passing algorithms. [sent-29, score-0.394]
17 4) for discrete MRFs, Gaussian MRFs, and hybrid models with potentials defined by discrete mixtures of Gaussian distributions. [sent-33, score-0.144]
18 5 demonstrate substantial improvements over baseline message passing algorithms. [sent-35, score-0.128]
19 We are interested in computing the log partition function log Zp , and/or the marginal distributions p(xs ), s ∈ V. [sent-39, score-0.118]
20 We associate each node s ∈ V with an exponential family qs (xs ; µs ), φs (x) ∈ Rds , and each edge (s, t) ∈ E with a family qst (xs , xt ; µst ), φst (x) ∈ Rdst . [sent-42, score-0.25]
21 Because qs (xs ; µs ) is a valid probability distribution, µs must lie in a set of realizable mean parameters, µs ∈ Ms . [sent-43, score-0.128]
22 We can express the log partition as the solution to an optimization problem, − log Zp = min Eµ [− log p(x)] − H[µ] = min F(µ), µ∈M(G) µ∈M(G) (3) where H[µ] is the entropy of q(x; µ), Eµ [·] denotes expectation with respect to q(x; µ), and F(µ) is known as the variational free energy. [sent-46, score-0.355]
23 Since our focus is on the continuous case we briefly review the correspondence between Gaussian LBP fixed points and the Gaussian Bethe free energy. [sent-59, score-0.247]
24 (8) The locally consistent marginal polytope L(G) consists of the constraints Cts (V ) = Vs − Vts for all nodes s ∈ V and edges (s, t) ∈ E. [sent-63, score-0.118]
25 s (9) t∈N (s) ∂L Taking the derivative with respect to the node marginal variance ∂Vs = 0 yields the stationary point 1 −1 Vs = As + ns −1 t∈N (s) λts . [sent-65, score-0.202]
26 For a Gaussian LBP algorithm with messages parametrized as mt→s (xs ) = exp − 1 x2 Λt→s , fixed points of the node marginal precision are given by 2 s Λ s = As + Λt→s t∈N (s) Let λts = − 1 a∈N (s)\t Λa→s . [sent-66, score-0.108]
27 Inverting the correspondence between multipliers and message parameters yields the converse Vs−1 ⇐ Λs (c. [sent-69, score-0.229]
28 2 Message Passing Non-Convergence and Degeneracy While local message passing algorithms are convenient for many applications, their convergence is not guaranteed in general. [sent-73, score-0.159]
29 For continuous models message updates may yield degenerate, unnormalizable marginal distributions which do not correspond to stationary points of the Lagrangian. [sent-76, score-0.208]
30 For example, for Gaussian MRFs the Bethe free energy F B (·) in (5) is derived from expectations with respect to variational distributions qs (xs ; µs ), qst (xs , xt ; µst ). [sent-77, score-0.634]
31 If a set of hypothesized marginals are not normalizable (positive variance), the Gaussian Bethe free energy F GB (·) is invalid and undefined. [sent-78, score-0.63]
32 Here LBP produces marginal variances which oscillate between impossibly large positive, and non-sensical negative, values. [sent-81, score-0.11]
33 Such degeneracies are arguably more problematic for EP since its moment matching steps require expected values with respect to an augmented distribution [3], which may involve an unbounded integral. [sent-82, score-0.157]
34 (b) Node marginal variance estimates per iteration for a symmetric, single-cycle Gaussian MRF with three nodes (plot is of x1 , other nodes are similar). [sent-89, score-0.127]
35 (c) For the model from (b), the Gaussian Bethe free energy is unbounded on the constraint set. [sent-90, score-0.445]
36 For Gaussian MRFs, the class of pairwise normalizable models are sufficient to guarantee LBP stability and convergence [5]. [sent-93, score-0.241]
37 For non-pairwise normalizable models the Gaussian Bethe free energy is unbounded below [6] on the set of local consistency constraints L(G). [sent-94, score-0.615]
38 We offer a small example consisting of a non-pairwise normalizable symmetric single cycle with 3 nodes. [sent-95, score-0.185]
39 Substituting this parametrization into the Gaussian free energy (8), and performing some simple algebra, yields 3 3 3 FGB (V, ρ) = − log V + V (1 + 1. [sent-101, score-0.37]
40 2 the free energy is unbounded below at rate O(−V ). [sent-104, score-0.408]
41 Figure 1(c) illustrates the Bethe free energy for this model as a function of V , and for several values of ρ. [sent-105, score-0.349]
42 More generally, it has been shown that Gaussian EP messages are always normalizable (positive variance) for models with log-concave potentials [14]. [sent-106, score-0.201]
43 Our work seeks to improve inference for non-log-concave models with bounded Bethe free energies. [sent-109, score-0.215]
44 3 Method of Multipliers Given our complete constrained formulation of the Bethe variational problem, we avoid convergence and degeneracy problems via direct minimization using the method of multipliers (MoM) [13]. [sent-110, score-0.31]
45 x∈M The penalty multiplier can be updated as ck+1 ≥ ck according to some fixed update schedule, or based on the results of the optimization step. [sent-112, score-0.169]
46 An update rule that we find useful [13] is to increase the penalty parameter by β > 1 if the constraint violation is not improved by a factor 0 < γ < 1 over the previous iteration, βck if h(xk ) > γ h(xk−1 ) , ck+1 = ck if h(xk ) ≤ γ h(xk−1 ) . [sent-113, score-0.128]
47 1 Gradient Projection Methods The augmented Lagrangian Lc (x, λ) is a partial one, where feasibility of mean parameters (x ∈ M) is enforced explicitly by projection. [sent-115, score-0.125]
48 A simple gradient projection method [13] defines a sequence + xk+1 = xk + αk (¯k − xk ), x xk = [xk − sk f (xk )] . [sent-116, score-0.411]
49 After taking a step sk > 0 in the direction of the negative gradient, we project the result onto the constraint set to obtain a feasible direction xk . [sent-118, score-0.163]
50 We then compute xk+1 by taking a step αk ∈ (0, 1] in the direction of (¯k − xk ). [sent-119, score-0.103]
51 If ¯ x xk − sk f (xk ) is feasible, gradient projection reduces to unconstrained steepest descent. [sent-120, score-0.205]
52 4] for the Method of Multipliers with a quadratic penalty and multiplier iteration λk+1 = λk + ck h(xk ). [sent-130, score-0.193]
53 xx For convergence, the initialization of the Lagrange multiplier λ0 and penalty parameter c0 must be such that λ0 − λ∗ < δc0 for some δ > 0 and c ≥ c which depend on the objective and constraints. [sent-133, score-0.14]
54 ¯ In practice, a poor initialization of the multiplier λ0 can often be offset by a sufficiently high c0 . [sent-134, score-0.103]
55 A final technical note is that convergence proofs assume the sequence of unconstrained optimizations which yield xk stays in the neighborhood of x∗ after some k. [sent-135, score-0.134]
56 To invoke existing convergence results we must show that a local minimum x∗ exists for each of the free energies we consider; a sufficient condition is then that the Bethe free energy is bounded from below. [sent-137, score-0.605]
57 This property has been previously established for general discrete MRFs [18], for pairwise normalizable Gaussian MRFs [6], and for the clutter model [3]. [sent-138, score-0.321]
58 For non-pairwise normalizable Gaussian MRFs, the example of Section 2. [sent-139, score-0.165]
59 3 shows that the Bethe variational objective is unbounded below, and further may not contain any local optima. [sent-140, score-0.126]
60 While the method of multipliers does not converge in this situation, its non-convergence is due to fundamental flaws in the Bethe approximation. [sent-141, score-0.121]
61 4 MoM Algorithms for Probabilistic Inference We derive MoM algorithms which minimize the Bethe free energy for three different families of graphical models. [sent-142, score-0.372]
62 For each model we define the form of the joint distribution, Bethe free energy (5), local consistency constraints, augmented Lagrangian, and the gradient projection step. [sent-143, score-0.526]
63 The Gaussian Bethe free energy (8) is always unbounded below off of the constraint set in node marginal variances Vs . [sent-147, score-0.587]
64 We correct this by adding an additional fixed penalty in the augmented Lagrangian, Lc (V, Σ, λ) = FGB (V ) + λts [Vs − Vts ] s + κ 2 t∈N (s) 2 [log Vs − log Vts ] + s t∈N (s) c 2 2 [Vs − Vts ] . [sent-148, score-0.156]
65 The gradient projection step is easily expressed in terms of correlation coefficients ρst , √ V ρst Vst Vts √ st Σst = . [sent-152, score-0.266]
66 3, we showed that the Gaussian Bethe free energy is unbounded on the constraint set for non-pairwise normalizable models. [sent-157, score-0.61]
67 We run MoM on the symmetric three-node cycle from this discussion and find that MoM, correctly, identifies an unbounded direction, and Figure 1(b) shows that the node marginal variances indeed diverge to infinity. [sent-158, score-0.221]
68 2 Discrete Markov Random Fields Consider a discrete MRF where all variables xs ∈ Xs = {1, . [sent-160, score-0.443]
69 The variational marginal Ks distributions are then qs (xs ; τ ) = k=1 τ (xs )I(xs ,k) , and have mean parameters τ ∈ RKs . [sent-164, score-0.237]
70 Pairwise marginals have mean parameters τst ∈ RKs ×Kt similarly indexed as τst (xs , xt ). [sent-166, score-0.142]
71 The discrete Bethe free energy is then FB (τ ; ϕ) = τst (xs , xt )[log τst (xs , xt ) − log φst (xs , xt )] (s,t)∈E xs − xt (ns − 1)τs (xs )[log τs (xs ) − log ϕs (xs )]. [sent-167, score-1.142]
72 s∈V xs For this discrete model, our expectation constraints reduce to the following normalization and marginalization constraints: Cs (τ ) = 1 − τs (xs ), Cts (xs ; τ ) = τs (xs ) − xs τst (xs , xt ). [sent-168, score-0.996]
73 xt The augmented Lagrangian is then, Lc (τ, λ, ξ; ϕ) = FB (τ ; ϕ) + λts (xs )Cts (xs ; τ ) + xs (s,t)∈E + ξss Cs (τ ) + s∈V c 2 λst (xt )Cst (xt ; τ ) (11) xt Cs (τ )2 + s∈V c 2 Cts (xs ; τ )2 + (s,t)∈E xs Cst (xt ; τ )2 . [sent-169, score-1.05]
74 This constraint is enforced by a bound projection τs (xs ) = max(0, τs (xs )), and similarly for the pairwise marginals. [sent-171, score-0.159]
75 Assuming normalizable marginals with V0 ≥ 0, Vi0 ≥ 0, Vi1 ≥ 0, as always ensured by our multiplier method, we define the Bethe free energy F CGB (m, V, β) in terms of the mean parameters in the supplemental material. [sent-179, score-0.681]
76 Expectation constraints are given by, mean Ci = E0 [x] − Ei [x], var Ci = Var0 [x] − Vari [x], where Ei [·] and Var i [·] denote the mean and variance of the Gaussian mixture qi (x, zi ). [sent-180, score-0.135]
77 Combining the free energy, constraints, and additional positive semidefinite constraints on the marginal variances we have the BVP for the clutter model, minimize m,V,β FCGB (m, V, β; ϕ) (12) mean var subject to Ci = 0, Ci = 0, for all i = 1, 2, . [sent-181, score-0.433]
78 , N V0 ≥ 0, Vi0 ≥ 0, Vi1 ≥ 0 Derivation of the free energy and augmented Lagrangian is somewhat lengthy, and so is deferred to the supplement. [sent-184, score-0.447]
79 We sample 500 instances at random from a 10x10 toroidal lattice with each Jst ∼ N (0, 1) and hs ∼ N (0, 0. [sent-189, score-0.152]
80 We report average L1 error of the approximate marginals as compared to the true marginals computed with the junction tree algorithm [20]. [sent-192, score-0.13]
81 Direct evaluation of the Bethe free energy does not take into account constraint violations for nonconvergent LBP runs. [sent-194, score-0.386]
82 The augmented Lagrangian penalizes constraint violation, but requires a penalty parameter which LBP does not provide. [sent-195, score-0.172]
83 For an objective comparison, we construct a penalized Bethe free energy by evaluating the augmented Lagrangian with fixed penalty c = 1 and multipliers λ = 0. [sent-196, score-0.605]
84 As we see in Figure 2(a,bottom), MoM finds a lower free energy for most trials. [sent-198, score-0.349]
85 Wall clock time is measured in seconds across various trials, and the percentiles for LBP are 25%: 1040. [sent-201, score-0.117]
86 2 Gaussian Markov Random Fields For the Gaussian case we again sample 500 random instances from a 10x10 lattice with toroidal boundary conditions. [sent-210, score-0.123]
87 We randomly sample only pairwise normalizable instances and initialization is provided with a single iteration of Gaussian LBP. [sent-211, score-0.259]
88 For pairwise normalizable models, Gaussian LBP is guaranteed to converge to the unique fixed point of the Bethe free energy, so it is reassuring that MoM optimization matches LBP performance. [sent-214, score-0.4]
89 The value of the augmented Lagrangian at the final iteration is shown in Figure 2(b,bottom) and again shows that MoM matches Gaussian LBP on pairwise normalizable models. [sent-215, score-0.332]
90 Bottom: Penalized Bethe free energy constructed by setting λ = 0, c = 1 in the augmented Lagrangian. [sent-271, score-0.447]
91 3 Discrete Mixtures of Gaussian Potentials To test the benefits of avoiding degenerate marginals, we consider the clutter model of Sec. [sent-273, score-0.103]
92 A good initialization of the multipliers is critical to performance of MoM. [sent-279, score-0.146]
93 We generate 10 initializations by running 5 iterations of EP, each with a different random message update schedule, compute the corresponding Lagrange multipliers for each, and use the one with the lowest value of the augmented Lagrangian. [sent-280, score-0.295]
94 Similarly, the augmented Lagrangian comparison is shown in Figure 2(c,bottom) and MoM often finds a better penalized free energy. [sent-285, score-0.288]
95 While MoM and EP can both suffer from local optima, MoM avoids non-convergence and the output of invalid (negative variance) marginal distributions. [sent-286, score-0.127]
96 This method directly avoids the creation of degenerate distributions — for example with negative variance — which frequently occur in more greedy approaches for minimizing the Bethe free energy. [sent-297, score-0.253]
97 Properties of bethe free energies and message passing in Gaussian models. [sent-332, score-0.785]
98 CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to Belief Propagation. [sent-336, score-0.19]
99 On the uniqueness of loopy belief propagation fixed points. [sent-392, score-0.189]
100 libDAI: A free and open source C++ library for discrete approximate inference in graphical models. [sent-404, score-0.259]
wordName wordTfidf (topN-words)
[('bethe', 0.432), ('xs', 0.399), ('mom', 0.36), ('lbp', 0.233), ('free', 0.19), ('vts', 0.188), ('st', 0.187), ('normalizable', 0.165), ('energy', 0.159), ('jst', 0.138), ('multipliers', 0.121), ('lagrangian', 0.117), ('xk', 0.103), ('augmented', 0.098), ('cts', 0.097), ('vs', 0.095), ('qs', 0.094), ('toroidal', 0.094), ('vst', 0.094), ('mrfs', 0.09), ('propagation', 0.086), ('ep', 0.085), ('multiplier', 0.078), ('xt', 0.077), ('message', 0.076), ('marginal', 0.076), ('belief', 0.069), ('clutter', 0.067), ('variational', 0.067), ('degeneracy', 0.065), ('marginals', 0.065), ('gaussian', 0.063), ('unbounded', 0.059), ('lagrange', 0.058), ('fgb', 0.055), ('ck', 0.054), ('passing', 0.052), ('invalid', 0.051), ('lc', 0.051), ('projection', 0.05), ('qst', 0.047), ('seconds', 0.046), ('pairwise', 0.045), ('discrete', 0.044), ('constraints', 0.042), ('zi', 0.042), ('bvp', 0.041), ('normalizability', 0.041), ('mst', 0.041), ('percentiles', 0.041), ('cs', 0.04), ('constraint', 0.037), ('penalty', 0.037), ('ns', 0.036), ('degenerate', 0.036), ('heskes', 0.036), ('potentials', 0.036), ('energies', 0.035), ('expectation', 0.035), ('valid', 0.034), ('zp', 0.034), ('mrf', 0.034), ('loopy', 0.034), ('variances', 0.034), ('fb', 0.033), ('correspondence', 0.032), ('node', 0.032), ('convergence', 0.031), ('cst', 0.031), ('eqst', 0.031), ('jss', 0.031), ('lck', 0.031), ('rks', 0.031), ('stationary', 0.031), ('clock', 0.03), ('fields', 0.03), ('lattice', 0.029), ('gradient', 0.029), ('hs', 0.029), ('ts', 0.028), ('bst', 0.028), ('wall', 0.027), ('variance', 0.027), ('enforced', 0.027), ('ising', 0.027), ('constrained', 0.026), ('inference', 0.025), ('initialization', 0.025), ('continuous', 0.025), ('var', 0.024), ('supplemental', 0.024), ('iteration', 0.024), ('derivations', 0.023), ('families', 0.023), ('sk', 0.023), ('ci', 0.022), ('yedidia', 0.022), ('log', 0.021), ('symmetric', 0.02), ('hybrid', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
2 0.31441045 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
Author: Nicholas Ruozzi
Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1
3 0.27166921 147 nips-2012-Graphical Models via Generalized Linear Models
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
4 0.2426524 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs
Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting
Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1
5 0.14937048 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
6 0.12095097 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
7 0.11662927 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems
8 0.10689329 300 nips-2012-Scalable nonconvex inexact proximal splitting
9 0.09278091 324 nips-2012-Stochastic Gradient Descent with Only One Projection
10 0.080843091 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature
11 0.080334961 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
12 0.076941557 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
13 0.070331439 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
14 0.068676725 282 nips-2012-Proximal Newton-type methods for convex optimization
15 0.067271605 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization
16 0.066801608 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
17 0.063102707 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering
18 0.062653206 305 nips-2012-Selective Labeling via Error Bound Minimization
19 0.060918558 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
20 0.060302898 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
topicId topicWeight
[(0, 0.173), (1, 0.014), (2, 0.124), (3, -0.013), (4, -0.053), (5, -0.009), (6, -0.018), (7, -0.199), (8, -0.033), (9, 0.002), (10, -0.122), (11, 0.014), (12, 0.078), (13, 0.076), (14, -0.129), (15, -0.084), (16, -0.009), (17, -0.052), (18, -0.143), (19, 0.201), (20, 0.004), (21, -0.142), (22, -0.218), (23, -0.1), (24, 0.062), (25, 0.174), (26, 0.223), (27, 0.1), (28, -0.058), (29, 0.036), (30, -0.082), (31, 0.018), (32, 0.105), (33, -0.059), (34, 0.034), (35, 0.113), (36, 0.039), (37, 0.022), (38, -0.054), (39, 0.05), (40, 0.093), (41, -0.045), (42, 0.021), (43, 0.014), (44, 0.104), (45, -0.038), (46, 0.005), (47, -0.053), (48, -0.012), (49, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.95348901 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
2 0.80290389 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs
Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting
Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1
3 0.73544014 147 nips-2012-Graphical Models via Generalized Linear Models
Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar
Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1
4 0.64667207 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
Author: Nicholas Ruozzi
Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1
5 0.56215364 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function
Author: Stefano Ermon, Ashish Sabharwal, Bart Selman, Carla P. Gomes
Abstract: Given a probabilistic graphical model, its density of states is a distribution that, for any likelihood value, gives the number of configurations with that probability. We introduce a novel message-passing algorithm called Density Propagation (DP) for estimating this distribution. We show that DP is exact for tree-structured graphical models and is, in general, a strict generalization of both sum-product and max-product algorithms. Further, we use density of states and tree decomposition to introduce a new family of upper and lower bounds on the partition function. For any tree decomposition, the new upper bound based on finer-grained density of state information is provably at least as tight as previously known bounds based on convexity of the log-partition function, and strictly stronger if a general condition holds. We conclude with empirical evidence of improvement over convex relaxations and mean-field based bounds. 1
6 0.47858149 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
7 0.43391058 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
8 0.3791647 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization
9 0.36047158 359 nips-2012-Variational Inference for Crowdsourcing
10 0.35063624 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
11 0.33833125 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
12 0.33317748 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems
13 0.32826531 37 nips-2012-Affine Independent Variational Inference
14 0.3260214 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
15 0.31566572 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions
16 0.31463996 206 nips-2012-Majorization for CRFs and Latent Likelihoods
17 0.31197897 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover
18 0.30206764 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family
19 0.2994459 300 nips-2012-Scalable nonconvex inexact proximal splitting
20 0.28672493 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression
topicId topicWeight
[(0, 0.017), (17, 0.014), (21, 0.027), (35, 0.239), (38, 0.182), (39, 0.03), (42, 0.027), (54, 0.059), (74, 0.027), (76, 0.118), (80, 0.109), (92, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.84101462 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation
Author: Jason Pacheco, Erik B. Sudderth
Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1
2 0.77780044 162 nips-2012-Inverse Reinforcement Learning through Structured Classification
Author: Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin
Abstract: This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator. 1
3 0.77101892 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines
Author: Nitish Srivastava, Ruslan Salakhutdinov
Abstract: A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains. 1
4 0.74373782 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
Author: Michael Bryant, Erik B. Sudderth
Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.
5 0.74096012 291 nips-2012-Reducing statistical time-series problems to binary classification
Author: Daniil Ryabko, Jeremie Mary
Abstract: We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. 1
6 0.72204071 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
7 0.72109461 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)
8 0.71765006 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization
9 0.71238595 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification
10 0.71206802 227 nips-2012-Multiclass Learning with Simplex Coding
11 0.71189207 292 nips-2012-Regularized Off-Policy TD-Learning
12 0.71086097 324 nips-2012-Stochastic Gradient Descent with Only One Projection
13 0.70961654 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
14 0.70936215 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback
15 0.70920587 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
16 0.70839894 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models
17 0.70835108 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress
18 0.70800167 187 nips-2012-Learning curves for multi-task Gaussian process regression
19 0.70711428 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
20 0.70691997 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning