nips nips2012 nips2012-213 knowledge-graph by maker-knowledge-mining

213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Source: pdf

Author: Jason Pacheco, Erik B. Sudderth

Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms deﬁne ﬁxed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random ﬁelds, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. [sent-5, score-0.2]

2 While existing message passing algorithms deﬁne ﬁxed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. [sent-6, score-0.349]

3 For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. [sent-7, score-0.152]

4 Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. [sent-8, score-0.269]

5 We derive general algorithms for discrete and Gaussian pairwise Markov random ﬁelds, showing improvements over standard loopy belief propagation. [sent-9, score-0.192]

6 We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. [sent-10, score-0.124]

7 Typically the optimization is formulated by minimizing an objective known as the Gibbs free energy [1]. [sent-12, score-0.349]

8 Local message passing algorithms offer a computationally efﬁcient method for extremizing variational free energies. [sent-14, score-0.385]

9 Loopy belief propagation (LBP), for example, optimizes a relaxed objective known as the Bethe free energy [1, 2], which we review in Sec. [sent-15, score-0.504]

10 Expectation propagation (EP) [3] is a generalization of LBP which shares the same objective, but optimizes over a relaxed set of constraints [4] applicable to a broader family of continuous inference problems. [sent-17, score-0.178]

11 Even in simple continuous models, both methods may improperly estimate invalid or degenerate marginal distributions, such as Gaussians with negative variance. [sent-19, score-0.188]

12 Work on optimization of continuous variational free energies has primarily focused on addressing convergence problems [11]. [sent-22, score-0.348]

13 By leveraging gradient projection methods from the extensive literature on constrained nonlinear optimization, we develop an algorithm which ensures that marginal estimates remain valid and normalizable at all iterations. [sent-24, score-0.38]

14 In doing so, we account for important constraints which have been ignored 1 by previous variational derivations of the expectation propagation algorithm [12, 6, 11]. [sent-25, score-0.253]

15 Moreover, by adapting the method of multipliers [13], we guarantee that our inference algorithm converges for most models of practical interest. [sent-26, score-0.146]

16 We brieﬂy review the correspondence between the Lagrangian formalism and message passing and discuss implicit normalizability assumptions which, when violated, lead to degeneracy in message passing algorithms. [sent-29, score-0.394]

17 4) for discrete MRFs, Gaussian MRFs, and hybrid models with potentials deﬁned by discrete mixtures of Gaussian distributions. [sent-33, score-0.144]

18 5 demonstrate substantial improvements over baseline message passing algorithms. [sent-35, score-0.128]

19 We are interested in computing the log partition function log Zp , and/or the marginal distributions p(xs ), s ∈ V. [sent-39, score-0.118]

20 We associate each node s ∈ V with an exponential family qs (xs ; µs ), φs (x) ∈ Rds , and each edge (s, t) ∈ E with a family qst (xs , xt ; µst ), φst (x) ∈ Rdst . [sent-42, score-0.25]

21 Because qs (xs ; µs ) is a valid probability distribution, µs must lie in a set of realizable mean parameters, µs ∈ Ms . [sent-43, score-0.128]

22 We can express the log partition as the solution to an optimization problem, − log Zp = min Eµ [− log p(x)] − H[µ] = min F(µ), µ∈M(G) µ∈M(G) (3) where H[µ] is the entropy of q(x; µ), Eµ [·] denotes expectation with respect to q(x; µ), and F(µ) is known as the variational free energy. [sent-46, score-0.355]

23 Since our focus is on the continuous case we brieﬂy review the correspondence between Gaussian LBP ﬁxed points and the Gaussian Bethe free energy. [sent-59, score-0.247]

24 (8) The locally consistent marginal polytope L(G) consists of the constraints Cts (V ) = Vs − Vts for all nodes s ∈ V and edges (s, t) ∈ E. [sent-63, score-0.118]

25 s (9) t∈N (s) ∂L Taking the derivative with respect to the node marginal variance ∂Vs = 0 yields the stationary point 1 −1 Vs = As + ns −1 t∈N (s) λts . [sent-65, score-0.202]

26 For a Gaussian LBP algorithm with messages parametrized as mt→s (xs ) = exp − 1 x2 Λt→s , ﬁxed points of the node marginal precision are given by 2 s Λ s = As + Λt→s t∈N (s) Let λts = − 1 a∈N (s)\t Λa→s . [sent-66, score-0.108]

27 Inverting the correspondence between multipliers and message parameters yields the converse Vs−1 ⇐ Λs (c. [sent-69, score-0.229]

28 2 Message Passing Non-Convergence and Degeneracy While local message passing algorithms are convenient for many applications, their convergence is not guaranteed in general. [sent-73, score-0.159]

29 For continuous models message updates may yield degenerate, unnormalizable marginal distributions which do not correspond to stationary points of the Lagrangian. [sent-76, score-0.208]

30 For example, for Gaussian MRFs the Bethe free energy F B (·) in (5) is derived from expectations with respect to variational distributions qs (xs ; µs ), qst (xs , xt ; µst ). [sent-77, score-0.634]

31 If a set of hypothesized marginals are not normalizable (positive variance), the Gaussian Bethe free energy F GB (·) is invalid and undeﬁned. [sent-78, score-0.63]

32 Here LBP produces marginal variances which oscillate between impossibly large positive, and non-sensical negative, values. [sent-81, score-0.11]

33 Such degeneracies are arguably more problematic for EP since its moment matching steps require expected values with respect to an augmented distribution [3], which may involve an unbounded integral. [sent-82, score-0.157]

34 (b) Node marginal variance estimates per iteration for a symmetric, single-cycle Gaussian MRF with three nodes (plot is of x1 , other nodes are similar). [sent-89, score-0.127]

35 (c) For the model from (b), the Gaussian Bethe free energy is unbounded on the constraint set. [sent-90, score-0.445]

36 For Gaussian MRFs, the class of pairwise normalizable models are sufﬁcient to guarantee LBP stability and convergence [5]. [sent-93, score-0.241]

37 For non-pairwise normalizable models the Gaussian Bethe free energy is unbounded below [6] on the set of local consistency constraints L(G). [sent-94, score-0.615]

38 We offer a small example consisting of a non-pairwise normalizable symmetric single cycle with 3 nodes. [sent-95, score-0.185]

39 Substituting this parametrization into the Gaussian free energy (8), and performing some simple algebra, yields 3 3 3 FGB (V, ρ) = − log V + V (1 + 1. [sent-101, score-0.37]

40 2 the free energy is unbounded below at rate O(−V ). [sent-104, score-0.408]

41 Figure 1(c) illustrates the Bethe free energy for this model as a function of V , and for several values of ρ. [sent-105, score-0.349]

42 More generally, it has been shown that Gaussian EP messages are always normalizable (positive variance) for models with log-concave potentials [14]. [sent-106, score-0.201]

43 Our work seeks to improve inference for non-log-concave models with bounded Bethe free energies. [sent-109, score-0.215]

44 3 Method of Multipliers Given our complete constrained formulation of the Bethe variational problem, we avoid convergence and degeneracy problems via direct minimization using the method of multipliers (MoM) [13]. [sent-110, score-0.31]

45 x∈M The penalty multiplier can be updated as ck+1 ≥ ck according to some ﬁxed update schedule, or based on the results of the optimization step. [sent-112, score-0.169]

46 An update rule that we ﬁnd useful [13] is to increase the penalty parameter by β > 1 if the constraint violation is not improved by a factor 0 < γ < 1 over the previous iteration, βck if h(xk ) > γ h(xk−1 ) , ck+1 = ck if h(xk ) ≤ γ h(xk−1 ) . [sent-113, score-0.128]

47 1 Gradient Projection Methods The augmented Lagrangian Lc (x, λ) is a partial one, where feasibility of mean parameters (x ∈ M) is enforced explicitly by projection. [sent-115, score-0.125]

48 A simple gradient projection method [13] deﬁnes a sequence + xk+1 = xk + αk (¯k − xk ), x xk = [xk − sk f (xk )] . [sent-116, score-0.411]

49 After taking a step sk > 0 in the direction of the negative gradient, we project the result onto the constraint set to obtain a feasible direction xk . [sent-118, score-0.163]

50 We then compute xk+1 by taking a step αk ∈ (0, 1] in the direction of (¯k − xk ). [sent-119, score-0.103]

51 If ¯ x xk − sk f (xk ) is feasible, gradient projection reduces to unconstrained steepest descent. [sent-120, score-0.205]

52 4] for the Method of Multipliers with a quadratic penalty and multiplier iteration λk+1 = λk + ck h(xk ). [sent-130, score-0.193]

53 xx For convergence, the initialization of the Lagrange multiplier λ0 and penalty parameter c0 must be such that λ0 − λ∗ < δc0 for some δ > 0 and c ≥ c which depend on the objective and constraints. [sent-133, score-0.14]

54 ¯ In practice, a poor initialization of the multiplier λ0 can often be offset by a sufﬁciently high c0 . [sent-134, score-0.103]

55 A ﬁnal technical note is that convergence proofs assume the sequence of unconstrained optimizations which yield xk stays in the neighborhood of x∗ after some k. [sent-135, score-0.134]

56 To invoke existing convergence results we must show that a local minimum x∗ exists for each of the free energies we consider; a sufﬁcient condition is then that the Bethe free energy is bounded from below. [sent-137, score-0.605]

57 This property has been previously established for general discrete MRFs [18], for pairwise normalizable Gaussian MRFs [6], and for the clutter model [3]. [sent-138, score-0.321]

58 For non-pairwise normalizable Gaussian MRFs, the example of Section 2. [sent-139, score-0.165]

59 3 shows that the Bethe variational objective is unbounded below, and further may not contain any local optima. [sent-140, score-0.126]

60 While the method of multipliers does not converge in this situation, its non-convergence is due to fundamental ﬂaws in the Bethe approximation. [sent-141, score-0.121]

61 4 MoM Algorithms for Probabilistic Inference We derive MoM algorithms which minimize the Bethe free energy for three different families of graphical models. [sent-142, score-0.372]

62 For each model we deﬁne the form of the joint distribution, Bethe free energy (5), local consistency constraints, augmented Lagrangian, and the gradient projection step. [sent-143, score-0.526]

63 The Gaussian Bethe free energy (8) is always unbounded below off of the constraint set in node marginal variances Vs . [sent-147, score-0.587]

64 We correct this by adding an additional ﬁxed penalty in the augmented Lagrangian, Lc (V, Σ, λ) = FGB (V ) + λts [Vs − Vts ] s + κ 2 t∈N (s) 2 [log Vs − log Vts ] + s t∈N (s) c 2 2 [Vs − Vts ] . [sent-148, score-0.156]

65 The gradient projection step is easily expressed in terms of correlation coefﬁcients ρst , √ V ρst Vst Vts √ st Σst = . [sent-152, score-0.266]

66 3, we showed that the Gaussian Bethe free energy is unbounded on the constraint set for non-pairwise normalizable models. [sent-157, score-0.61]

67 We run MoM on the symmetric three-node cycle from this discussion and ﬁnd that MoM, correctly, identiﬁes an unbounded direction, and Figure 1(b) shows that the node marginal variances indeed diverge to inﬁnity. [sent-158, score-0.221]

68 2 Discrete Markov Random Fields Consider a discrete MRF where all variables xs ∈ Xs = {1, . [sent-160, score-0.443]

69 The variational marginal Ks distributions are then qs (xs ; τ ) = k=1 τ (xs )I(xs ,k) , and have mean parameters τ ∈ RKs . [sent-164, score-0.237]

70 Pairwise marginals have mean parameters τst ∈ RKs ×Kt similarly indexed as τst (xs , xt ). [sent-166, score-0.142]

71 The discrete Bethe free energy is then FB (τ ; ϕ) = τst (xs , xt )[log τst (xs , xt ) − log φst (xs , xt )] (s,t)∈E xs − xt (ns − 1)τs (xs )[log τs (xs ) − log ϕs (xs )]. [sent-167, score-1.142]

72 s∈V xs For this discrete model, our expectation constraints reduce to the following normalization and marginalization constraints: Cs (τ ) = 1 − τs (xs ), Cts (xs ; τ ) = τs (xs ) − xs τst (xs , xt ). [sent-168, score-0.996]

73 xt The augmented Lagrangian is then, Lc (τ, λ, ξ; ϕ) = FB (τ ; ϕ) + λts (xs )Cts (xs ; τ ) + xs (s,t)∈E + ξss Cs (τ ) + s∈V c 2 λst (xt )Cst (xt ; τ ) (11) xt Cs (τ )2 + s∈V c 2 Cts (xs ; τ )2 + (s,t)∈E xs Cst (xt ; τ )2 . [sent-169, score-1.05]

74 This constraint is enforced by a bound projection τs (xs ) = max(0, τs (xs )), and similarly for the pairwise marginals. [sent-171, score-0.159]

75 Assuming normalizable marginals with V0 ≥ 0, Vi0 ≥ 0, Vi1 ≥ 0, as always ensured by our multiplier method, we deﬁne the Bethe free energy F CGB (m, V, β) in terms of the mean parameters in the supplemental material. [sent-179, score-0.681]

76 Expectation constraints are given by, mean Ci = E0 [x] − Ei [x], var Ci = Var0 [x] − Vari [x], where Ei [·] and Var i [·] denote the mean and variance of the Gaussian mixture qi (x, zi ). [sent-180, score-0.135]

77 Combining the free energy, constraints, and additional positive semideﬁnite constraints on the marginal variances we have the BVP for the clutter model, minimize m,V,β FCGB (m, V, β; ϕ) (12) mean var subject to Ci = 0, Ci = 0, for all i = 1, 2, . [sent-181, score-0.433]

78 , N V0 ≥ 0, Vi0 ≥ 0, Vi1 ≥ 0 Derivation of the free energy and augmented Lagrangian is somewhat lengthy, and so is deferred to the supplement. [sent-184, score-0.447]

79 We sample 500 instances at random from a 10x10 toroidal lattice with each Jst ∼ N (0, 1) and hs ∼ N (0, 0. [sent-189, score-0.152]

80 We report average L1 error of the approximate marginals as compared to the true marginals computed with the junction tree algorithm [20]. [sent-192, score-0.13]

81 Direct evaluation of the Bethe free energy does not take into account constraint violations for nonconvergent LBP runs. [sent-194, score-0.386]

82 The augmented Lagrangian penalizes constraint violation, but requires a penalty parameter which LBP does not provide. [sent-195, score-0.172]

83 For an objective comparison, we construct a penalized Bethe free energy by evaluating the augmented Lagrangian with ﬁxed penalty c = 1 and multipliers λ = 0. [sent-196, score-0.605]

84 As we see in Figure 2(a,bottom), MoM ﬁnds a lower free energy for most trials. [sent-198, score-0.349]

85 Wall clock time is measured in seconds across various trials, and the percentiles for LBP are 25%: 1040. [sent-201, score-0.117]

86 2 Gaussian Markov Random Fields For the Gaussian case we again sample 500 random instances from a 10x10 lattice with toroidal boundary conditions. [sent-210, score-0.123]

87 We randomly sample only pairwise normalizable instances and initialization is provided with a single iteration of Gaussian LBP. [sent-211, score-0.259]

88 For pairwise normalizable models, Gaussian LBP is guaranteed to converge to the unique ﬁxed point of the Bethe free energy, so it is reassuring that MoM optimization matches LBP performance. [sent-214, score-0.4]

89 The value of the augmented Lagrangian at the ﬁnal iteration is shown in Figure 2(b,bottom) and again shows that MoM matches Gaussian LBP on pairwise normalizable models. [sent-215, score-0.332]

90 Bottom: Penalized Bethe free energy constructed by setting λ = 0, c = 1 in the augmented Lagrangian. [sent-271, score-0.447]

91 3 Discrete Mixtures of Gaussian Potentials To test the beneﬁts of avoiding degenerate marginals, we consider the clutter model of Sec. [sent-273, score-0.103]

92 A good initialization of the multipliers is critical to performance of MoM. [sent-279, score-0.146]

93 We generate 10 initializations by running 5 iterations of EP, each with a different random message update schedule, compute the corresponding Lagrange multipliers for each, and use the one with the lowest value of the augmented Lagrangian. [sent-280, score-0.295]

94 Similarly, the augmented Lagrangian comparison is shown in Figure 2(c,bottom) and MoM often ﬁnds a better penalized free energy. [sent-285, score-0.288]

95 While MoM and EP can both suffer from local optima, MoM avoids non-convergence and the output of invalid (negative variance) marginal distributions. [sent-286, score-0.127]

96 This method directly avoids the creation of degenerate distributions — for example with negative variance — which frequently occur in more greedy approaches for minimizing the Bethe free energy. [sent-297, score-0.253]

97 Properties of bethe free energies and message passing in Gaussian models. [sent-332, score-0.785]

98 CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to Belief Propagation. [sent-336, score-0.19]

99 On the uniqueness of loopy belief propagation ﬁxed points. [sent-392, score-0.189]

100 libDAI: A free and open source C++ library for discrete approximate inference in graphical models. [sent-404, score-0.259]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bethe', 0.432), ('xs', 0.399), ('mom', 0.36), ('lbp', 0.233), ('free', 0.19), ('vts', 0.188), ('st', 0.187), ('normalizable', 0.165), ('energy', 0.159), ('jst', 0.138), ('multipliers', 0.121), ('lagrangian', 0.117), ('xk', 0.103), ('augmented', 0.098), ('cts', 0.097), ('vs', 0.095), ('qs', 0.094), ('toroidal', 0.094), ('vst', 0.094), ('mrfs', 0.09), ('propagation', 0.086), ('ep', 0.085), ('multiplier', 0.078), ('xt', 0.077), ('message', 0.076), ('marginal', 0.076), ('belief', 0.069), ('clutter', 0.067), ('variational', 0.067), ('degeneracy', 0.065), ('marginals', 0.065), ('gaussian', 0.063), ('unbounded', 0.059), ('lagrange', 0.058), ('fgb', 0.055), ('ck', 0.054), ('passing', 0.052), ('invalid', 0.051), ('lc', 0.051), ('projection', 0.05), ('qst', 0.047), ('seconds', 0.046), ('pairwise', 0.045), ('discrete', 0.044), ('constraints', 0.042), ('zi', 0.042), ('bvp', 0.041), ('normalizability', 0.041), ('mst', 0.041), ('percentiles', 0.041), ('cs', 0.04), ('constraint', 0.037), ('penalty', 0.037), ('ns', 0.036), ('degenerate', 0.036), ('heskes', 0.036), ('potentials', 0.036), ('energies', 0.035), ('expectation', 0.035), ('valid', 0.034), ('zp', 0.034), ('mrf', 0.034), ('loopy', 0.034), ('variances', 0.034), ('fb', 0.033), ('correspondence', 0.032), ('node', 0.032), ('convergence', 0.031), ('cst', 0.031), ('eqst', 0.031), ('jss', 0.031), ('lck', 0.031), ('rks', 0.031), ('stationary', 0.031), ('clock', 0.03), ('fields', 0.03), ('lattice', 0.029), ('gradient', 0.029), ('hs', 0.029), ('ts', 0.028), ('bst', 0.028), ('wall', 0.027), ('variance', 0.027), ('enforced', 0.027), ('ising', 0.027), ('constrained', 0.026), ('inference', 0.025), ('initialization', 0.025), ('continuous', 0.025), ('var', 0.024), ('supplemental', 0.024), ('iteration', 0.024), ('derivations', 0.023), ('families', 0.023), ('sk', 0.023), ('ci', 0.022), ('yedidia', 0.022), ('log', 0.021), ('symmetric', 0.02), ('hybrid', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

2 0.31441045 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any ﬁxed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the afﬁrmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

3 0.27166921 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by ﬁtting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1

4 0.2426524 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for ﬁnding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an inﬁnite number of observations, there are only a ﬁnite number of continuous observation partitionings that are relevant for optimal decision-making when a ﬁnite, ﬁxed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

5 0.14937048 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reﬂects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the signiﬁcance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

6 0.12095097 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

7 0.11662927 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

8 0.10689329 300 nips-2012-Scalable nonconvex inexact proximal splitting

9 0.09278091 324 nips-2012-Stochastic Gradient Descent with Only One Projection

10 0.080843091 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

11 0.080334961 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

12 0.076941557 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

13 0.070331439 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

14 0.068676725 282 nips-2012-Proximal Newton-type methods for convex optimization

15 0.067271605 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

16 0.066801608 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

17 0.063102707 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

18 0.062653206 305 nips-2012-Selective Labeling via Error Bound Minimization

19 0.060918558 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

20 0.060302898 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.173), (1, 0.014), (2, 0.124), (3, -0.013), (4, -0.053), (5, -0.009), (6, -0.018), (7, -0.199), (8, -0.033), (9, 0.002), (10, -0.122), (11, 0.014), (12, 0.078), (13, 0.076), (14, -0.129), (15, -0.084), (16, -0.009), (17, -0.052), (18, -0.143), (19, 0.201), (20, 0.004), (21, -0.142), (22, -0.218), (23, -0.1), (24, 0.062), (25, 0.174), (26, 0.223), (27, 0.1), (28, -0.058), (29, 0.036), (30, -0.082), (31, 0.018), (32, 0.105), (33, -0.059), (34, 0.034), (35, 0.113), (36, 0.039), (37, 0.022), (38, -0.054), (39, 0.05), (40, 0.093), (41, -0.045), (42, 0.021), (43, 0.014), (44, 0.104), (45, -0.038), (46, 0.005), (47, -0.053), (48, -0.012), (49, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95348901 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

2 0.80290389 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

3 0.73544014 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

4 0.64667207 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

5 0.56215364 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

Author: Stefano Ermon, Ashish Sabharwal, Bart Selman, Carla P. Gomes

Abstract: Given a probabilistic graphical model, its density of states is a distribution that, for any likelihood value, gives the number of conﬁgurations with that probability. We introduce a novel message-passing algorithm called Density Propagation (DP) for estimating this distribution. We show that DP is exact for tree-structured graphical models and is, in general, a strict generalization of both sum-product and max-product algorithms. Further, we use density of states and tree decomposition to introduce a new family of upper and lower bounds on the partition function. For any tree decomposition, the new upper bound based on ﬁner-grained density of state information is provably at least as tight as previously known bounds based on convexity of the log-partition function, and strictly stronger if a general condition holds. We conclude with empirical evidence of improvement over convex relaxations and mean-ﬁeld based bounds. 1

6 0.47858149 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

7 0.43391058 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

8 0.3791647 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

9 0.36047158 359 nips-2012-Variational Inference for Crowdsourcing

10 0.35063624 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

11 0.33833125 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

12 0.33317748 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

13 0.32826531 37 nips-2012-Affine Independent Variational Inference

14 0.3260214 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

15 0.31566572 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

16 0.31463996 206 nips-2012-Majorization for CRFs and Latent Likelihoods

17 0.31197897 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

18 0.30206764 129 nips-2012-Fast Variational Inference in the Conjugate Exponential Family

19 0.2994459 300 nips-2012-Scalable nonconvex inexact proximal splitting

20 0.28672493 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.017), (17, 0.014), (21, 0.027), (35, 0.239), (38, 0.182), (39, 0.03), (42, 0.027), (54, 0.059), (74, 0.027), (76, 0.118), (80, 0.109), (92, 0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84101462 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

2 0.77780044 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

Author: Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin

Abstract: This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classiﬁer. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator. 1

3 0.77101892 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

Author: Nitish Srivastava, Ruslan Salakhutdinov

Abstract: A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a uniﬁed representation that fuses modalities together. We ﬁnd that this representation is useful for classiﬁcation and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and ﬁlling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model signiﬁcantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains. 1

4 0.74373782 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

Author: Michael Bryant, Erik B. Sudderth

Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.

5 0.74096012 291 nips-2012-Reducing statistical time-series problems to binary classification

Author: Daniil Ryabko, Jeremie Mary

Abstract: We show how binary classiﬁcation methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classiﬁcation and concern highly-dependent time series. Speciﬁcally, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classiﬁcation methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data. 1

6 0.72204071 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

7 0.72109461 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)

8 0.71765006 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

9 0.71238595 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

10 0.71206802 227 nips-2012-Multiclass Learning with Simplex Coding

11 0.71189207 292 nips-2012-Regularized Off-Policy TD-Learning

12 0.71086097 324 nips-2012-Stochastic Gradient Descent with Only One Projection

13 0.70961654 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.70936215 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

15 0.70920587 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

16 0.70839894 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

17 0.70835108 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

18 0.70800167 187 nips-2012-Learning curves for multi-task Gaussian process regression

19 0.70711428 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

20 0.70691997 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning