nips nips2007 nips2007-123 knowledge-graph by maker-knowledge-mining

123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models

Source: pdf

Author: Alan S. Willsky, Erik B. Sudderth, Martin J. Wainwright

Abstract: Variational methods are frequently used to approximate or bound the partition or likelihood function of a Markov random ﬁeld. Methods based on mean ﬁeld theory are guaranteed to provide lower bounds, whereas certain types of convex relaxations provide upper bounds. In general, loopy belief propagation (BP) provides often accurate approximations, but not bounds. We prove that for a class of attractive binary models, the so–called Bethe approximation associated with any ﬁxed point of loopy BP always lower bounds the true likelihood. Empirically, this bound is much tighter than the naive mean ﬁeld bound, and requires no further work than running BP. We establish these lower bounds using a loop series expansion due to Chertkov and Chernyak, which we show can be derived as a consequence of the tree reparameterization characterization of BP ﬁxed points. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In general, loopy belief propagation (BP) provides often accurate approximations, but not bounds. [sent-11, score-0.233]

2 We prove that for a class of attractive binary models, the so–called Bethe approximation associated with any ﬁxed point of loopy BP always lower bounds the true likelihood. [sent-12, score-0.483]

3 We establish these lower bounds using a loop series expansion due to Chertkov and Chernyak, which we show can be derived as a consequence of the tree reparameterization characterization of BP ﬁxed points. [sent-14, score-0.657]

4 The variational framework provides a suite of candidate methods, including mean ﬁeld approximations [3, 9], the sum–product or belief propagation (BP) algorithm [11, 14], Kikuchi and cluster variational methods [23], and related convex relaxations [21]. [sent-17, score-0.308]

5 The likelihood or partition function of an undirected graphical model is of fundamental interest in many contexts, including parameter estimation, error bounds in hypothesis testing, and combinatorial enumeration. [sent-18, score-0.296]

6 In rough terms, particular variational methods can be understood as solving optimization problems whose optima approximate the log partition function. [sent-19, score-0.249]

7 Although “convexiﬁed” relaxations of the Bethe problem yield upper bounds [21], to date the best known lower bounds on the partition function are based on mean ﬁeld theory. [sent-22, score-0.379]

8 Recent work has studied loop series expansions [2, 4] of the partition function, which generate better approximations but not, in general, bounds. [sent-23, score-0.524]

9 Such models often encode “smoothness” priors, and thus have attractive interactions which encourage connected variables to share common values. [sent-26, score-0.215]

10 The ﬁrst main contribution of this paper is to demonstrate a family of attractive models for which the Bethe variational method always yields lower bounds on the true likelihood. [sent-27, score-0.372]

11 For such models, these lower bounds are easily computed from any ﬁxed point of loopy BP, and empirically improve substantially on naive mean ﬁeld bounds. [sent-29, score-0.267]

12 3 uses the reparameterization characterization of BP ﬁxed points [20] to provide a simple derivation for the loop series expansion of Chertkov and Chernyak [2]. [sent-32, score-0.506]

13 The Bethe approximation is the ﬁrst term in this representation of the true partition function. [sent-33, score-0.199]

14 4 then identiﬁes attractive models for which all terms in this expansion are positive, thus establishing the Bethe lower bound. [sent-35, score-0.33]

15 2 Undirected Graphical Models Given an undirected graph G = (V, E), with edges (s, t) ∈ E connecting n vertices s ∈ V , a graphical model associates each node with a random variable Xs taking values xs ∈ X . [sent-37, score-0.785]

16 Letting xc := {xs | s ∈ c}, the corresponding joint distribution equals 1 ψc (xc ) (2) p(x) = ψs (xs ) Z(ψ) c∈C s∈V where as before Z(ψ) = x∈X n s ψs (xs ) c ψc (xc ). [sent-41, score-0.292]

17 Computationally tractable families of bounds on the true partition function are thus of great practical interest. [sent-49, score-0.285]

18 We say that a pairwise MRF, with compatibility functions ψst : {0, 1}2 → R+, has attractive interactions if ψst (0, 0) ψst (1, 1) ≥ ψst (0, 1) ψst (1, 0) (3) for each edge (s, t) ∈ E. [sent-52, score-0.365]

19 In the statistical physics literature, Ising models are typically expressed by coupling random spins zs ∈ {−1, +1} with symmetric potentials log ψst (zs , zt ) = θst zs zt . [sent-59, score-0.379]

20 Furthermore, pairwise MRFs satisfy the regularity condition of [10], and thus allow tractable MAP estimation via graph cuts [5], if and only if they are attractive. [sent-62, score-0.218]

21 Even for attractive models, however, calculation of the partition function in non–planar graphs is #P–complete [8]. [sent-63, score-0.395]

22 To deﬁne families of higher–order attractive potentials, we ﬁrst consider a probability distribution τc (xc ) on k = |c| binary variables. [sent-64, score-0.242]

23 Given these deﬁnitions, we say that a probability distribution τc (xc ) is attractive if the central moments associated with all subsets a ⊆ c of binary variables are non–negative (κa ≥ 0). [sent-69, score-0.278]

24 Similarly, a compatibility function ψc (xc ) is attractive if the probability distribution attained by normalizing its values has non–negative central moments. [sent-70, score-0.253]

25 Loopy belief propagation (BP) approximates these marginals via a series of messages passed among nodes of the graphical model [14, 23]. [sent-76, score-0.419]

26 The BP algorithm then iterates the following message updates: msc (xs ) ← ψs (xs ) ¯ mcs (xs ) ← mds (xs ) ψc (xc ) xc\s d∈Γ(s)\c mtc (xt ) ¯ (8) t∈c\s The left–hand expression updates the message msc (xs ) passed from variable node s to factor c. [sent-78, score-0.239]

27 A wide range of inference algorithms can be derived via variational approximations [9] to the true partition function. [sent-83, score-0.265]

28 (2): 1 τc (xc ) p(x) = τs (xs ) (11) Z(τ ) t∈c τt (xt ) s∈V c∈C For pairwise MRFs, the reparameterized compatibility functions equal τst (xs , xt )/τs (xs )τt (xt ). [sent-89, score-0.432]

29 (10), it is easily shown that the Bethe approximation Zβ (τ ; τ ) = 1 for any joint distribution deﬁned by reparameterized potentials as in eq. [sent-94, score-0.264]

30 For simplicity, the remainder of this paper focuses on reparameterized models of this form, and analyzes properties of the corresponding exact partition function Z(τ ). [sent-96, score-0.289]

31 The resulting expansions and bounds are then related to the original MRF’s partition function via the positive constant Z(ψ)/Z(τ ) = Zβ (ψ; τ ) of eq. [sent-97, score-0.285]

32 Recently, Chertkov and Chernyak proposed a ﬁnite loop series expansion [2] of the partition function, whose ﬁrst term coincides with the Bethe approximation. [sent-99, score-0.574]

33 1 Pairwise Loop Series Expansions We begin by developing a loop series expansion for pairwise MRFs. [sent-105, score-0.529]

34 Given an undirected graph G = (V, E), and some subset F ⊆ E of the graph’s edges, let ds (F ) denote the degree (number of neighbors) of node s in the subgraph induced by F . [sent-106, score-0.236]

35 1, any subset F for which all nodes s ∈ V have degree ds (F ) = 1 deﬁnes a generalized loop [2]. [sent-108, score-0.465]

36 The partition function for any binary, pairwise MRF can then be expanded via an associated set of loop corrections. [sent-109, score-0.492]

37 Consider a pairwise MRF deﬁned on an undirected G = (V, E), with reparameterized potentials as in eq. [sent-111, score-0.378]

38 The associated partition function then equals Z(τ ) = 1 + Eτs (Xs − τs )ds (F ) βF β F := s∈V ∅=F ⊆E βst (12) (s,t)∈F τst − τs τt Covτst (Xs , Xt ) = τs (1 − τs )τt (1 − τt ) Varτs (Xs ) Varτt (Xt ) where only generalized loops F lead to non–zero terms in the sum of eq. [sent-113, score-0.303]

39 (12), we exploit the following polynomial representation of reparameterized pairwise compatibility functions: τst (xs , xt ) = 1 + βst (xs − τs )(xt − τt ) (15) τs (xs )τt (xt ) As veriﬁed in [17], this expression is satisﬁed for any (xs , xt ) ∈ {0, 1}2 if βst is deﬁned as in eq. [sent-117, score-0.566]

40 There is thus one loop correction for each generalized loop F , in which all connected nodes have degree at least two. [sent-124, score-0.645]

41 4 Figure 1: A pairwise MRF coupling ten binary variables (left), and the nine generalized loops in its loop series expansion (right). [sent-125, score-0.716]

42 For attractive potentials, two of the generalized loops may have negative signs (second & third from right), while the core graph of Thm. [sent-126, score-0.438]

43 Figure 1 illustrates the set of generalized loops associated with a particular pairwise MRF. [sent-128, score-0.238]

44 These loops effectively deﬁne corrections to the Bethe estimate Z(τ ) ≈ 1 of the partition function for reparameterized models. [sent-129, score-0.417]

45 Tree–structured graphs do not contain any non–trivial generalized loops, and the Bethe variational approximation is thus exact. [sent-130, score-0.208]

46 The loop expansion formulas of [2] can be precisely recovered by transforming binary variables to a spin representation, and refactoring terms from the denominator of edge weights βst to adjacent vertices. [sent-131, score-0.458]

47 Explicit computation of these loop corrections is in general intractable; for example, fully connected graphs with n ≥ 5 nodes have more than 2n generalized loops. [sent-132, score-0.507]

48 In some cases, accounting for a small set of signiﬁcant loop corrections may lead to improved approximations to Z(ψ) [4], or more accurate belief estimates for LDPC codes [1]. [sent-133, score-0.361]

49 2 Factor Graph Loop Series Expansions We now extend the loop series expansion to higher–order MRFs deﬁned on hypergraphs G = (V, C). [sent-137, score-0.446]

50 2, we deﬁne a generalized loop to be a subset F ⊆ E of edges such that all connected factor and variable nodes have degree at least two. [sent-140, score-0.534]

51 Consider any factor graph G = (V, C) with reparameterized potentials as in eq. [sent-142, score-0.378]

52 The partition function then equals Z(τ ) = 1 + Eτs (Xs − τs )ds (F ) βF ∅=F ⊆E β F := s∈V βac (F ) (18) c∈C Eτ (Xs − τs ) κa = c s∈a (19) t∈a τt (1 − τt ) t∈a Varτt (Xt ) where ac (F ) := {s ∈ c | (s, c) ∈ F } denotes the subset of variables linked to factor node c by the edges in F . [sent-144, score-0.382]

53 (11): τc (xc ) =1+ βa (xs − τs ) (20) t∈c τt (xt ) s∈a βa := a⊆c,|a|≥2 For factor graphs with attractive reparameterized potentials, the constant βa ≥ 0 for all a ⊆ c. [sent-149, score-0.426]

54 (16)), we may express the partition function of the reparameterized factor graph as follows: τc (Xc ) 1+ Z(τ ) = Eτ = Eτ βa (Xs − τs ) (21) ˜ ˜ t∈c τt (Xt ) s∈a c∈C c∈C ∅=a⊆c Note that βa = 0 for any subset where |a| = 1. [sent-156, score-0.444]

55 For a term ˜ in this loop series to be non–zero, there must be no degree one variables, since Eτs [Xs − τs ] = 0. [sent-160, score-0.324]

56 5 Figure 2: A factor graph (left) with three binary variables (circles) and four factor nodes (squares), and the thirteen generalized loops in its loop series expansion (right, along with the full graph). [sent-162, score-0.876]

57 4 Lower Bounds in Attractive Binary Models The Bethe approximation underlying loopy BP differs from mean ﬁeld methods [9], which lower bound the true log partition function Z(ψ), in two key ways. [sent-163, score-0.418]

58 Nevertheless, we now show that for a large family of attractive graphical models, the Bethe approximation Zβ (ψ; τ ) of eq. [sent-167, score-0.235]

59 We then deﬁne the core graph H = (VH , EH ) as the node–induced subgraph obtained by discarding edges from nodes outside VH , so that EH = {(s, t) ∈ E | s, t ∈ VH }. [sent-174, score-0.266]

60 The unique core graph H underlying any graph G can be efﬁciently constructed by iteratively pruning degree one nodes, or leaves, until all remaining nodes have two or more neighbors. [sent-175, score-0.332]

61 The following theorem identiﬁes conditions under which all terms in the loop series expansion must be non–negative. [sent-176, score-0.44]

62 Let H = (VH , EH ) be the core graph for a pairwise binary MRF, with attractive potentials satisfying eq. [sent-178, score-0.589]

63 Consider any BP ﬁxed point for which all nodes s ∈ VH with three or 1 more neighbors in H have marginals τs ≤ 2 (or equivalently, τs ≥ 1 ). [sent-180, score-0.196]

64 The corresponding Bethe 2 variational approximation Zβ (ψ; τ ) then lower bounds the true partition function Z(ψ). [sent-181, score-0.372]

65 It is sufﬁcient to show that Z(τ ) ≥ 1 for any reparameterized pairwise MRF, as in eq. [sent-183, score-0.244]

66 (9), note that loopy BP estimates the pseudo–marginal τst (xs , xt ) via the product of ψst (xs , xt ) with message functions of single variables. [sent-186, score-0.409]

67 For this reason, attractive pairwise compatibilities always lead to BP ﬁxed points with attractive pseudo–marginals satisfying τst ≥ τs τt . [sent-187, score-0.508]

68 (13), attractive models lead to edge weights βst ≥ 0. [sent-191, score-0.202]

69 It is thus sufﬁcient to show that s Eτs (Xs − τs )ds (F ) ≥ 0 for each generalized loop F ⊆ E. [sent-192, score-0.284]

70 Suppose ﬁrst that the graph has a single cycle, and thus exactly one non–zero generalized loop F . [sent-193, score-0.369]

71 More generally, we clearly have Z(τ ) ≥ 1 in graphs where every generalized loop F associates an even number of neighbors ds (F ) with each node. [sent-195, score-0.417]

72 Focusing on generalized loops containing nodes with odd degree d ≥ 3, eq. [sent-196, score-0.242]

73 In particular, the symmetric ﬁxed point τs = 2 leads to uniformly positive generalized loop corrections. [sent-199, score-0.284]

74 More generally, the marginals of nodes s for which ds (F ) ≤ 2 for every generalized loop F do not inﬂuence the expansion’s positivity. [sent-200, score-0.509]

75 Theorem 1 discards these nodes by examining the topology of the core graph H (see Fig. [sent-201, score-0.218]

76 1 For ﬁxed points where τs ≥ 2 for all nodes, we rewrite the polynomial in the loop expansion of eq. [sent-203, score-0.354]

77 (15) as (1 + βst (τs − xs )(τt − xt )), and employ an analogous line of reasoning. [sent-204, score-0.693]

78 1, our arguments show that the true partition function monotonically increases as additional edges, with attractive reparameterized potentials as in eq. [sent-206, score-0.596]

79 For such models, the accumulation of particular loop corrections, as explored by [4], produces a sequence of increasingly tight bounds on Z(ψ). [sent-208, score-0.306]

80 1 For attractive Ising models in which some nodes have marginals τs > 1 and others τt < 2 , the loop 2 series expansion may contain negative terms. [sent-212, score-0.772]

81 1, it is possible to use upper bounds on the edge weights βst , which follow from τst ≤ min(τs , τt ), to cancel negative loop corrections with larger positive terms. [sent-214, score-0.387]

82 3, the lower bound Z(ψ) ≥ Zβ (ψ; τ ) thus continues to hold for many (perhaps all) attractive Ising models with less homogeneous marginal biases. [sent-217, score-0.251]

83 2 Partition Function Bounds for Factor Graphs Given a factor graph G = (V, C) relating binary variables, deﬁne a core graph H = (VH , CH ) by excluding variable and factor nodes which are not members of any generalized loops. [sent-219, score-0.495]

84 2, let Γ(s) denote the set of factor nodes neighboring variable node s in the core graph H. [sent-222, score-0.296]

85 Let H = (VH , CH ) be the core graph for a binary factor graph, and consider an attractive BP ﬁxed point for which one of the following conditions holds: (i) τs ≤ (ii) τs ≥ 1 2 1 2 for all nodes s ∈ VH with |Γ(s)| ≥ 3, and κa ≥ 0 for all a ⊆ c, c ∈ CH . [sent-224, score-0.5]

86 The Bethe approximation Zβ (ψ; τ ) then lower bounds the true partition function Z(ψ). [sent-226, score-0.306]

87 When τs ≥ 2 , we replace all (xs − τs ) terms by (τs − xs ) in the expansion of eq. [sent-230, score-0.684]

88 For factor graphs, it is more challenging to determine which compatibility functions ψc (xc ) necessarily lead to attractive ﬁxed points. [sent-234, score-0.279]

89 Using the spin representation zs ∈ {−1, +1}, we examine Ising models with attractive pairwise potentials log ψst (zs , zt ) = θst zs zt of varying strengths θst ≥ 0. [sent-240, score-0.64]

90 We also consider a set of random 10 × 10 nearest–neighbor grids, with inhomogeneous pairwise ¯ potentials sampled according to |θst | ∼ N 0, θ 2 , and observation potentials log ψs (zs ) = θs zs , 2 ¯ we sample 100 random MRFs, and plot the average differ|θs | ∼ N 0, 0. [sent-250, score-0.46]

91 For each candidate θ, ence log Zβ (ψ; τ ) − log Z(ψ) between the true partition function and the BP (or mean ﬁeld) ﬁxed point reached from a random initialization. [sent-252, score-0.255]

92 Although this sometimes leads to BP ﬁxed points with negative associated loop corrections, the Bethe variational approximation nevertheless always lower bounds the true partition function in these examples. [sent-259, score-0.601]

93 5 Discussion We have provided an alternative, direct derivation of the partition function’s loop series expansion, based on the reparameterization characterization of BP ﬁxed points. [sent-261, score-0.535]

94 We use this expansion to prove that the Bethe approximation lower bounds the true partition function in a family of binary attractive 7 −10 −20 −30 −40 −50 −60 Belief Propagation Mean Field 0. [sent-262, score-0.643]

95 8 1 (c) Figure 3: Bethe (dark blue, top) and naive mean ﬁeld (light green, bottom) lower bounds on log Z(ψ) for three families of attractive, pairwise Ising models. [sent-278, score-0.316]

96 Acknowledgments The authors thank Yair Weiss for suggesting connections to loop series expansions, and helpful conversations. [sent-287, score-0.295]

97 On the uniqueness of loopy belief propagation ﬁxed points. [sent-336, score-0.233]

98 Loop series and Bethe variational bounds in attractive graphical models. [sent-425, score-0.423]

99 Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. [sent-432, score-0.223]

100 A new class of upper bounds on the log partition function. [sent-459, score-0.26]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xs', 0.559), ('bethe', 0.288), ('xc', 0.272), ('st', 0.261), ('bp', 0.244), ('loop', 0.229), ('attractive', 0.175), ('partition', 0.154), ('reparameterized', 0.135), ('xt', 0.134), ('expansion', 0.125), ('vh', 0.124), ('loopy', 0.119), ('pairwise', 0.109), ('potentials', 0.108), ('marginals', 0.093), ('mrfs', 0.091), ('graph', 0.085), ('nodes', 0.084), ('mrf', 0.079), ('bounds', 0.077), ('loops', 0.074), ('pseudo', 0.074), ('zs', 0.073), ('graphs', 0.066), ('variational', 0.066), ('series', 0.066), ('chertkov', 0.065), ('reparameterization', 0.062), ('ising', 0.06), ('non', 0.059), ('belief', 0.057), ('propagation', 0.057), ('generalized', 0.055), ('expansions', 0.054), ('corrections', 0.054), ('compatibility', 0.054), ('stu', 0.052), ('factor', 0.05), ('core', 0.049), ('edges', 0.048), ('ds', 0.048), ('eld', 0.046), ('ac', 0.041), ('graphical', 0.039), ('chernyak', 0.039), ('mcs', 0.039), ('binary', 0.037), ('field', 0.035), ('ch', 0.035), ('eh', 0.034), ('inhomogeneous', 0.033), ('families', 0.03), ('lower', 0.03), ('log', 0.029), ('degree', 0.029), ('node', 0.028), ('edge', 0.027), ('zt', 0.027), ('satisfying', 0.026), ('undirected', 0.026), ('cumulant', 0.026), ('hypergraphs', 0.026), ('msc', 0.026), ('mtc', 0.026), ('torus', 0.026), ('sudderth', 0.026), ('tree', 0.025), ('wainwright', 0.024), ('marginal', 0.024), ('characterization', 0.024), ('true', 0.024), ('cuts', 0.024), ('central', 0.024), ('messages', 0.023), ('compatibilities', 0.023), ('mooij', 0.023), ('spins', 0.023), ('bound', 0.022), ('naive', 0.022), ('message', 0.022), ('xed', 0.022), ('relaxations', 0.022), ('approximation', 0.021), ('moments', 0.021), ('cycles', 0.021), ('jaakkola', 0.021), ('variables', 0.021), ('approximations', 0.021), ('subset', 0.02), ('conditions', 0.02), ('equals', 0.02), ('physics', 0.019), ('cycle', 0.019), ('spin', 0.019), ('mean', 0.019), ('establish', 0.019), ('neighbors', 0.019), ('var', 0.019), ('connected', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models

Author: Alan S. Willsky, Erik B. Sudderth, Martin J. Wainwright

2 0.17393941 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

Author: Gwenn Englebienne, Tim Cootes, Magnus Rattray

Abstract: The present work aims to model the correspondence between facial motion and speech. The face and sound are modelled separately, with phonemes being the link between both. We propose a sequential model and evaluate its suitability for the generation of the facial animation from a sequence of phonemes, which we obtain from speech. We evaluate the results both by computing the error between generated sequences and real video, as well as with a rigorous double-blind test with human subjects. Experiments show that our model compares favourably to other existing methods and that the sequences generated are comparable to real video sequences. 1

3 0.15933846 102 nips-2007-Incremental Natural Actor-Critic Algorithms

Author: Shalabh Bhatnagar, Mohammad Ghavamzadeh, Mark Lee, Richard S. Sutton

Abstract: We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or inﬁnite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the ﬁrst convergence proofs and the ﬁrst fully incremental algorithms. 1

4 0.12898757 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

Author: Charles L. Isbell, Michael P. Holmes, Alexander G. Gray

Abstract: Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n2 ) or higher, which severely limits application to large datasets. We present a multi-stage stratiﬁed Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sampling in trees. This method differs from many previous scalability techniques (such as standard multi-tree methods) in that its error is stochastic, but we derive conditions for error control and demonstrate that they work. Further, we give a theoretical sample complexity for the method that is independent of dataset size, and show that this appears to hold in experiments, where speedups reach as high as 1014 , many orders of magnitude beyond the previous state of the art. 1

5 0.11897756 141 nips-2007-New Outer Bounds on the Marginal Polytope

Author: David Sontag, Tommi S. Jaakkola

Abstract: We give a new class of outer bounds on the marginal polytope, and propose a cutting-plane algorithm for efﬁciently optimizing over these constraints. When combined with a concave upper bound on the entropy, this gives a new variational inference algorithm for probabilistic inference in discrete Markov Random Fields (MRFs). Valid constraints on the marginal polytope are derived through a series of projections onto the cut polytope. As a result, we obtain tighter upper bounds on the log-partition function. We also show empirically that the approximations of the marginals are signiﬁcantly more accurate when using the tighter outer bounds. Finally, we demonstrate the advantage of the new constraints for ﬁnding the MAP assignment in protein structure prediction. 1

6 0.11255074 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

7 0.10944937 86 nips-2007-Exponential Family Predictive Representations of State

8 0.097012587 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs

9 0.084315971 127 nips-2007-Measuring Neural Synchrony by Message Passing

10 0.080612913 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

11 0.080168203 197 nips-2007-The Infinite Markov Model

12 0.079943962 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

13 0.07495515 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

14 0.072858989 187 nips-2007-Structured Learning with Approximate Inference

15 0.069683708 213 nips-2007-Variational Inference for Diffusion Processes

16 0.068858735 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

17 0.066887237 42 nips-2007-CPR for CSPs: A Probabilistic Relaxation of Constraint Propagation

18 0.065524332 111 nips-2007-Learning Horizontal Connections in a Sparse Coding Model of Natural Images

19 0.064329788 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

20 0.063146949 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.178), (1, -0.092), (2, -0.018), (3, -0.036), (4, -0.052), (5, -0.164), (6, -0.02), (7, 0.09), (8, 0.051), (9, -0.148), (10, 0.004), (11, 0.069), (12, -0.114), (13, 0.107), (14, 0.009), (15, 0.02), (16, -0.009), (17, 0.016), (18, -0.169), (19, 0.172), (20, 0.015), (21, 0.058), (22, 0.252), (23, 0.024), (24, 0.023), (25, 0.186), (26, 0.129), (27, 0.081), (28, 0.09), (29, 0.021), (30, -0.004), (31, -0.04), (32, -0.055), (33, -0.002), (34, -0.038), (35, 0.098), (36, 0.074), (37, -0.021), (38, -0.019), (39, 0.082), (40, -0.029), (41, 0.132), (42, 0.088), (43, -0.174), (44, 0.031), (45, -0.11), (46, 0.075), (47, 0.051), (48, 0.023), (49, -0.091)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9731794 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models

Author: Alan S. Willsky, Erik B. Sudderth, Martin J. Wainwright

2 0.60131562 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

Author: Kyomin Jung, Devavrat Shah

Abstract: We present a new local approximation algorithm for computing MAP and logpartition function for arbitrary exponential family distribution represented by a ﬁnite-valued pair-wise Markov random ﬁeld (MRF), say G. Our algorithm is based on decomposing G into appropriately chosen small components; computing estimates locally in each of these components and then producing a good global solution. We prove that the algorithm can provide approximate solution within arbitrary accuracy when G excludes some ﬁnite sized graph as its minor and G has bounded degree: all Planar graphs with bounded degree are examples of such graphs. The running time of the algorithm is Θ(n) (n is the number of nodes in G), with constant dependent on accuracy, degree of graph and size of the graph that is excluded as a minor (constant for Planar graphs). Our algorithm for minor-excluded graphs uses the decomposition scheme of Klein, Plotkin and Rao (1993). In general, our algorithm works with any decomposition scheme and provides quantiﬁable approximation guarantee that depends on the decomposition scheme.

3 0.57793391 141 nips-2007-New Outer Bounds on the Marginal Polytope

Author: David Sontag, Tommi S. Jaakkola

4 0.52324593 86 nips-2007-Exponential Family Predictive Representations of State

Author: David Wingate, Satinder S. Baveja

Abstract: In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufﬁcient statistic for history is necessary. Predictive representations of state (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the “Exponential family PSR,” which deﬁnes as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a parameter learning algorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the quality of our model with reinforcement learning by directly evaluating the control performance of the model. 1

5 0.50291055 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

Author: Gwenn Englebienne, Tim Cootes, Magnus Rattray

6 0.48934251 127 nips-2007-Measuring Neural Synchrony by Message Passing

7 0.43072316 102 nips-2007-Incremental Natural Actor-Critic Algorithms

8 0.41153958 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

9 0.40144774 191 nips-2007-Temporal Difference Updating without a Learning Rate

10 0.39289758 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs

11 0.38255107 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

12 0.38106772 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

13 0.37586176 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess

14 0.3687377 187 nips-2007-Structured Learning with Approximate Inference

15 0.31050315 197 nips-2007-The Infinite Markov Model

16 0.30435169 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

17 0.30159444 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

18 0.2963233 97 nips-2007-Hidden Common Cause Relations in Relational Learning

19 0.29048103 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

20 0.28047526 87 nips-2007-Fast Variational Inference for Large-scale Internet Diagnosis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.274), (5, 0.055), (13, 0.045), (16, 0.031), (18, 0.02), (21, 0.057), (31, 0.017), (34, 0.073), (35, 0.031), (47, 0.046), (49, 0.015), (83, 0.137), (85, 0.031), (87, 0.013), (90, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80371976 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models

Author: Alan S. Willsky, Erik B. Sudderth, Martin J. Wainwright

2 0.75217116 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

Author: Nicolas Chapados, Yoshua Bengio

Abstract: We introduce a functional representation of time series which allows forecasts to be performed over an unspeciﬁed horizon with progressively-revealed information sets. By virtue of using Gaussian processes, a complete covariance matrix between forecasts at several time-steps is available. This information is put to use in an application to actively trade price spreads between commodity futures contracts. The approach delivers impressive out-of-sample risk-adjusted returns after transaction costs on a portfolio of 30 spreads. 1

3 0.66010189 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

Author: Larry Wasserman, John D. Lafferty

Abstract: Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our ﬁrst result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning. 1

4 0.55715901 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging

Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald

Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we deﬁne the switch-distribution, a modiﬁcation of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efﬁcient algorithm. 1

5 0.55267727 187 nips-2007-Structured Learning with Approximate Inference

Author: Alex Kulesza, Fernando Pereira

Abstract: In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufﬁcient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1

6 0.55246878 63 nips-2007-Convex Relaxations of Latent Variable Training

7 0.55202812 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

8 0.54735363 49 nips-2007-Colored Maximum Variance Unfolding

9 0.54677486 141 nips-2007-New Outer Bounds on the Marginal Polytope

10 0.54668379 66 nips-2007-Density Estimation under Independent Similarly Distributed Sampling Assumptions

11 0.54634684 128 nips-2007-Message Passing for Max-weight Independent Set

12 0.54551756 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

13 0.5454672 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding

14 0.54545379 146 nips-2007-On higher-order perceptron algorithms

15 0.54274476 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

16 0.54218417 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

17 0.54147899 177 nips-2007-Simplified Rules and Theoretical Analysis for Information Bottleneck Optimization and PCA with Spiking Neurons

18 0.54078513 156 nips-2007-Predictive Matrix-Variate t Models

19 0.53940666 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

20 0.5391643 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs