nips nips2000 nips2000-62 knowledge-graph by maker-knowledge-mining

62 nips-2000-Generalized Belief Propagation


Source: pdf

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. [sent-7, score-0.203]

2 We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. [sent-9, score-0.434]

3 More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. [sent-11, score-0.079]

4 Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. [sent-12, score-0.33]

5 Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. [sent-13, score-0.257]

6 These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. [sent-14, score-0.334]

7 We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. [sent-15, score-0.258]

8 1 Introduction Local "belief propagation" (BP) algorithms such as those introduced by Pearl are guaranteed to converge to the correct marginal posterior probabilities in tree-like graphical models. [sent-16, score-0.176]

9 On the one hand, a number of researchers have empirically demonstrated good performance for BP algorithms applied to networks with loops. [sent-18, score-0.081]

10 One dramatic case is the near Shannon-limit performance of "Turbo codes" , whose decoding algorithm is equivalent to BP on a loopy network [2, 6]. [sent-19, score-0.067]

11 For some problems in computer vision involving networks with loops, BP has also shown to be accurate and to converge very quickly [2, 1, 7]. [sent-20, score-0.113]

12 We show that BP is the first in a progression of local message-passing algorithms, each giving equivalent results to a corresponding approximation from statistical physics known as the "Kikuchi" approximation to the Gibbs free energy. [sent-24, score-0.276]

13 These algorithms have the attractive property of being user-adjustable: by paying some additional computational cost, one can obtain considerable improvement in the accuracy of one's approximation, and can sometimes obtain a convergent message-passing algorithm when ordinary BP does not converge. [sent-25, score-0.189]

14 2 Belief propagation fixed-points are zero gradient points of the Bethe free energy We assume that we are given an undirected graphical model of N nodes with pairwise potentials (a Markov network). [sent-26, score-0.555]

15 The state of each node i is denoted by Xi, and the joint probability distribution function is given by P(Xl,X2, . [sent-28, score-0.077]

16 ,XN) = z II 'l/Jij(Xi,Xj) II 'l/Ji(Xi) 1 ij (1) i where 'l/Ji(Xi) is the local "evidence" for node i, 'l/Jij(Xi, Xj) is the compatibility matrix between nodes i and j, and Z is a normalization constant. [sent-31, score-0.32]

17 Note that we are subsuming any fixed evidence nodes into our definition of 'l/Ji(Xi). [sent-32, score-0.129]

18 The standard BP update rules are: mij(Xj) a f- L 'l/Jij (Xi, Xj)'l/Ji(Xi) II mki(xi) kEN(i)\j (2) Xi bi(Xi) a'I/Ji (Xi) f- II mki(xi) (3) kEN(i) where a denotes a normalization constant and N(i)\j means all nodes neighboring node i, except j. [sent-33, score-0.34]

19 Here mij refers to the message that node i sends to node j and bi is the belief (approximate marginal posterior probability) at node i, obtained by multiplying all incoming messages to that node by the local evidence. [sent-34, score-0.937]

20 Similarly, we can define the belief bij(Xi,Xj) at the pair of nodes (Xi, Xj) as the product of the local potentials and all messages incoming to the pair of nodes: bij(Xi, Xj) = acPij(Xi, Xj) ITkEN(i)\j mki(Xi) IT1EN(j)\i mlj (Xj), where cPij(Xi,Xj) 'l/Jij(Xi,Xj)'l/Ji(Xi)'l/Jj(Xj) . [sent-35, score-0.55]

21 = Claim 1: Let {mij} be a set of BP messages and let {bij , bi} be the beliefs calculated from those messages. [sent-36, score-0.389]

22 Then the beliefs are fixed-points of the BP algorithm if and only if they are zero gradient points of the Bethe free energy, Ff3: LL ij bij(Xi,Xj) [In bij(Xi, Xj) -lncPij(xi,Xj)] Xi,Xj Xi subject to the normalization and marginalization constraints: LXi bi(Xi) = 1, L Xi bij(Xi,Xj) = bj(xj). [sent-37, score-0.365]

23 ) To prove this claim we add Lagrange multipliers to form a Lagrangian L: Aij (x j ) is the multiplier corresponding to the constraint that bij (Xi, Xj) marginalizes down to bj(xj), and "(ij, "(i are multipliers corresponding to the normalization constraints. [sent-39, score-0.239]

24 jJi (Xi) + LjEN(i) Aji (Xi) + "(i路 Setting Aij (Xj) = In OkEN(j)\i mkj (Xj) and using the marginalization constraints, we find that the stationary conditions on the Lagrangian are equivalent to the BP fixed-point conditions. [sent-44, score-0.082]

25 (Empirically, we find that stable BP fixed-points correspond to local minima of the Bethe free energy, rather than maxima or saddle-points. [sent-45, score-0.128]

26 The free energy formulation clarifies the relationship to variational approaches which also minimize an approximate free energy [3]. [sent-49, score-0.522]

27 For example, the mean field approximation finds a set of {b i } that minimize: FMF({bd) = - LL ij bi(Xi)bj(xj) In? [sent-50, score-0.099]

28 The BP free energy includes first-order terms bi(Xi) as well as second-order terms bij (Xi, Xj), while the mean field free energy uses only the first order ones. [sent-53, score-0.573]

29 It is easy to show that the BP free energy is exact for trees while the mean field one is not. [sent-54, score-0.248]

30 Kabashima and Saad [4] have previously pointed out the correspondence between BP and the Bethe approximation (expressed using the TAP formalism) for some specific graphical models with random disorder. [sent-56, score-0.095]

31 " [4] 3 Kikuchi Approximations to the Gibbs Free Energy The Bethe approximation, for which the energy and entropy are approximated by terms that involve at most pairs of nodes, is the simplest version of the Kikuchi "cluster variational method. [sent-58, score-0.146]

32 " [5, 10] In a general Kikuchi approximation, the free energy is approximated as a sum of the free energies of basic clusters of nodes, minus the free energy of over-counted cluster intersections, minus the free energy of the over-counted intersections of intersections, and so on. [sent-59, score-1.229]

33 Let R be a set of regions that include some chosen basic clusters of nodes, their intersections, the intersections of the intersections, and so on. [sent-60, score-0.313]

34 The choice of basic clusters determines the Kikuchi approximation- for the Bethe approximation, the basic clusters consist of all linked pairs of nodes. [sent-61, score-0.286]

35 Let Xr be the state of the nodes in region r and br(xr) be the "belief" in X r . [sent-62, score-0.209]

36 We define the energy of a region by Er(xr) == -In TIij 'l/Jij (Xi, Xj) - In TIi 'l/Ji(Xi) == -In 'l/Jr(xr ), where the products are over all interactions contained within the region r. [sent-63, score-0.309]

37 For models with higher than pair-wise interactions, the region energy is generalized to include those interactions as well. [sent-64, score-0.26]

38 The Kikuchi free energy is FK = 2:: (2:: Cr br(xr)Er(xr) + x. [sent-65, score-0.248]

39 rER 2:: (6) br(xr) IOgbr(Xr)) XT' where Cr is the over-counting number of region r, defined by: Cr = 1- LSEStLper(r) C s where super(r) is the set of all super-regions of r. [sent-67, score-0.08]

40 The belief br (Q: r ) in region r has several constraints: it must sum to one and be consistent with the beliefs in regions which intersect with r. [sent-69, score-0.48]

41 In general, increasing the size of the basic clusters improves the approximation one obtains by minimizing the Kikuchi free energy. [sent-70, score-0.319]

42 4 Generalized belief propagation (G BP) Minimizing the Kikuchi free energy subject to the constraints on the beliefs is not simple. [sent-71, score-0.636]

43 Nearly all applications of the Kikuchi approximation in the physics literature exploit symmetries in the underlying physical system and the choice of clusters to reduce the number of equations that need to be solved from O(N) to 0(1). [sent-72, score-0.189]

44 But just as the Bethe free energy can be minimized by the BP algorithm, we introduce a class of analogous genemlized belief propagation (GBP) algorithms that minimize an arbitrary Kikuchi free energy. [sent-73, score-0.66]

45 We present a "canonical" GBP algorithm which has the nice property of reducing to ordinary BP at the Bethe level. [sent-76, score-0.161]

46 We introduce messages mrs(xs) between all regions r and their "direct sub-regions" s. [sent-77, score-0.321]

47 ") It is helpful to think of this as a message from those nodes in r but not in s (which we denote by r\s) to the nodes in s. [sent-79, score-0.35]

48 Intuitively, we want messages to propagate information that lies outside of a region into it. [sent-80, score-0.356]

49 Thus, for a given region r, we want the belief br(xr) to depend on exactly those messages mr,s, that start outside ofthe region r and go into the region r. [sent-81, score-0.636]

50 We define this set of messages M(r) to be those messages mr,s, (xs,) such that region r'\s' has no nodes in common with region r, and such that region s' is a sub-region of r or the same as region r. [sent-82, score-1.001]

51 We also define the set M(r, s) of messages to be all those messages that start in a sub-region of r and also belong to M(s), and we define M(r)\M(s) to be those messages that are in M(r) but not in M(s). [sent-83, score-0.828]

52 The canonical generalized belief propagation update rules are: mrs +- Q: [2:: br II 'l/Jr\s(xr\s) m,1I ," X'\' +- Q:'l/Jr(x r ) II EM(r)\M(s) mr,s, II mrllsll] / m,' " mr,s, (7) EM(r,s) (8) m" . [sent-84, score-0.679]

53 ,EM(r) where for brevity we have suppressed the functional dependences of the beliefs and messages. [sent-85, score-0.113]

54 The messages are updated starting with the messages into the smallest regions first. [sent-86, score-0.597]

55 One can then use the newly computed messages in the product over M(r, s) of the message-update rule. [sent-87, score-0.276]

56 Claim 2: Let {mrs(xs)} be a set of canonical GBP messages and let {br(xr)} be the beliefs calculated from those messages. [sent-89, score-0.54]

57 Then the beliefs are fixed-points of the canonical GBP algorithm if and only if they are zero gradient points of the constrained Kikuchi free energy FK. [sent-90, score-0.512]

58 We prove this claim by adding Lagrange multipliers: 'Yr to enforce the normalization of br and Ars(Xs) to enforce the consistency of each region r with all of its direct sub-regions s. [sent-91, score-0.374]

59 This set of consistency constraints is actually more than sufficient, but there is no harm in adding extra constraints. [sent-92, score-0.083]

60 We then rotate to another set of Lagrange multipliers /l-rs(xs) of equal dimensionality which enforce a linear combination of the original constraints: /l-rs (xs) enforces all those constraints involving marginalizations by all direct super-regions r' of s into s except that of region r itself. [sent-93, score-0.274]

61 , b(x~) where R(/l-rs) is the set of all regions which receive the message /l-rs in the belief update rule of the canonical algorithm. [sent-96, score-0.468]

62 ) Finally, we differentiate the Kikuchi free energy with respect to br(r), and identify /l-rs(xs) = lnmrs(xs) to obtain the canonical GBP belief update rules, Eq. [sent-102, score-0.579]

63 Using the belief update rules in the marginalization constraints, we obtain the canonical GBP message update rules, Eq. [sent-104, score-0.566]

64 It is clear from this proof outline that other GBP message passing algorithms which are equivalent to the Kikuchi approximation exist. [sent-106, score-0.208]

65 If one writes any set of constraints which are sufficient to insure the consistency of all Kikuchi regions, one can associate the exponentiated Lagrange multipliers of those constraints with a set of messages. [sent-107, score-0.18]

66 The GBP algorithms we have described solve exactly those networks which have the topology of a tree of basic clusters. [sent-108, score-0.083]

67 This is reminiscent of Pearl's method of clustering [8], wherein wherein one groups clusters of nodes into "super-nodes," and then applies a belief propagation method to the equivalent super-node lattice. [sent-109, score-0.564]

68 We can show that the clustering method, using Kikuchi clusters as super-nodes, also gives results equivalent to the Kikuchi approximation for those lattices and cluster choices where there are no intersections between the intersections of the Kikuchi basic clusters. [sent-110, score-0.588]

69 5 Application to Specific Lattices We illustrate the canonical G BP algorithm for the Kikuchi approximation of overlapping 4-node clusters on a square lattice of nodes. [sent-112, score-0.451]

70 Figure 1 (a), (b), (c) illustrates the beliefs at a node, pair of nodes, and at a cluster of 4 nodes, in terms of messages propagated in the network. [sent-113, score-0.438]

71 Vectors are the single index messages also used in ordinary BP. [sent-114, score-0.437]

72 Vectors with line segments indicate the double-indexed messages arising from the Kikuchi approximation used here. [sent-115, score-0.324]

73 These can be thought of as correction terms accounting for correlations between messages that ordinary BP treats as in- dependent. [sent-116, score-0.437]

74 1 (d), (e), (f) shows the corresponding marginal computations for the triangular lattice with all triangles chosen as the basic Kikuchi clusters). [sent-118, score-0.19]

75 We find the message update rules by equating marginalizations of Fig. [sent-119, score-0.225]

76 The update rule (a) is like that for ordinary BP, with the addition of two double-indexed messages. [sent-123, score-0.221]

77 The update rule for the double-indexed messages involves division by the newly-computed singleindexed messages. [sent-124, score-0.361]

78 Fixed points of these message update equations give beliefs that are stationary points (empirically minima) of the corresponding Kikuchi approximation to the free energy. [sent-125, score-0.461]

79 \~/\ Figure 1: Marginal probabilities in terms of the node links and GBP messages. [sent-135, score-0.1]

80 For (a) node, (b) line, (c) square cluster, using a Kikuchi approximation with 4-node clusters on a square lattice. [sent-136, score-0.228]

81 8, written here using node labels): bab(X a , Xb) = a'I/Jab(Xa, Xb)'l/Ja(Xa)'l/Jb(Xb)M~M~M~M:t Mt M~ Mt: M~~ , where super and subscripts indicate which nodes message M goes from and to. [sent-140, score-0.329]

82 (d), (e), (f): Marginal probabilities for triangular lattice with 3-node Kikuchi clusters. [sent-141, score-0.14]

83 a b (a) Figure 2: Graphical depiction of message update equations (Eq. [sent-143, score-0.152]

84 7; marginalize over nodes shown unfilled) for GBP using overlapping 4-node Kikuchi clusters. [sent-144, score-0.15]

85 (b) Update equation for doubleindexed messages (involves a division by the single-index messages on the left hand side) . [sent-147, score-0.577]

86 We constructed such a network, known in the physics literature as the square lattice Ising spin glass in a random magnetic field. [sent-149, score-0.171]

87 The nodes are on a square lattice, with nearest neighbor nodes exp(J,路路) . [sent-150, score-0.293]

88 )) J, (J, exp - ij exp ij and local evidence vectors of the form 'l/Ji = (exp(hi)jexp(-h i )). [sent-152, score-0.102]

89 The following results are for n by n lattices with toroidal boundary conditions and with J = 1, and h = 0. [sent-154, score-0.074]

90 This model is designed to show off the weaknesses of ordinary BP, which performs well for many other networks. [sent-156, score-0.161]

91 Ordinary BP is a special case of canonical G BP, so we exploited this to use the same general-purpose GBP code for both ordinary BP and canonical GBP using overlapping square fournode clusters, thus making computational cost comparisons reasonable. [sent-157, score-0.519]

92 We started with randomized messages and only stepped half-way towards the computed values of the messages at each iteration in order to help convergence. [sent-158, score-0.552]

93 We found that canonical G BP took about twice as long as ordinary BP per iteration, but would typically reach a given level of convergence in many fewer iterations. [sent-159, score-0.312]

94 In fact, for the majority of the dozens of samples that we looked at, BP did not converge at all, while canonical GBP always converged for this model and always to accurate answers. [sent-160, score-0.223]

95 (We found that for the zero-field 3-dimensional spin glass with toroidal boundary conditions, which is an even more difficult model, canonical GBP with 2x2x2 cubic clusters would also fail to converge). [sent-161, score-0.307]

96 For n = 20 or larger, it was difficult to make comparisons with any other algorithm, because ordinary BP did not converge and Monte Carlo simulations suffered from extremely slow equilibration. [sent-162, score-0.199]

97 However, generalized belief propagation converged reasonably rapidly to plausible-looking beliefs. [sent-163, score-0.257]

98 To give a qualitative feel for the results, we compare ordinary BP, canonical GBP, and the exact results for an n = 10 lattice where ordinary BP did converge. [sent-165, score-0.559]

99 Listing the values of the one-node marginal probabilities in one of the rows, we find that ordinary BP gives (. [sent-166, score-0.224]

100 Loopy belief propagation for approximate inference: an empirical study. [sent-246, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kikuchi', 0.47), ('bp', 0.415), ('gbp', 0.314), ('messages', 0.276), ('ordinary', 0.161), ('canonical', 0.151), ('bethe', 0.149), ('xs', 0.149), ('xr', 0.147), ('nodes', 0.129), ('free', 0.128), ('intersections', 0.125), ('br', 0.122), ('energy', 0.12), ('belief', 0.12), ('beliefs', 0.113), ('clusters', 0.11), ('xi', 0.108), ('propagation', 0.106), ('message', 0.092), ('xj', 0.091), ('lattice', 0.086), ('xb', 0.081), ('region', 0.08), ('node', 0.077), ('bij', 0.077), ('cr', 0.064), ('update', 0.06), ('loops', 0.057), ('bi', 0.054), ('ij', 0.051), ('cluster', 0.049), ('constraints', 0.049), ('approximation', 0.048), ('multipliers', 0.048), ('pearl', 0.048), ('lattices', 0.047), ('mij', 0.047), ('mki', 0.047), ('mrs', 0.047), ('graphical', 0.047), ('regions', 0.045), ('rules', 0.042), ('xa', 0.041), ('marginalization', 0.041), ('mt', 0.04), ('marginal', 0.04), ('converge', 0.038), ('aij', 0.037), ('lagrange', 0.035), ('square', 0.035), ('consistency', 0.034), ('turbo', 0.034), ('phone', 0.034), ('accurate', 0.034), ('claim', 0.034), ('basic', 0.033), ('normalization', 0.032), ('aji', 0.031), ('broadway', 0.031), ('compatibility', 0.031), ('fmf', 0.031), ('jji', 0.031), ('lnbi', 0.031), ('marginalizations', 0.031), ('merl', 0.031), ('super', 0.031), ('triangular', 0.031), ('generalized', 0.031), ('physics', 0.031), ('empirically', 0.031), ('minimized', 0.03), ('clustering', 0.03), ('interactions', 0.029), ('algorithms', 0.028), ('bj', 0.027), ('toroidal', 0.027), ('variational', 0.026), ('decoding', 0.026), ('division', 0.025), ('enforce', 0.025), ('potentials', 0.025), ('yedidia', 0.024), ('ken', 0.024), ('kabashima', 0.024), ('wherein', 0.024), ('bd', 0.023), ('probabilities', 0.023), ('direct', 0.022), ('networks', 0.022), ('berkeley', 0.021), ('equivalent', 0.021), ('overlapping', 0.021), ('lagrangian', 0.02), ('loopy', 0.02), ('minus', 0.02), ('stationary', 0.02), ('involving', 0.019), ('passing', 0.019), ('spin', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

2 0.15800074 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

3 0.11000451 97 nips-2000-Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Author: Rich Caruana, Steve Lawrence, C. Lee Giles

Abstract: The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

4 0.10999725 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

5 0.10706961 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

Author: Renato Vicente, David Saad, Yoshiyuki Kabashima

Abstract: We analyze Gallager codes by employing a simple mean-field approximation that distorts the model geometry and preserves important interactions between sites. The method naturally recovers the probability propagation decoding algorithm as an extremization of a proper free-energy. We find a thermodynamic phase transition that coincides with information theoretical upper-bounds and explain the practical code performance in terms of the free-energy landscape.

6 0.10271623 114 nips-2000-Second Order Approximations for Probability Models

7 0.09035708 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

8 0.079049475 103 nips-2000-Probabilistic Semantic Video Indexing

9 0.078062087 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

10 0.071333848 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

11 0.069446437 13 nips-2000-A Tighter Bound for Graphical Models

12 0.069383681 143 nips-2000-Using the Nyström Method to Speed Up Kernel Machines

13 0.063267648 12 nips-2000-A Support Vector Method for Clustering

14 0.060541268 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

15 0.059665747 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

16 0.054072075 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

17 0.047552947 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

18 0.046974879 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

19 0.043065283 148 nips-2000-`N-Body' Problems in Statistical Learning

20 0.0426763 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.152), (1, -0.001), (2, 0.118), (3, -0.053), (4, 0.178), (5, 0.011), (6, 0.051), (7, -0.001), (8, -0.046), (9, 0.252), (10, -0.028), (11, 0.07), (12, -0.138), (13, 0.05), (14, 0.034), (15, -0.044), (16, -0.08), (17, 0.093), (18, 0.142), (19, -0.012), (20, -0.039), (21, 0.203), (22, -0.18), (23, 0.104), (24, 0.075), (25, 0.009), (26, -0.032), (27, 0.022), (28, -0.099), (29, -0.036), (30, -0.129), (31, -0.052), (32, -0.055), (33, -0.263), (34, -0.131), (35, 0.107), (36, 0.056), (37, -0.021), (38, 0.088), (39, -0.081), (40, -0.063), (41, -0.149), (42, -0.011), (43, -0.1), (44, -0.058), (45, -0.021), (46, -0.138), (47, -0.066), (48, 0.037), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96256936 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

2 0.56311274 97 nips-2000-Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Author: Rich Caruana, Steve Lawrence, C. Lee Giles

Abstract: The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

3 0.53585875 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.42431572 143 nips-2000-Using the Nyström Method to Speed Up Kernel Machines

Author: Christopher K. I. Williams, Matthias Seeger

Abstract: unkown-abstract

5 0.39787811 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

6 0.34744614 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

7 0.34381536 103 nips-2000-Probabilistic Semantic Video Indexing

8 0.28721535 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

9 0.27627981 12 nips-2000-A Support Vector Method for Clustering

10 0.27041662 114 nips-2000-Second Order Approximations for Probability Models

11 0.25709048 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

12 0.25572273 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

13 0.21073571 93 nips-2000-On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

14 0.19555306 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

15 0.18258484 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

16 0.18110055 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

17 0.17369109 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

18 0.17204365 13 nips-2000-A Tighter Bound for Graphical Models

19 0.17092836 61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets

20 0.15872829 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.012), (17, 0.055), (32, 0.024), (33, 0.042), (35, 0.014), (54, 0.014), (55, 0.013), (62, 0.038), (65, 0.055), (67, 0.068), (75, 0.027), (76, 0.039), (79, 0.012), (81, 0.033), (90, 0.041), (91, 0.034), (92, 0.365), (97, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84693718 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

2 0.75637746 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

Author: Renato Vicente, David Saad, Yoshiyuki Kabashima

Abstract: We analyze Gallager codes by employing a simple mean-field approximation that distorts the model geometry and preserves important interactions between sites. The method naturally recovers the probability propagation decoding algorithm as an extremization of a proper free-energy. We find a thermodynamic phase transition that coincides with information theoretical upper-bounds and explain the practical code performance in terms of the free-energy landscape.

3 0.31755874 68 nips-2000-Improved Output Coding for Classification Using Continuous Relaxation

Author: Koby Crammer, Yoram Singer

Abstract: Output coding is a general method for solving multiclass problems by reducing them to multiple binary classification problems. Previous research on output coding has employed, almost solely, predefined discrete codes. We describe an algorithm that improves the performance of output codes by relaxing them to continuous codes. The relaxation procedure is cast as an optimization problem and is reminiscent of the quadratic program for support vector machines. We describe experiments with the proposed algorithm, comparing it to standard discrete output codes. The experimental results indicate that continuous relaxations of output codes often improve the generalization performance, especially for short codes.

4 0.30715871 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Author: Thomas Natschläger, Wolfgang Maass, Eduardo D. Sontag, Anthony M. Zador

Abstract: Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their

5 0.30391607 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

6 0.3036961 146 nips-2000-What Can a Single Neuron Compute?

7 0.30027428 85 nips-2000-Mixtures of Gaussian Processes

8 0.29963878 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

9 0.29929653 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

10 0.2969563 74 nips-2000-Kernel Expansions with Unlabeled Examples

11 0.29485306 79 nips-2000-Learning Segmentation by Random Walks

12 0.29429039 122 nips-2000-Sparse Representation for Gaussian Process Models

13 0.29317227 38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method

14 0.29156521 134 nips-2000-The Kernel Trick for Distances

15 0.28942367 22 nips-2000-Algorithms for Non-negative Matrix Factorization

16 0.28920412 78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

17 0.2888945 36 nips-2000-Constrained Independent Component Analysis

18 0.28843093 52 nips-2000-Fast Training of Support Vector Classifiers

19 0.28792208 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

20 0.28770816 49 nips-2000-Explaining Away in Weight Space