nips nips2001 nips2001-188 knowledge-graph by maker-knowledge-mining

188 nips-2001-The Unified Propagation and Scaling Algorithm


Source: pdf

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. [sent-8, score-0.687]

2 The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. [sent-9, score-0.851]

3 This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. [sent-10, score-1.267]

4 1 Introduction For many interesting models, exact inference is intractible. [sent-12, score-0.161]

5 BP on loopy graphs can still be understood as a form of approximate inference since its fixed points are stationary points of the Bethe free energy [2]. [sent-14, score-0.971]

6 A seemingly unrelated problem is that of finding the distribution with minimim KL divergence to a prior distribution subject to some constraints. [sent-15, score-0.123]

7 This problem can be solved through the iterative scaling (IS) procedure [3]. [sent-16, score-0.164]

8 Although a lot of work has been done on approximate inference, there seems to be no counterpart in the literature on approximate minimum divergence problems. [sent-17, score-0.252]

9 This paper shows that the Bethe free energy can be used as an approximation to the KL divergence and derives a novel approximate minimum divergence algorithm which we call unified propagation and scaling (UPS). [sent-18, score-0.794]

10 In section 2 we introduce generalized inference and the iterative scaling (IS) algorithm. [sent-19, score-0.433]

11 In section 3, we approximate the KL divergence with the Bethe free energy and derive fixed point equations to perform approximate generalized inference. [sent-20, score-0.601]

12 We also show in what sense our fixed point equations are related to loopy BP and IS. [sent-21, score-0.635]

13 Section 4 describes unified propagation and scaling (UPS), a novel algorithm to minimize the Bethe free energy, while section 5 shows experiments on the efficiency and accuracy of UPS. [sent-22, score-0.472]

14 2 Generalized Inference In this section we will introduce generalized inference and review some of the literature on iterative scaling (IS). [sent-23, score-0.433]

15 Let     ©§ ¨ ¦     £   ¤ £ ¢   ¡ ¥    £¡ ' ( where , , ranges over the edges of , ranges over the nodes of and is the number of neighbours of . [sent-29, score-0.209]

16 Generalized inference is the process by which we determine the generalized posterior1. [sent-34, score-0.269]

17 0 34 0 5 T then the generalized posterior is   0 for each . [sent-38, score-0.186]

18 1 W ¥¦¨ § £  ¡   ¦ ¦ ¦     ¦ ¦    ¦ ¡ ¦    ¦     ¡ §    P 3 3 Theorem 1 If U V P U a 3 a b ` Y X 43 c  d e ¥ 5 where for a subset of nodes . [sent-39, score-0.117]

19 The above theorem shows that if the constrained marginals are delta functions, i. [sent-41, score-0.304]

20 the observations are hard, then the generalized posterior reduces to a trivial extension of the ordinary posterior, hence explaining our use of the term generalized inference. [sent-43, score-0.447]

21 g f d e g f $ Since generalized inference is a constrained minimum divergence problem, a standard way and , let be the of solving it is using Lagrange multipliers. [sent-44, score-0.454]

22 IS is a specific case of the generalized iterative scaling (GIS) algorithm [4], which updates the Lagrange multipliers for a subset of nodes using . [sent-49, score-0.591]

23 Parallel GIS steps can be understood as performing IS updates in parallel, but damping the steps such that the algorithm is still guaranteed to converge. [sent-50, score-0.209]

24 5     ¦  ¦  „ Q† …£ „ ƒ ¦    % % 43 ‚ y % € C   % C  u % r p  % % p r Ordinary inference is needed to compute the current marginals required by (4). [sent-51, score-0.358]

25 If is singly connected, then belief propagation (BP) can be used to compute the required marginals. [sent-52, score-0.319]

26 Otherwise, exact inference or sampling algorithms like Markov chain Monte Carlo can be used, but usually are computationally taxing. [sent-53, score-0.187]

27 Alternative approximate inference algorithms like variational methods and loopy BP can be used instead to estimate the 5  1 To avoid confusion, we will explicitly use “ordinary inference” for normal inference, but when there is no confusion “inference” by itself will mean generalized inference. [sent-54, score-0.934]

28 A more principled approach is to first approximate the KL divergence, then derive algorithms to minimize the approximation. [sent-59, score-0.125]

29 In the next section, we describe a Bethe free energy approximation to the KL divergence. [sent-60, score-0.189]

30 The fixed point equations reduce to BP propagation updates at hidden nodes, and to IS scaling updates at observed nodes. [sent-62, score-0.56]

31 As a consequence, using loopy BP to approximate the required marginals turns out to be a particular scheduling of the fixed point equations. [sent-63, score-0.932]

32 Because the Bethe free energy is fairly well understood, and is quite accurate in many regimes [5, 2, 6], we conclude that IS with loopy BP is a viable approximate generalized inference technique. [sent-64, score-1.076]

33 However, in section 4 we describe more efficient algorithms for approximate generalized inference based upon the Bethe free energy. [sent-65, score-0.452]

34 We can also use Lagrange multipliers multipliers to impose the normalization and observational constraints as well, but this reduces to simply keeping and normalized, and keeping fixed for . [sent-72, score-0.234]

35 Setting derivatives of and to 0, we get 7        ©¦   ¦   where to (7) , For a quick example, consider a two node Boltzmann machine, with weight and biases and the desired means on both nodes are . [sent-76, score-0.255]

36 Then using either naive mean field or naive TAP equations to estimate the marginals required by IS will not converge. [sent-77, score-0.283]

37 If is a delta function, then (11) reduces to so that instead of bouncing back, messages going into node get absorbed. [sent-80, score-0.248]

38 Updating each using (4), where we identify updating (12)   C  % r tp i 0   ¦  ¦   ¦    ¦   ¦ §   H   ¦  ¦ ¨ d A  i §  ¦ ¦ §  5 Theorem 3 states the unexpected result that scaling updates (4) are just fixed point equations to minimize . [sent-84, score-0.388]

39 Further, the required marginals are computed using (9), which is exactly loopy BP. [sent-85, score-0.798]

40 Hence using loopy BP to approximate the marginals required by IS is just a particular scheduling of the fixed point equations (9,10). [sent-86, score-0.981]

41 ¦   5 § 3©§ ¤ ¥ 4 Algorithms to Minimize the Bethe Free Energy Inspired by [2], we can run run the fixed point equations (9,10) and hope that they converge . [sent-87, score-0.23]

42 In simulations we find that it IS converges it will converge to stationary points of always gets to a good local minimum, if not the global minimum. [sent-90, score-0.106]

43 However loopy IS does not necessarily converge, especially when the variables are strongly correlated. [sent-91, score-0.564]

44 Firstly, the loopy BP component (9) may fail to converge. [sent-93, score-0.594]

45 However this is not serious as past results indicate that loopy BP often fails only when the Bethe approximation is not accurate [6]. [sent-94, score-0.59]

46 Secondly, the IS component (10) may fail to converge, since it is not run sequentially and the estimated marginals are inaccurate. [sent-95, score-0.274]

47 We will show in section 5 that this is a serious problem for loopy IS. [sent-96, score-0.59]

48 § ¨©¨§¥ ¤ § ¥ ¨©¨§¤ One way to mitigate the second problem is to use the scaling updates (4), and approximate the required marginals using an inner phase of loopy BP (call this algorithm IS+BP). [sent-97, score-1.086]

49 Theorem 3 shows that IS+BP is just a particular scheduling of loopy IS, hence it inherits the accuracy of loopy IS while converging more often. [sent-98, score-1.226]

50 However because we have to run loopy BP until convergence for each scaling update, IS+BP is not particularly efficient. [sent-99, score-0.762]

51 Another way to promote convergence is to damp the loopy IS updates. [sent-100, score-0.599]

52 same fixed point equations (9,10) which is guaranteed to converge without damping. [sent-105, score-0.15]

53 1 we describe UPS-T, an algorithm which applies when is a tree and the Obs constraints are on the leaves of . [sent-107, score-0.168]

54 1 Constraining the leaves of trees  1 § ¡  § ¨©¨§3©¥ ¥§ ¤ ¤  §  Suppose that is a tree, and all observed nodes are leaves of . [sent-111, score-0.214]

55 Since is a tree, the Bethe free energy is exact, i. [sent-112, score-0.166]

56 Therefore if the fixed point equations (9,10) converge, they will converge to the unique global minimum. [sent-116, score-0.15]

57 Further, since (9) is exactly a propagation update, and (10) is exactly a scaling update, the following scheduling of (9,10) will always converge: alternately run (9) until convergence and perform a single (10) update. [sent-117, score-0.416]

58 The schedule essentially implements the IS+BP procedure, except that loopy BP is exact for a tree. [sent-118, score-0.601]

59 Our algorithm essentially implements the scheduling, except that unnecessary propagation updates are not performed. [sent-119, score-0.271]

60 Perform scaling update (10) for , where is the unique neighbour of . [sent-127, score-0.178]

61 For each edge on path from to , apply propagation update (9) for DDD    . [sent-129, score-0.178]

62 However we can make use of the For graphs with cycles, fact that it is exact on trees to find a local minimum (or saddle point). [sent-137, score-0.131]

63 The idea is that we clamp a number of hidden nodes to their current marginals such that the rest of the hidden nodes become singly connected, and apply UPS-T. [sent-138, score-0.588]

64 Once UPS-T has converged, we clamp a different set of hidden nodes and apply UPS-T again. [sent-139, score-0.174]

65 The algorithm can be understood as coordinate descent where we minimize with respect to the unclamped nodes at each iteration. [sent-140, score-0.234]

66 § ¨©¨§¥ ¤  be a set of clamped nodes such that every loop in the graph contains a node . [sent-141, score-0.228]

67 For each node replicate it times, and connect each replica to one neighbour of and no other nodes. [sent-143, score-0.102]

68 U (14) ¨ U Q are the original nodes in we a § ¨©¨§¥ ¤ ¢   U U U  U U d U    ¢ U U  U U . [sent-149, score-0.117]

69 #  F G and %  F P ) define in U  ¦ B w ¡ B C ED0 ¦ ) 4 For Similarly for F H T  F G where is the number of neighbours of node can show the following:   Let from . [sent-153, score-0.117]

70 ¢ ¦   B FQ † d ( ¥ €9 FQ 5  ¡ ¨©¨§¥ ¤ § ¦   §  ¦     ©¦   ¡    ©¦   ¦ ¦ 1 S 7 h6 B 3 for B Theorem 4 Let by 3 U TQ S R § 3©§ ¤ ¥ To minimize , now all we have to do is to minimize each individually. [sent-156, score-0.102]

71 By clamping the marginals of nodes in , we have reduced the problem to one solved by UPS-T, where the observed nodes are taken to include those in . [sent-158, score-0.522]

72   $  # " Find a set of nodes such that every loopy is broken by Using UPS-T, set and Obs constraints are satisfied . [sent-164, score-0.745]

73 Now by using the fact that both It is clear that scaling and propagation updates are fixed point equations for finding stationary points of we have,  ©   © & 5 5  ¤1  with , then will converge with MN and Obs constraints satisfied. [sent-170, score-0.611]

74 In the first experiment we compared the speed of convergence against other methods which minimize . [sent-173, score-0.147]

75 In the second experiment we compared the accuracy of UPS against loopy IS. [sent-174, score-0.62]

76 The desired marginals are where are sampled from a Gaussian with mean 0 and standard deviation . [sent-178, score-0.232]

77 We find that the result is not sensitive to the settings of and so long as the algorithms are able to converge without damping. [sent-183, score-0.105]

78 IS+BP and GIS+BP are slow because the loopy BP phase is expensive. [sent-185, score-0.564]

79 However the next experiment shows that it also converges less frequently and there is a trade off between the speed of loopy IS and the stability of UPS. [sent-189, score-0.625]

80 ¥G ¢ X Y¡ V G HG   ¦ ¦ i ¢ £¡ ¥ G B ¡ VG HG (b) (c) (d) (e) 5 Number of Updates (a) 10 4 10 3 10 GIS+BP IS+BP UPS−H UPS−HV loopy IS Figure 1: (a) Network structure. [sent-190, score-0.564]

81 Circles are hidden nodes and black squares are observationally constrained nodes. [sent-191, score-0.196]

82 (d) Replicating each clamped and observed node in (c). [sent-195, score-0.111]

83 An algorithm or subroutine is considered converged if the beliefs change by less than ¤¡ 5 F  Experiment 2 Accuracy of Estimated Marginals We compared the accuracy of the posterior marginals obtained using UPS-HV and loopy IS for four possible types of constraints, as shown in figure 2. [sent-199, score-0.953]

84 In case (a), the constraint marginals are delta functions, so that generalized inference reduces down to ordinary inference, loopy IS reduces to loopy BP and UPS becomes a convergent alternative to loopy BP. [sent-200, score-2.427]

85 In case (b), we did not enforce any Obs constraints so that the problem is one of estimating the marginals of the prior . [sent-201, score-0.268]

86 The general trend is that loopy BP and UPS are comparable, and they perform worse as weights get larger, biases get smaller or there is less evidence. [sent-202, score-0.658]

87 Further, we see that when loopy BP did not converge, UPS’s estimates are not better than loopy BP’s estimates. [sent-204, score-1.128]

88 In these cases UPS and loopy IS did equally well when the out over latter converged, but UPS continued to perform well even when loopy IS did not converge. [sent-209, score-1.128]

89 Since loopy BP always converged when UPS performed well (for cases (a) and (b)), and we used very high damping, we conclude that loopy IS’s failure to converge must be due to performing scaling updates before accurate marginals were available. [sent-210, score-1.676]

90 Concluding, we see that UPS is comparable to loopy IS when generalized inference reduces to ordinary inference, but in the presence of Obs constraints it is better. [sent-211, score-1.013]

91 ¢  ¢ £ ¤ F¢ ¢ ¤ 6 Discussion In this paper we have shown that approximating the KL divergence with the Bethe free energy leads to viable algorithms for approximate generalized inference. [sent-212, score-0.561]

92 We also find that there is an interesting and fruitful relationship between IS and loopy BP. [sent-213, score-0.564]

93 Our novel algorithm UPS can also be used as a convergent alternative to loopy BP for ordinary inference. [sent-214, score-0.74]

94 Interesting extensions are to cluster nodes together to get more accurate approximations to the KL divergence analogous to the Kikuchi free energy, and to handle marginal constraints over subsets of nodes. [sent-215, score-0.417]

95 Another interesting direction for future work is algorithms for learning in log linear models by approximating the free energy. [sent-218, score-0.136]

96 The top plots show errors for loopy IS and bottom plots show errors for UPS. [sent-228, score-0.564]

97 The inset shows the cases (black) when loopy IS did not converge within 2000 iterations, with linear damping slowly . [sent-229, score-0.693]

98 Loopy belief propagation for approximate inference : An empirical study. [sent-253, score-0.394]

99 Belief optimization for binary networks : A stable alternative to loopy belief propagation. [sent-260, score-0.657]

100 CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. [sent-265, score-0.231]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('loopy', 0.564), ('bp', 0.382), ('ups', 0.269), ('obs', 0.213), ('marginals', 0.204), ('bethe', 0.184), ('propagation', 0.154), ('generalized', 0.145), ('inference', 0.124), ('scaling', 0.123), ('divergence', 0.123), ('nodes', 0.117), ('gis', 0.107), ('updates', 0.093), ('mn', 0.091), ('kl', 0.09), ('free', 0.086), ('clamping', 0.084), ('energy', 0.08), ('converge', 0.079), ('fq', 0.077), ('ordinary', 0.076), ('lagrange', 0.072), ('node', 0.071), ('belief', 0.068), ('singly', 0.067), ('constraints', 0.064), ('scheduling', 0.064), ('uni', 0.06), ('xed', 0.055), ('convergent', 0.051), ('minimize', 0.051), ('damping', 0.05), ('tp', 0.05), ('messages', 0.049), ('converged', 0.049), ('equations', 0.049), ('multipliers', 0.048), ('approximate', 0.048), ('ddd', 0.046), ('neighbours', 0.046), ('welling', 0.043), ('understood', 0.042), ('theorem', 0.041), ('iterative', 0.041), ('posterior', 0.041), ('uf', 0.04), ('clamped', 0.04), ('reduces', 0.04), ('biases', 0.04), ('run', 0.04), ('trees', 0.039), ('speed', 0.039), ('exact', 0.037), ('beliefs', 0.037), ('convergence', 0.035), ('accuracy', 0.034), ('observational', 0.034), ('minimum', 0.033), ('clamp', 0.031), ('bouncing', 0.031), ('neighbour', 0.031), ('cccp', 0.031), ('delta', 0.03), ('fail', 0.03), ('required', 0.03), ('leaves', 0.029), ('constrained', 0.029), ('marginalization', 0.029), ('viable', 0.029), ('tree', 0.028), ('sampled', 0.028), ('get', 0.027), ('stationary', 0.027), ('confusion', 0.027), ('vg', 0.027), ('ef', 0.027), ('ed', 0.026), ('algorithms', 0.026), ('serious', 0.026), ('gures', 0.026), ('subsection', 0.026), ('met', 0.026), ('hidden', 0.026), ('gure', 0.025), ('alternative', 0.025), ('rows', 0.025), ('kikuchi', 0.024), ('update', 0.024), ('algorithm', 0.024), ('approximating', 0.024), ('satis', 0.024), ('black', 0.024), ('describe', 0.023), ('boltzmann', 0.023), ('ranges', 0.023), ('college', 0.023), ('experiment', 0.022), ('point', 0.022), ('saddle', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

2 0.25858638 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1

3 0.23733485 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

Abstract: We develop a tree-based reparameterization framework that provides a new conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including a new characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy. Our results also have natural extensions to more structured approximations [e.g. , 1, 2]. 1

4 0.18162327 196 nips-2001-Very loopy belief propagation for unwrapping phase images

Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic

Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances

5 0.14952168 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

Author: Shiro Ikeda, Toshiyuki Tanaka, Shun-ichi Amari

Abstract: The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true “belief” by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.

6 0.13880467 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

7 0.091821015 180 nips-2001-The Concave-Convex Procedure (CCCP)

8 0.084876053 115 nips-2001-Linear-time inference in Hierarchical HMMs

9 0.084363803 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

10 0.076660827 190 nips-2001-Thin Junction Trees

11 0.071370021 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

12 0.066193834 21 nips-2001-A Variational Approach to Learning Curves

13 0.066144377 118 nips-2001-Matching Free Trees with Replicator Equations

14 0.060561124 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

15 0.059206557 3 nips-2001-ACh, Uncertainty, and Cortical Inference

16 0.052153181 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks

17 0.051461857 84 nips-2001-Global Coordination of Local Linear Models

18 0.050656047 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

19 0.050508171 17 nips-2001-A Quantitative Model of Counterfactual Reasoning

20 0.046280932 169 nips-2001-Small-World Phenomena and the Dynamics of Information


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.16), (1, 0.005), (2, 0.045), (3, -0.085), (4, -0.026), (5, -0.334), (6, 0.024), (7, -0.248), (8, 0.12), (9, 0.162), (10, -0.061), (11, -0.034), (12, 0.05), (13, 0.051), (14, -0.014), (15, 0.057), (16, -0.157), (17, 0.11), (18, 0.001), (19, -0.051), (20, 0.099), (21, -0.179), (22, -0.178), (23, -0.048), (24, -0.04), (25, -0.002), (26, -0.072), (27, 0.029), (28, 0.028), (29, -0.104), (30, -0.043), (31, -0.031), (32, -0.051), (33, 0.052), (34, 0.028), (35, -0.108), (36, -0.048), (37, 0.031), (38, -0.123), (39, -0.106), (40, 0.073), (41, -0.134), (42, 0.048), (43, 0.075), (44, 0.089), (45, 0.084), (46, -0.019), (47, -0.005), (48, 0.027), (49, 0.094)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97862458 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

2 0.89833641 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

Abstract: We develop a tree-based reparameterization framework that provides a new conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including a new characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy. Our results also have natural extensions to more structured approximations [e.g. , 1, 2]. 1

3 0.77350718 196 nips-2001-Very loopy belief propagation for unwrapping phase images

Author: Brendan J. Frey, Ralf Koetter, Nemanja Petrovic

Abstract: Since the discovery that the best error-correcting decoding algorithm can be viewed as belief propagation in a cycle-bound graph, researchers have been trying to determine under what circumstances

4 0.7263037 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: The Cluster Variation method is a class of approximation methods containing the Bethe and Kikuchi approximations as special cases. We derive two novel iteration schemes for the Cluster Variation Method. One is a fixed point iteration scheme which gives a significant improvement over loopy BP, mean field and TAP methods on directed graphical models. The other is a gradient based method, that is guaranteed to converge and is shown to give useful results on random graphs with mild frustration. We conclude that the methods are of significant practical value for large inference problems. 1

5 0.49508065 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity

Author: Lehel Csató, Manfred Opper, Ole Winther

Abstract: The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization of Minka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.

6 0.39614049 180 nips-2001-The Concave-Convex Procedure (CCCP)

7 0.32837132 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder

8 0.31163764 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes

9 0.29952508 3 nips-2001-ACh, Uncertainty, and Cortical Inference

10 0.26323014 115 nips-2001-Linear-time inference in Hierarchical HMMs

11 0.24618518 190 nips-2001-Thin Junction Trees

12 0.23678054 118 nips-2001-Matching Free Trees with Replicator Equations

13 0.23602045 17 nips-2001-A Quantitative Model of Counterfactual Reasoning

14 0.21328761 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation

15 0.21133459 169 nips-2001-Small-World Phenomena and the Dynamics of Information

16 0.21013764 21 nips-2001-A Variational Approach to Learning Curves

17 0.19783062 129 nips-2001-Multiplicative Updates for Classification by Mixture Models

18 0.18869419 119 nips-2001-Means, Correlations and Bounds

19 0.18567909 84 nips-2001-Global Coordination of Local Linear Models

20 0.18133365 149 nips-2001-Probabilistic Abstraction Hierarchies


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.037), (17, 0.028), (19, 0.022), (27, 0.155), (30, 0.057), (38, 0.022), (43, 0.227), (59, 0.054), (68, 0.014), (72, 0.09), (79, 0.063), (83, 0.031), (91, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93906832 158 nips-2001-Receptive field structure of flow detectors for heading perception

Author: J. A. Beintema, M. Lappe, Alexander C. Berg

Abstract: Observer translation relative to the world creates image flow that expands from the observer's direction of translation (heading) from which the observer can recover heading direction. Yet, the image flow is often more complex, depending on rotation of the eye, scene layout and translation velocity. A number of models [1-4] have been proposed on how the human visual system extracts heading from flow in a neurophysiologic ally plausible way. These models represent heading by a set of neurons that respond to large image flow patterns and receive input from motion sensed at different image locations. We analysed these models to determine the exact receptive field of these heading detectors. We find most models predict that, contrary to widespread believe, the contribut ing motion sensors have a preferred motion directed circularly rather than radially around the detector's preferred heading. Moreover, the results suggest to look for more refined structure within the circular flow, such as bi-circularity or local motion-opponency.

same-paper 2 0.88267601 188 nips-2001-The Unified Propagation and Scaling Algorithm

Author: Yee W. Teh, Max Welling

Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.

3 0.86445546 79 nips-2001-Gaussian Process Regression with Mismatched Models

Author: Peter Sollich

Abstract: Learning curves for Gaussian process regression are well understood when the 'student' model happens to match the 'teacher' (true data generation process). I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima; over-fitting can occur even if the student estimates the teacher noise level correctly. In lower dimensions, plateaux also appear, and the learning curve remains dependent on the mismatch between student and teacher even in the asymptotic limit of a large number of training examples. Learning with excessively strong smoothness assumptions can be particularly dangerous: For example, a student with a standard radial basis function covariance function will learn a rougher teacher function only logarithmically slowly. All predictions are confirmed by simulations. 1

4 0.70329469 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs

Author: Martin J. Wainwright, Tommi Jaakkola, Alan S. Willsky

Abstract: We develop a tree-based reparameterization framework that provides a new conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including a new characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy. Our results also have natural extensions to more structured approximations [e.g. , 1, 2]. 1

5 0.70167214 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior

Author: Mário Figueiredo

Abstract: In this paper we introduce a new sparseness inducing prior which does not involve any (hyper)parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields state-of-the-art performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyper-parameters.

6 0.69926405 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family

7 0.69869465 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding

8 0.69562191 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources

9 0.69498622 13 nips-2001-A Natural Policy Gradient

10 0.69362682 8 nips-2001-A General Greedy Approximation Algorithm with Applications

11 0.69184673 95 nips-2001-Infinite Mixtures of Gaussian Process Experts

12 0.69165897 157 nips-2001-Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

13 0.68746984 132 nips-2001-Novel iteration schemes for the Cluster Variation Method

14 0.68744898 114 nips-2001-Learning from Infinite Data in Finite Time

15 0.68735343 155 nips-2001-Quantizing Density Estimators

16 0.6872865 190 nips-2001-Thin Junction Trees

17 0.68707687 84 nips-2001-Global Coordination of Local Linear Models

18 0.68616974 92 nips-2001-Incorporating Invariances in Non-Linear Support Vector Machines

19 0.6861245 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes

20 0.68547392 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines