nips nips2005 nips2005-204 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. [sent-5, score-0.451]
2 The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. [sent-6, score-0.382]
3 The weight of each walk is given by a product of edgewise partial correlations. [sent-7, score-0.141]
4 We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. [sent-8, score-0.492]
5 This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. [sent-9, score-0.245]
6 The nodes of the graph denote random variables and the edges indicate statistical dependencies between variables. [sent-11, score-0.159]
7 This information matrix is sparse, reflecting the structure of the defining graph such that only the diagonal elements and those off-diagonal elements corresponding to edges of the graph are non-zero. [sent-15, score-0.214]
8 , in chains, trees and in graphs with “thin” junction trees. [sent-20, score-0.139]
9 For these models, belief propagation (BP) or its junction tree variants efficiently compute the marginals [1]. [sent-21, score-0.376]
10 Then, approximate methods such as loopy belief propagation (LBP) provide a tractable alternative to exact inference [1, 2, 3, 4]. [sent-23, score-0.23]
11 We develop a “walk-sum” formulation for computation of means, variances and correlations that holds in a wide class of Gauss-Markov models which we call walk-summable. [sent-24, score-0.272]
12 In particular, this leads to a new interpretation of BP in trees and of LBP in general. [sent-25, score-0.118]
13 We also explain why, in walk-summable models, LBP converges to the correct means but not to the correct variances (proving “walk-sum” analogs of results in [3]). [sent-30, score-0.249]
14 Hence, we also provide a tighter (essentially necessary) condition for convergence of LBP variances based on walk-summability of the LBP computation tree. [sent-32, score-0.282]
15 This provides deeper insight into why LBP can fail to converge – because the LBP computation tree is not always well-posed – which suggests connections to [5]. [sent-33, score-0.26]
16 In (b) and (c) we illustrate the first three levels of the LBP computation tree rooted at nodes 1 and 2. [sent-55, score-0.388]
17 After 3 iterations of LBP in (a), the marginals at nodes 1 and 2 are identical to the marginals at the root of (b) and (c) respectively. [sent-56, score-0.213]
18 In trees, the marginal of any given node can be efficiently computed by sequentially eliminating leaves of the tree until just that node remains. [sent-57, score-0.59]
19 BP may be seen as a message-passing form of GE in which a message passed from node i to node j ∈ N (i) captures the effect of eliminating the subtree rooted at i. [sent-58, score-0.57]
20 The mean and variance are given by and In trees, this is the marginal at node i conditioned on zero boundary conditions at nodes (n + 1) steps away and LBP converges to the correct marginals after a finite number of steps equal to the diameter of the tree. [sent-65, score-0.371]
21 A useful fact about LBP is the following [2, 3, 5]: the marginal computed at node i after n iterations is identical to the marginal at the root of the n-step computation tree rooted at node i. [sent-67, score-0.823]
22 This tree is obtained by “unwinding” the loopy graph for n steps (see Fig. [sent-68, score-0.3]
23 Note that each node of the graph may be replicated many times in the computation tree. [sent-70, score-0.323]
24 Also, neighbors of a node in the computation tree correspond exactly with neighbors of the associated node in the original graph (except at the last level of the tree where some neighbors are missing). [sent-71, score-0.895]
25 The corresponding J matrix defined on the computation tree has the same node and edge values as in the original GMM. [sent-72, score-0.464]
26 Consider a 5-node cycle with normalized information matrix J, which has all partial correlations on the edges set to ρ. [sent-87, score-0.144]
27 , wl ) is a sequence of nodes wi ∈ V connected by edges {wi , wi+1 } ∈ E where l is the length of the walk. [sent-104, score-0.245]
28 The weight ρ(w) of walk w is the product of edge weights along the walk: l ρ(w) = ρws−1 ,ws (11) s=1 At each node i ∈ V , we also define a zero-length walk w = (i) for which ρ(w) = 1. [sent-105, score-0.44]
29 Given a set of walks W, we define the walk-sum over W by ρ(W) = ρ(w) (12) w∈W ¯ which is well-defined (i. [sent-107, score-0.347]
30 Let W l denote the set of l-length walks from i to j and let i→j Wi→j = ∪∞ W l . [sent-110, score-0.347]
31 The relation between walks and the geometric series (10) is that l=0 i→j the entries of Rl correspond to walk-sums over l-length walks from i to j in the graph, i. [sent-111, score-0.747]
32 Hence, i→j ∞ (Rl )i,j = Pi,j = l=0 ρ(W l i→j ) = ρ(∪l W l i→j ) = ρ(Wi→j ) (13) l 2 In particular, the variance σi ≡ Pi,i of variable i is the walk-sum taken over the set Wi→i of self-return walks that begin and end at i (defined so that (i) ∈ Wi→i ). [sent-114, score-0.365]
33 , where each walk is scaled by the potential at the start of the walk: ρ(w; h) = hw0 ρ(w), and ρ(W; h) = w∈W ρ(w; h). [sent-117, score-0.134]
34 Then, µi = Pi,j hj = j∈V ρ(Wj→i )hj = ρ(W∗→i ; h) (14) j where W∗→i ≡ ∪j∈V Wj→i is the set of all walks which end at node i. [sent-118, score-0.548]
35 We have found that a wide class of GMMs are walk-summable: Proposition 1 (Walk-Summable GMMs) All of the following classes of GMMs are walksummable:2 (i) attractive models, (ii) trees and (iii) pairwise-normalizable3 models. [sent-119, score-0.122]
36 In (ii) & (iii), negating any ρij , it still ¯ non-negative elements, (R) holds that J = I − R 0 : (ii) negating ρij doesn’t affect the eigenvalues of J (remove edge {i, j} and, in each eigenvector, negate all entries in one subtree); (iii) negating ρij ¯ preserves J{i,j} 0 in (4) so J 0. [sent-125, score-0.15]
37 4 Recursive Walk-Sum Calculations on Trees In this section we derive a recursive algorithm which accrues the walk-sums (over infinite sets of walks) necessary for exact inference on trees and relate this to BP. [sent-129, score-0.143]
38 Walk-summability guarantees correctness of this algorithm which reorders walks in a non-trivial way. [sent-130, score-0.374]
39 We start with a chain of N nodes: its graph G has nodes V = {1, . [sent-131, score-0.152]
40 The set Wi→i can be partitioned according to the number of times that walks return to node (r) (r) i: Wi→i = ∪∞ Wi→i where Wi→i is the set of all self-return walks which return to i r=0 (0) (0) exactly r times. [sent-138, score-0.894]
41 A walk which starts at node i and returns r times is a concatenation of r single-revisit self-return walks, (r) (1) so ρ(Wi→i ) = ρ(Wi→i )r . [sent-140, score-0.295]
42 The single-revisit walks at node i consist of walks in the left subchain, and walks in the right subchain. [sent-143, score-1.224]
43 Let Wi→i\j be the set of self-return walks of i which never visit j, so e. [sent-144, score-0.347]
44 With this notation: (1) (1) (1) ρ(Wi→i ) = ρ(Wi→i\i+1 ) + ρ(Wi→i\i−1 ) (16) (1) The left single-revisit self-return walk-sums ρ(Wi→i\i+1 ) can be computed recursively (1) starting from node 1. [sent-150, score-0.183]
45 A single-revisit self-return walk from node i in the left subchain consists of a step to node i − 1, then some number of self-return walks in the subgraph {1, . [sent-152, score-0.92]
46 , i − 1}, and a step from i − 1 back to i: (1) ρ(Wi→i\i+1 ) = ρ2 ρ(Wi−1→i−1\i ) = i,i−1 ρ2 i,i−1 (1) 1 − ρ(Wi−1→i−1\i ) (17) Thus single-revisit (and multiple revisit) walk-sums in the left subchain of every node i can be calculated in one forward pass through the chain. [sent-155, score-0.257]
47 The same can be done for the right subchain walk-sums at every node i, by starting at node N , and going backwards. [sent-156, score-0.44]
48 Using equations (15) and (16) these quantities suffice to calculate the variances at all nodes of the chain. [sent-157, score-0.23]
49 A similar forwards-backwards procedure computes the means as reweighted walksums over the left and right single-visit walks for node i, which start at an arbitrary node (in the left or right subchain) and end at i, never visiting i before that [6]. [sent-158, score-0.854]
50 The same strategy applies for trees: both single-revisit and single-visit walks at node i can be partitioned according to which subtree (rooted at a neighbor j ∈ N (i) of i) the walk lives in. [sent-162, score-0.698]
51 This leads to a two-pass walks-sum calculation on trees (from the leaves to the root, and back) to calculate means and variances at all nodes. [sent-163, score-0.319]
52 5 Walk-sum Analysis of Loopy Belief Propagation First, we analyze LBP in the case that the model is walk-summable and show that LBP converges and includes all the walks for the means, but only a subset of the walks for the variances. [sent-164, score-0.752]
53 Then, we consider the case of non-walksummable models and relate convergence of the LBP variances to walk-summability of the computation tree. [sent-165, score-0.277]
54 1 LBP in walk-summable models To compute means and variances in a walk-summable model, we need to calculate walksums for certain sets of walks in the graph G. [sent-167, score-0.684]
55 Running LBP in G is equivalent to exact inference in the computation tree for G, and hence calculating walk-sums for certain walks in the computation tree. [sent-168, score-0.678]
56 In the computation tree rooted at node i, walks ending at the root have a one-to-one correspondence with walks ending at node i in G. [sent-169, score-1.505]
57 Hence, LBP captures all of the walks necessary to calculate the means. [sent-170, score-0.367]
58 For variances, the walks captured by LBP have to start and end at the root in the computation tree. [sent-171, score-0.51]
59 However, some of the selfreturn walks in G translate to walks in the computation tree that end at the root but start at a replica of the root, rather than at the root itself. [sent-172, score-1.131]
60 These walks are not captured by the LBP variances. [sent-173, score-0.347]
61 1(a), the walk (1, 2, 3, 1) is a self-return walk in the original graph G but is not a self-return walk in the computation tree shown in Fig. [sent-175, score-0.636]
62 LBP variances capture only those self-return walks of the original graph G which also are self-return walks in the computation tree – e. [sent-177, score-1.153]
63 , the walk (1, 3, 2, 3, 4, 3, 1) is a selfreturn walk in both Figs. [sent-179, score-0.254]
64 These simple observations lead to our main result: Proposition 2 (Convergence of LBP for walk-summable GMMs) If the model is walksummable, then LBP converges: the means converge to the true means and the LBP variances converge to walk-sums over just the backtracking self-return walks at each node. [sent-182, score-0.707]
65 All backtracking walks have positive weights, since each edge is traversed an even number of times. [sent-184, score-0.439]
66 For a walk-summable model, LBP variances are walks-sums over the backtracking walks and are therefore monotonically increasing with the iterations. [sent-185, score-0.594]
67 For the means: the series l=0 R h converges absolutely l ¯ l |h|, and the series ¯ l |h| is a linear combination of terms of the absince |R h| ≤ R lR ¯ solutely convergent series l Rl . [sent-187, score-0.228]
68 The LBP means are a rearrangement of the absolutely ∞ convergent series l=0 Rl h, so they converge to the same values. [sent-188, score-0.173]
69 Also, in attractive models, the LBP variances are less than or equal to the true variances. [sent-190, score-0.189]
70 4 They also show that LBP variances omit some terms needed for the correct variances. [sent-192, score-0.159]
71 These terms correspond to correlations between the root and its replicas in the computation tree. [sent-193, score-0.158]
72 In our framework, each such correlation is a walk-sum over the subset of non-backtracking self-return walks in G which, in the computation tree, begin at a particular replica of the root. [sent-194, score-0.43]
73 2(a) we show LBP variances for node 1 (the other nodes are similar) vs. [sent-207, score-0.393]
74 4 50 100 150 200 0 LBP converges LBP does not converge 10 (a) 20 30 (b) Figure 2: (a) LBP variances vs. [sent-219, score-0.256]
75 2 LBP in non-walksummable models We extend our analysis to develop a tighter condition for convergence of LBP variances based on walk-summability of the computation tree (rather than walk-summability on G). [sent-228, score-0.459]
76 J 0 ⇔ (R) < 1, hence our condition is equivalent to validity of the computation tree. [sent-231, score-0.119]
77 First, we note that when a model on G is valid (J is positive-definite) but not walksummable, then some finite computation trees may be invalid (indefinite). [sent-232, score-0.223]
78 This turns out to be the reason why LBP variances can fail to converge. [sent-233, score-0.159]
79 But if the GMM is not walk-summable, then its computation tree may or may not be walksummable. [sent-235, score-0.221]
80 395 the computation tree is still walk-summable (even though the model on G is not) and LBP converges. [sent-237, score-0.221]
81 4, the computation tree is not walk-summable and LBP does not converge. [sent-239, score-0.221]
82 Indeed, LBP is not even well-posed in this case (because the computation tree is indefinite) which explains its strange behavior seen in the bottom plot of Fig. [sent-240, score-0.221]
83 We characterize walk-summability of the computation tree as follows. [sent-244, score-0.221]
84 Let Tn be the nstep computation tree rooted at some node i and define Rn In − Jn where Jn is the normalized information matrix on Tn and In is the n × n identity matrix. [sent-245, score-0.547]
85 The n-step ¯ computation tree Tn is walk-summable (valid) if and only if (Rn ) < 1 (in trees, (Rn ) = ¯ (Rn )). [sent-246, score-0.221]
86 Proposition 3 (LBP validity/variance convergence) (i) If tation trees are valid and the LBP variances converge. [sent-249, score-0.295]
87 (ii) If tree eventually becomes invalid and LBP is ill-posed. [sent-250, score-0.186]
88 (i) For some δ > 0, (Rn ) ≤ 1 − δ for all n which implies: all computation trees are walk-summable and variances monotonically increase; λmax (Rn ) ≤ 1 − δ, λmin (Jn ) ≥ δ, and (Pn )i,i ≤ λmax (Pn ) ≤ 1 . [sent-252, score-0.341]
89 The variances are monotonically increasing δ 6 We can focus on one tree: if the computation tree rooted at node i is walk-summable, then so is the computation tree rooted at any node j. [sent-253, score-1.228]
90 Also, if a finite computation tree rooted at node i is not walk-summable, then some finite tree at node j also becomes non-walksummable for n large enough. [sent-254, score-0.863]
91 (ii) If limn→∞ (Rn ) > 1, then there exists an m for which (Rn ) > 1 for all n ≥ m and the computation tree is invalid. [sent-256, score-0.221]
92 Hence, it is easily detected if the LBP computation tree becomes invalid. [sent-258, score-0.221]
93 In non-walksummable models, the series LBP computes for the means is not absolutely convergent and may diverge even when variances converge (e. [sent-272, score-0.349]
94 However, in all cases where variances converge we have observed that with enough damping of BP messages7 we also obtain convergence of the means. [sent-276, score-0.238]
95 Apparently, it is the validity of the computation tree that is critical for convergence of Gaussian LBP. [sent-277, score-0.295]
96 In future work, we plan to develop extended walk-sum algorithms which gather more walks than LBP. [sent-279, score-0.347]
97 Another approach is to estimate variances by sampling random walks in the graph. [sent-280, score-0.506]
98 We also are interested to explore possible connections between results in [8] – based on selfavoiding walks in Ising models – and sufficient conditions for convergence of discrete LBP [9] which have some parallels to our walk-sum analysis in the Gaussian case. [sent-281, score-0.404]
99 Correctness of belief propagation in Gaussian graphical models of arbitrary topology. [sent-296, score-0.161]
100 An analysis of belief propagation on the turbo decoding graph with Gaussian densities. [sent-301, score-0.223]
wordName wordTfidf (topN-words)
[('lbp', 0.759), ('walks', 0.347), ('node', 0.183), ('wi', 0.165), ('tree', 0.16), ('variances', 0.159), ('rooted', 0.116), ('walk', 0.112), ('trees', 0.092), ('ji', 0.084), ('graph', 0.079), ('belief', 0.079), ('hi', 0.078), ('subchain', 0.074), ('rn', 0.069), ('propagation', 0.065), ('gmms', 0.064), ('rl', 0.063), ('root', 0.062), ('loopy', 0.061), ('computation', 0.061), ('backtracking', 0.059), ('converges', 0.058), ('gmm', 0.055), ('xv', 0.051), ('nodes', 0.051), ('marginals', 0.05), ('bp', 0.047), ('walksummable', 0.044), ('valid', 0.044), ('convergence', 0.04), ('converge', 0.039), ('negating', 0.039), ('je', 0.039), ('subtree', 0.039), ('messages', 0.036), ('limn', 0.036), ('jn', 0.035), ('convergent', 0.035), ('correlations', 0.035), ('series', 0.034), ('validity', 0.034), ('absolutely', 0.033), ('edge', 0.033), ('means', 0.032), ('tn', 0.031), ('attractive', 0.03), ('message', 0.03), ('malioutov', 0.03), ('pairwisenormalizable', 0.03), ('selfreturn', 0.03), ('walksums', 0.03), ('marginal', 0.029), ('partial', 0.029), ('edges', 0.029), ('monotonically', 0.029), ('correctness', 0.027), ('matrix', 0.027), ('recursive', 0.026), ('interpretation', 0.026), ('gaussian', 0.026), ('invalid', 0.026), ('inde', 0.026), ('fused', 0.026), ('nite', 0.026), ('inference', 0.025), ('graphs', 0.025), ('hence', 0.024), ('cycle', 0.024), ('proposition', 0.024), ('ii', 0.024), ('ending', 0.023), ('iii', 0.023), ('absolute', 0.023), ('ij', 0.023), ('neighbors', 0.023), ('tighter', 0.022), ('replica', 0.022), ('junction', 0.022), ('reweighted', 0.022), ('johnson', 0.022), ('ising', 0.022), ('calculations', 0.022), ('ce', 0.022), ('start', 0.022), ('subgraph', 0.021), ('calculate', 0.02), ('eliminating', 0.019), ('ge', 0.019), ('geometric', 0.019), ('end', 0.018), ('elimination', 0.018), ('strict', 0.017), ('models', 0.017), ('partitioned', 0.017), ('computes', 0.017), ('implies', 0.016), ('leaves', 0.016), ('var', 0.016), ('pn', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
2 0.13432463 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
3 0.12286837 46 nips-2005-Consensus Propagation
Author: Benjamin V. Roy, Ciamac C. Moallemi
Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1
4 0.097987168 125 nips-2005-Message passing for task redistribution on sparse graphs
Author: K. Wong, Zhuo Gao, David Tax
Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.
5 0.091235429 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
6 0.089190237 70 nips-2005-Fast Information Value for Graphical Models
7 0.080182485 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
8 0.072158262 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
9 0.067733765 178 nips-2005-Soft Clustering on Graphs
10 0.06659022 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
11 0.064836197 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
12 0.062308516 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
13 0.057411142 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
14 0.056723658 153 nips-2005-Policy-Gradient Methods for Planning
15 0.056072447 121 nips-2005-Location-based activity recognition
16 0.055429168 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
17 0.049228858 75 nips-2005-Fixing two weaknesses of the Spectral Method
18 0.048746835 78 nips-2005-From Weighted Classification to Policy Search
19 0.046293896 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
20 0.043061797 50 nips-2005-Convex Neural Networks
topicId topicWeight
[(0, 0.136), (1, 0.052), (2, 0.025), (3, -0.056), (4, -0.085), (5, -0.046), (6, -0.194), (7, 0.163), (8, 0.06), (9, 0.177), (10, 0.134), (11, -0.046), (12, -0.084), (13, -0.013), (14, 0.036), (15, 0.012), (16, -0.068), (17, 0.011), (18, 0.07), (19, 0.02), (20, -0.019), (21, 0.065), (22, -0.09), (23, -0.067), (24, -0.002), (25, -0.02), (26, -0.01), (27, -0.045), (28, 0.039), (29, 0.027), (30, -0.03), (31, -0.085), (32, 0.054), (33, -0.023), (34, -0.065), (35, 0.033), (36, -0.008), (37, 0.028), (38, 0.102), (39, 0.057), (40, 0.0), (41, -0.026), (42, -0.056), (43, 0.043), (44, -0.002), (45, -0.064), (46, -0.048), (47, -0.008), (48, 0.053), (49, 0.076)]
simIndex simValue paperId paperTitle
same-paper 1 0.97024393 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
2 0.7806949 125 nips-2005-Message passing for task redistribution on sparse graphs
Author: K. Wong, Zhuo Gao, David Tax
Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efficient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.
3 0.72096229 46 nips-2005-Consensus Propagation
Author: Benjamin V. Roy, Ciamac C. Moallemi
Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1
4 0.64651585 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
5 0.62012309 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
6 0.53610766 121 nips-2005-Location-based activity recognition
7 0.53301704 70 nips-2005-Fast Information Value for Graphical Models
8 0.45595765 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
9 0.44863677 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
10 0.4105716 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
11 0.36253709 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining
12 0.35005355 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
13 0.34256503 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
14 0.33266789 69 nips-2005-Fast Gaussian Process Regression using KD-Trees
15 0.31534129 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
16 0.30857816 153 nips-2005-Policy-Gradient Methods for Planning
17 0.30716255 38 nips-2005-Beyond Gaussian Processes: On the Distributions of Infinite Networks
18 0.30688348 105 nips-2005-Large-Scale Multiclass Transduction
19 0.29006481 62 nips-2005-Efficient Estimation of OOMs
20 0.28869861 116 nips-2005-Learning Topology with the Generative Gaussian Graph and the EM Algorithm
topicId topicWeight
[(3, 0.039), (10, 0.039), (27, 0.016), (31, 0.504), (34, 0.06), (41, 0.011), (55, 0.036), (69, 0.034), (73, 0.032), (77, 0.014), (88, 0.044), (91, 0.056)]
simIndex simValue paperId paperTitle
1 0.96750593 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
Author: Viren Jain, Valentin Zhigulin, H. S. Seung
Abstract: There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the theory of permitted and forbidden sets [3, 4]. The network behaviors are illustrated by recreating Rumelhart and McClelland’s “interactive activation” model [7]. In neural network models of visual object recognition [2, 6, 8], patterns of synaptic connectivity often reflect part-whole relationships between the features that are represented by neurons. For example, the connections of Figure 1 reflect the fact that feature B both contains simpler features A1, A2, and A3, and is contained in more complex features C1, C2, and C3. Such connectivity allows neurons to follow the rule that existence of the part is evidence for existence of the whole. By combining synaptic input from multiple sources of evidence for a feature, a neuron can “decide” whether that feature is present. 1 The synapses shown in Figure 1 are purely bottom-up, directed from simple to complex features. However, there are also top-down connections in the visual system, and there is little consensus about their function. One possibility is that top-down connections also reflect part-whole relationships. They allow feature detectors to make decisions using the rule that existence of the whole is evidence for existence of its parts. In this paper, we analyze the dynamics of a recurrent network in which part-whole relationships are stored as bidirectional synaptic interactions, rather than the unidirectional interactions of Figure 1. The network has a number of interesting computational capabilities. When the network detects a whole, it can rigorously enforce part-whole relationships 1 Synaptic connectivity may reflect other relationships besides part-whole. For example, invariances can be implemented by connecting detectors of several instances of the same feature to the same target, which is consequently an invariant detector of the feature. C1 C2 C3 B A1 A2 A3 Figure 1: The synaptic connections (arrows) of neuron B represent part-whole relationships. Feature B both contains simpler features and is contained in more complex features. The synaptic interactions are drawn one-way, as in most models of visual object recognition. Existence of the part is regarded as evidence for existence of the whole. This paper makes the interactions bidirectional, allowing the existence of the whole to be evidence for the existence of its parts. by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the recently developed theory of permitted and forbidden sets [3, 4]. Our model is closely related to the interactive activation model of word recognition, which was proposed by McClelland and Rumelhart to explain the word superiority effect studied by visual psychologists [7]. Here our concern is not to model a psychological effect, but to characterize mathematically how computations involving part-whole relationships can be carried out by a recurrent network. 1 Network model Suppose that we are given a set of part-whole relationships specified by a ξi = 1, if part i is contained in whole a 0, otherwise We assume that every whole contains at least one part, and every part is contained in at least one whole. The stimulus drives a layer of neurons that detect parts. These neurons also interact with a layer of neurons that detect wholes. We will refer to part-detectors as “P-neurons” and whole-detectors as “W-neurons.” The part-whole relationships are directly stored in the synaptic connections between P and a W neurons. If ξi = 1, the ith neuron in the P layer and the ath neuron in the W layer have a an excitatory interaction of strength γ. If ξi = 0, the neurons have an inhibitory interaction of strength σ. Furthermore, the P-neurons inhibit each other with strength β, and the Wneurons inhibit each other with strength α. All of these interactions are symmetric, and all activation functions are the rectification nonlinearity [z]+ = max{z, 0}. Then the dynamics of the network takes the form ˙ Wa + Wa a Pi ξ i − σ = γ i + a (1 − ξi )Pi − α i Wb , + ˙ Pi + Pi (1) b=a a Wa ξ i − σ = γ a a (1 − ξi )Wa − β a Pj + B i . j=i (2) where Bi is the input to the P layer from the stimulus. Figure 2 shows an example of a network with two wholes. Each whole contains two parts. One of the parts is contained in both wholes. -α Wa excitation γ -σ inhibition P1 B1 -β } W layer Wb -σ P2 -β B2 P3 } P layer B3 Figure 2: Model in example configuration: ξ = {(1, 1, 0), (0, 1, 1)}. When a stimulus is presented, it activates some of the P-neurons, which activate some of the W-neurons. The network eventually converges to a stable steady state. We will assume that α > 1. In the Appendix, we prove that this leads to unconditional winner-take-all behavior in the W layer. In other words, no more than one W-neuron can be active at a stable steady state. If a single W-neuron is active, then a whole has been detected. Potentially there are also many P-neurons active, indicating detection of parts. This representation may have different properties, depending on the choice of parameters β, γ, and σ. As discussed below, these include rigorous enforcement of part-whole relationships, completion of wholes by “filling in” missing parts, and non-recognition of parts that do not conform to a whole. 2 Enforcement of part-whole relationships Suppose that a single W-neuron is active at a stable steady state, so that a whole has been detected. Part-whole relationships are said to be enforced if the network always ignores parts that are not contained in the detected whole, despite potentially strong bottom-up evidence for them. It can be shown that enforcement follows from the inequality σ 2 + β 2 + γ 2 + 2σβγ > 1. (3) which guarantees that neuron i in the P layer is inactive, if neuron a in the W layer is a active and ξi = 0. When part-whole relations are enforced, prior knowledge about legal combinations of parts strictly constrains what may be perceived. This result is proven in the Appendix, and only an intuitive explanation is given here. Enforcement is easiest to understand when there is interlayer inhibition (σ > 0). In this case, the active W-neuron directly inhibits the forbidden P-neurons. The case of σ = 0 is more subtle. Then enforcement is mediated by lateral inhibition in the P layer. Excitatory feedback from the W-neuron has the effect of counteracting the lateral inhibition between the P-neurons that belong to the whole. As a result, these P-neurons become strongly activated enough to inhibit the rest of the P layer. 3 Completion of wholes by filling in missing parts If a W-neuron is active, it excites the P-neurons that belong to the whole. As a result, even if one of these P-neurons receives no bottom-up input (Bi = 0), it is still active. We call this phenomenon “completion,” and it is guaranteed to happen when (4) γ> β The network may thus “imagine” parts that are consistent with the recognized whole, but are not actually present in the stimulus. As with enforcement, this condition depends on top-down connections. √ In the special case γ = β, the interlayer excitation between a W-neuron and its P-neurons exactly cancels out the lateral inhibition between the P-neurons at a steady state. So the recurrent connections effectively vanish, letting the activity of the P-neurons be determined by their feedforward inputs. When the interlayer excitation is stronger than this, the inequality (4) holds, and completion occurs. 4 Non-recognition of a whole If there is no interlayer inhibition (σ = 0), then a single W-neuron is always active, assuming that there is some activity in the P layer. To see this, suppose for the sake of contradiction that all the W-neurons are inactive. Then they receive no inhibition to counteract the excitation from the P layer. This means some of them must be active, which contradicts our assumption. This means that the network always recognizes a whole, even if the stimulus is very different from any part-whole combination that is stored in the network. However, if interlayer inhibition is sufficiently strong (large σ), the network may refuse to recognize a whole. Neurons in the P layer are activated, but there is no activity in the W layer. Formal conditions on σ can be derived, but are not given here because of space limitations. In case of non-recognition, constraints on the P-layer are not enforced. It is possible for the network to detect a configuration of parts that is not consistent with any stored whole. 5 Example: Interactive Activation model To illustrate the computational capabilities of our network, we use it to recreate the interactive activation (IA) model of McClelland and Rumelhart. Figure 3 shows numerical simulations of a network containing three layers of neurons representing strokes, letters, and words, respectively. There are 16 possible strokes in each of four letter positions. For each stroke, there are two neurons, one signaling the presence of the stroke and the other signaling its absence. Letter neurons represent each letter of the alphabet in each of four positions. Word neurons represent each of 1200 common four letter words. The letter and word layers correspond to the P and W layers that were introduced previously. There are bidirectional interactions between the letter and word layers, and lateral inhibition within the layers. The letter neurons also receive input from the stroke neurons, but this interaction is unidirectional. Our network differs in two ways from the original IA model. First, all interactions involving letter and word neurons are symmetric. In the original model, the interactions between the letter and word layers were asymmetric. In particular, inhibitory connections only ran from letter neurons to word neurons, and not vice versa. Second, the only nonlinearity in our model is rectification. These two aspects allow us to apply the full machinery of the theory of permitted and forbidden sets. Figure 3 shows the result of presenting the stimulus “MO M” for four different settings of parameters. In each of the four cases, the word layer of the network converges to the same result, detecting the word “MOON”, which is the closest stored word to the stimulus. However, the activity in the letter layer is different in the four cases. input: P layer reconstruction W layer P layer reconstruction W layer completion noncompletion enforcement non-enforcement Figure 3: Simulation of 4 different parameter regimes in a letter- word recognition network. Within each panel, the middle column presents a feature- layer reconstruction based on the letter activity shown in the left column. W layer activity is shown in the right column. The top row shows the network state after 10 iterations of the dynamics. The bottom row shows the steady state. In the left column, the parameters obey the inequality (3), so that part- whole relationships are enforced. The activity of the letter layer is visualized by activating the strokes corresponding to each active letter neuron. The activated letters are part of the word “MOON”. In the top left, the inequality (4) is satisfied, so that the missing “O” in the stimulus is filled in. In the bottom left, completion does not occur. In the simulations of the right column, parameters are such that part- whole relationships are not enforced. Consequently, the word layer is much more active. Bottom- up input provides evidence for several other letters, which is not suppressed. In the top right, the inequality (4) is satisfied, so that the missing “O” in the stimulus is filled in. In the bottom right, the “O” neuron is not activated in the third position, so there is no completion. However, some letter neurons for the third position are activated, due to the input from neurons that indicate the absence of strokes. input: non-recognition event multi-stability Figure 4: Simulation of a non- recognition event and example of multistability. Figure 4 shows simulations for large σ, deep in the enforcement regime where non- recognition is a possibility. From one initial condition, the network converges to a state in which no W neurons are active, a non- recognition. From another initial condition, the network detects the word “NORM”. Deep in the enforcement regime, the top- down feedback can be so strong that the network has multiple stable states, many of which bear little resemblance to the stimulus at all. This is a problematic aspect of this network. It can be prevented by setting parameters at the edge of the enforcement regime. 6 Discussion We have analyzed a recurrent network that performs computations involving part- whole relationships. The network can fill in missing parts and suppress parts that do not belong. These two computations are distinct and can be dissociated from each other, as shown in Figure 3. While these two computations can also be performed by associative memory models, they are not typically dissociable in these models. For example, in the Hopfield model pattern completion and noise suppression are both the result of recall of one of a finite number of stereotyped activity patterns. We believe that our model is more appropriate for perceptual systems, because its behavior is piecewise linear, due its reliance on rectification nonlinearity. Therefore, analog aspects of computation are able to coexist with the part-whole relationships. Furthermore, in our model the stimulus is encoded in maintained synaptic input to the network, rather than as an initial condition of the dynamics. A Appendix: Permitted and forbidden sets Our mathematical results depend on the theory of permitted and forbidden sets [3, 4], which is summarized briefly here. The theory is applicable to neural networks with rectification nonlinearity, of the form xi + xi = [bi + j Wij xj ]+ . Neuron i is said to be active when ˙ xi > 0. For a network of N neurons, there are 2N possible sets of active neurons. For each active set, consider the submatrix of Wij corresponding to the synapses between active neurons. If all eigenvalues of this submatrix have real parts less than or equal to unity, then the active set is said to be permitted. Otherwise the active set is said to be forbidden. A set is permitted if and only if there exists an input vector b such that those neurons are active at a stable steady state. Permitted sets can be regarded as memories stored in the synaptic connections Wij . If Wij is a symmetric matrix, the nesting property holds: every subset of a permitted set is permitted, and every superset of a forbidden set is forbidden. The present model can be seen as a general method for storing permitted sets in a recurrent network. This method introduces a neuron for each permitted set, relying on a unary or “grandmother cell” representation. In contrast, Xie et al.[9] used lateral inhibition in a single layer of neurons to store permitted sets. By introducing extra neurons, the present model achieves superior storage capacity, much as unary models of associative memory [1] surpass distributed models [5]. A.1 Unconditional winner-take-all in the W layer The synapses between two W-neurons have strengths 0 −α −α 0 The eigenvalues of this matrix are ±α. Therefore two W-neurons constitute a forbidden set if α > 1. By the nesting property, it follows more than two W-neurons is also a forbidden set, and that the W layer has the unconditional winner-take-all property. A.2 Part-whole combinations as permitted sets Theorem 1. Suppose that β < 1. If γ 2 < β + (1 − β)/k then any combination of k ≥ 1 parts consistent with a whole corresponds to a permitted set. Proof. Consider k parts belonging to a whole. They are represented by one W-neuron and k P-neurons, with synaptic connections given by the (k + 1) × (k + 1) matrix M= −β(11T − I) γ1 , γ1T 0 (5) where 1 is the k- dimensional vector whose elements are all equal to one. Two eigenvectors of M are of the form (1T c), and have the same eigenvalues as the 2 × 2 matrix −β(k − 1) γk γ 0 This matrix has eigenvalues less than one when γ 2 < β + (1 − β)/k and β(k − 1) + 2 > 0. The other k − 1 eigenvectors are of the form (dT , 0), where dT 1 = 0. These have eigenvalues β. Therefore all eigenvalues of W are less than one if the condition of the theorem is satisfied. A.3 Constraints on combining parts Here, we derive conditions under which the network can enforce the constraint that steady state activity be confined to parts that constitute a whole. Theorem 2. Suppose that β > 0 and σ 2 +β 2 +γ 2 +2σβγ > 1 If a W- neuron is active, then only P- neurons corresponding to parts contained in the relevant whole can be active at a stable steady state. Proof. Consider P- neurons Pi , Pj , and W- neuron Wa . Supa a pose that ξi = 1 but ξj = 0. As shown in Figure 5, the matrix of connections is given by: W = 0 −β γ −β 0 −σ γ −σ 0 (6) Wa γ Pi -σ -β Pj Figure 5: A set of one W- neuron and two P- neurons is forbidden if one part belongs to the whole and the other does not. This set is permitted if all eigenvalues of W − I have negative real parts. The characteristic equation of I − W is λ3 + b1 λ2 + b2 λ + b3 = 0, where b1 = 3, b2 = 3 − σ 2 − β 2 − γ 2 and b3 = 1−2σβγ−σ 2 −β 2 −γ 2 . According to the Routh- Hurwitz theorem, all the eigenvalues have negative real parts if and only if b1 > 0, b3 > 0 and b1 b2 > b3 . Clearly, the first condition is always satisfied. The second condition is more restrictive than the third. It is satisfied only when σ 2 + β 2 + γ 2 + 2σβγ < 1. Hence, one of the eigenvalues has a positive real part when this condition is broken, i.e., when σ 2 +β 2 +γ 2 +2σβγ > 1. By the nesting property, any larger set of P- neurons inconsistent with the W- neuron is also forbidden. A.4 Completion of wholes √ Theorem 3. If γ > β and a single W- neuron a is active at a steady state, then Pi > 0 a for all i such that ξi = 1. Proof. Suppose that the detected whole has k parts. At the steady state Pi = a ξi Bi − (β − γ 2 )Ptot 1−β + where Ptot = Pi = i 1 1 − β + (β − γ 2 )k k a B i ξi i=1 (7) A.5 Preventing runaway If feedback loops cause the network activity to diverge, then the preceding analyses are not relevant. Here we give a sufficient condition guaranteeing that runaway instability does not happen. It is not a necessary condition. Interestingly, the condition implies the condition of Theorem 1. Theorem 4. Suppose that P and W obey the dynamics of Eqs. (1) and (2), and define the objective function E 1−α 2 = − 2 Wa a α + 2 Wa a 1−β + 2 a Pi Wa ξi + σ Bi Pi − γ i 2 ia Pi2 i 2 β + 2 Pi i a (1 − ξi )Pi Wa . (8) ia Then E is a Lyapunov like function that, given β > γ 2 − dynamics to a stable steady state. 1−γ 2 N −1 , ensures convergence of the Proof. (sketch) Differentiation of E with respect to time shows that that E is nonincreasing in the nonnegative orthant and constant only at steady states of the network dynamics. We must also show that E is radially unbounded, which is true if the quadratic part of E is copositive definite. Note that the last term of E is lower-bounded by zero and the previous term is upper bounded by γ ia Pi Wa . We assume α > 1. Thus, we can use Cauchy’s 2 2 inequality, i Pi2 ≥ ( i Pi ) /N , and the fact that a Wa ≤ ( a Wa )2 for Wa ≥ 0, to derive E≥ 1 2 Wa )2 + ( If β > γ 2 − unbounded. a 1−γ 2 N −1 , 1 − β + βN ( N Pi )2 − 2γ( i Wa a Pi ) i − Bi Pi . (9) i the quadratic form in the inequality is positive definite and E is radially References [1] E. B. Baum, J. Moody, and F. Wilczek. Internal representations for associative memory. Biol. Cybern., 59:217–228, 1988. [2] K. Fukushima. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4):193–202, 1980. [3] R.H. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R.J. Douglas, and H.S. Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947– 51, Jun 22 2000. [4] R.H. Hahnloser, H.S. Seung, and J.-J. Slotine. Permitted and forbidden sets in symmetric threshold-linear networks. Neural Computation, 15:621–638, 2003. [5] J.J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A, 79(8):2554–8, Apr 1982. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1:541–551, 1989. [7] J. L. McClelland and D. E. Rumelhart. An interactive activation model of context effects in letter perception: Part i. an account of basic findings. Psychological Review, 88(5):375–407, Sep 1981. [8] M Riesenhuber and T Poggio. Hierarchical models of object recognition in cortex. Nat Neurosci, 2(11):1019–25, Nov 1999. [9] X. Xie, R.H. Hahnloser, and H. S. Seung. Selectively grouping neurons in recurrent networks of lateral inhibition. Neural Computation, 14:2627–2646, 2002.
same-paper 2 0.95122755 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
3 0.90859365 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of variational methods. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. 1 Keywords: Markov random fields; variational method; message-passing algorithms; sum-product; belief propagation; parameter estimation; learning. 1
4 0.90684026 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
5 0.90393329 145 nips-2005-On Local Rewards and Scaling Distributed Reinforcement Learning
Author: Drew Bagnell, Andrew Y. Ng
Abstract: We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n. 1
6 0.64956105 78 nips-2005-From Weighted Classification to Policy Search
7 0.60759908 153 nips-2005-Policy-Gradient Methods for Planning
8 0.60084826 46 nips-2005-Consensus Propagation
9 0.57508993 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
10 0.54705471 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
11 0.54648519 111 nips-2005-Learning Influence among Interacting Markov Chains
12 0.54492432 187 nips-2005-Temporal Abstraction in Temporal-difference Networks
13 0.53186172 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
14 0.52505988 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
15 0.51152962 36 nips-2005-Bayesian models of human action understanding
16 0.50864506 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
17 0.50754029 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
18 0.49962342 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
19 0.4972654 181 nips-2005-Spiking Inputs to a Winner-take-all Network
20 0.48948196 199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions