nips nips2011 nips2011-158 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Learning unbelievable probabilities Xaq Pitkow Department of Brain and Cognitive Science University of Rochester Rochester, NY 14607 xaq@neurotheory. [sent-1, score-0.336]
2 edu Abstract Loopy belief propagation performs approximate inference on graphical models with loops. [sent-7, score-0.495]
3 Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. [sent-9, score-0.854]
4 On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. [sent-10, score-0.812]
5 ’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. [sent-12, score-0.366]
6 All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. [sent-13, score-0.835]
7 We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. [sent-14, score-0.802]
8 One popular technique is belief propagation (BP), in particular the sumproduct rule, which is a message-passing algorithm for performing inference on a graphical model [2]. [sent-17, score-0.495]
9 In this paper, we prove that some sets of marginals simply cannot be achieved by belief propagation. [sent-20, score-0.58]
10 We will write the collection of all these marginals as a vector p. [sent-27, score-0.363]
11 Instead we will use approximate inference via loopy belief propagation to match the target. [sent-33, score-0.542]
12 1 Belief propagation The sum-product algorithm for belief propagation on a graphical model with energy function (2) uses the following equations [4]: Y X Y mi! [sent-35, score-0.846]
13 Once these messages converge, the single-node and factor beliefs are given by Y Y bi (xi ) / m↵! [sent-40, score-0.273]
14 For tree graphs, these beliefs exactly equal the marginals of the graphical model Q0 (x). [sent-43, score-0.621]
15 For loopy graphs, the beliefs at stable fixed points are often good approximations of the marginals. [sent-44, score-0.451]
16 While they are guaranteed to be locally consistent, P x↵ \xi b↵ (x↵ ) = bi (xi ), they are not necessarily globally consistent: There may not exist a single joint distribution B(x) of which the beliefs are the marginals [5]. [sent-45, score-0.588]
17 We use a vector b to refer to the set of both node and factor beliefs produced by belief propagation. [sent-47, score-0.456]
18 2 Bethe free energy Despite its limitations, BP is found empirically to work well in many circumstances. [sent-49, score-0.305]
19 Some theoretical justification for loopy belief propagation emerged with proofs that its stable fixed points are local minima of the Bethe free energy [6, 7]. [sent-50, score-1.024]
20 Free energies are important quantities in machine learning because the Kullback-Leibler divergence between the data and model distributions can be expressed in terms of free energies, so models can be optimized by minimizing free energies appropriately. [sent-51, score-0.431]
21 For tree-structured Q Q graphical models, which factorize as Q(x) = ↵ q↵ (x↵ ) i qi (xi )1 di , the Bethe entropy is exact, and hence so is the Bethe free energy. [sent-54, score-0.405]
22 Nonetheless, the Bethe free energy is often close enough to the Gibbs free energy that its minima approximate the true marginals [8]. [sent-58, score-1.006]
23 Since stable fixed points of BP are minima of the Bethe free energy [6, 7], this helped explain why belief propagation is often so successful. [sent-59, score-0.907]
24 To emphasize that the Bethe free energy directly depends only on the marginals and not the joint distribution, we will write F [q] where q is a vector of pseudomarginals q↵ (x↵ ) for all ↵ and all x↵ . [sent-60, score-0.838]
25 Pseudomarginal space is the convex set [5] of all q that satisfy the positivity and local consistency X X constraints, 0 q↵ (x↵ ) 1 q↵ (x↵ ) = qi (xi ) qi (xi ) = 1 (11) 2. [sent-61, score-0.34]
26 3 Pseudo-moment matching xi x↵ \xi We now wish to correct for the deficiencies of belief propagation by identifying the parameters ✓ so that BP produces beliefs b matching the true marginals p of the target distribution P (x). [sent-62, score-1.175]
27 Since the fixed points of BP are stationary points of F [6], one may simply try to find parameters ✓ that produce a stationary point in pseudomarginal space at p, which is a necessary condition for BP to reach a stable fixed point there. [sent-63, score-0.375]
28 In contrast, here we are using it to solve for the parameters needed to move beliefs to a target location. [sent-66, score-0.293]
29 This is much easier, since the Bethe free energy is linear in ✓. [sent-67, score-0.305]
30 (A) A slice through the Bethe free energy (solid lines) along one axis v 1 of pseudomarginal space, for three different values of parameters ✓. [sent-73, score-0.455]
31 The energy U is linear in the pseudomarginals (dotted lines), so varying the parameters only changes the tilt of the free energy. [sent-74, score-0.529]
32 During a run of Bethe wake-sleep learning, the beliefs (blue dots) proceed along v 2 toward the target marginals p. [sent-79, score-0.646]
33 Stable fixed points of BP can exist only in the believable region (cyan), but the target p resides in an unbelievable region (yellow). [sent-80, score-0.513]
34 As learning equilibrates, the stable fixed points jump between believable regions on either side of the unbelievable zone. [sent-81, score-0.546]
35 4 Unbelievable marginals It is well known that BP may converge on stable fixed points that cannot be realized as marginals of any joint distribution. [sent-83, score-0.907]
36 In this section we show that the converse is also true: There are some distributions whose marginals cannot be realized as beliefs for any set of couplings. [sent-84, score-0.584]
37 This is surprising in view of claims to the contrary: [9, 5] state that belief propagation run after pseudo-moment matching can always reach a fixed point that reproduces the target marginals. [sent-86, score-0.567]
38 While BP does technically have such fixed points, they are not always stable and thus may not be reachable by running belief propagation. [sent-87, score-0.33]
39 A set of marginals are ‘unbelievable’ if belief propagation cannot converge to them for any set of parameters. [sent-89, score-0.806]
40 For belief propagation to converge to the target — namely, the marginals p — a zero gradient is not sufficient: The Bethe free energy must also be a local minimum [7]. [sent-90, score-1.219]
41 If not, then the target cannot be a stable fixed point of loopy belief propagation. [sent-94, score-0.49]
42 The simplest unbelievable example is a binary graphical P model with pairwise interactions between four nodes, x 2 { 1, +1}4 , and the energy E(x) = J (ij) xi xj . [sent-100, score-0.638]
43 4 By symmetry and (1), marginals of this target P (x) are the same for all nodes and pairs: pi (xi ) = 1 2 and pij (xi = xj ) = ⇢ = (2 + 4/(1 + e2J e4J + e6J )) 1 . [sent-102, score-0.446]
44 Substituting these marginals into the appropriate Bethe Hessian (22) gives a matrix that has a negative eigenvalue for all ⇢ > 3 , or 8 J > 0. [sent-103, score-0.363]
45 Thus the Bethe free energy does not have a minimum at the marginals of these P (x). [sent-106, score-0.668]
46 Stable fixed points of BP occur only at local minima of the Bethe free energy [7], and so BP cannot reproduce the marginals p for any parameters. [sent-107, score-0.776]
47 Not only do unbelievable marginals exist, but they are actually quite common, as we will see in Section 3. [sent-109, score-0.68]
48 On the other hand, all marginals with sufficiently small correlations are believable because they are guaranteed to have a positive-definite Bethe Hessian [12]. [sent-111, score-0.446]
49 5 Bethe wake-sleep algorithm When pseudo-moment matching fails to reproduce unbelievable marginals, an alternative is to use a gradient descent procedure for learning, analagous to the wake-sleep algorithm used to train Boltzmann machines [13]. [sent-114, score-0.394]
50 That original rule can be derived as gradient descent of the Kullback-Leibler divergence DKL between the target P (x) and the Boltzmann distribution Q0 (x) (1), X P (x) DKL [P ||Q0 ] = P (x) log = F [P ] F [Q0 ] 0 (17) Q0 (x) x where F is the Gibbs free energy (5). [sent-115, score-0.428]
51 Note that this free energy depends on the same energy function E (2) that defines the Boltzmann distribution Q0 (1), and achieves its minimal value of log Z for that distribution. [sent-116, score-0.49]
52 By changing the energy E and thus Q0 to decrease this divergence, the graphical model moves closer to the target distribution. [sent-118, score-0.278]
53 Furthermore, the entropy terms of the free energies do not depend explicitly on ✓, so dD @U (p) = d✓ @✓ @U (b) = @✓ ⌘(p) + ⌘(b) (20) P where ⌘(q) = x q(x) (x) are the expectations of the sufficient statistics (x) under the pseudomarginals q. [sent-121, score-0.411]
54 At each step in learning, belief propagation is run, obtaining beliefs b for the current parameters ✓. [sent-123, score-0.655]
55 This generally increases the Bethe free energy for the beliefs while decreasing that of the data, hopefully allowing BP to draw closer to the data marginals. [sent-125, score-0.527]
56 The Bethe wake-sleep learning rule sometimes places a minimum of F at the true data distribution, such that belief propagation can give the true marginals as one of its (possibly multiple) stable fixed points. [sent-131, score-0.899]
57 6 Ensemble belief propagation When the Bethe wake-sleep algorithm attempts to learn unbelievable marginals, the parameters and beliefs do not reach a fixed point but instead continue to vary over time (Figure 2A,B). [sent-134, score-0.994]
58 Still, if learning reaches equilibrium, then the temporal average of beliefs is equal to the unbelievable marginals. [sent-135, score-0.538]
59 If the Bethe wake-sleep algorithm reaches equilibrium, then unbelievable marginals are matched by the belief propagation stable fixed points averaged over the equilibrium ensemble of parameters. [sent-137, score-1.415]
60 After learning has equilibrated, stable fixed points of belief propagation occur with just the right frequency so that they can be averaged together to reproduce the target distribution exactly (Figure 2C). [sent-142, score-0.654]
61 We call this inference algorithm ensemble belief propagation (eBP). [sent-144, score-0.525]
62 Ensemble BP produces perfect marginals by exploiting a constant, small amplitude learning, and thus assumes that the correct marginals are perpetually available. [sent-145, score-0.726]
63 There may also be no equilibrium if belief propagation at each learning iteration fails to converge. [sent-151, score-0.478]
64 The energy function is E(x) = P P i hi x i (ij) Jij xi xj . [sent-153, score-0.265]
65 We parameterize pseudomarginals as {qi , qij } where qi = qi (xi = +1) ++ and qij = qij (xi = xj = +1) [8]. [sent-156, score-0.841]
66 Positivity constraints and local consistency constraints then appear as 0 qi 1 and + + ++ + + max(0, qi + qj 1) qij min(qi , qj ). [sent-158, score-0.549]
67 (A) As learning proceeds, the Bethe wake-sleep algorithm causes parameters ✓ to converge on a discrete limit cycle when attempting to learn unbelievable marginals. [sent-161, score-0.39]
68 (C) The corresponding beliefs b during the limit cycle (blue circles), projected onto the first two principal components v 1 and v 2 of the trajectory through pseudomarginal space. [sent-163, score-0.343]
69 Believable regions of pseudomarginal space are colored with cyan and the unbelievable regions with yellow, and inconsistent ¯ pseudomarginals are black. [sent-164, score-0.627]
70 Over the limit cycle, the average beliefs b (blue ⇥) are precisely equal ¯ to the target marginals p (black ⇤). [sent-165, score-0.63]
71 The average b (red +) over many stable fixed points of BP ¯ (red dots) generated from randomly perturbed parameters ✓ + ✓ still produces a better approximation of the target marginals than any of the individual believable stable fixed points. [sent-166, score-0.808]
72 (D) Even the best amongst several BP stable fixed points cannot match unbelievable marginals (black and grey). [sent-167, score-0.826]
73 For J & 1 , most 3 4 marginals cannot be reproduced by belief propagation with any parameters, because the Bethe Hessian (22) has a negative eigenvalue. [sent-170, score-0.786]
74 fraction unbelievable A B 1 C i i BP ii ii iii iii iv iv eBP 0 0 coupling standard deviation J 1 v v 10–5 10–4 . [sent-171, score-0.359]
75 1 1 Bethe divergence D [p||b] 10 10–3 10–2 10–1 Euclidean distance |p 1 b| Figure 3: Performance in learning unbelievable marginals. [sent-174, score-0.35]
76 (B,C) Performance of five models on 3 370 unbelievable random target marginals (Section 3), measured with Bethe divergence D [p||b] (B) and Euclidean distance |p b| (C). [sent-177, score-0.774]
77 7 We generated 500 Ising model targets using J = 1 , selected the unbelievable ones, and eval3 uated the performance of BP and ensemble BP for various methods of choosing parameters ✓. [sent-181, score-0.425]
78 We evaluated BP performance for the actual parameters that generated the target (1), pseudomoment matching (15), and at best-matching beliefs obtained at any time during Bethe wake-sleep learning. [sent-184, score-0.345]
79 Belief propagation gave a poor approximation of the target marginals, as expected for a model with many strong loops. [sent-186, score-0.284]
80 Even with learning, BP could never get the correct marginals, which was guaranteed by selection of unbelievable targets. [sent-187, score-0.317]
81 Using the exact parameter ensemble gave orders of magnitude improvement, limited by the number of beliefs being averaged. [sent-189, score-0.32]
82 For another example, when the Hessian is positive definite throughout pseudomarginal space, then the Bethe free energy is convex and thus BP has a unique stable fixed point [18]. [sent-193, score-0.528]
83 One might hope that by adjusting the parameters of belief propagation in some systematic way, ✓ ! [sent-195, score-0.48]
84 In this paper we proved that this is a futile hope, because belief propagation simply can never converge to certain marginals. [sent-197, score-0.443]
85 However, we also provided an algorithm that does work: Ensemble belief propagation uses BP on several different parameters with different stable fixed points and averages the results. [sent-198, score-0.595]
86 When belief propagation is used during learning, then the model will fail even on known training examples if they happen to be unbelievable. [sent-204, score-0.423]
87 This paper addressed learning in fully-observed models only, where marginals for all variables were available during training. [sent-207, score-0.363]
88 Yet unbelievable marginals exist for models with hidden variables as well. [sent-208, score-0.699]
89 When inference is hard, neural computations may resort to approximations, perhaps including belief propagation [20, 21, 22, 23, 24]. [sent-211, score-0.443]
90 Note added in proof: After submission of this work, [25] presented partially overlapping results showing that some marginals cannot be achieved by belief propagation. [sent-218, score-0.58]
91 [7] Heskes T (2003) Stable fixed points of loopy belief propagation are minima of the Bethe free energy. [sent-234, score-0.728]
92 [8] Welling M, Teh Y (2001) Belief optimization for binary networks: A stable alternative to loopy belief propagation. [sent-236, score-0.444]
93 [9] Wainwright MJ, Jaakkola TS, Willsky AS (2003) Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. [sent-241, score-0.423]
94 [12] Watanabe Y, Fukumizu K (2011) Loopy belief propagation, Bethe free energy and graph zeta function. [sent-248, score-0.522]
95 [15] Yedidia J, Freeman W, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. [sent-258, score-0.423]
96 [16] Mooij J, Kappen H (2005) On the properties of the Bethe approximation and loopy belief propagation on binary networks. [sent-260, score-0.537]
97 [17] Mooij J, Kappen H (2005) Validity estimates for loopy belief propagation on binary real-world networks. [sent-262, score-0.537]
98 [18] Heskes T (2004) On the uniqueness of loopy belief propagation fixed points. [sent-266, score-0.522]
99 [22] Ott T, Stoop R (2007) The neurodynamics of belief propagation on binary markov random fields. [sent-274, score-0.438]
100 [23] Shon A, Rao R (2005) Implementing belief propagation in neural circuits. [sent-278, score-0.423]
wordName wordTfidf (topN-words)
[('bethe', 0.525), ('marginals', 0.363), ('unbelievable', 0.317), ('bp', 0.291), ('belief', 0.217), ('beliefs', 0.206), ('propagation', 0.206), ('pseudomarginals', 0.17), ('energy', 0.165), ('qi', 0.146), ('free', 0.14), ('ebp', 0.124), ('qij', 0.119), ('stable', 0.113), ('pseudomarginal', 0.11), ('loopy', 0.099), ('hessian', 0.088), ('believable', 0.083), ('ensemble', 0.082), ('target', 0.061), ('energies', 0.059), ('equilibrium', 0.055), ('qj', 0.053), ('messages', 0.052), ('graphical', 0.052), ('jij', 0.05), ('xi', 0.048), ('qik', 0.045), ('entropy', 0.042), ('couplings', 0.036), ('rochester', 0.036), ('divergence', 0.033), ('minima', 0.033), ('points', 0.033), ('ising', 0.031), ('welling', 0.03), ('xed', 0.03), ('cyan', 0.03), ('hi', 0.03), ('gradient', 0.029), ('ken', 0.028), ('pseudomoment', 0.028), ('tilt', 0.028), ('xaq', 0.028), ('cycle', 0.027), ('dd', 0.026), ('parameters', 0.026), ('di', 0.025), ('boltzmann', 0.025), ('reproduce', 0.024), ('compensate', 0.024), ('matching', 0.024), ('gibbs', 0.023), ('reach', 0.022), ('xj', 0.022), ('yellow', 0.022), ('iv', 0.021), ('qk', 0.021), ('reproduces', 0.021), ('kappen', 0.021), ('inference', 0.02), ('minimal', 0.02), ('converge', 0.02), ('averaging', 0.02), ('dkl', 0.02), ('heskes', 0.02), ('mooij', 0.02), ('inferences', 0.02), ('exist', 0.019), ('probabilities', 0.019), ('yedidia', 0.019), ('stationary', 0.019), ('pairwise', 0.019), ('yet', 0.019), ('node', 0.018), ('mj', 0.018), ('local', 0.018), ('gave', 0.017), ('blind', 0.017), ('lq', 0.017), ('draw', 0.016), ('positivity', 0.016), ('adjusting', 0.016), ('ny', 0.016), ('perturbed', 0.016), ('run', 0.016), ('exact', 0.015), ('overcomplete', 0.015), ('reaches', 0.015), ('binary', 0.015), ('graphs', 0.015), ('hope', 0.015), ('realized', 0.015), ('factor', 0.015), ('marginal', 0.015), ('consistency', 0.014), ('dots', 0.014), ('arti', 0.014), ('matched', 0.014), ('slice', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
2 0.26425132 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
Author: Yusuke Watanabe
Abstract: While loopy Belief Propagation (LBP) has been utilized in a wide variety of applications with empirical success, it comes with few theoretical guarantees. Especially, if the interactions of random variables in a graphical model are strong, the behaviors of the algorithm can be difficult to analyze due to underlying phase transitions. In this paper, we develop a novel approach to the uniqueness problem of the LBP fixed point; our new “necessary and sufficient” condition is stated in terms of graphs and signs, where the sign denotes the types (attractive/repulsive) of the interaction (i.e., compatibility function) on the edge. In all previous works, uniqueness is guaranteed only in the situations where the strength of the interactions are “sufficiently” small in certain senses. In contrast, our condition covers arbitrary strong interactions on the specified class of signed graphs. The result of this paper is based on the recent theoretical advance in the LBP algorithm; the connection with the graph zeta function.
3 0.12372596 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
4 0.1192392 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
Author: Jiarong Jiang, Piyush Rai, Hal Daume
Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1
5 0.10015263 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
6 0.098668583 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
7 0.081212521 229 nips-2011-Query-Aware MCMC
8 0.079981215 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
9 0.075110368 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs
10 0.073157221 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
11 0.061631288 306 nips-2011-t-divergence Based Approximate Inference
12 0.061192367 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
13 0.060255453 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
14 0.055388913 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
15 0.051388077 174 nips-2011-Monte Carlo Value Iteration with Macro-Actions
16 0.051040728 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
17 0.050962262 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models
18 0.050426722 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems
19 0.048525561 123 nips-2011-How biased are maximum entropy models?
20 0.047941409 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
topicId topicWeight
[(0, 0.13), (1, 0.009), (2, 0.021), (3, -0.013), (4, -0.06), (5, -0.044), (6, -0.06), (7, -0.088), (8, 0.05), (9, -0.134), (10, -0.048), (11, -0.088), (12, -0.065), (13, -0.04), (14, -0.048), (15, -0.0), (16, 0.114), (17, -0.08), (18, -0.015), (19, 0.078), (20, -0.021), (21, 0.066), (22, 0.204), (23, -0.074), (24, 0.231), (25, 0.034), (26, -0.057), (27, -0.018), (28, -0.02), (29, 0.075), (30, -0.099), (31, -0.01), (32, -0.006), (33, 0.106), (34, -0.031), (35, 0.124), (36, -0.118), (37, -0.066), (38, -0.097), (39, 0.093), (40, 0.076), (41, 0.137), (42, 0.021), (43, -0.101), (44, 0.007), (45, -0.012), (46, -0.157), (47, -0.09), (48, 0.016), (49, -0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.96257728 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
2 0.80019021 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
Author: Yusuke Watanabe
Abstract: While loopy Belief Propagation (LBP) has been utilized in a wide variety of applications with empirical success, it comes with few theoretical guarantees. Especially, if the interactions of random variables in a graphical model are strong, the behaviors of the algorithm can be difficult to analyze due to underlying phase transitions. In this paper, we develop a novel approach to the uniqueness problem of the LBP fixed point; our new “necessary and sufficient” condition is stated in terms of graphs and signs, where the sign denotes the types (attractive/repulsive) of the interaction (i.e., compatibility function) on the edge. In all previous works, uniqueness is guaranteed only in the situations where the strength of the interactions are “sufficiently” small in certain senses. In contrast, our condition covers arbitrary strong interactions on the specified class of signed graphs. The result of this paper is based on the recent theoretical advance in the LBP algorithm; the connection with the graph zeta function.
3 0.71658683 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding
Author: Pablo M. Olmos, Luis Salamanca, Juan Fuentes, Fernando Pérez-Cruz
Abstract: We show an application of a tree structure for approximate inference in graphical models using the expectation propagation algorithm. These approximations are typically used over graphs with short-range cycles. We demonstrate that these approximations also help in sparse graphs with long-range loops, as the ones used in coding theory to approach channel capacity. For asymptotically large sparse graph, the expectation propagation algorithm together with the tree structure yields a completely disconnected approximation to the graphical model but, for for finite-length practical sparse graphs, the tree structure approximation to the code graph provides accurate estimates for the marginal of each variable. Furthermore, we propose a new method for constructing the tree structure on the fly that might be more amenable for sparse graphs with general factors. 1
4 0.47423422 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
Author: Jiarong Jiang, Piyush Rai, Hal Daume
Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1
5 0.46323395 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky
Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.
6 0.44473773 274 nips-2011-Structure Learning for Optimization
7 0.43146688 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
8 0.4220373 212 nips-2011-Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
9 0.405801 306 nips-2011-t-divergence Based Approximate Inference
10 0.39901119 253 nips-2011-Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment
11 0.39482081 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
12 0.38868517 67 nips-2011-Data Skeletonization via Reeb Graphs
13 0.36207572 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs
14 0.32912332 229 nips-2011-Query-Aware MCMC
15 0.32657066 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
16 0.32595721 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
17 0.32450664 55 nips-2011-Collective Graphical Models
18 0.31510416 225 nips-2011-Probabilistic amplitude and frequency demodulation
19 0.31081232 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
20 0.30728701 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems
topicId topicWeight
[(0, 0.012), (4, 0.043), (20, 0.025), (26, 0.047), (31, 0.112), (33, 0.039), (34, 0.281), (43, 0.048), (45, 0.06), (57, 0.024), (65, 0.017), (74, 0.08), (83, 0.056), (84, 0.011), (89, 0.015), (99, 0.038)]
simIndex simValue paperId paperTitle
1 0.81975943 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation
Author: Mehdi Keramati, Boris S. Gutkin
Abstract: Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identical. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion.
same-paper 2 0.76898259 158 nips-2011-Learning unbelievable probabilities
Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller
Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1
3 0.62574983 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk
Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, affine rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efficient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efficacy of the algorithm. 1
4 0.58683795 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
Author: Le Song, Eric P. Xing, Ankur P. Parikh
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
5 0.54826856 157 nips-2011-Learning to Search Efficiently in High Dimensions
Author: Zhen Li, Huazhong Ning, Liangliang Cao, Tong Zhang, Yihong Gong, Thomas S. Huang
Abstract: High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm. Hence learning to hash does not directly address the search efficiency issue. This paper considers a new framework that applies supervised learning to directly optimize a data structure that supports efficient large scale search. Our approach takes both search quality and computational cost into consideration. Specifically, we learn a boosted search forest that is optimized using pair-wise similarity labeled examples. The output of this search forest can be efficiently converted into an inverted indexing data structure, which can leverage modern text search infrastructure to achieve both scalability and efficiency. Experimental results show that our approach significantly outperforms the start-of-the-art learning to hash methods (such as spectral hashing), as well as state-of-the-art high dimensional search algorithms (such as LSH and k-means trees).
6 0.54807079 75 nips-2011-Dynamical segmentation of single trials from population neural data
7 0.51940286 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
8 0.51508689 249 nips-2011-Sequence learning with hidden units in spiking neural networks
9 0.51172191 229 nips-2011-Query-Aware MCMC
10 0.50820053 68 nips-2011-Demixed Principal Component Analysis
11 0.50747305 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
12 0.50742429 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
13 0.50341737 66 nips-2011-Crowdclustering
14 0.50338149 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
15 0.50009507 102 nips-2011-Generalised Coupled Tensor Factorisation
16 0.49986276 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
17 0.49947387 180 nips-2011-Multiple Instance Filtering
18 0.49890986 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
19 0.4987334 221 nips-2011-Priors over Recurrent Continuous Time Processes
20 0.49841145 258 nips-2011-Sparse Bayesian Multi-Task Learning