nips nips2003 nips2003-169 knowledge-graph by maker-knowledge-mining

169 nips-2003-Sample Propagation


Source: pdf

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efficient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. [sent-6, score-1.935]

2 1 Introduction Message passing on junction trees is an efficient means of solving many probabilistic inference problems [1, 2]. [sent-8, score-0.589]

3 This happens when the messages become too expensive to compute, as in discrete models of large treewidth or conditional Gaussian models [3]. [sent-10, score-0.445]

4 In these settings it is natural to investigate whether junction tree techniques can be combined with sampling to yield fast, accurate approximate inference algorithms. [sent-11, score-0.665]

5 This strategy has two disadvantages: first, the samples must be stored, which limits the sample size by space constraints (rather than time constraints); and second, variables are sampled using only local information, leading to samples that may not be likely under the entire model. [sent-13, score-0.282]

6 Another way to integrate sampling and message passing is via Rao–Blackwellization, where we repeatedly sample a subset of the model’s variables and then compute all of the messages exactly, conditioned on these sample values. [sent-14, score-1.18]

7 This technique, suggested in [6] and studied in [7], yields a powerful and flexible approximate inference algorithm; however, it can be expensive because the junction tree algorithm must be run for every sample. [sent-15, score-0.617]

8 In this paper, we present a simple implementation of Rao–Blackwellized approximate inference that avoids running the entire junction tree algorithm for every sample. [sent-16, score-0.588]

9 We develop a new message passing algorithm for junction trees that supports fast retraction of evidence, and we tightly integrate it with a blocking Gibbs sampler so that only one message must be recomputed per sample. [sent-17, score-1.393]

10 The resulting algorithm, Sample Propagation, has an appealing structure: it walks the clusters of a junction tree, resampling some of the current cluster’s variables and then passing a message to the next cluster in the walk. [sent-18, score-1.157]

11 2 Rao–Blackwellized approximation using junction tree inference We start by presenting our notation and assumptions on the probability model. [sent-19, score-0.588]

12 Then we summarize the three basic ingredients of our approach: message passing in a junction tree, Rao–Blackwellized approximation, and sampling via Markov chain Monte Carlo. [sent-20, score-0.981]

13 We use pA (·) to denote the marginal density of XA and pA|B (· | ·) to denote the conditional density of XA given XB . [sent-39, score-0.301]

14 2 Junction tree inference Given a density of the form (1), we view the problem of probabilistic inference as that of computing the expectation of a function f : E [f (X)] = u∈X p(u)f (u). [sent-42, score-0.476]

15 We can compute this marginal via message passing on a junction tree [1, 2]. [sent-46, score-1.074]

16 A junction tree for C is a singly-connected, undirected graph (C, E) with the junction tree property: for each pair of nodes (or clusters) A, B ∈ C that contain some i ∈ I, every cluster on the unique path between A and B also contains i. [sent-47, score-1.199]

17 In what follows we assume we have a junction tree for C with a cluster that covers D, the input of f . [sent-48, score-0.743]

18 (Such a junction tree can always be found, but we may have to enlarge the subsets in C. [sent-49, score-0.528]

19 ) Whereas the HUGIN message passing algorithm [2] may be more familiar, Sample Propagation is most easily described by extending the Shafer–Shenoy algorithm [1]. [sent-50, score-0.463]

20 In this algorithm, we define for each edge B → C of the junction tree a potential over B ∩ C: ψB (u ∪ v) µBC (u) ≡ v∈XB\C µAB (uA ∪ vA ), u ∈ XB∩C (3) (A,B)∈E A=C µBC is called the message from B to C. [sent-51, score-0.804]

21 Note that this definition is recursive—messages can depend on each other—with the base case being messages from leaf clusters of the junction tree. [sent-52, score-0.611]

22 For each cluster C we define a potential βC over C by βC (u) ≡ ψC (u) u ∈ XC µBC (uB ), (4) (B,C)∈E βC is called the cluster belief of C, and it follows that βC ∝ pC , i. [sent-53, score-0.446]

23 , that the cluster beliefs are the marginals over their respective variables (up to renormalization). [sent-55, score-0.384]

24 Thus we can use the (normalized) cluster beliefs βC for some C ⊇ D to compute the expectation (2). [sent-56, score-0.342]

25 In what follows we will also be interested in computing conditional cluster densities given an evidence assignment w to a subset of the variables XE . [sent-57, score-0.649]

26 Because pI\E|E (u | w) ∝ p(u ∪ w), we can “enter in” this evidence by instantiating w in every cluster potential ψC . [sent-58, score-0.307]

27 The cluster beliefs (4) will then be proportional to the conditional density pC\E|E (· | w). [sent-59, score-0.443]

28 Junction tree inference is often the most efficient means of computing exact solutions to inference problems of the sort described above. [sent-60, score-0.395]

29 However, the sums required by the messages (3) or the function expectations (2) are often prohibitively expensive to compute. [sent-61, score-0.301]

30 If the variables are all finite-valued, this happens when the clusters of the junction tree are too large; if the model is conditional-Gaussian, this happens when the messages, which are mixtures of Gaussians, have too many mixture components [3]. [sent-62, score-0.676]

31 Many models have the property that while computing exact expectations is intractable, there exists a subset of random variables XE such that the conditional expectation E [f (XD ) | XE = xE ] can be computed efficiently. [sent-66, score-0.391]

32 Algorithm 1 Rao–Blackwell estimation on a junction tree Input: A set of samples {wn : 1 ≤ n ≤ N } of XE , a function f of XD , and a cluster C ⊇ D ˆ Output: An estimate f ≈ E [f (XD )] ˆ 1: Initialize the estimator f = 0. [sent-69, score-0.815]

33 2: for n = 1 to N do 3: Enter the assignment wn as evidence into the junction tree. [sent-70, score-0.725]

34 4: Use message passing to compute the beliefs βC ∝ pC\E|E (· | wn ) via (3) and (4). [sent-71, score-0.802]

35 5: Compute the expectation E [f (XD ) | wn ] via (7). [sent-72, score-0.284]

36 ˆ 7: Set f However, the Rao–Blackwellized estimate (6) is more expensive to compute than (5) because we must compute conditional expectations. [sent-75, score-0.279]

37 In many cases, message passing in a junction tree can be used to implement these computations (see Algorithm 1). [sent-76, score-0.961]

38 We can enter each sample assignment wn as evidence into the junction tree and use message passing to compute the conditional density pC\E|E (· | wn ) for some cluster C that covers D. [sent-77, score-2.196]

39 We then compute the conditional expectation as E [f (XD ) | wn ] = n pC\E|E (u | wn )f (uD ∪ wD ) (7) u∈XC\E 2. [sent-78, score-0.657]

40 Markov chain Monte Carlo (MCMC) is a powerful technique for generating samples from a complex distribution p; we design a Markov chain whose stationary distribution is p, and simulate the chain to obtain samples [8]. [sent-80, score-0.316]

41 To obtain the benefits of sampling in a smaller space, we would like to sample directly from the marginal pE ; however, this requires us to sum out the nuisance variables XI\E from the joint density p. [sent-83, score-0.378]

42 Blocking Gibbs sampling is particularly attractive in this setting because message passing can be used to implement the required marginalizations. [sent-84, score-0.54]

43 1 Assume that the current state of the Markov chain over XE is wn . [sent-85, score-0.295]

44 To generate the next state of the chain wn+1 we choose a cluster C (randomly, or according to a schedule) and resample XC∩E n given wE\C ; i. [sent-86, score-0.395]

45 , we resample the E variables within C given the E variables outside C. [sent-88, score-0.273]

46 n The transition density can be computed by entering the evidence wE\C into the junction tree, computing the cluster belief at C, and marginalizing down to a conditional density over XC∩E . [sent-89, score-0.948]

47 2 3 Sample Propagation Algorithms 1 and 2 represent two of the three key ideas behind our proposal: both Gibbs sampling and Rao–Blackwellized estimation can be implemented efficiently using message passing on a junction tree. [sent-91, score-0.882]

48 The third idea is that these two uses of message passing can be interleaved so that each sample requires only one message to be computed. [sent-92, score-0.872]

49 1 Interestingly, the blocking Gibbs proposal [9] makes a different use of junction tree inference than we do here: they use message passing within a block of variables to efficiently generate a sample. [sent-93, score-1.264]

50 2 n In cases where the transition density pC∩E|E\C (· | wE\C ) is too large to represent or too difficult to sample from, we can use the Metropolis-Hastings algorithm, where we instead sample from a simpler proposal distribution qC∩E and then accept or reject the proposal [8]. [sent-94, score-0.346]

51 Algorithm 2 Blocking Gibbs sampler on a junction tree Input: A subset of variables XE to sample and a sample size N Output: A set of samples {wn : 1 ≤ n ≤ N } of XE 1: Choose an initial assignment w0 ∈ XE . [sent-95, score-1.082]

52 n−1 4: Enter the evidence wE\C into the junction tree. [sent-97, score-0.413]

53 n−1 5: Use message passing to compute the beliefs βC ∝ pC|E\C (· | wE\C ) via (3) and (4). [sent-98, score-0.583]

54 The second advantage is that, by being selective about when the Rao–Blackwellized estimator is updated, we can compute the messages once, not twice, per sample. [sent-103, score-0.333]

55 When the Gibbs sampler chooses to resample a cluster C that covers D (the input of f ), we can update the Rao–Blackwellized estimator for free. [sent-104, score-0.499]

56 In particular, the Gibbs sampler n−1 computes the cluster belief βC ∝ pC|E\C (· | wE\C ) in order to compute the transition n−1 n density pC∩E|E\C (· | wE\C ). [sent-105, score-0.477]

57 Once it samples wC∩E from this density, we can instantiate the sample in the belief βC to obtain the conditional density pC\E|E (· | wn ) needed by the n−1 n Rao–Blackwellized estimator. [sent-106, score-0.652]

58 ) In fact, when it is tractable to do so, we can simply use the cluster belief βC to update the estimator in (7); because it treats more variables exactly, it can yield a lower-variance estimate. [sent-108, score-0.382]

59 Therefore, if we are willing to update the Rao–Blackwellized estimator only when the Gibbs sampler chooses a cluster that covers the function’s inputs, we can focus on reducing the computational requirements of the Gibbs sampler. [sent-109, score-0.408]

60 In parallel estimation problems where every cluster is computing expectations, every sample will be used to update an estimate, but not every estimate will be updated by every sample. [sent-111, score-0.358]

61 The Gibbs sampler computes the messages so that it can compute the cluster belief βC when it resamples within a cluster C. [sent-114, score-0.846]

62 An important property of the computation (4) is that it requires only those messages directed towards C; thus, we have again reduced by half the number of messages required per sample. [sent-115, score-0.524]

63 The difficulty in further minimizing the number of messages computed by the Gibbs sampler is that the evidence on the junction tree is constantly changing. [sent-116, score-0.918]

64 It will therefore be useful to modify the message passing so that, rather than instantiating all the evidence and then passing messages, the evidence is instantiated on the fly, on a per-cluster basis. [sent-117, score-0.821]

65 This is the conditional message from B to C given evidence w on XE . [sent-119, score-0.503]

66 ˆ 2: Choose a cluster C1 ∈ C, initialize the estimator f = 0, and set the sample count M = 0. [sent-121, score-0.354]

67 3: for n = 1 to N do n−1 4: Compute the conditional cluster belief βCn |E (·, wn−1 ) ∝ pCn |E\Cn (· | wE\Cn ) via (9). [sent-122, score-0.392]

68 9: Compute the expectation E [f (XD ) | wn ] via (7). [sent-126, score-0.284]

69 ˆ ˆ 10: Set f = f + E [f (XD ) | wn ] and increment the sample count M . [sent-127, score-0.322]

70 12: Recompute the message µCn Cn+1 |E (·, wn ) via (8). [sent-129, score-0.548]

71 It is easy to verify that the conditional belief βC|E given by βC|E (u, w) ≡ ψC (u) µBC|E (uB , w), u ∈ XC , w ∈ XE (9) (B,C)∈E is proportional to the conditional density pC|E\C (u | wE\C ). [sent-132, score-0.36]

72 In particular, the conditional messages have the following important property: Proposition. [sent-134, score-0.36]

73 Let w and w be two assignments to E such that wE\D = wE\D for some cluster D. [sent-135, score-0.282]

74 Figure 1: A Venn diagram showing how the ranges of the assignment variables in (8) cover the cluster B. [sent-137, score-0.387]

75 Otherwise, by the junction property we know that if i ∈ B and i ∈ D, then i ∈ C, so again we get wB\C = wB\C . [sent-142, score-0.342]

76 n−1 n Thus, when we resample a cluster Cn , we have wE\Cn = wE\Cn and so only those messages directed away from Cn change. [sent-143, score-0.562]

77 In addition, as argued above, when we resample Cn+1 in iteration n + 1, we only require the messages directed towards Cn+1 . [sent-144, score-0.359]

78 Combining these two arguments, we find that only the messages on the directed path from Cn to Cn+1 must be recomputed in iteration n. [sent-145, score-0.294]

79 If we choose Cn+1 to be a neighbor of Cn , we only have to recompute a single message in each iteration. [sent-146, score-0.354]

80 3 The modified message passing scheme we describe can be viewed as an implementation of fast retraction for Shafer-Shenoy messages, analogous to the scheme described for HUGIN in [2, §6. [sent-148, score-0.496]

81 In the Shafer–Shenoy algorithm, the space complexity of representing the exact message (3) is O(|XB∩C |), and the time complexity of computing it is O(|XB |) (since for each assignment to B ∩ C we must sum over assignments to B\C). [sent-154, score-0.605]

82 In contrast, when computing the conditional message (8), we only sum over assignments to B\(C ∪ E), since E ∩ (B\C) is instantiated by the current sample. [sent-155, score-0.567]

83 This makes the conditional message cheaper to compute than the exact message: in the finite case the time complexity is O(|XB\(E∩(B\C)) |). [sent-156, score-0.575]

84 The space complexity of representing the conditional message is O(|XB∩C |)—the same as the exact message, since it a potential over the same variables. [sent-157, score-0.495]

85 As we sample more variables, the conditional messages become cheaper to compute. [sent-158, score-0.492]

86 However, note that the space complexity of representing the conditional message is independent of the choice of sampled variables E; even if we sample a given variable, it remains a free parameter of the conditional message. [sent-159, score-0.786]

87 ) Thus, the time complexity of computing conditional messages can be reduced by sampling more variables, but only up to a point: the time complexity of computing the conditional message must be o(|XB∩C |). [sent-161, score-1.019]

88 However, to achieve this their algorithm runs the entire junction tree algorithm in each iteration, and does not reuse messages between iterations. [sent-163, score-0.732]

89 4 Application to conditional Gaussian models A conditional Gaussian (CG) model is a probability distribution over a set of discrete variables {Xi : i ∈ ∆} and continous variables {Xi : i ∈ Γ} such that the conditional distribution of XΓ given X∆ is multivariate Gaussian. [sent-165, score-0.59]

90 For example, consider polytree models: when all of the variables are discrete or all are Gaussian, exact inference is linear in size of the model; but if the model is CG then approximate inference is NP-hard [11]. [sent-167, score-0.33]

91 In traditional junction tree inference, our goal is to compute the marginal for each cluster. [sent-168, score-0.588]

92 However, when p is a CG model, each cluster marginal is a mixture of |X∆ | Gaussians, and is intractable to represent. [sent-169, score-0.279]

93 , for each cluster we compute the best conditional Gaussian approximation of pC . [sent-172, score-0.38]

94 Unfortunately, it is often intractable because it requires strongly rooted junction trees, which can have clusters that contain most or all of the discrete variables [3]. [sent-174, score-0.535]

95 The structure of CG models makes it possible to use Sample Propagation to approximate the weak cluster marginals: we choose E = ∆, since conditioning on the discrete variables leaves a tractable Gaussian inference problem. [sent-175, score-0.461]

96 5 The expectations we must compute are of the sufficient statistics of the weak cluster marginals: for each cluster C, we need the distribution of XC∩∆ and the conditional means and covariances of XC∩Γ given XC∩∆ . [sent-176, score-0.643]

97 5 We cannot choose E = Γ because computing the conditional messages (8) may require summing discrete variables out of CG potentials, which leads to representational difficulties [3]. [sent-181, score-0.536]

98 Z2, Z3 Z T - 1, Z T (b) Assumed Density Filtering Sample Propagation Gibbs Sampling 9 average position error Lauritzen’s algorithm is intractable in this case because any strongly rooted junction tree for this network must have a cluster containing all of the discrete variables [3, Thm. [sent-197, score-0.883]

99 Both Gibbs sampling and Sample Propagation were run with a forwards–backwards sampling schedule; Sample Propagation used the junction tree of Figure 2(b). [sent-201, score-0.652]

100 HUGS: Combining exact inference and Gibbs sampling in junction trees. [sent-232, score-0.538]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('junction', 0.342), ('message', 0.306), ('messages', 0.234), ('wn', 0.219), ('cn', 0.218), ('xd', 0.218), ('blackwellized', 0.214), ('xe', 0.209), ('rao', 0.207), ('cluster', 0.203), ('pc', 0.185), ('gibbs', 0.183), ('passing', 0.157), ('tree', 0.156), ('xc', 0.142), ('conditional', 0.126), ('sampler', 0.115), ('propagation', 0.111), ('xb', 0.109), ('sample', 0.103), ('assignment', 0.093), ('resample', 0.091), ('variables', 0.091), ('inference', 0.09), ('blocking', 0.086), ('bc', 0.08), ('assignments', 0.079), ('wb', 0.078), ('cg', 0.078), ('sampling', 0.077), ('chain', 0.076), ('evidence', 0.071), ('density', 0.068), ('pcn', 0.066), ('xa', 0.061), ('instantiate', 0.052), ('compute', 0.051), ('bidyuk', 0.049), ('hugin', 0.049), ('ub', 0.049), ('estimator', 0.048), ('beliefs', 0.046), ('monte', 0.045), ('samples', 0.044), ('marginals', 0.044), ('shafer', 0.043), ('covers', 0.042), ('expectation', 0.042), ('belief', 0.04), ('enter', 0.04), ('markov', 0.039), ('marginal', 0.039), ('expectations', 0.038), ('carlo', 0.038), ('intractable', 0.037), ('zt', 0.037), ('lauritzen', 0.036), ('proposal', 0.036), ('subset', 0.035), ('clusters', 0.035), ('directed', 0.034), ('xi', 0.034), ('complexity', 0.034), ('blackwell', 0.033), ('blackwellization', 0.033), ('hugs', 0.033), ('instantiating', 0.033), ('retraction', 0.033), ('shenoy', 0.033), ('wc', 0.033), ('wcn', 0.033), ('ab', 0.033), ('kj', 0.033), ('backwards', 0.031), ('subsets', 0.03), ('hybrid', 0.03), ('computing', 0.03), ('discrete', 0.03), ('exact', 0.029), ('expensive', 0.029), ('cheaper', 0.029), ('dechter', 0.029), ('carter', 0.029), ('happens', 0.026), ('marginalize', 0.026), ('ud', 0.026), ('recomputed', 0.026), ('instantiated', 0.026), ('choose', 0.025), ('xt', 0.024), ('forwards', 0.024), ('position', 0.024), ('recompute', 0.023), ('walks', 0.023), ('via', 0.023), ('estimate', 0.022), ('reduced', 0.022), ('weak', 0.022), ('tightly', 0.022), ('dawid', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efficient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

2 0.19757882 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

3 0.17444634 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Author: Amos J. Storkey

Abstract: Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficients. Furthermore efficient generalised belief propagation methods between clusters of four nodes enable the Fourier coefficients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal. 1

4 0.16484004 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

Author: Alexander T. Ihler, Erik B. Sudderth, William T. Freeman, Alan S. Willsky

Abstract: The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods. 1

5 0.13729867 82 nips-2003-Geometric Clustering Using the Information Bottleneck Method

Author: Susanne Still, William Bialek, Léon Bottou

Abstract: We argue that K–means and deterministic annealing algorithms for geometric clustering can be derived from the more general Information Bottleneck approach. If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of “smooth” initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification, we argue that this approach can be more efficient and robust than classic algorithms. 1

6 0.12277301 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

7 0.11295025 32 nips-2003-Approximate Expectation Maximization

8 0.1109255 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

9 0.10141076 152 nips-2003-Pairwise Clustering and Graphical Models

10 0.096832663 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

11 0.09604951 117 nips-2003-Linear Response for Approximate Inference

12 0.084532641 73 nips-2003-Feature Selection in Clustering Problems

13 0.083590358 30 nips-2003-Approximability of Probability Distributions

14 0.082986675 100 nips-2003-Laplace Propagation

15 0.081354119 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

16 0.078076363 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

17 0.078047037 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

18 0.070287324 111 nips-2003-Learning the k in k-means

19 0.069278218 46 nips-2003-Clustering with the Connectivity Kernel

20 0.066107936 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.203), (1, -0.052), (2, -0.113), (3, 0.254), (4, 0.042), (5, -0.151), (6, 0.068), (7, 0.167), (8, -0.014), (9, 0.102), (10, -0.075), (11, 0.099), (12, -0.022), (13, -0.065), (14, 0.033), (15, -0.012), (16, 0.007), (17, 0.049), (18, 0.06), (19, 0.051), (20, -0.029), (21, 0.142), (22, 0.027), (23, 0.022), (24, 0.011), (25, -0.069), (26, 0.128), (27, -0.013), (28, 0.062), (29, -0.135), (30, 0.004), (31, -0.04), (32, -0.105), (33, -0.027), (34, 0.025), (35, 0.005), (36, -0.133), (37, -0.104), (38, -0.012), (39, -0.123), (40, -0.043), (41, -0.024), (42, 0.071), (43, 0.119), (44, -0.081), (45, -0.011), (46, 0.026), (47, -0.04), (48, 0.096), (49, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97642213 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efficient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

2 0.74980551 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of difficult graphical models to find the most probable configuration of the hidden variables. In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of finding the M best configurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best configurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables. 1

3 0.6639716 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

Author: Alexander T. Ihler, Erik B. Sudderth, William T. Freeman, Alan S. Willsky

Abstract: The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods. 1

4 0.65545702 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

5 0.5456329 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

Abstract: We discuss the integration of the expectation-maximization (EM) algorithm for maximum likelihood learning of Bayesian networks with belief propagation algorithms for approximate inference. Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. Simulations illustrate the merits of such an approach. 1

6 0.5308798 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

7 0.49201345 152 nips-2003-Pairwise Clustering and Graphical Models

8 0.44583657 30 nips-2003-Approximability of Probability Distributions

9 0.42790309 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

10 0.41811296 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

11 0.39919233 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

12 0.39201245 196 nips-2003-Wormholes Improve Contrastive Divergence

13 0.36365846 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

14 0.34688875 172 nips-2003-Semi-Supervised Learning with Trees

15 0.34321067 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

16 0.34027049 176 nips-2003-Sequential Bayesian Kernel Regression

17 0.327613 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

18 0.31921324 73 nips-2003-Feature Selection in Clustering Problems

19 0.31723711 111 nips-2003-Learning the k in k-means

20 0.30842698 79 nips-2003-Gene Expression Clustering with Functional Mixture Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.053), (11, 0.016), (29, 0.02), (33, 0.018), (35, 0.072), (53, 0.074), (59, 0.369), (69, 0.031), (71, 0.058), (76, 0.051), (85, 0.063), (91, 0.049), (99, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90902478 18 nips-2003-A Summating, Exponentially-Decaying CMOS Synapse for Spiking Neural Systems

Author: Rock Z. Shi, Timothy K. Horiuchi

Abstract: Synapses are a critical element of biologically-realistic, spike-based neural computation, serving the role of communication, computation, and modification. Many different circuit implementations of synapse function exist with different computational goals in mind. In this paper we describe a new CMOS synapse design that separately controls quiescent leak current, synaptic gain, and time-constant of decay. This circuit implements part of a commonly-used kinetic model of synaptic conductance. We show a theoretical analysis and experimental data for prototypes fabricated in a commercially-available 1.5µm CMOS process. 1

same-paper 2 0.81359911 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that flexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efficient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

3 0.75477678 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity

Author: Yuval Aviel, David Horn, Moshe Abeles

Abstract: A balanced network leads to contradictory constraints on memory models, as exemplified in previous work on accommodation of synfire chains. Here we show that these constraints can be overcome by introducing a 'shadow' inhibitory pattern for each excitatory pattern of the model. This is interpreted as a doublebalance principle, whereby there exists both global balance between average excitatory and inhibitory currents and local balance between the currents carrying coherent activity at any given time frame. This principle can be applied to networks with Hebbian cell assemblies, leading to a high capacity of the associative memory. The number of possible patterns is limited by a combinatorial constraint that turns out to be P=0.06N within the specific model that we employ. This limit is reached by the Hebbian cell assembly network. To the best of our knowledge this is the first time that such high memory capacities are demonstrated in the asynchronous state of models of spiking neurons. 1 In trod u ction Numerous studies analyze the different phases of unstructured networks of spiking neurons [1, 2]. These networks with random connectivity possess a phase of asynchronous activity, the asynchronous state (AS), which is the most interesting one from the biological perspective, since it is similar to physiological data. Unstructured networks, however, do not hold information in their connectivity matrix, and therefore do not store memories. Binary networks with ordered connectivity matrices, or structured networks, and their ability to store and retrieve memories, have been extensively studied in the past [3-8]. Applicability of these results to biologically plausible neuronal models is questionable. In particular, models of spiking neurons are known to have modes of synchronous global oscillations. Avoiding such modes, and staying in an AS, is a major constraint on networks of spiking neurons that is absent in most binary neural networks. As we will show below, it is this constraint that imposes a limit on capacity in our model. Existing associative memory models of spiking neurons have not strived for maximal pattern capacity [3, 4, 8]. Here, using an integrate-and-fire model, we embed structured synaptic connections in an otherwise unstructured network and study the capacity limit of the system. The system is therefore macroscopically unstructured, but microscopically structured. The unstructured network model is based on Brunel's [1] balanced network of integrate-and-fire neurons. In his model, the network possesses different phases, one of which is the AS. We replace his unstructured excitatory connectivity by a semistructured one, including a super-position of either synfire chains or Hebbian cell assemblies. The existence of a stable AS is a fundamental prerequisite of the system. There are two reasons for that: First, physiological measurements of cortical tissues reveal an irregular neuronal activity and an asynchronous population activity. These findings match the properties of the AS. Second, in term of information content, the entropy of the system is the highest when firing probability is uniformly distributed, as in an AS. In general, embedding one or two patterns will not destabilize the AS. Increasing the number of embedded patterns, however, will eventually destabilize the AS, leading to global oscillations. In previous work [9], we have demonstrated that the cause of AS instability is correlations between neurons that result from the presence of structure in the network. The patterns, be it Hebbian cell assemblies (HCA) or pools occurring in synfire chains (SFC), have an important characteristic: neurons that are members of the same pattern (or pool) share a large portion of their inputs. This common input correlates neuronal activities both when a pattern is activated and when both neurons are influenced by random activity. If too many patterns are embedded in the network, too many neurons become correlated due to common inputs, leading to globally synchronized deviations from mean activity. A qualitative understanding of this state of affairs is provided by a simple model of a threshold linear pair of neurons that receive n excitatory common, and correlated, inputs, and K-n excitatory, as well as K inhibitory, non-common uncorrelated inputs. Thinking of these neurons as belonging to a pattern or a pool within a network, we can obtain an interesting self-consistent result by assuming the correlation of the pair of neurons to be also the correlation in their common correlated input (as is likely to be the case in a network loaded with HCA or SFC). We find then [9] that there exists a critical pattern size, n c , below which correlations decay but above which correlations are amplified. Furthermore, the following scaling was found to exist (1) nc = rc K . Implications of this model for the whole network are that: (i) rc is independent of N, the size of the network, (ii) below nc the AS is stable, and (iii) above nc the AS is unstable. Using extensive computer simulations we were able [9] to validate all these predictions. In addition, keeping n nmin, by the requirement that n excitatory post-synaptic potentials (PSPs), on average, drive a neuron across its threshold. Since N>K and typically N>>K, together with Eq. (1) it follows that N >> (n min / rc ) . Hence rc and nmin set the lower bound of the network's size, 2 above which it is possible to embed a reasonable number of patterns in the network without losing the AS. In this paper we propose a solution that enables small n min and large r values, which in turn enables embedding a large number of patterns in much smaller networks. This is made possible by the doubly-balanced construction to be outlined below. 2 The double-balance principle Counteracting the excitatory correlations with inhibitory ones is the principle that will allow us to solve the problem. Since we deal with balanced networks, in which the mean excitatory input is balanced by an inhibitory one, we note that this principle imposes a second type of balancing condition, hence we refer to it as the double- balance principle. In the following, we apply this principle by introducing synaptic connections between any excitatory pattern and its randomly chosen inhibitory pattern. These inhibitory patterns, which we call shadow patterns, are activated after the excitatory patterns fire, but have no special in-pattern connectivity or structured projections onto other patterns. The premise is that correlations evolved in the excitatory patterns will elicit correlated inhibitory activity, thus balancing the network's average correlation level. The size of the shadow pattern has to be small enough, so that the global network activity will not be quenched, yet large enough, so that the excitatory correlation will be counteracted. A balanced network that is embedded with patterns and their shadow patterns will be referred to as a doubly balanced network (DBN), to be contrasted with the singly balanced network (SBN) where shadow patterns are absent. 3 3.1 Application of the double balance principle. The Network We model neuronal activity with the Integrate and Fire [10] model. All neurons have the same parameters: τ = 10ms , τ ref = 2.5ms , C=250pF. PSPs are modeled by a delta function with fixed delay. The number of synapses on a neuron is fixed and set to KE excitatory synapses from the local network, KE excitatory synapses from external sources and KI inhibitory synapses from the local network. See Aviel et al [9] for details. All synapses of each group will be given fixed values. It is allowed for one pre-synaptic neuron to make more than one connection to one postsynaptic neuron. The network possesses NE excitatory neurons and N I ≡ γN E inhibitory neurons. Connectivity is sparse, ε = 0.1 ). K E N E = K I N I = ε , (we use A Poisson process with rate vext=10Hz models the external source. If a neuron of population y innervates a neuron of population x its synaptic strength J xy is defined as J xE ≡ J 0 K E , J xI ≡ − gJ 0 with J0=10, and g=5. Note that J xI = − g γ KI J xE , hence g γ controls the balance between the two populations. Within an HCA pattern the neurons have high connection probability with one another. Here it is achieved by requiring L of the synapses of a neuron in the excitatory pattern to originate from within the pattern. Similarly, a neuron in the inhibitory shadow pattern dedicates L of its synapses to the associated excitatory pattern. In a SFC, each neuron in an excitatory pool is fed by L neurons from the previous pool. This forms a feed forward connectivity. In addition, when shadow pools are present, each neuron in a shadow pool is fed by L neurons from its associated excitatory pool. In both cases L = C L K E , with CL=2.5. The size of the excitatory patterns (i.e. the number of neurons participating in a pattern) or pools, nE, is also chosen to be proportional to K E (see Aviel et al. 2003 [9]), nE ≡ Cn K E , where Cn varies. This is a suitable choice, because of the behavior of the critical nc of Eq. (1), and is needed for the meaningful memory activity (of the HCA or SFC) to overcome synaptic noise. ~ The size of a shadow pattern is defined as nI ≡ d nE . This leads to the factor d, representing the relative strength of inhibitory and excitatory currents, due to a pattern or pool, affecting a neuron that is connected to both: d≡ (2) Thus it fixes nI = d − J xI nI J xE nE = gJ 0 K E d gd . = J0 K I γ ( )n . In the simulations reported below d varied between 1 γ g E and 3. Wiring the network is done in two stages, first all excitatory patterns are wired, and then random connections are added, complying with the fixed number of synapses. A volley of w spikes, normally distributed over time with width of 1ms, is used to ignite a memory pattern. In the case of SFC, the first pool is ignited, and under the right conditions the volley propagates along the chain without fading away and without destabilizing the AS. 3.2 Results First we show that the AS remains stable when embedding HCAs in a small DBN, whereas global oscillations take place if embedding is done without shadow pools. Figure 1 displays clearly the sustained activity of an HCA in the DBN. The same principle also enables embedding of SFCs in a small network. This is to be contrasted with the conclusions drawn in Aviel et al [9], where it was shown that otherwise very large networks are necessary to reach this goal. Figure 1: HCAs are embedded in a balanced network without (left) and with (right) shadow patterns. P=300 HCAs of size nE=194 excitatory neurons were embedded in a network of NE=15,000 excitatory neurons. The eleventh pattern is externally ignited at time t=100ms. A raster plot of 200ms is displayed. Without shadow patterns the network exhibits global oscillations, but with shadow patterns the network exhibits only minute oscillations, enabling the activity of the ignited pattern to be sustained. The size of the shadow patterns is set according to Eq. (2) with d=1. Neurons that participate in more than one HCA may appear more than once on the raster plot, whose y-axis is ordered according to HCAs, and represents every second neuron in each pattern. Figure 2: SFCs embedded in a balanced network without (left) and with (right) shadow patterns. The first pool is externally ignited at time t=100ms. d=0.5. The rest of the parameters are as in Figure 1. Here again, without shadow pools, the network exhibits global oscillations, but with shadow pools it has only minute oscillation, enabling a stable propagation of the synfire wave. 3.3 Maximum Capacity In this section we show that, within our DBN, it is the fixed number of synapses (rather than dynamical constraints) that dictates the maximal number of patterns or pools P that may be loaded onto the network. Let us start by noting that a neuron of population x (E or I) can participate in at most m ≡ K E L  patterns, hence N x m sets an upper bound on the number of neurons that participate in all patterns: P n x P ≤ m ⋅ N x . Next, defining α x ≡ , we find that Nx αx ≤ (3) m nx =  K E CL K E    nx To leading order in NE this turns into  K E CL K E   N = C C D −1 N − O αxNx =  n L x E E D C K (4) x n E ( ) ( NE ) (g γ ) if x=I, or 1 for x=E. where Dx ≡ d Thus we conclude that synaptic combinatorial considerations lead to a maximal number of patterns P. If DI<1, including the case DI=0 of the SBN, the excitatory neurons determine the limit to be P = (C n C L ) N E . If, as is the case in our DBN, −1 DI>1, then ( γα I < α E P = C n C L DI ) −1 and the inhibitory neurons set the maximum value to NE . For example, setting Cn=3.5, CL=2.4, g=3 and d=3, in Eq. (4), we get P=0.06NE. In Figure 3 we use these parameters. The capacity of a DBN is compared to that of an SBN for different network sizes. The maximal load is defined by the presence of global oscillation strong enough to prohibit sustained activity of patterns. The DBN reaches the combinatorial limit, whereas the SBN does not increase with N and obviously does not reach its combinatorial limit. 1400 1200 Pmax 1000 DBN SBN DBN Upper Limit SBN Upper Limit 800 600 400 200 0 0 5000 10000 NE 15000 Figure 3: A balanced network maximally loaded with HCAs. Left: A raster plot of a maximally loaded DBN. P=408, NE=6,000. At time t=450ms, the seventh pattern is ignited for a duration of 10ms, leading to termination of another pattern's activity (upper stripe) and to sustained activity of the ignited pattern (lower stripe). Right: P(NE) as inferred from simulations of a SBN (

4 0.4466812 129 nips-2003-Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons

Author: Hsin Chen, Patrice Fleury, Alan F. Murray

Abstract: This paper presents VLSI circuits with continuous-valued probabilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Continuous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomedical data. The Minimising-Contrastive-Divergence learning algorithm for CRBM is also implemented in mixed-mode VLSI, to adapt the noisy neurons’ parameters on-chip. 1

5 0.42746156 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector

Author: Reid R. Harrison

Abstract: We have designed and tested a single-chip analog VLSI sensor that detects imminent collisions by measuring radially expansive optic flow. The design of the chip is based on a model proposed to explain leg-extension behavior in flies during landing approaches. A new elementary motion detector (EMD) circuit was developed to measure optic flow. This EMD circuit models the bandpass nature of large monopolar cells (LMCs) immediately postsynaptic to photoreceptors in the fly visual system. A 16 × 16 array of 2-D motion detectors was fabricated on a 2.24 mm × 2.24 mm die in a standard 0.5-µm CMOS process. The chip consumes 140 µW of power from a 5 V supply. With the addition of wide-angle optics, the sensor is able to detect collisions around 500 ms before impact in complex, real-world scenes. 1

6 0.41333359 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model

7 0.41285768 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons

8 0.41128448 16 nips-2003-A Recurrent Model of Orientation Maps with Simple and Complex Cells

9 0.40063399 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

10 0.38891944 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

11 0.38796833 122 nips-2003-Margin Maximizing Loss Functions

12 0.38674721 189 nips-2003-Tree-structured Approximations by Expectation Propagation

13 0.38430479 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

14 0.38345382 172 nips-2003-Semi-Supervised Learning with Trees

15 0.38329315 78 nips-2003-Gaussian Processes in Reinforcement Learning

16 0.38293752 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses

17 0.38011491 158 nips-2003-Policy Search by Dynamic Programming

18 0.37946302 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

19 0.37846279 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

20 0.37830332 113 nips-2003-Learning with Local and Global Consistency