nips nips2013 nips2013-161 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas Stuhlmüller, Jacob Taylor, Noah Goodman
Abstract: We describe a class of algorithms for amortized inference in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Goodman Department of Psychology Stanford University Abstract We describe a class of algorithms for amortized inference in Bayesian networks. [sent-2, score-0.187]
2 Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. [sent-4, score-0.342]
3 These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. [sent-6, score-0.508]
4 We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. [sent-7, score-0.474]
5 To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. [sent-8, score-1.036]
6 Humans and robots are in the setting of amortized inference: they have to solve many similar inference problems, and can thus offload part of the computational work to shared precomputation and adaptation over time. [sent-17, score-0.253]
7 While much of this work is focused on adaptation for a single posterior inference, amortized inference calls for adaptation across many different inferences. [sent-21, score-0.379]
8 In this setting, we will often have considerable training data available in the form of posterior samples from previous inferences; how should we use this data to adapt our inference procedure? [sent-22, score-0.395]
9 We consider using training samples to learn the inverse structure of a directed model. [sent-23, score-0.438]
10 Posterior inference is the task of inverting a probabilistic model: Bayes’ theorem turns p(d|h) into p(h|d); vision is commonly understood as inverse graphics (Horn, 1977) and, more recently, as inverse physics (Sanborn et al. [sent-24, score-0.527]
11 However, while this is a good description of the problem that inference solves, conditional sampling usually does not proceed backwards step-by-step. [sent-28, score-0.226]
12 ally and actually learning the inverse conditionals needed to invert the model. [sent-30, score-0.334]
13 Knowing the conditionals for this inverse factorization would allow us to rapidly sample the latent variables given an observation. [sent-33, score-0.375]
14 In this paper, we will explore what these factorizations look like for Bayesian networks, how to learn them, and how to use them to construct block proposals for MCMC. [sent-34, score-0.323]
15 We say that a Bayesian network H expresses an inverse factorization of p if the observations y do not have parents (but may themselves be parents of some xi ): m p(xi |paH (xi )) p(x, y) = p(y) i=1 As an example, consider the forward and inverse networks shown in Figure 1. [sent-44, score-0.892]
16 We call the conditional distributions p(xi |paH (xi )) stochastic inverses, with inputs paH (xi ) and output xi . [sent-45, score-0.213]
17 If we could sample from these distributions, we could produce samples from p(x|y) for arbitrary y, which solves the problem of inference for all queries with the same set of observation nodes. [sent-46, score-0.222]
18 This fact will be important in Section 4 when we resample subsets of inverse graphs. [sent-49, score-0.239]
19 Algorithm 1 gives a heuristic method for computing an inverse factorization given Bayes net G, observation nodes y, and desired leaf node xi . [sent-50, score-0.622]
20 We then add the nodes in order to the inverse graph, with dependencies determined by the graph structure of the original network. [sent-52, score-0.392]
21 In the setting of amortized inference, past tasks provide approximate posterior samples for the corresponding observations. [sent-53, score-0.342]
22 We therefore investigate learning inverses from such samples, and ways of using approximate stochastic inverses for improving the efficiency of solving future inference tasks. [sent-54, score-0.893]
23 (Learning from prior samples) Let H be an inverse factorization. [sent-57, score-0.246]
24 We now show that valid inverse factorizations allow us to learn from posterior samples as well. [sent-60, score-0.49]
25 (Learning from posterior samples) Let H be an inverse factorization. [sent-62, score-0.338]
26 The nodes y are root nodes in H and hence do not descend from xi . [sent-68, score-0.328]
27 In addition, we can combine samples from distributions conditioned on several different observation sets to produce more accurate estimates of the inverse conditionals. [sent-71, score-0.363]
28 , (z (k) , xi ) in S based on distance to z (1) (k) 2: construct density estimate q on xi , . [sent-80, score-0.321]
29 Then, use a consistent density estimator to construct a density estimate on the nearby previous outputs and sample an output xi from the estimated distribution. [sent-84, score-0.26]
30 4 Inverse MCMC We have described how to compute the structure of inverse Bayes nets, and how to learn the associated conditional distributions and densities from prior and posterior samples. [sent-89, score-0.461]
31 We propose the following Inverse MCMC procedure (Algorithm 3): Offline, use Algorithm 1 to compute an inverse graph for each latent node and train each local inverse in this graph from (posterior or prior) samples. [sent-92, score-0.591]
32 With little training data, we will want to make small proposals (small k) in order to achieve a reasonable acceptance rate; with more training data, we can make larger proposals and expect to succeed. [sent-94, score-0.668]
33 Let G be a Bayesian network, let θ be a consistent estimator (for inverse conditionals), let {Hi }i∈1. [sent-96, score-0.276]
34 m be a collection of inverse graphs produced using Algorithm 1, and assume a source of training samples (prior or posterior) with full support. [sent-98, score-0.386]
35 Then, as training set size |S| → ∞, Inverse MCMC with proposal size k converges to block-Gibbs sampling where blocks are the last k nodes in each Hi . [sent-99, score-0.288]
36 In particular, it converges to Gibbs sampling for proposal size k = 1 and to exact posterior sampling for k = |G|. [sent-100, score-0.288]
37 We must show that proposals are made from the conditional posterior in the limit of large training data. [sent-102, score-0.45]
38 Fix an inverse H, and let x be the last k variables in H. [sent-103, score-0.214]
39 Now the conditional distribution over x factorizes along the inverse graph: |H| p(x|paH (x)) = i=k p(xi |paH (xi )). [sent-106, score-0.253]
40 Hence, using the estimated inverses to sequentially sample the x variables results, in the limit, in samples from the conditional distribution given remaining variables. [sent-108, score-0.531]
41 (Note that, in the limit, these proposals will always be accepted. [sent-109, score-0.21]
42 4 Algorithm 3: Inverse MCMC Algorithm 4: Inverse MCMC proposer Input: Prior or posterior samples S Output: Samples x(1) , . [sent-115, score-0.219]
43 m do 4: train inverse θS (xj |paHi (xj )) 5: end for 6: end for Online (MH with inverse proposals): 1: for t in 1 . [sent-124, score-0.472]
44 T do 2: x , pfw , pbw from Algorithm 4 3: x ← x with MH acceptance rule 4: end for Input: State x, observations y, ordered inverse graphs {Hi }i∈1. [sent-127, score-0.488]
45 m , proposal size kmax , inverses θ Output: Proposed state x , forward and backward probabilities pfw and pbw 1: H ∼ Uniform({Hi }i∈1. [sent-129, score-0.715]
46 , kmax − 1}) 3: x ← x 4: pfw , pbw ← 0 5: for j in n − k, . [sent-134, score-0.209]
47 , n do 6: let xl be jth variable in H 7: xl ∼ θ(xl |paH (xl )) 8: pfw ← pfw · pθ (xl |paH (xl )) 9: pbw ← pbw · pθ (xl |paH (xl )) 10: end for Instead of learning the k=1 “Gibbs” conditionals for each inverse graph, we can often precompute these distributions to “seed” our sampler. [sent-137, score-0.901]
48 This suggests a bootstrapping procedure for amortized inference on observations y (1) , . [sent-138, score-0.212]
49 , y (t) : first, precompute the “Gibbs” distributions so that k=1 proposals will be reasonably effective; then iterate between training on previously generated approximate posterior samples and doing inference on the next observation. [sent-141, score-0.675]
50 For networks with near-deterministic dependencies, Gibbs may be unable to generate training samples of sufficient quality. [sent-143, score-0.221]
51 This poses a chicken-and-egg problem: we need a sufficiently good posterior sampler to generate the data required to train our sampler. [sent-144, score-0.239]
52 To address this problem, we propose a simple annealing scheme: We introduce a temperature parameter t that controls the extent to which (almost-)deterministic dependencies in a network are relaxed. [sent-145, score-0.241]
53 We produce a sequence of trained samplers, one for each temperature, by generating samples for a network with temperature ti+1 using a sampler trained on approximate samples for the network with next-higher temperature ti . [sent-146, score-0.585]
54 Finally, we discard all samplers except for the sampler trained on the network with t = 0, the network of interest. [sent-147, score-0.242]
55 We start by studying the behavior of the Inverse MCMC algorithm with empirical frequency estimator on a 225-node rectangular grid network from the UAI 2008 inference competition. [sent-150, score-0.299]
56 This network has binary nodes and approximately 50% deterministic dependencies, which we relax to dependencies with strength . [sent-151, score-0.214]
57 We select the 15 nodes on the diagonal as observations and remove any nodes below, leaving a triangular network with 120 nodes and treewidth 15 (Figure 2). [sent-153, score-0.342]
58 , 2010), and calculate the error of our estimates P s as error = 1 N N i=1 1 |Xi | |P ∗ (Xi = xi ) − P s (Xi = xi )|. [sent-155, score-0.344]
59 xi ∈Xi We generate 20 inference tasks as sources of training samples by sampling values for the 15 observation nodes uniformly at random. [sent-156, score-0.615]
60 We precompute the “final” inverse conditionals as outlined above, producing a Gibbs sampler when k=1. [sent-157, score-0.449]
61 For each inference task, we use this sampler to generate 105 approximate posterior samples. [sent-158, score-0.294]
62 00 Error in marginals Gibbs Inverses (10x10) Inverses (10x100) Inverses (10x1000) Figure 3: The effect of training on approximate posterior samples for 10 inference tasks. [sent-169, score-0.446]
63 As the number of training samples per task increases, Inverse MCMC with proposals of size 20 performs new inference tasks more quickly. [sent-170, score-0.516]
64 1e+01 1e+02 1e+03 1e+04 1e+05 Number of training samples Figure 4: Learning an inverse distribution for the brightness constancy model (Figure 1) from prior samples using the KNN density predictor. [sent-171, score-0.596]
65 More training samples result in better estimates after the same number of MCMC steps. [sent-172, score-0.172]
66 Figures 3 and 5 show the effect of training the frequency estimator on 10 inference tasks and testing on a different task (averaged over 20 runs). [sent-173, score-0.311]
67 Inverse proposals of (up to) size k=20 do worse than pure Gibbs sampling with little training (due to higher rejection rate), but they speed convergence as the number of training samples increases. [sent-174, score-0.502]
68 More generally, large proposals are likely to be rejected without training, but improve convergence after training. [sent-175, score-0.21]
69 Figure 6 illustrates how the number of inference tasks influences error and MH acceptance ratio in a setting where the total number of training samples is kept constant. [sent-176, score-0.424]
70 Surprisingly, increasing the number of training tasks from 5 to 15 has little effect on error and acceptance ratio for this network. [sent-177, score-0.23]
71 That is, it seems relatively unimportant which posterior the training samples are drawn from; we may expect different results when posteriors are more sparse. [sent-178, score-0.296]
72 Figure 7 shows how different sources of training data affect the quality of the trained sampler (averaged over 20 runs). [sent-179, score-0.175]
73 As the strength of near-deterministic dependencies increases, direct training on Gibbs samples becomes infeasible. [sent-180, score-0.224]
74 In this regime, we can still train on prior samples and on Gibbs samples for networks with relaxed dependencies. [sent-181, score-0.348]
75 01, 0]—that is, we start by learning inverses for the relaxed network where all CPT probabilities are constrained to lie within [. [sent-188, score-0.502]
76 8]; we then use these inverses as proposers for MCMC inference on a network constrained to CPT probabilities in [. [sent-190, score-0.568]
77 While the empirical frequency estimator used in the above experiments provides an attractive asymptotic convergence guarantee (Theorem 3), it is likely to generalize slowly from small amounts of training data. [sent-193, score-0.203]
78 We compare inference using a logistic regression estimator with L2 regularization (with and without interaction terms) to inference using the empirical frequency estimator. [sent-198, score-0.328]
79 The regression estimator with interaction terms results in significantly better results when training on few posterior samples, but is ultimately overtaken by the consistent empirical estimator. [sent-200, score-0.263]
80 Next, we use the KNN density predictor to learn inverse distributions for the continuous Bayesian network shown in Figure 1. [sent-201, score-0.403]
81 16 Maximum proposal size Maximum proposal size 30 4 1 Log10(training samples per task) 2 3 4 Log10(training samples per task) Figure 5: Without training, big inverse proposals result in high error, as they are unlikely to be accepted. [sent-219, score-0.77]
82 As we increase the number of approximate posterior samples used to train the MCMC sampler, the acceptance probability for big proposals goes up, which decreases overall error. [sent-220, score-0.567]
83 , samples for other observations) we train on has little effect on acceptance ratio (and error) if we keep the total number of training samples constant. [sent-239, score-0.405]
84 For others, we can use prior samples, Gibbs samples for relaxed networks, and samples from a sequence of annealed Inverse samplers. [sent-241, score-0.255]
85 As we refine the inverses using forward samples, the error in the estimated marginals decreases towards 0, providing evidence for convergence towards a posterior sampler (Figure 4). [sent-244, score-0.698]
86 To evaluate Inverse MCMC in more breadth, we run the algorithm on all binary Bayes nets with up to 500 nodes that have been submitted to the UAI 08 inference competition (216 networks). [sent-245, score-0.23]
87 Since many of these networks exhibit strong determinism, we train on prior samples and apply the annealing scheme outlined above to generate approximate posterior samples. [sent-246, score-0.421]
88 While performance varies across network classes—with extremely deterministic networks making the acquisition of training data challenging—the comparison with Gibbs suggests that learned block proposals frequently help. [sent-253, score-0.44]
89 Overall, these results indicate that Inverse MCMC is of practical benefit for learning block proposals in reasonably large Bayes nets and using a realistic amount of training data (an amount that might result from amortizing over five or ten inferences). [sent-254, score-0.36]
90 2 10 Gibbs error integral 20 50 100 200 500 Number of training samples Figure 8: Each mark represents a single run of a model from the UAI 08 inference competition. [sent-263, score-0.295]
91 1 q qq q qqqq qqqqq q q q q q qq q qq q q q q qq q q q q q qq q q q qq q q q qq qq qq qqq q q qq q q q q q qq q q q qq q Frequency estimator Logistic regression (L2) Logistic regression (L2 + ^2) 0. [sent-266, score-1.874]
92 0 Inverse MCMC error integral q Figure 9: Integrated error (over 1s of inference) as a function of the number of samples used to train inverses, comparing logistic regression with and without interaction terms to an empirical frequency estimator. [sent-272, score-0.255]
93 Related work A recognition network (Morris, 2001) is a multilayer perceptron used to predict posterior marginals. [sent-273, score-0.196]
94 By learning local inverses our technique generalizes in a more fine-grained way, and can be combined with MCMC to provide unbiased samples. [sent-275, score-0.397]
95 These techniques typically learn Bayes nets which are directed “forward”, which means that the conditional distributions must be learned from posterior samples, creating a chicken-and-egg problem. [sent-281, score-0.282]
96 Because our trained model is directed “backwards”, we can learn from both prior and posterior samples. [sent-282, score-0.235]
97 It is well-known that this can be addressed using block proposals, but such proposals typically need to be built manually for each model. [sent-284, score-0.242]
98 In our framework, block proposals are learned from past samples, with a natural parameter for adjusting the block size. [sent-285, score-0.274]
99 We have given simple methods for estimating and using these inverses as part of an MCMC algorithm. [sent-287, score-0.397]
100 Jags: A program for analysis of bayesian graphical models using gibbs sampling. [sent-346, score-0.175]
wordName wordTfidf (topN-words)
[('pah', 0.615), ('inverses', 0.397), ('inverse', 0.214), ('proposals', 0.21), ('qq', 0.151), ('xi', 0.148), ('mcmc', 0.147), ('gibbs', 0.13), ('posterior', 0.124), ('luminance', 0.112), ('inference', 0.099), ('conditionals', 0.097), ('samples', 0.095), ('acceptance', 0.094), ('determinism', 0.09), ('pbw', 0.09), ('pfw', 0.09), ('nodes', 0.09), ('amortized', 0.088), ('xl', 0.08), ('proposal', 0.078), ('training', 0.077), ('network', 0.072), ('sampler', 0.071), ('factorization', 0.064), ('temperature', 0.063), ('estimator', 0.062), ('druzdzel', 0.054), ('haario', 0.054), ('ortiz', 0.054), ('annealing', 0.054), ('dependencies', 0.052), ('marginals', 0.051), ('parents', 0.05), ('networks', 0.049), ('ectance', 0.048), ('node', 0.047), ('bayes', 0.046), ('backwards', 0.045), ('bayesian', 0.045), ('train', 0.044), ('precompute', 0.044), ('illumination', 0.044), ('sampling', 0.043), ('nets', 0.041), ('predictor', 0.04), ('importance', 0.039), ('conditional', 0.039), ('frequency', 0.038), ('roberts', 0.038), ('hi', 0.038), ('mh', 0.036), ('graph', 0.036), ('cpt', 0.036), ('hernandez', 0.036), ('jags', 0.036), ('mateescu', 0.036), ('pag', 0.036), ('plummer', 0.036), ('rosenthal', 0.036), ('salmeron', 0.036), ('sanborn', 0.036), ('tasks', 0.035), ('knn', 0.034), ('adaptation', 0.034), ('relaxed', 0.033), ('watanabe', 0.032), ('kaelbling', 0.032), ('precomputation', 0.032), ('shachter', 0.032), ('block', 0.032), ('prior', 0.032), ('leaf', 0.031), ('factorizations', 0.031), ('cheng', 0.031), ('forward', 0.031), ('logistic', 0.03), ('constancy', 0.029), ('wingate', 0.029), ('brightness', 0.029), ('kmax', 0.029), ('observation', 0.028), ('grid', 0.028), ('trained', 0.027), ('xj', 0.027), ('learn', 0.026), ('neighbors', 0.026), ('slowly', 0.026), ('directed', 0.026), ('adaptive', 0.026), ('distributions', 0.026), ('rapid', 0.025), ('resample', 0.025), ('bootstrapping', 0.025), ('integrated', 0.025), ('density', 0.025), ('explore', 0.024), ('error', 0.024), ('invert', 0.023), ('outlined', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 161 nips-2013-Learning Stochastic Inverses
Author: Andreas Stuhlmüller, Jacob Taylor, Noah Goodman
Abstract: We describe a class of algorithms for amortized inference in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets. 1
2 0.14666294 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
3 0.12080061 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
Author: Jason Chang, John W. Fisher III
Abstract: We present an MCMC sampler for Dirichlet process mixture models that can be parallelized to achieve significant computational gains. We combine a nonergodic, restricted Gibbs iteration with split/merge proposals in a manner that produces an ergodic Markov chain. Each cluster is augmented with two subclusters to construct likely split moves. Unlike some previous parallel samplers, the proposed sampler enforces the correct stationary distribution of the Markov chain without the need for finite approximations. Empirical results illustrate that the new sampler exhibits better convergence properties than current methods. 1
4 0.11532406 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
Author: Tamir Hazan, Subhransu Maji, Tommi Jaakkola
Abstract: In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions. Specifically, we provide means for drawing either approximate or unbiased samples from Gibbs’ distributions by introducing low dimensional perturbations and solving the corresponding MAP assignments. Our approach also leads to new ways to derive lower bounds on partition functions. We demonstrate empirically that our method excels in the typical “high signal high coupling” regime. The setting results in ragged energy landscapes that are challenging for alternative approaches to sampling and/or lower bounds. 1
5 0.10557924 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
6 0.10099876 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
7 0.087499149 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
8 0.084191792 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
9 0.074582227 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling
10 0.071153797 143 nips-2013-Integrated Non-Factorized Variational Inference
11 0.070366167 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
12 0.069315344 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
13 0.068321556 160 nips-2013-Learning Stochastic Feedforward Neural Networks
14 0.068313882 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion
15 0.068038419 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
16 0.066130526 168 nips-2013-Learning to Pass Expectation Propagation Messages
17 0.062438503 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
18 0.060485203 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs
19 0.060109239 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
20 0.059761684 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
topicId topicWeight
[(0, 0.177), (1, 0.061), (2, -0.067), (3, -0.015), (4, 0.046), (5, 0.122), (6, 0.136), (7, 0.011), (8, 0.072), (9, -0.026), (10, 0.045), (11, 0.019), (12, 0.038), (13, 0.005), (14, 0.01), (15, 0.001), (16, -0.061), (17, -0.046), (18, -0.035), (19, 0.068), (20, 0.037), (21, 0.007), (22, -0.083), (23, 0.066), (24, 0.068), (25, 0.055), (26, -0.0), (27, -0.026), (28, 0.081), (29, 0.123), (30, 0.122), (31, -0.034), (32, -0.073), (33, -0.091), (34, 0.011), (35, -0.09), (36, 0.005), (37, -0.143), (38, 0.042), (39, 0.055), (40, -0.033), (41, 0.063), (42, -0.023), (43, 0.07), (44, 0.016), (45, -0.072), (46, -0.006), (47, -0.025), (48, -0.03), (49, -0.097)]
simIndex simValue paperId paperTitle
same-paper 1 0.91676784 161 nips-2013-Learning Stochastic Inverses
Author: Andreas Stuhlmüller, Jacob Taylor, Noah Goodman
Abstract: We describe a class of algorithms for amortized inference in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets. 1
2 0.75675213 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
3 0.64065695 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
Author: Tamir Hazan, Subhransu Maji, Tommi Jaakkola
Abstract: In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions. Specifically, we provide means for drawing either approximate or unbiased samples from Gibbs’ distributions by introducing low dimensional perturbations and solving the corresponding MAP assignments. Our approach also leads to new ways to derive lower bounds on partition functions. We demonstrate empirically that our method excels in the typical “high signal high coupling” regime. The setting results in ragged energy landscapes that are challenging for alternative approaches to sampling and/or lower bounds. 1
4 0.63962102 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
Author: Justin Domke, Xianghang Liu
Abstract: Inference in general Ising models is difficult, due to high treewidth making treebased algorithms intractable. Moreover, when interactions are strong, Gibbs sampling may take exponential time to converge to the stationary distribution. We present an algorithm to project Ising model parameters onto a parameter set that is guaranteed to be fast mixing, under several divergences. We find that Gibbs sampling using the projected parameters is more accurate than with the original parameters when interaction strengths are strong and when limited time is available for sampling.
5 0.60895151 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling
Author: Matthew Johnson, James Saunderson, Alan Willsky
Abstract: Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going “Hogwild” and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean. 1
6 0.57173842 47 nips-2013-Bayesian Hierarchical Community Discovery
7 0.56445408 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
8 0.56313181 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
9 0.52283382 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
10 0.51685256 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
11 0.51097125 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
12 0.50637966 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
13 0.50483614 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
14 0.50416297 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
15 0.49993533 160 nips-2013-Learning Stochastic Feedforward Neural Networks
16 0.48629418 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
17 0.47009912 13 nips-2013-A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks
18 0.46886817 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
19 0.46492022 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
20 0.44372624 36 nips-2013-Annealing between distributions by averaging moments
topicId topicWeight
[(0, 0.213), (2, 0.011), (16, 0.054), (33, 0.154), (34, 0.134), (41, 0.029), (49, 0.036), (56, 0.085), (70, 0.042), (85, 0.045), (89, 0.036), (93, 0.045), (95, 0.016), (99, 0.019)]
simIndex simValue paperId paperTitle
1 0.94637257 246 nips-2013-Perfect Associative Learning with Spike-Timing-Dependent Plasticity
Author: Christian Albers, Maren Westkott, Klaus Pawelzik
Abstract: Recent extensions of the Perceptron as the Tempotron and the Chronotron suggest that this theoretical concept is highly relevant for understanding networks of spiking neurons in the brain. It is not known, however, how the computational power of the Perceptron might be accomplished by the plasticity mechanisms of real synapses. Here we prove that spike-timing-dependent plasticity having an anti-Hebbian form for excitatory synapses as well as a spike-timing-dependent plasticity of Hebbian shape for inhibitory synapses are sufficient for realizing the original Perceptron Learning Rule if these respective plasticity mechanisms act in concert with the hyperpolarisation of the post-synaptic neurons. We also show that with these simple yet biologically realistic dynamics Tempotrons and Chronotrons are learned. The proposed mechanism enables incremental associative learning from a continuous stream of patterns and might therefore underly the acquisition of long term memories in cortex. Our results underline that learning processes in realistic networks of spiking neurons depend crucially on the interactions of synaptic plasticity mechanisms with the dynamics of participating neurons.
2 0.83174878 183 nips-2013-Mapping paradigm ontologies to and from the brain
Author: Yannick Schwartz, Bertrand Thirion, Gael Varoquaux
Abstract: Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies. 1
same-paper 3 0.81877756 161 nips-2013-Learning Stochastic Inverses
Author: Andreas Stuhlmüller, Jacob Taylor, Noah Goodman
Abstract: We describe a class of algorithms for amortized inference in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets. 1
4 0.73721927 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
5 0.73523152 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
Author: Cristina Savin, Peter Dayan, Mate Lengyel
Abstract: It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate. 1
6 0.73507255 201 nips-2013-Multi-Task Bayesian Optimization
7 0.73388386 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
8 0.73380589 350 nips-2013-Wavelets on Graphs via Deep Learning
9 0.73281717 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
10 0.73164439 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
11 0.72990763 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
12 0.72944373 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
13 0.7293281 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models
14 0.72882754 5 nips-2013-A Deep Architecture for Matching Short Texts
15 0.72797388 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
16 0.72746426 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
17 0.72743648 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
18 0.72736138 294 nips-2013-Similarity Component Analysis
19 0.72643417 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
20 0.72632343 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields