nips nips2013 nips2013-168 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
Reference: text
sentIndex sentText sentNum sentScore
1 However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. [sent-2, score-0.367]
2 , a neural network or random forest) to map EP message inputs to EP message outputs. [sent-5, score-0.822]
3 The cost of this flexibility is that inference routines have a more superficial understanding of the model structure, being unaware of symmetries and other idiosyncrasies of its components, which makes the already challenging inference task even harder. [sent-15, score-0.226]
4 For example, EP message update operations, which are used by Infer. [sent-22, score-0.367]
5 In this work, we aim to bridge the gap between the informed and the uninformed cases and achieve the best of both worlds by automatically implementing the computational operations required for the informed case from a specification such as would be given in the uninformed case. [sent-24, score-0.292]
6 We train high-capacity discriminative models that learn to map EP message inputs to EP message outputs for each message operation needed for EP inference. [sent-25, score-1.154]
7 Models may then be constructed from any combination of these learned modules and previously implemented modules. [sent-28, score-0.2]
8 1 Background and Notation Factor graphs, directed graphical models, and probabilistic programming As is common for message passing algorithms, we assume that models of interest are represented as factor graphs: the joint distribution over a set of random variables x = {x1 , . [sent-30, score-0.697]
9 Here xψj is used to mean the set of variables that factor ψj is defined over and whose index set we will denote by Scope(ψj ). [sent-37, score-0.219]
10 The set x may have a mix of discrete and continuous random variables and factors can operate over variables of mixed types. [sent-39, score-0.21]
11 Note that this formulation allows for conditioning on variables by attaching factors with no inputs to variables which constrain the variable to be equal to a particular value, but we suppress this detail for simplicity of presentation. [sent-41, score-0.263]
12 factors of the form ψj (xout(j) | xin(j) ) which directly specify the (conditional) distribution (or density) over xout(j) as a function of the vector of inputs xin(j) (here xψj is the set of variables {xout(j) } ∪ xin(j) ). [sent-44, score-0.234]
13 This can be related to the factor graph notation by introducing forward-sampling functions f1 , . [sent-50, score-0.19]
14 The key difference is that the factor graph specification usually assumes that an analytic expression will be given to define ψj (xout(j) | xin(j) ), while the forward-sampling formulation allows for fj to execute an arbitrary piece of computer code. [sent-56, score-0.298]
15 2 Expectation Propagation Expectation Propagation (EP) is a message passing algorithm that is a generalization of sum-product belief propagation. [sent-59, score-0.413]
16 The second aspect is the form of the message from a factor ψ to a variable i. [sent-65, score-0.557]
17 It is defined as follows: mψi (xi ) = proj ψ(xout | xin ) i ∈Scope(ψ) miψ (xi ). [sent-66, score-0.528]
18 The proj operator ensures that the message being passed is a distribution of type type(xi ) – it only has an effect if its argument is outside the approximating family used for the target message. [sent-68, score-0.539]
19 The goal is to allow a user to specify a factor to be used in EP solely via specifying a forward sampling procedure; that is, we assume that the user provides an executable stochastic function f (xin ), which, given xin returns a sample of xout . [sent-73, score-1.127]
20 The user further specifies the families of distributions with which to represent the messages associated with the variables of the factor (e. [sent-74, score-0.504]
21 Below we show how to learn fast EP message operators so that the new factor can be used alongside existing factors in a variety of models. [sent-77, score-0.709]
22 Computing Targets with Importance Sampling Our goal is to compute EP messages from the factor ψ that is associated with f , as if we had access to an analytic expression for ψ(xout | xin ). [sent-78, score-0.95]
23 The only way a factor interacts with the rest of the model is via the incoming and outgoing messages, so we can focus on this mapping and the resulting operator can be used in any model. [sent-79, score-0.343]
24 Given incoming messages {miψ (xi )}i∈Scope(ψ) , the simplest approach to computing mψi (xi ) is to use importance sampling. [sent-80, score-0.479]
25 To use this procedure for computing messages mψi (xi ), we use importance sampling with proposal Q m (x ) distribution r. [sent-82, score-0.456]
26 Generation of Training Algorithm 1 Generate training data Data For a given set 1: Input: ψ, i, specifying we are learning to send message m (x ). [sent-88, score-0.488]
27 i ψi of incoming messages 2: Input: Dm training distribution over messages {m (x )} i ψ i i ∈Scope(ψ) {miψ (xi )}i∈Scope(ψ) , 3: Input: q(xin ) importance sampling distribution we can produce a target 4: for n = 1 : N do outgoing message using 5: Draw mn (x0 ), . [sent-89, score-1.348]
28 To train 7: out in in in Q n nk i ∈Scope(ψ) mi ψ (xi ) a model to automatically 8: Compute importance weight wnk = . [sent-93, score-0.215]
29 nk ) q(x in compute these messages, 9: end for h P nk i we need many example δ(xi ) kw P Compute µn (xi ) = proj ˆ nk incoming and target out- 10: kw n n Add pair ( m0 (x0 ), . [sent-94, score-0.374]
30 incoming messages from some specified distribution, then computing the target outgoing message as above. [sent-100, score-0.833]
31 Learning Given the training data, we learn a neural network model that takes as input the sufficient statistics defining mn = mn ψ (xi ) and outputs sufficient statistics defining i i ∈Scope(ψ) 3 the approximation g(mn ) to target µn (xi ). [sent-101, score-0.19]
32 For each output message that the factor needs to send, ˆ we train a separate network. [sent-102, score-0.557]
33 Choice of Decomposition Structure So far, we have shown how to incorporate factors into a model when the definition of the factor is via the forward-sample procedure f rather than as an analytic expression ψ. [sent-105, score-0.395]
34 If the new collapsed factor is sufficiently structured in a statistical sense, then this may lead to improved accuracy. [sent-110, score-0.19]
35 A key property of our approach is that a factor may be learned once, then used in a variety of different models without need for re-training. [sent-116, score-0.316]
36 This approach employs a very similar importance sampling strategy but performs inference simply by sending the messages that we use as training data. [sent-118, score-0.54]
37 The advantage is that no function approximator needs to be chosen, and if enough samples are drawn for each message update, the accuracy should be good. [sent-119, score-0.398]
38 The downside is that generation and weighting of a sufficient number of samples can be very expensive, and it is usually not practical to generate enough samples every time a message needs to be sent. [sent-121, score-0.367]
39 Our formulation allows for a very large number of samples to be generated once as an up-front cost then, as long as the learning is done effectively, each message computation is much faster while still effectively drawing on a large number of samples. [sent-122, score-0.402]
40 Empirically we have found that using importance sampling but reducing the number of samples so as to make runtime computation times close to our method can lead to unreliable inference. [sent-124, score-0.217]
41 The primary question of interest is whether given f it is feasible to learn the mapping from EP message inputs to outputs in such a way that the learned factors can be used within nontrivial models. [sent-131, score-0.698]
42 This obviously depends on the specifics of f and the model in which the learned factor is used. [sent-132, score-0.316]
43 First, we wanted a simple factor to prove the concept and give an indication of the performance that we might expect. [sent-135, score-0.215]
44 For this, we chose the sigmoid factor, which deterministically computes 1 xout = f (xin ) = 1+exp(−xin ) . [sent-136, score-0.499]
45 For this factor, sensible choices for the messages to xout and xin are Beta and Gaussian distributions respectively. [sent-137, score-1.126]
46 For the first of these, we chose a compound Gamma factor, which is sampled by first drawing a random Gamma variable r2 with rate r1 and shape s1 , then drawing another random Gamma variable xout with rate r2 and shape s2 . [sent-139, score-0.618]
47 This defines xout = f (r1 , s1 , s2 ), which is a challenging factor because depending on the choice of inputs, this can produce a very heavy tailed distribution over xout . [sent-140, score-1.062]
48 Another challenging factor we experiment with is the product factor, which uses xout = f (xin,1 , xin,2 ) = xin,1 × xin,2 . [sent-141, score-0.676]
49 While this is a conceptually simple function, it is highly challenging to use within EP for several reasons, including symmetries due to signs, and the fact that message outputs can change very quickly as functions of message inputs (see Fig. [sent-142, score-0.848]
50 One main reason for the above factor choices is that there are existing hand-crafted implementations in Infer. [sent-144, score-0.215]
51 NET, which we can use to evaluate our learned message operators. [sent-145, score-0.493]
52 Finally, we developed a factor that models the throwing of a ball, which is representative of the type of factors that we believe our framework to be well-suited for, and which is not easily implemented with hand-crafted approximations. [sent-147, score-0.502]
53 For all factors, we use the extensible factor interface in Infer. [sent-148, score-0.19]
54 NET to create factors that compute messages by running a forward pass of the learned neural network. [sent-149, score-0.597]
55 We then studied these factors in a variety of models, using the default Infer. [sent-150, score-0.192]
56 First, we learned a factor using the methodology described in Section 3 and evaluated how well the network was able to reconstruct the training data. [sent-156, score-0.394]
57 1 we show histograms of KL errors for the network trained to send forward messages (Fig. [sent-158, score-0.407]
58 1a) and the network trained to send backwards messages (Fig. [sent-159, score-0.373]
59 We then used the learned factor within a Bayesian logistic regression model where the output nonlinearity is implemented using either the default Infer. [sent-163, score-0.39]
60 In the inset are message parameters (left: Gaussian mean and precision; right: Beta α and β) for the true (top line) and predicted (middle line) message, along with the KL (bottom line). [sent-175, score-0.391]
61 5 Compound Gamma Factor The compound Gamma factor is useful as a heavy-tailed prior over precisions of Gaussian random variables. [sent-176, score-0.406]
62 Accordingly, we evaluate performance in the context of models where the factor provides a prior precision for learning a Gaussian or mixture of Gaussians model. [sent-177, score-0.252]
63 For this factor, we fixed the value of the inputs xin , which is a standard way that the compound Gamma construction is used as a prior. [sent-179, score-0.604]
64 Here shapein and ratein denote the parameters of the message being sent from the precision variable to the compound Gamma factor. [sent-186, score-0.526]
65 Overlaid on these plots are the message values that were actually encountered when running the experiments in Fig. [sent-192, score-0.397]
66 First column: Log sum of importance weights arising from improved importance sampler (top) and naive sampler (bottom) as a function of the incoming context message. [sent-203, score-0.431]
67 Second column: Improved importance sampler estimate of outgoing message shape parameter (top) and rate parameter (bottom) as a function of the incoming context message. [sent-204, score-0.68]
68 Parameters of the variable-to-factor messages encountered when running the experiments in Fig. [sent-208, score-0.315]
69 NET (“default”) factor for 20 and 100 datapoints respectively and true precisions: λ1 = 0. [sent-211, score-0.19]
70 We compare to the same construction but using two hand-crafted Gamma factors to implement the compound Gamma prior. [sent-215, score-0.281]
71 8 in the supplementary material show the means of the learned precisions for two choices of compound Gamma parameters (top is (3, 3, 1), bottom is (1, 1, 1)). [sent-217, score-0.373]
72 Even though some messages were passed in regions with little representation under the importance sampling, the factor was still able to perform reliably. [sent-218, score-0.621]
73 We next evaluate performance of the compound Gamma factors when learning a mixture of Gaussians. [sent-219, score-0.313]
74 We generated data from a mixture of two Gaussians with fixed means but widely different variances, using the compound Gamma prior on the precisions of both Gaussians in the mixture. [sent-220, score-0.248]
75 We see that both factors sometimes under-estimate the true variance, but the learned factor is equally as reliable as the hand-crafted version. [sent-223, score-0.468]
76 We also observed in these experiments that the learned factor was an order of magnitude faster than the built-in factor (total runtime was 11s for the learned factor vs. [sent-224, score-0.854]
77 We compare the builtin factor’s result (green) to our learned factor (red) and an importance sampler that is given the same runtime budget as the learned model (black). [sent-350, score-0.634]
78 Inset gives KL between built-in factor and learned factor (red) and IS factor (black). [sent-353, score-0.696]
79 Product Factor The product factor is a surprisingly difficult factor to work with. [sent-354, score-0.413]
80 To illustrate some of the difficulty, we provide plots of output message parameters along slices in input message space (Fig. [sent-355, score-0.734]
81 NET product factor to using our learned product factor for the multiplication of η and the output of the inner products. [sent-363, score-0.572]
82 10 a 1 latent-dimensional 0 binary matrix-factorization 10 10 model, using our learned 0 0 product factor for the ! [sent-377, score-0.349]
83 10 µ µ senator index most challenging model we have considered yet, Figure 3: Message surfaces and failure case plot for the product factor because (a) EP is known to (computing z = xy). [sent-381, score-0.317]
84 Left: Mean of the factor to z message as be unreliable in matrix faca function of the mean-parameters of the incoming messages from x torization models [13], and and y. [sent-382, score-0.968]
85 Senators due to the loopiness of the are ordered according to ideal-points means inferred with factor [13] graph, which pushes the (SHG). [sent-387, score-0.218]
86 factor into more extreme ranges, which it might not have been trained as reliably for and/or where importance sampling estimates used for generating training data are unreliable. [sent-389, score-0.378]
87 1 We evaluated the model based on how well the learned factor version recovered the posteriors over senator latent factors that were found by the built-in product factor and the approximate product factor of [13]. [sent-391, score-1.036]
88 The result of this experiment was that midway through inference, the learned factor version produced posteriors with means that were consistent with the built-in factors, although the variances were slightly larger, and the means were noisier. [sent-392, score-0.427]
89 Investigating the cause of this result, we found that a large number of zero-precision messages were being sent, which happens when the projected distribution has larger variance than 1 Data obtained from http://www. [sent-395, score-0.285]
90 This leads to perhaps an obvious point, which is that our approach will fail when messages required by inference are significantly different from those that were in the training distribution. [sent-401, score-0.395]
91 This can happen via the choice of too extreme priors, too many observations driving precisions to be extreme, and due to complicated effects arising from the dynamics of message passing on loopy graphs. [sent-402, score-0.5]
92 We learn the factor as before person #4 person #1 person #2 person #5 person #7 0. [sent-409, score-0.475]
93 In the first model, 0 0 0 0 0 we have person-specific distri0 10 30 50 0 10 30 50 0 10 30 50 0 10 30 50 0 10 30 50 velocity butions over height (Gaussian), Figure 5: Throwing a ball factor experiments. [sent-420, score-0.307]
94 True distributions log slope (Gaussian) and the over individual throwing velocities (black) and predictive distrirate parameter (Gamma) of a bution based on the learned posterior over velocity rates. [sent-421, score-0.267]
95 We then use our learned factor to infer posteriors over the personspecific parameters. [sent-424, score-0.403]
96 6 Discussion We have shown that it is possible to learn to pass EP messages in several challenging cases. [sent-439, score-0.319]
97 Its success depends on (a) the ability of the function approximator to represent the required message updates (which may be highly discontinuous) and (b) the availability of reliable samples of these mappings (some factors may be very hard to invert). [sent-442, score-0.586]
98 A critical ingredient is here an appropriate choice of the distribution of training messages which, at the current stage, can require some manual tuning and experimentation. [sent-445, score-0.328]
99 This leads to an interesting extension, which would be to maintain an estimate of the quality of the approximation over the domain of the factor, and to re-train the factor on the fly when a message is encountered that lies in a low-confidence region. [sent-446, score-0.587]
100 For example, we may be able to ask the network to learn the best message updates subject to a constraint that guarantees convergence. [sent-448, score-0.438]
wordName wordTfidf (topN-words)
[('xin', 0.422), ('xout', 0.419), ('message', 0.367), ('messages', 0.285), ('ep', 0.205), ('factor', 0.19), ('factors', 0.152), ('gamma', 0.133), ('compound', 0.129), ('learned', 0.126), ('importance', 0.108), ('proj', 0.106), ('scope', 0.093), ('uninformed', 0.088), ('precisions', 0.087), ('posteriors', 0.087), ('incoming', 0.086), ('sigmoid', 0.08), ('mi', 0.075), ('xnk', 0.07), ('throwing', 0.07), ('inference', 0.067), ('outgoing', 0.067), ('informed', 0.058), ('xi', 0.058), ('person', 0.057), ('fj', 0.055), ('propagation', 0.054), ('analytic', 0.053), ('send', 0.053), ('inputs', 0.053), ('kl', 0.052), ('sampler', 0.052), ('passing', 0.046), ('velocity', 0.045), ('ball', 0.045), ('stan', 0.044), ('minka', 0.044), ('training', 0.043), ('mn', 0.042), ('modules', 0.04), ('unreliable', 0.04), ('default', 0.04), ('shg', 0.04), ('numerator', 0.038), ('passed', 0.038), ('sampling', 0.037), ('gaussians', 0.037), ('directed', 0.036), ('updates', 0.036), ('drawing', 0.035), ('senator', 0.035), ('senators', 0.035), ('bugs', 0.035), ('network', 0.035), ('challenging', 0.034), ('beta', 0.034), ('forward', 0.034), ('implemented', 0.034), ('product', 0.033), ('wingate', 0.032), ('runtime', 0.032), ('nk', 0.032), ('mixture', 0.032), ('bottom', 0.031), ('routines', 0.031), ('church', 0.031), ('approximator', 0.031), ('herbrich', 0.031), ('microsoft', 0.03), ('precision', 0.03), ('xd', 0.03), ('encountered', 0.03), ('representative', 0.03), ('probabilistic', 0.029), ('kw', 0.029), ('variables', 0.029), ('inferred', 0.028), ('target', 0.028), ('nn', 0.028), ('height', 0.027), ('exempli', 0.027), ('symmetries', 0.027), ('believe', 0.026), ('expressions', 0.026), ('proposal', 0.026), ('goodman', 0.026), ('posterior', 0.026), ('exibility', 0.025), ('plot', 0.025), ('wanted', 0.025), ('winn', 0.025), ('naive', 0.025), ('dx', 0.025), ('specifying', 0.025), ('implementations', 0.025), ('nh', 0.024), ('inset', 0.024), ('accurate', 0.024), ('variances', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
Author: Yuchen Zhang, John Duchi, Michael Jordan, Martin J. Wainwright
Abstract: We establish lower bounds on minimax risks for distributed statistical estimation under a communication budget. Such lower bounds reveal the minimum amount of communication required by any procedure to achieve the centralized minimax-optimal rates for statistical estimation. We study two classes of protocols: one in which machines send messages independently, and a second allowing for interactive communication. We establish lower bounds for several problems, including various types of location models, as well as for parameter estimation in regression models. 1
3 0.14357582 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
Author: Nils E. Napp, Ryan P. Adams
Abstract: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
4 0.12452736 318 nips-2013-Structured Learning via Logistic Regression
Author: Justin Domke
Abstract: A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an “oracle” exists to minimize a logistic loss.
5 0.097501568 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
Author: Daniel S. Levine, Jonathan P. How
Abstract: We consider the sensor selection problem on multivariate Gaussian distributions where only a subset of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for any distribution with nuisances. 1
6 0.07805638 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
7 0.070989013 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
8 0.06974455 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
9 0.069446206 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
10 0.066130526 161 nips-2013-Learning Stochastic Inverses
11 0.065545149 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
12 0.06520009 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models
13 0.06514775 160 nips-2013-Learning Stochastic Feedforward Neural Networks
14 0.063205533 149 nips-2013-Latent Structured Active Learning
15 0.062660605 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
16 0.062374469 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
17 0.06141502 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
18 0.057310458 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
19 0.055120215 317 nips-2013-Streaming Variational Bayes
20 0.053723566 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
topicId topicWeight
[(0, 0.176), (1, 0.052), (2, -0.046), (3, -0.019), (4, 0.035), (5, 0.082), (6, 0.057), (7, -0.023), (8, 0.047), (9, -0.02), (10, 0.049), (11, 0.02), (12, -0.003), (13, -0.011), (14, -0.042), (15, 0.046), (16, -0.045), (17, -0.034), (18, -0.02), (19, 0.021), (20, -0.096), (21, -0.005), (22, 0.01), (23, -0.002), (24, -0.009), (25, 0.061), (26, -0.01), (27, 0.023), (28, 0.107), (29, -0.047), (30, 0.004), (31, 0.046), (32, -0.02), (33, 0.127), (34, 0.151), (35, -0.033), (36, -0.094), (37, 0.021), (38, 0.051), (39, -0.031), (40, -0.07), (41, -0.046), (42, -0.087), (43, -0.057), (44, 0.06), (45, 0.03), (46, 0.047), (47, -0.091), (48, 0.054), (49, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.93454617 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
2 0.84395319 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
Author: Nils E. Napp, Ryan P. Adams
Abstract: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale. Just as in macroscale robotics, it is critical that such devices can learn about their environment and reason under uncertainty. At this small scale, systems are typically modeled as chemical reaction networks. In this work, we develop a procedure that can take arbitrary probabilistic graphical models, represented as factor graphs over discrete random variables, and compile them into chemical reaction networks that implement inference. In particular, we show that marginalization based on sum-product message passing can be implemented in terms of reactions between chemical species whose concentrations represent probabilities. We show algebraically that the steady state concentration of these species correspond to the marginal distributions of the random variables in the graph and validate the results in simulations. As with standard sum-product inference, this procedure yields exact results for tree-structured graphs, and approximate solutions for loopy graphs.
3 0.6377455 318 nips-2013-Structured Learning via Logistic Regression
Author: Justin Domke
Abstract: A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is “smoothed” through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an “oracle” exists to minimize a logistic loss.
4 0.62765199 184 nips-2013-Marginals-to-Models Reducibility
Author: Tim Roughgarden, Michael Kearns
Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efficient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1
Author: Arash Amini, Xuanlong Nguyen
Abstract: We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances. A convergence theory for iterated random functions is presented. As an application of the general theory we analyze convergence behaviors of exact and approximate message-passing algorithms that arise in a sequential change point detection problem formulated via a latent variable directed graphical model. The sequential inference algorithm and its supporting theory are illustrated by simulated examples.
7 0.56560546 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
8 0.56083012 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing
9 0.54652369 160 nips-2013-Learning Stochastic Feedforward Neural Networks
10 0.54484701 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
11 0.52561843 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
12 0.52377737 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
13 0.51903123 340 nips-2013-Understanding variable importances in forests of randomized trees
14 0.51131177 110 nips-2013-Estimating the Unseen: Improved Estimators for Entropy and other Properties
15 0.50917846 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
16 0.50514615 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
17 0.49835175 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
18 0.49823791 332 nips-2013-Tracking Time-varying Graphical Structure
19 0.49495342 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
20 0.49421141 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
topicId topicWeight
[(16, 0.032), (33, 0.141), (34, 0.12), (41, 0.036), (49, 0.03), (56, 0.075), (70, 0.031), (85, 0.39), (89, 0.015), (93, 0.043), (95, 0.015)]
simIndex simValue paperId paperTitle
1 0.9303174 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
Author: Tai Qin, Karl Rohe
Abstract: Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the “star shape” in the eigenvectors–a common feature of empirical networks–can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models. 1
2 0.90516746 7 nips-2013-A Gang of Bandits
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1
same-paper 3 0.88231212 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
4 0.83322603 332 nips-2013-Tracking Time-varying Graphical Structure
Author: Erich Kummerfeld, David Danks
Abstract: Structure learning algorithms for graphical models have focused almost exclusively on stable environments in which the underlying generative process does not change; that is, they assume that the generating model is globally stationary. In real-world environments, however, such changes often occur without warning or signal. Real-world data often come from generating models that are only locally stationary. In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. We show by simulation that the algorithm performs comparably to batch-mode learning when the generating graphical structure is globally stationary, and significantly better when it is only locally stationary. 1
5 0.81052059 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Author: Bruno Scherrer
Abstract: Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal “-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal Ï advantage. We show that Howard’s PI terminates 1 2Ì 1 1 22 1 1 nm 1 after at most n(m ≠ 1) 1≠“ log 1≠“ = O 1≠“ log 1≠“ iterations, improving by a factor O(log 1 a result by [3], while Simplex-PI terminates n) 1 22 1 2 1 22 2 1 1 2 after at most n (m ≠ 1) 1 + 1≠“ log 1≠“ = O n m log 1≠“ 1≠“ iterations, improving by a factor O(log n) a result by [11]. Under some structural assumptions of the MDP, we then consider bounds that are independent of the discount factor “: given a measure of the maximal transient time ·t and the maximal time ·r to revisit states in recurrent classes under all policies, we show that Simplex-PI terminates after at most n2 (m≠ # $ 1) (Á·r log(n·r )Ë + Á·r log(n·t )Ë) (m ≠ 1)Án·t log(n·t )Ë + Án·t log(n2 ·t )Ë = !
6 0.73179251 196 nips-2013-Modeling Overlapping Communities with Node Popularities
7 0.70643592 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
8 0.70520639 89 nips-2013-Dimension-Free Exponentiated Gradient
9 0.69530541 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
10 0.68680161 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
11 0.68061632 232 nips-2013-Online PCA for Contaminated Data
12 0.6634258 289 nips-2013-Scalable kernels for graphs with continuous attributes
13 0.63315439 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.63184851 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
15 0.62937385 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
16 0.62881362 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
17 0.62070757 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
18 0.61983079 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
19 0.61310667 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
20 0.610255 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic