nips nips2008 nips2008-129 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ydo Wexler, Christopher Meek
Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated. 1
Reference: text
sentIndex sentText sentNum sentScore
1 MAS: a multiplicative approximation scheme for probabilistic inference Christopher Meek Microsoft Research Redmond, WA 98052 meek@microsoft. [sent-1, score-0.533]
2 com Abstract We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. [sent-3, score-0.677]
3 The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . [sent-4, score-0.615]
4 MAS translates these local approximations into bounds on the accuracy of the results. [sent-5, score-0.153]
5 Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. [sent-7, score-0.351]
6 1 Introduction Probabilistic graphical models gained popularity in the recent decades due to their intuitive representation and because they enable the user to query about the value distribution of variables of interest [19]. [sent-9, score-0.152]
7 Although very appealing, these models suffer from the problem that performing inference in the model (e. [sent-10, score-0.189]
8 As a result, a variety of approximate inference methods have been developed. [sent-13, score-0.152]
9 Among these methods are loopy message propagation algorithms [24], variational methods [16, 12], mini buckets [10], edge deletion [8], and a variety of Monte Carlo sampling techniques [13, 19, 21, 4, 25]. [sent-14, score-0.204]
10 Approximation algorithms that have useful error bounds and speedup while maintaining high accuracy, include the work of Dechter and colleagues [2, 3, 10, 17], which provide both upper and lower bounds on probabilities, upper bounds suggested by Wainwright et. [sent-15, score-0.397]
11 The approximation is based on a local operation called an -decomposition, that decomposes functions used in the inference procedure into functions over smaller subsets of variables, with a guarantee on the error introduced. [sent-19, score-0.604]
12 The main difference from existing approximations is the ability to translate the error introduced in the local decompositions performed during execution of the algorithm into bounds on the accuracy of the entire inference procedure. [sent-20, score-0.522]
13 We note that this approximation can be also applied to the more general class of multiplicative models introduced in [27]. [sent-21, score-0.271]
14 As an example we show how to apply MAS to the Variable Elimination (VE) algorithm [9, 20], and present an algorithm called DynaDecomp, which dynamically decomposes functions in the VE algorithm. [sent-25, score-0.177]
15 2 Multiplicative Approximation Scheme (MAS) We propose an approximation scheme, called the Multiplicative Approximation Scheme (MAS) for inference problems in graphical models. [sent-28, score-0.258]
16 The basic operations of the scheme are local approximations called -decompositions that decouple the dependency of variables. [sent-29, score-0.134]
17 Every such local decomposition has an associated error that our scheme combines into an error bound on the result. [sent-30, score-0.287]
18 Throughout the paper we denote variables and sets of variables with capital letters and denote a value assigned to them with lowercase letters. [sent-35, score-0.142]
19 In addition to approximating functions ψ by which the original model is defined, we also may wish to approximate other functions such as intermediate functions created in the course of an inference algorithm. [sent-40, score-0.598]
20 We can write the result of marginalizing out a set of hidden variables as a factor of functions fi . [sent-41, score-0.258]
21 The log of the probability distribution the model encodes after such marginalization can then be written as log P (A, E) = log fi (Ui ) = φi (Ui ) (1) i i where A ⊆ H. [sent-42, score-0.382]
22 When A = H we can choose sets Ui = Di and functions fi (Ui ) = ψi (Di ). [sent-43, score-0.187]
23 Definition 1 ( -decomposition) Given a set of variables W , and a function φ(W ) that assigns real ˜ values to every instantiation W = w, a set of m functions φl (Wl ), l = 1 . [sent-44, score-0.178]
24 m, where Wl ⊆ W is an -decomposition if l Wl = W , and 1 1+ ≤ ˜ φl (wl ) ≤1+ φ(w) l (2) for some ≥ 0, where wl is the projection of w on Wl . [sent-47, score-0.217]
25 We also note that when approximating models in which some assignments have zero probability, the theoretical error bounds can be arbitrarily bad, yet, in practice the approximation can sometimes yield good results. [sent-53, score-0.353]
26 1, then the log of the joint probability P (a, e) can be approximated within a multiplicative factor of 1 + max using a set of i decompositions, where max = maxi { i }. [sent-56, score-0.426]
27 Proof: Recall that j (cj ) r r ≤ j cj for any set of numbers cj ≥ 0 and r ≥ 1. [sent-58, score-0.21]
28 r j (cj ) r ≥ j cj for any set Note that whenever E = ∅, Theorem 1 claims that the log of all marginal probabilities can be approximated within a multiplicative factor of 1 + max . [sent-60, score-0.548]
29 In addition, for any E ⊆ X by setting A = A the log-likelihood log P (e) can be approximated with the same factor. [sent-61, score-0.154]
30 1, for a set H ⊆ H we can write log ⊕h P (h, e) = log fi (Ui ) = φi (Ui ) (3) i i Theorem 2 Given a set A ⊆ H, the log of the MAP probability log maxa H\A=h− P (h, e) can be approximated within a multiplicative factor of 1 + max using a set of i -decompositions. [sent-66, score-0.759]
31 Proof: The proof follows that of Theorem 1 with the addition of the fact that maxj (cj )r = r (maxj cj ) for any set of real numbers cj ≥ 0 and r ≥ 0. [sent-67, score-0.284]
32 An immediate conclusion from Theorem 2 is that the MPE probability can also be approximated with the same error bounds, by choosing A = H. [sent-68, score-0.136]
33 1 Compounded Approximation The results on using -decompositions assume that we decompose functions fi as in Eqs. [sent-70, score-0.259]
34 Here we consider decompositions of any function created during the inference procedure, and in particular compounded decompositions of functions that were already decomposed. [sent-72, score-0.588]
35 Suppose that a ˜ function φ(W ), that already incurs an error 1 compared to a function φ(W ), can be decomposed ˆ with an error 2 . [sent-73, score-0.188]
36 error of l φ To understand what is the guaranteed error for an entire inference procedure consider a directed graph where the nodes represent functions of the inference procedure, and each node v has an associated error rv . [sent-76, score-0.737]
37 The nodes representing the initial potential functions of the model ψi have no parents in the model and are associated with zero error (rv = 1). [sent-77, score-0.179]
38 An -decomposition on the other hand has a single source node s with an associated error rs , representing the decomposed function, and several target nodes T , with an error rt = (1 + )rs for every t ∈ T . [sent-79, score-0.228]
39 The guaranteed error for the entire inference procedure is then the error associated with the sink function in the graph. [sent-80, score-0.372]
40 In Figure 1 we illustrate such a graph for an inference procedure that starts with four functions (fa , fb , fc and fd ) and decomposes three functions, fa , fg and fj , with errors 1 , 2 and 3 respectively. [sent-81, score-0.574]
41 2 -decomposition Optimization -decompositions can be utilized in inference algorithms to reduce the computational cost by parsimoniously approximating factors that occur during the course of computation. [sent-84, score-0.213]
42 Here we consider the ˜ problem of optimizing the approximating functions φi given a selected factorization Wi . [sent-88, score-0.168]
43 Given a function f (W ) = eφ(W ) and the sets Wi , the goal is to optimize the functions φi (Wi ) in order to minimize the error f introduced in the decomposition. [sent-89, score-0.179]
44 On the other hand, many times functions defined over a small domain can not be decomposed without introducing a large error. [sent-102, score-0.151]
45 4 to choose the functions φi we may increase the worst case penalty error but not necessarily the actual error achieved by the approximation. [sent-106, score-0.251]
46 Hence we are left with the task of minimizing: Figure 1: A schematic description of an inference procedure along with the associated error. [sent-115, score-0.186]
47 The procedure starts with four functions (fa , fb , fc and fd ) and decomposes three functions, fa , fg and fj , with errors 1 , 2 and 3 respectively. [sent-116, score-0.422]
48 Figure 2: An irreducible minor graph of a 4 × 4 Ising model that can be obtained via VE without creating functions of more than 3 variables. [sent-118, score-0.152]
49 Applying MAS, only one function over three variables needs to be decomposed into two functions over overlapping sets of variables in order to complete inference using only functions over three or less variables. [sent-119, score-0.602]
50 ,φm ) − φ(w) (7) i w∈W We use the notation w ≈ wk to denote an instantiation W = w that is consistent with the instan˜ tiation Wk = wk . [sent-123, score-0.196]
51 ,φm ) w∈W i ˜ φi (wi ) φ(w) i (9) Although no closed form solution is known for this minimization problem, iterative algorithms were ˜ devised for variational approximation, which start with arbitrary functions φi (Wi ) and converge to a local minimum [16, 12]. [sent-136, score-0.22]
52 3 Applying MAS to Inference Algorithms Our multiplicative approximation scheme offers a way to reduce the computational cost of inference by decoupling variables via -decompositions. [sent-139, score-0.615]
53 The fact that many existing inference algorithms compute and utilize multiplicative factors during the course of computation means that the scheme can be applied widely. [sent-140, score-0.419]
54 The approach does require a mechanism to select functions to decompose, however, the flexibility of the scheme allows a variety of alternative mechanisms. [sent-141, score-0.202]
55 Below we consider the application of our approximation scheme to variable elimination with yet another selection strategy. [sent-144, score-0.31]
56 The ideal application of our scheme is likely to depend both on the specific inference algorithm and the application of interest. [sent-146, score-0.247]
57 1 Dynamic Decompositions One family of decomposition strategies which are of particular interest, are those which allow for dynamic decompositions during the inference procedure. [sent-148, score-0.381]
58 In this dynamic framework, MAS can be incorporated into known exact inference algorithms for graphical models, provided that local functions can be bounded according to Eq. [sent-149, score-0.339]
59 A dynamic decomposition strategy applies -decompositions to functions in which the original model is defined and to intermediate functions created in the course of the inference algorithm, according to Eq. [sent-151, score-0.482]
60 Unlike other approximation methods, such as the variational approach [16] or the edge deletion approach [8], dynamic decompositions has the capability of decoupling two variables in some contexts while maintaining their dependence in others. [sent-154, score-0.581]
61 If we wish to restrict ourselves to functions over three or less variables when performing inference on a 4 × 4 Ising model, the model in Figure 2 is an inevitable minor, and from this point of the elimination, approximation is mandatory. [sent-155, score-0.392]
62 In the variational framework, an edge in the graph should be removed, disconnecting the direct dependence between two or more variables (e. [sent-156, score-0.228]
63 removing the edge A-C would result in breaking the set ABC into the sets AB and BC and breaking the set ACD into AD and CD). [sent-158, score-0.126]
64 Dynamic decompositions allow for a more refined decoupling, where the dependence is removed only in some of the functions. [sent-160, score-0.145]
65 In our example breaking the set ABC into AB and BC while keeping the set ACD intact is possible and is also sufficient for reducing the complexity of inference to functions of no more than three variables (the elimination order would be: A,B,F,H,C,E,D,G). [sent-161, score-0.524]
66 2, then we are guaranteed not to exceed this error for the entire approximate inference procedure. [sent-163, score-0.266]
67 It is possible to decompose the function over the set ABC into two functions over the sets AB and BC with an arbitrarily small error, while the same is not possible for the function over the set ACD. [sent-165, score-0.215]
68 Hence, in this example the result of our method will be nearly equal to the solution of exact inference on the model, and the theoretical error bounds will be arbitrarily small, while other approaches, such as the variational method, can yield arbitrarily bad approximations. [sent-166, score-0.494]
69 In this algorithm variables V ∈ H are summed out iteratively after multiplying all existing functions that include V , yielding intermediate functions f (W ⊆ X) where V ∈ W . [sent-168, score-0.317]
70 MAS can be incorporated into the VE algorithm by identifying / decompositions for some of the intermediate functions f . [sent-169, score-0.284]
71 This results in the elimination of f from ˜ ˜ the pool of functions and adding instead the functions fi (Wi ) = eφi (Wi ) . [sent-170, score-0.447]
72 Throughout the algorithm the maximal error max introduced by the decompositions Algorithm 1: DynaDecomp Table 1: Accuracy and speedup for grid-like models. [sent-173, score-0.337]
73 Upper panel: attractive Ising models; Middle panel: repulsive Ising models; Lower panel: Bayesian network grids with random probabilities. [sent-174, score-0.169]
74 , Xn } and functions ψi (Di ⊆ X), that encodes P (X) = i ψi (Di ); A set E = X \ H of observed variables and their assignment E = e; An elimination order R over the variables in H; scalars M and η. [sent-270, score-0.434]
75 The approximating functions in this algorithm are strictly disjoint, of size no more than M , and with the variables assigned randomly to the functions. [sent-276, score-0.239]
76 There we use the notation ⊗(T ) to denote multiplication of the functions f ∈ T , and (f ) to denote decomposition of function f . [sent-278, score-0.155]
77 The outcome of (f ) is a ˜ ˜ ˜ pair ( , F ) where the functions fi ∈ F are over a disjoint set of variables. [sent-279, score-0.236]
78 We note that MAS can also be used on top of other common algorithms for exact inference in probabilistic models which are widely used, thus gaining similar benefits as those algorithms. [sent-280, score-0.241]
79 For example, applying MAS to the junction tree algorithm [14] a decomposition can decouple variables in messages sent from one node in the junction tree to another, and approximate all marginal distributions of single variables in the model in a single run, with similar guarantees on the error. [sent-281, score-0.331]
80 4 Results We demonstrate the power of MAS by reporting the accuracy and theoretical bounds for our DynaDecomp algorithm for a variety of models. [sent-283, score-0.153]
81 The quality of approximation is measured in terms of accuracy and speedup. [sent-287, score-0.13]
82 The accuracy is ˜ ˜ reported as max{ log L , log L } − 1 where L is the likelihood and L is the approximate likelihood ˜ log L log L achieved by DynaDecomp. [sent-288, score-0.496]
83 We also report the theoretical accuracy which is the maximum error introduced by decomposition operations. [sent-289, score-0.188]
84 The speedup is reported as a ratio of run-times for obtaining the approximated and exact solutions, in addition to the absolute time of approximation. [sent-290, score-0.166]
85 The parameters i and m, which are the maximal number of variables and functions in a mini-bucket, were initially set to 3 and 1 respectively. [sent-296, score-0.178]
86 The first is an Ising model with random attractive or repulsive pair-wise potentials, as was used in [28]. [sent-300, score-0.136]
87 When computing likelihood in these models we randomly assigned values to 10% of the variables in the model. [sent-301, score-0.142]
88 As a side note, the MB algorithm performed significantly better on attractive Ising models than on repulsive ones. [sent-320, score-0.173]
89 i,j xij −5 We applied our method to probabilistic phylogenetic models. [sent-329, score-0.194]
90 Previous works [15, 26] have obtained upper and lower bounds on the likelihood of evidence in the models suggested in [22] using variational methods, reporting an error of 1%. [sent-331, score-0.341]
91 01% error on average within a few seconds, which improves over previous results by two orders of magnitude both in terms of accuracy and speedup. [sent-333, score-0.14]
92 In addition, we applied DynaDecomp to 24 models from the UAI’06 evaluation of probabilistic inference repository [1] with η = 1%. [sent-334, score-0.241]
93 Only models that did not have zeros and that our exact inference algorithm could solve in less than an hour were used. [sent-335, score-0.189]
94 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] Evaluation of probabilistic inference systems: http://tinyurl. [sent-345, score-0.204]
95 A variational approach for approximating Bayesian networks by edge deletion. [sent-357, score-0.218]
96 The computational complexity of probabilistic inference using Bayesian belief networks. [sent-360, score-0.255]
97 Approximating probabilistic inference in Bayesian belief networks is NP-hard. [sent-363, score-0.255]
98 A variational inference procedure allowing internal structure for overlapping clusters and deterministic constraints. [sent-382, score-0.349]
99 Simulation approaches to general probabilistic inference on belief networks. [sent-411, score-0.255]
100 A new class of upper bounds on the log partition function. [sent-417, score-0.175]
wordName wordTfidf (topN-words)
[('dynadecomp', 0.433), ('mas', 0.409), ('wl', 0.217), ('wi', 0.189), ('multiplicative', 0.172), ('elimination', 0.153), ('inference', 0.152), ('ising', 0.146), ('decompositions', 0.145), ('abc', 0.144), ('mpe', 0.144), ('variational', 0.113), ('functions', 0.107), ('cj', 0.105), ('mb', 0.101), ('wk', 0.098), ('repulsive', 0.096), ('uil', 0.096), ('wexler', 0.096), ('scheme', 0.095), ('ui', 0.092), ('log', 0.09), ('bounds', 0.085), ('fi', 0.08), ('acd', 0.072), ('dechter', 0.072), ('gmf', 0.072), ('phylogenetic', 0.072), ('meek', 0.072), ('error', 0.072), ('decompose', 0.072), ('uai', 0.071), ('variables', 0.071), ('decomposes', 0.07), ('speedup', 0.07), ('xij', 0.07), ('accuracy', 0.068), ('approximated', 0.064), ('decoupling', 0.063), ('approximation', 0.062), ('approximating', 0.061), ('fa', 0.058), ('sw', 0.058), ('geiger', 0.058), ('il', 0.056), ('probabilistic', 0.052), ('belief', 0.051), ('overlapping', 0.05), ('max', 0.05), ('disjoint', 0.049), ('gbp', 0.048), ('jojic', 0.048), ('siepel', 0.048), ('decomposition', 0.048), ('ab', 0.047), ('deletion', 0.047), ('minor', 0.045), ('di', 0.044), ('seconds', 0.044), ('decomposed', 0.044), ('graphical', 0.044), ('edge', 0.044), ('dj', 0.043), ('redmond', 0.042), ('bidyuk', 0.042), ('alse', 0.042), ('fg', 0.042), ('maxj', 0.042), ('guaranteed', 0.042), ('breaking', 0.041), ('attractive', 0.04), ('rs', 0.04), ('bc', 0.04), ('compounded', 0.039), ('decouple', 0.039), ('aposteriori', 0.039), ('fc', 0.039), ('jair', 0.039), ('fb', 0.039), ('shachter', 0.039), ('recomb', 0.039), ('models', 0.037), ('arbitrarily', 0.036), ('ve', 0.036), ('dd', 0.036), ('dynamic', 0.036), ('marginal', 0.034), ('procedure', 0.034), ('junction', 0.034), ('rv', 0.034), ('likelihood', 0.034), ('probabilities', 0.033), ('fd', 0.033), ('anytime', 0.033), ('maxa', 0.033), ('grids', 0.033), ('intermediate', 0.032), ('bayesian', 0.032), ('addition', 0.032), ('encodes', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
Author: Ydo Wexler, Christopher Meek
Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated. 1
2 0.10857394 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass
Abstract: Uncertainty is omnipresent when we perceive or interact with our environment, and the Bayesian framework provides computational methods for dealing with it. Mathematical models for Bayesian decision making typically require datastructures that are hard to implement in neural networks. This article shows that even the simplest and experimentally best supported type of synaptic plasticity, Hebbian learning, in combination with a sparse, redundant neural code, can in principle learn to infer optimal Bayesian decisions. We present a concrete Hebbian learning rule operating on log-probability ratios. Modulated by reward-signals, this Hebbian plasticity rule also provides a new perspective for understanding how Bayesian inference could support fast reinforcement learning in the brain. In particular we show that recent experimental results by Yang and Shadlen [1] on reinforcement learning of probabilistic inference in primates can be modeled in this way. 1
3 0.10469931 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
4 0.096287057 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
Author: Laurent E. Ghaoui, Assane Gueye
Abstract: We consider the problem of bounding from above the log-partition function corresponding to second-order Ising models for binary distributions. We introduce a new bound, the cardinality bound, which can be computed via convex optimization. The corresponding error on the logpartition function is bounded above by twice the distance, in model parameter space, to a class of “standard” Ising models, for which variable inter-dependence is described via a simple mean field term. In the context of maximum-likelihood, using the new bound instead of the exact log-partition function, while constraining the distance to the class of standard Ising models, leads not only to a good approximation to the log-partition function, but also to a model that is parsimonious, and easily interpretable. We compare our bound with the log-determinant bound introduced by Wainwright and Jordan (2006), and show that when the l1 -norm of the model parameter vector is small enough, the latter is outperformed by the new bound. 1 1.1
5 0.093160816 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
6 0.086039566 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
7 0.078071423 214 nips-2008-Sparse Online Learning via Truncated Gradient
8 0.074728668 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
9 0.071025804 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
10 0.068495989 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
11 0.066421606 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
12 0.065658882 127 nips-2008-Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction
13 0.057494301 73 nips-2008-Estimating Robust Query Models with Convex Optimization
14 0.056756601 118 nips-2008-Learning Transformational Invariants from Natural Movies
15 0.056602333 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
16 0.056567572 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
17 0.056358609 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
18 0.056283526 62 nips-2008-Differentiable Sparse Coding
19 0.056164812 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
20 0.055173226 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
topicId topicWeight
[(0, -0.201), (1, 0.009), (2, -0.032), (3, 0.036), (4, 0.051), (5, -0.076), (6, 0.03), (7, 0.051), (8, -0.039), (9, -0.055), (10, -0.026), (11, 0.098), (12, 0.137), (13, -0.012), (14, -0.019), (15, -0.033), (16, 0.011), (17, -0.038), (18, 0.021), (19, -0.03), (20, 0.024), (21, 0.002), (22, -0.091), (23, -0.114), (24, 0.051), (25, 0.09), (26, 0.072), (27, -0.014), (28, -0.066), (29, -0.044), (30, -0.16), (31, 0.023), (32, -0.011), (33, -0.037), (34, -0.023), (35, -0.001), (36, 0.09), (37, 0.025), (38, 0.066), (39, -0.092), (40, 0.056), (41, -0.092), (42, -0.08), (43, 0.046), (44, 0.016), (45, 0.013), (46, -0.037), (47, 0.155), (48, -0.055), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.94429016 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
Author: Ydo Wexler, Christopher Meek
Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated. 1
2 0.65951753 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
3 0.65633011 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
Author: Laurent E. Ghaoui, Assane Gueye
Abstract: We consider the problem of bounding from above the log-partition function corresponding to second-order Ising models for binary distributions. We introduce a new bound, the cardinality bound, which can be computed via convex optimization. The corresponding error on the logpartition function is bounded above by twice the distance, in model parameter space, to a class of “standard” Ising models, for which variable inter-dependence is described via a simple mean field term. In the context of maximum-likelihood, using the new bound instead of the exact log-partition function, while constraining the distance to the class of standard Ising models, leads not only to a good approximation to the log-partition function, but also to a model that is parsimonious, and easily interpretable. We compare our bound with the log-determinant bound introduced by Wainwright and Jordan (2006), and show that when the l1 -norm of the model parameter vector is small enough, the latter is outperformed by the new bound. 1 1.1
4 0.6203981 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
5 0.61844128 105 nips-2008-Improving on Expectation Propagation
Author: Manfred Opper, Ulrich Paquet, Ole Winther
Abstract: A series of corrections is developed for the fixed points of Expectation Propagation (EP), which is one of the most popular methods for approximate probabilistic inference. These corrections can lead to improvements of the inference approximation or serve as a sanity check, indicating when EP yields unrealiable results.
6 0.55297661 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
7 0.54247266 69 nips-2008-Efficient Exact Inference in Planar Ising Models
8 0.53682256 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
9 0.50684112 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
10 0.50489211 233 nips-2008-The Gaussian Process Density Sampler
11 0.50164443 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor
12 0.47744784 98 nips-2008-Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
13 0.47340101 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
14 0.47181818 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
15 0.46877387 89 nips-2008-Gates
16 0.46619245 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
17 0.45386255 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
18 0.43990967 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
19 0.43633953 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
20 0.42961797 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
topicId topicWeight
[(6, 0.062), (7, 0.089), (12, 0.04), (28, 0.183), (57, 0.092), (59, 0.016), (63, 0.033), (71, 0.028), (77, 0.047), (83, 0.032), (91, 0.277)]
simIndex simValue paperId paperTitle
1 0.85353547 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance
Author: Jeremy Hill, Jason Farquhar, Suzanna Martens, Felix Biessmann, Bernhard Schölkopf
Abstract: From an information-theoretic perspective, a noisy transmission system such as a visual Brain-Computer Interface (BCI) speller could benefit from the use of errorcorrecting codes. However, optimizing the code solely according to the maximal minimum-Hamming-distance criterion tends to lead to an overall increase in target frequency of target stimuli, and hence a significantly reduced average target-to-target interval (TTI), leading to difficulties in classifying the individual event-related potentials (ERPs) due to overlap and refractory effects. Clearly any change to the stimulus setup must also respect the possible psychophysiological consequences. Here we report new EEG data from experiments in which we explore stimulus types and codebooks in a within-subject design, finding an interaction between the two factors. Our data demonstrate that the traditional, rowcolumn code has particular spatial properties that lead to better performance than one would expect from its TTIs and Hamming-distances alone, but nonetheless error-correcting codes can improve performance provided the right stimulus type is used. 1
same-paper 2 0.7618807 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
Author: Ydo Wexler, Christopher Meek
Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efficiency of DynaDecomp is demonstrated. 1
3 0.75017279 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
Author: Liu Yang, Rong Jin, Rahul Sukthankar
Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.
4 0.64855027 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
5 0.64589876 62 nips-2008-Differentiable Sparse Coding
Author: J. A. Bagnell, David M. Bradley
Abstract: Prior work has shown that features which appear to be biologically plausible as well as empirically useful can be found by sparse coding with a prior such as a laplacian (L1 ) that promotes sparsity. We show how smoother priors can preserve the benefits of these sparse priors while adding stability to the Maximum A-Posteriori (MAP) estimate that makes it more useful for prediction problems. Additionally, we show how to calculate the derivative of the MAP estimate efficiently with implicit differentiation. One prior that can be differentiated this way is KL-regularization. We demonstrate its effectiveness on a wide variety of applications, and find that online optimization of the parameters of the KL-regularized model can significantly improve prediction performance. 1
6 0.64221823 138 nips-2008-Modeling human function learning with Gaussian processes
7 0.64155614 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
8 0.64137691 4 nips-2008-A Scalable Hierarchical Distributed Language Model
9 0.64123368 200 nips-2008-Robust Kernel Principal Component Analysis
10 0.63987851 66 nips-2008-Dynamic visual attention: searching for coding length increments
11 0.6389327 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
12 0.63881671 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
13 0.63731951 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
14 0.63671994 31 nips-2008-Bayesian Exponential Family PCA
15 0.63609004 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
16 0.63586932 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
17 0.63470966 231 nips-2008-Temporal Dynamics of Cognitive Control
18 0.63463843 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
19 0.63337213 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
20 0.63299143 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds