nips nips2008 nips2008-50 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. [sent-2, score-0.319]
2 In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. [sent-3, score-0.713]
3 Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. [sent-4, score-0.973]
4 We give a principled method for altering the non-uniform discretization according to information-based measures. [sent-5, score-0.597]
5 CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. [sent-6, score-0.174]
6 As the dimensionality of the state-space increases, a na¨ve uniform discretization ı rapidly becomes intractable [8]. [sent-17, score-0.656]
7 When models are complex functions of the observations, sampling methods such as non-parametric belief propagation (NBP) [9, 10], have been successful. [sent-18, score-0.29]
8 “Message passing” is a class of algorithms for approximating these distributions, in which messages are iteratively updated between factors and variables. [sent-20, score-0.26]
9 When a given message is to be updated, all other messages in the graph are fixed and treated as though they were exact. [sent-21, score-0.39]
10 The algorithm proceeds by picking, from 1 a family of approximate functions, the message that minimizes a divergence to the local “exact” message. [sent-22, score-0.279]
11 In some forms of the approach [12] this minimization takes place over approximate belief distributions rather than approximate messages. [sent-23, score-0.327]
12 A general recipe for producing message passing algorithms, summarized by Minka [13], is as follows: (i) pick a family of approximating distributions; (ii) pick a divergence measure to minimize; (iii) construct an optimization algorithm to perform this minimization within the approximating family. [sent-24, score-0.456]
13 For step (i), we advocate an approximating family that has received little attention in recent years: piecewise-constant probability densities with a bounded number of piecewise-constant regions. [sent-26, score-0.13]
14 We show that for a special class of piecewise-constant probability densities (the so-called naturally-weighted densities), the minimal divergence is achieved by a distribution of minimum entropy, leading to an intuitive and easily-implemented algorithm. [sent-30, score-0.131]
15 2 Discretizing a factor graph Let us consider what it means to discretize an inference problem represented by a factor graph with factors fi and continuous variables xα taking values in some subset of RN . [sent-34, score-0.418]
16 One constructs a nonuniform discretization of the factor graph by partitioning the state space of each variable xα into k K regions Hα for k = 1, . [sent-35, score-0.907]
17 This discretization induces a discrete approximation fi of the factors, which are now regarded as functions of discrete variables xα taking integer values in the set {1, 2, . [sent-39, score-0.829]
18 Thus, given a factor graph of continuous variables and a particular choice of disα k cretization {Hα }, one gets a piecewise-constant approximation to the marginals by first discretizing the variables according to (1), then using BP according to (2)–(4). [sent-59, score-0.307]
19 The error in the approximation to the true marginals arises from (3) when fi (x) is not constant over x in the given partition. [sent-60, score-0.227]
20 Consider the task of selecting between discretizations of a continuous probability distribution p(x) over some subset U of Euclidean space. [sent-61, score-0.121]
21 A discretization of p consists in partitioning U into K disjoint subsets V1 , . [sent-62, score-0.714]
22 , VK and assigning a weight wk to each Vk , with k wk = 1. [sent-65, score-0.164]
23 We are interested in finding a discretization for which the KL divergence KL(p||q) is as small as possible. [sent-67, score-0.681]
24 The optimal choice of the wk for any fixed partitioning V1 , . [sent-68, score-0.199]
25 02 z (c) (d) (e) (f) Figure 1: Expanding a hypercube in two dimensions. [sent-85, score-0.12]
26 Informed belief values are computed for the re-combined hypercubes, including a new estimate for ˆ b(H) (f), by summing the beliefs in the finer-scale partitioning. [sent-88, score-0.323]
27 The new estimates are more accurate since the error introduced by the discretization decreases as the partitions become smaller. [sent-89, score-0.685]
28 There is a simple relationship between the quality of a naturally-weighted discretization and its entropy H(·): Theorem 1. [sent-91, score-0.653]
29 Among any collection of naturally-weighted discretizations of p(x), the minimum KL divergence to p(x) is achieved by a discretization of minimal entropy. [sent-92, score-0.773]
30 For a naturally-weighted discretization q, KL(p||q) = − k=1 wk log |Vk | + H(q) − H(p). [sent-94, score-0.679]
31 U p log p = k Suppose we are given a discretization {Hα } and have computed messages and beliefs for every node using (2)–(4). [sent-96, score-0.945]
32 The messages have not necessarily reached a fixed point, but we nevertheless have some current estimate for them. [sent-97, score-0.216]
33 For any arbitrary hypercube H at xα (not necessarily in its current discretization) we can define the informed belief, denoted ˆ b(H), to be the belief H would receive if all other nodes and their incoming messages were left unaltered. [sent-98, score-0.815]
34 To compute the informed belief, one first computes new discrete factor function values involving H using integrals like (1). [sent-99, score-0.374]
35 These values are fed into (2), (3) to produce “informed” messages mi,α (H) arriving at xα from each neighbor fi . [sent-100, score-0.314]
36 Finally, the informed messages are fed into (4) to obtain the informed belief ˆ b(H). [sent-101, score-0.833]
37 3 Continuously-adaptive discretization The core of the CAD-MP algorithm is the procedure for passing a message to a variable xα . [sent-102, score-0.759]
38 Given fixed approximations at every other node, any discretization of α induces an approximate belief distribution qα (xα ). [sent-103, score-0.85]
39 The task of the algorithm is to select the best discretization, and as Theorem 1 shows, a good strategy for this selection is to look for a naturally-weighted discretization that minimizes the entropy of qα . [sent-104, score-0.653]
40 CAD-MP employs an axis-aligned binary-split kd-tree [15] to represent the discrete partitioning of a D-dimensional continuous state space at each variable (the same representation was used in [14] where it was called a Binary Split Partitioning). [sent-106, score-0.237]
41 The root is assigned the whole space, and any internal vertex splits its hypercube equally between its two children using an axis-aligned plane. [sent-108, score-0.182]
42 We build the kd-tree greedily by recursively splitting leaf vertices: at each step we must choose k a hypercube Hα in the current partitioning to split, and a dimension d to split it. [sent-110, score-0.393]
43 According to Theorem 1, we should choose k and d to minimize the entropy of the resulting discretization— provided that this discretization has “natural” weights. [sent-111, score-0.653]
44 In practice, the natural weights are estimated using informed beliefs; we nevertheless proceed as though they were exact and choose the k- and 3 d-values leading to lowest entropy. [sent-112, score-0.262]
45 A subroutine of the algorithm involves “expanding” a hypercube into sub-cubes as illustrated in the two-dimensional case in Figure 1. [sent-113, score-0.12]
46 We can now describe the CAD-MP algorithm using informed splitting, which re-partitions a variable of the factor graph by producing a new kd-tree whose leaves are the hypercubes in the new partitioning: 1. [sent-121, score-0.398]
47 Initialize the root vertex of the kd-tree with its associated hypercube being the whole state space, with belief 1. [sent-122, score-0.435]
48 While the number of leaves |L| is less than the desired number of partitions in the discretized model: (a) Pick the leaf H and split dimension d that minimize the split-entropy (6). [sent-125, score-0.224]
49 All variables in the factor graph are initialized with the trivial discretization (a single partition). [sent-128, score-0.712]
50 A simple example showing the evolution of the belief at one variable is shown in Figure 2. [sent-130, score-0.255]
51 If the variable being repartitioned has T neighbors and we require a partitioning of K hypercubes, then a straightforward implementation of this algorithm requires the computation of 2K × 2D × KT message components. [sent-131, score-0.235]
52 Roughly speaking, then, informed splitting pays a factor of 2D+1 over BP which must compute K 2 T message components. [sent-132, score-0.453]
53 But CAD-MP trades this for an exponential factor in K since it can home in on interesting areas of the state space using binary search, so if BP requires K partitions for a given level of accuracy, CAD-MP (empirically) achieves the same accuracy with only O(log K) partitions. [sent-133, score-0.185]
54 4 Experiments We would like to compare our candidate algorithms against the marginal belief distributions that would be computed by exact inference, however no exact inference algorithm is known for our models. [sent-135, score-0.496]
55 Instead, for each experiment we construct a fine-scale uniform discretization Df of the model and input data, and compute the marginal belief distributions p(xα ; Df ) at each variable xα using the standard forward-backward BP algorithm. [sent-136, score-0.973]
56 Given a candidate approximation C we can then compare the marginals p(xα ; C) under that approximation to the fine-scale discretization by computing the KL-divergence KL(p(xα ; Df )||p(xα ; C)) at each variable. [sent-137, score-0.778]
57 While a “fine-enough” uniform discretization will tend to the true marginals, we do not a priori i know how fine that is. [sent-139, score-0.656]
58 We therefore construct a sequence of coarser uniform discretizations Dc of i i the same model and data, and compute µ(Dc ) for each of them. [sent-140, score-0.151]
59 If µ(Dc ) is converging rapidly enough to zero, as is the case in the experiments below, we have confidence that the fine-scale discretization is a good approximation to the exact marginals. [sent-141, score-0.686]
60 4 Observation (local factor) (a) (b) (c) Figure 2: Evolution of discretization at a single variable. [sent-142, score-0.597]
61 The left image is the local (singlevariable) factor at the first node in a simple chain MRF whose nodes have 2-D state spaces. [sent-143, score-0.121]
62 The next three images, from left to right, show the evolution of the informed belief. [sent-144, score-0.241]
63 Initially (a) the partitioning is informed simply by the local factor, but after messages have been passed once along the chain and back (b), the posterior marginal estimate has shifted and the discretization has adapted accordingly. [sent-145, score-1.197]
64 For this toy example only 16 partitions are used, and the normalized log of the belief is displayed to make the structure of the distribution more apparent. [sent-147, score-0.303]
65 We compare our adaptive discretization algorithm against non-parametric belief propagation (NBP) [9, 10] which represents the marginal distribution at a variable by a particle set. [sent-148, score-1.042]
66 Particle sets typically do not approximate the tails of a distribution well, leading to zeros in the approximate marginals and divergences that tend to infinity. [sent-150, score-0.358]
67 We therefore regularize all divergence computations as follows: KL∗ (p||q) = p∗ log( k k p∗ k ∗ ), qk p∗ = k + H k p(x) , n ( + H n p(x)) ∗ qk = + n( + xk q(x) Hn q(x)) where {H k } are the partitions in the fine-scale discretization Df . [sent-151, score-0.827]
68 (7) = 10−4 We begin with a set of experiments over ten randomly generated input sequences of a onedimensional target moving through structured clutter of similar-looking distractors. [sent-153, score-0.151]
69 There are stationary clutter distractors, and also periodic “forkings” where a moving clutter distractor emerges from the target and proceeds for a few time-steps before disappearing. [sent-156, score-0.124]
70 Each sequence contains 256 timesteps, and the “exact” marginals (Figure 3b) are computed using standard discrete BP with 15360 states per time-step. [sent-157, score-0.154]
71 The modes of the marginals generated by all the experiments are similar to those in Figure 3b, except for one run of NBP shown in Figure 3c that failed entirely to find the mode (red line) due to an unlucky random seed. [sent-158, score-0.131]
72 Figure 4a shows the divergences µ(·) for the various discrete algorithms: both uniform discretization at various degrees of coarseness, and adaptive discretization using CAD-MP with varying numbers of partitions. [sent-160, score-1.454]
73 Each data point shows the mean divergence µ(·) for one of the ten simulated onedimensional datasets. [sent-161, score-0.177]
74 As the number of adaptive partitions increases, the variance of µ(·) across trials increases, but the divergence stays small. [sent-162, score-0.222]
75 Higher divergences in CAD-MP trials correspond to a mis-estimation of the tails of the marginal belief at a few time-steps. [sent-163, score-0.462]
76 The straight line on the log/log plot for the uniform discretizations gives us confidence that the fine-scale discretization is a close approximation to the exact beliefs. [sent-164, score-0.837]
77 The adaptive discretization provides a very faithful approximation to this “exact” distribution with vastly fewer partitions. [sent-165, score-0.721]
78 Figure 4b shows the divergences for the same ten one-dimensional trial sequences when the marginals are computed using NBP with varying numbers of particles. [sent-166, score-0.281]
79 The region of the white rectangle in (b) is expanded in (d)–(g), with beliefs now plotted on log intensity scale to expand their dynamic range. [sent-169, score-0.186]
80 CAD-MP using only 16 partitions per time-step (e) already produces a faithful approximation to the exact belief (d), and increasing to 128 partitions (f) fills in more details. [sent-170, score-0.526]
81 The NBP algorithm using 800 particles (g) does not approximate the tails of the distribution well. [sent-171, score-0.15]
82 AC C yx wv ut ssr yx wv ut ss CB A yx wv ut ss yx wv ut ss ¢ ¢¡ D E F GH I P RQ E S S Q T E P F E U V E ¢ ¡ ¢ ¡ @ 90 87565 ¢ ¡ ¢ ¡ CAB A AAAC ¢ ¡ 43210)( ¢ AAC ¢ ¢ ¢ ' ! [sent-172, score-0.488]
83 5 signify runs on which NBP incorrectly located the mode of the marginal belief distribution at some or all time-steps, as in Figure 3c. [sent-178, score-0.311]
84 This time the input data is a 64 × 64 image grid, and the “exact” fine-scale discretization is at a resolution of 512 × 512 giving 262144 discrete states in total. [sent-180, score-0.65]
85 Figures 4c and 4d show that adaptive discretization still greatly outperforms NBP for an equivalent computational cost. [sent-181, score-0.647]
86 Again there is a straight-line trend in the log/log plots for both CAD-MP and uniform discretization, though as in the one-dimensional case the variance of the divergences increases with more partitions. [sent-182, score-0.157]
87 NBP again performs less accurately, and frequently fails to find the high-weight regions of the belief at all at some time-steps, even with 3200 particles. [sent-183, score-0.215]
88 Adaptive discretization seems to correct some of the well-known limitations of particle-based methods. [sent-184, score-0.597]
89 The discrete distribution is able to represent probability mass well into the tails of the distribution, which leads to a more faithful approximation to the exact beliefs. [sent-185, score-0.271]
90 Moreover, CAD-MP’s computational complexity scales linearly with the number of incoming messages at a factor. [sent-187, score-0.279]
91 NBP has to resort to heuristics to sample from the product of incoming messages once the number of messages is greater than two. [sent-188, score-0.495]
92 Another interesting approach is to retain the uniform discretization, but enforce sparsity on messages to reduce computational cost. [sent-195, score-0.275]
93 However, these approaches appear to suffer when multiplying messages with disjoint peaks whose tails have been truncated to enforce sparsity: such peaks are unable to fuse their evidence correctly. [sent-197, score-0.379]
94 It substantially outperforms the two standard methods for inference in this setting: uniform-discretization and non-parametric belief propagation. [sent-201, score-0.248]
95 The main challenge in applying the technique to an arbitrary factor graph is the tractability of the definite integrals (1). [sent-204, score-0.143]
96 Also, we employ a greedy heuristic to select a partitioning with low entropy rather than exhaustively computing a minimimum entropy over some family of discretizations. [sent-207, score-0.268]
97 The choices reported here seem to give the best accuracy on our problems for a given computational budget, however many others are possible and we hope this work will serve as a starting point for a renewed interest in adaptive discretization in a variety of inference settings. [sent-210, score-0.68]
98 Approximating probabilistic inference in bayesian belief networks is NP-hard. [sent-218, score-0.248]
99 Dense motion and disparity estimation via loop belief propagation. [sent-256, score-0.215]
100 Shape matching with belief propagation: Using dynamic quantization to accommodate occlusion and clutter. [sent-316, score-0.215]
wordName wordTfidf (topN-words)
[('discretization', 0.597), ('nbp', 0.367), ('messages', 0.216), ('belief', 0.215), ('informed', 0.201), ('bp', 0.149), ('hypercube', 0.12), ('message', 0.118), ('partitioning', 0.117), ('beliefs', 0.108), ('vk', 0.101), ('marginals', 0.101), ('fi', 0.098), ('divergences', 0.098), ('discretizations', 0.092), ('partitions', 0.088), ('divergence', 0.084), ('tails', 0.083), ('wk', 0.082), ('splitting', 0.075), ('propagation', 0.075), ('wv', 0.074), ('yx', 0.069), ('marginal', 0.066), ('kl', 0.064), ('incoming', 0.063), ('clutter', 0.062), ('exact', 0.061), ('factor', 0.059), ('uniform', 0.059), ('df', 0.057), ('entropy', 0.056), ('graph', 0.056), ('hypercubes', 0.055), ('discrete', 0.053), ('adaptive', 0.05), ('ut', 0.048), ('densities', 0.047), ('dc', 0.047), ('faithful', 0.046), ('coughlan', 0.046), ('isard', 0.046), ('kozlov', 0.046), ('split', 0.046), ('approximating', 0.044), ('passing', 0.044), ('ss', 0.043), ('evolution', 0.04), ('expanded', 0.04), ('peaks', 0.04), ('nonuniform', 0.04), ('valley', 0.04), ('family', 0.039), ('particle', 0.039), ('approximate', 0.038), ('state', 0.038), ('expand', 0.038), ('seeds', 0.037), ('distributions', 0.036), ('cvpr', 0.036), ('leaf', 0.035), ('expanding', 0.035), ('discretizing', 0.034), ('silicon', 0.034), ('computes', 0.033), ('inference', 0.033), ('microsoft', 0.033), ('vertex', 0.033), ('junction', 0.033), ('kannan', 0.033), ('onedimensional', 0.033), ('ten', 0.032), ('felzenszwalb', 0.031), ('recipe', 0.031), ('mi', 0.03), ('mode', 0.03), ('metropolis', 0.03), ('loopy', 0.03), ('vision', 0.029), ('continuous', 0.029), ('root', 0.029), ('qk', 0.029), ('particles', 0.029), ('mountain', 0.029), ('approximation', 0.028), ('simulated', 0.028), ('discretized', 0.028), ('integrals', 0.028), ('discretize', 0.028), ('leaves', 0.027), ('fj', 0.027), ('pick', 0.026), ('trial', 0.026), ('tree', 0.025), ('candidate', 0.024), ('sequences', 0.024), ('kt', 0.024), ('node', 0.024), ('freeman', 0.023), ('hybrid', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
2 0.16148145 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
3 0.139865 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
Author: Lukas Kroc, Ashish Sabharwal, Bart Selman
Abstract: We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1
4 0.11203903 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
Author: David Sontag, Amir Globerson, Tommi S. Jaakkola
Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and significantly faster than a method that does not use partitioning. 1
5 0.10052736 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
Author: Ali Nouri, Michael L. Littman
Abstract: The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE’s broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches. 1
6 0.09085983 89 nips-2008-Gates
7 0.086039566 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
8 0.077274948 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
9 0.076289698 76 nips-2008-Estimation of Information Theoretic Measures for Continuous Random Variables
10 0.073572457 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
11 0.073388271 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs
12 0.064437293 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
13 0.062862113 62 nips-2008-Differentiable Sparse Coding
14 0.061957531 223 nips-2008-Structure Learning in Human Sequential Decision-Making
15 0.057011031 84 nips-2008-Fast Prediction on a Tree
16 0.051697936 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
17 0.050358001 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
18 0.049783375 235 nips-2008-The Infinite Hierarchical Factor Regression Model
19 0.049718328 69 nips-2008-Efficient Exact Inference in Planar Ising Models
20 0.048668601 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
topicId topicWeight
[(0, -0.173), (1, 0.042), (2, 0.013), (3, 0.012), (4, 0.052), (5, -0.09), (6, 0.048), (7, 0.006), (8, 0.006), (9, -0.121), (10, -0.054), (11, 0.056), (12, 0.102), (13, -0.087), (14, -0.117), (15, -0.02), (16, 0.038), (17, -0.034), (18, 0.056), (19, 0.031), (20, 0.064), (21, -0.047), (22, -0.151), (23, -0.112), (24, 0.017), (25, -0.071), (26, 0.141), (27, 0.008), (28, -0.011), (29, -0.066), (30, -0.189), (31, 0.047), (32, 0.04), (33, 0.022), (34, -0.014), (35, 0.04), (36, 0.15), (37, 0.138), (38, 0.058), (39, 0.037), (40, -0.063), (41, -0.011), (42, -0.042), (43, -0.12), (44, 0.029), (45, -0.06), (46, -0.006), (47, -0.085), (48, -0.0), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.95458388 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
2 0.70115942 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
3 0.65413159 89 nips-2008-Gates
Author: Tom Minka, John Winn
Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1
4 0.57928079 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
Author: Lukas Kroc, Ashish Sabharwal, Bart Selman
Abstract: We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efficiently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1
5 0.55908424 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
Author: David Sontag, Amir Globerson, Tommi S. Jaakkola
Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and significantly faster than a method that does not use partitioning. 1
6 0.55526453 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
7 0.44423509 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
8 0.41338137 104 nips-2008-Improved Moves for Truncated Convex Models
9 0.40980333 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
10 0.40816149 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
11 0.4063558 33 nips-2008-Bayesian Model of Behaviour in Economic Games
12 0.4029347 69 nips-2008-Efficient Exact Inference in Planar Ising Models
13 0.39726958 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
14 0.3885262 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs
15 0.36695465 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
16 0.36374322 105 nips-2008-Improving on Expectation Propagation
17 0.35984144 29 nips-2008-Automatic online tuning for fast Gaussian summation
18 0.35320517 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
19 0.35255399 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
20 0.35014135 236 nips-2008-The Mondrian Process
topicId topicWeight
[(6, 0.055), (7, 0.066), (12, 0.071), (21, 0.01), (28, 0.201), (34, 0.026), (57, 0.074), (59, 0.018), (63, 0.027), (71, 0.03), (77, 0.063), (78, 0.012), (83, 0.051), (88, 0.172), (89, 0.018)]
simIndex simValue paperId paperTitle
1 0.90875894 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
Author: Siwei Lyu, Eero P. Simoncelli
Abstract: We consider the problem of transforming a signal to a representation in which the components are statistically independent. When the signal is generated as a linear transformation of independent Gaussian or non-Gaussian sources, the solution may be computed using a linear transformation (PCA or ICA, respectively). Here, we consider a complementary case, in which the source is non-Gaussian but elliptically symmetric. Such a source cannot be decomposed into independent components using a linear transform, but we show that a simple nonlinear transformation, which we call radial Gaussianization (RG), is able to remove all dependencies. We apply this methodology to natural signals, demonstrating that the joint distributions of nearby bandpass filter responses, for both sounds and images, are closer to being elliptically symmetric than linearly transformed factorial sources. Consistent with this, we demonstrate that the reduction in dependency achieved by applying RG to either pairs or blocks of bandpass filter responses is significantly greater than that achieved by PCA or ICA.
same-paper 2 0.8883813 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
3 0.80673802 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
Author: David Sontag, Amir Globerson, Tommi S. Jaakkola
Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and significantly faster than a method that does not use partitioning. 1
4 0.80176783 4 nips-2008-A Scalable Hierarchical Distributed Language Model
Author: Andriy Mnih, Geoffrey E. Hinton
Abstract: Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPLMs is their extremely long training and testing times. Morin and Bengio have proposed a hierarchical language model built around a binary tree of words, which was two orders of magnitude faster than the nonhierarchical model it was based on. However, it performed considerably worse than its non-hierarchical counterpart in spite of using a word tree created using expert knowledge. We introduce a fast hierarchical language model along with a simple feature-based algorithm for automatic construction of word trees from the data. We then show that the resulting models can outperform non-hierarchical neural models as well as the best n-gram models. 1
5 0.80140251 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
6 0.79935026 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
7 0.7980895 231 nips-2008-Temporal Dynamics of Cognitive Control
9 0.79644471 40 nips-2008-Bounds on marginal probability distributions
10 0.79542327 195 nips-2008-Regularized Policy Iteration
11 0.79503727 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
12 0.79483318 219 nips-2008-Spectral Hashing
13 0.79480767 138 nips-2008-Modeling human function learning with Gaussian processes
14 0.79427326 113 nips-2008-Kernelized Sorting
15 0.79289192 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
16 0.79174942 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
17 0.7905935 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
18 0.78994089 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
19 0.78878832 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
20 0.78870958 200 nips-2008-Robust Kernel Principal Component Analysis