nips nips2011 nips2011-55 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel R. Sheldon, Thomas G. Dietterich
Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). [sent-6, score-0.31]
2 We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. [sent-8, score-0.217]
3 Far more readily available are aggregated data in the form of counts or low-dimensional contingency tables. [sent-10, score-0.247]
4 This paper introduces a formalism in which one starts with a graphical model describing the behavior of individuals, and then derives a new graphical model — the Collective Graphical Model (CGM) — on the sufficient statistics of a population drawn from that model. [sent-16, score-0.232]
5 This paper is devoted to the problem of inference in CGMs, where the goal is to calculate conditional probabilities over the sufficient statistics given partial observations made at the population level. [sent-18, score-0.21]
6 We consider both an exact observation model where subtables of the sufficient statistics are observed directly, and a noisy observation model where these counts are corrupted. [sent-19, score-0.159]
7 Figure 1(a) shows the graphical model plate notation for the bird migration model from [1, 2], in which birds transition stochastically among a discrete set of locations (say, grid cells on a map) according to a Markov m chain (the individual model). [sent-23, score-0.532]
8 The variable Xt denotes the location of the mth bird at time t, and birds are independent and identically distributed. [sent-24, score-0.2]
9 Suppose, for example, that very accurate surveys reveal the number of birds nt (i) in each location i at each time t, and these numbers are collected into a single vector nt for each time step. [sent-26, score-0.193]
10 Figure 1(b) shows the CGM for this model, which we obtain by analytically marginalizing away the individual variables to get a new model on their sufficient statistics, which are the tables nt,t+1 with entries nt,t+1 (i, j) equaling the number of birds that fly from i to j from time t to t + 1. [sent-42, score-0.485]
11 Here, we are faced with yet another challenge: the CGM has hard constraints encoded into its distribution, and our MCMC moves must preserve these constraints yet still connect the state space. [sent-48, score-0.354]
12 To understand this, observe that the hidden variables in this example comprise a flow of M units through the trellis graph of the Markov chain, with the interpretation that nt,t+1 (i, j) birds “flow” along edge (i, j) at time t (see Figure 1(c) and [1]). [sent-49, score-0.191]
13 The constraints are that (1) flow is conserved at each trellis node, and (2) the number of birds that enter location i at time t equals the observed number nt (i). [sent-50, score-0.211]
14 ) How can we design a set of moves that connect any two M -unit flows while preserving these constraints? [sent-52, score-0.313]
15 The answer is to make moves that send flow around cycles. [sent-53, score-0.25]
16 Cycles of the form illustrated in Figure 1(d) preserve flow conservation but change the amount of flow through some trellis nodes. [sent-54, score-0.149]
17 One can show by graph-theoretic arguments that moves of these two general classes are enough to connect any two flows. [sent-56, score-0.313]
18 This gives us the skeleton of an ergodic MCMC sampler: starting with a feasible flow, select cycles from these two classes uniformly at random and propose moves that send δ units of flow around the cycle. [sent-57, score-0.25]
19 This paper formally develops these concepts in a way that generalizes the construction of Figure 1 to allow arbitrary graphical models inside the plate, and a more general observation model that includes both noisy observations and observations involving multiple variables. [sent-63, score-0.241]
20 We develop an efficient Gibbs sampler to conduct inference in CGMs that builds on existing work for conducting exact tests in contingency tables and makes several novel technical contributions. [sent-64, score-0.639]
21 Foremost is the analysis of the distribution over the move size δ, which we show to be a discrete univariate distribution that generalizes both the binomial and hypergeometric distributions. [sent-65, score-0.181]
22 The bird migration model of [1, 2] is a special case of CGMs where the individual model is a Markov chain and observations are made for single variables only. [sent-69, score-0.425]
23 Sampling methods for exact tests in contingency tables (e. [sent-71, score-0.475]
24 [4]) generate tables with the same sufficient statistics as an observed table. [sent-73, score-0.255]
25 Our work differs in that our observations are not sufficient, and we are sampling the sufficient statistics instead of the complete contingency table. [sent-74, score-0.34]
26 Diaconis and Sturmfels [5] broadly introduced the concept of Markov bases, which are sets of moves that connect the state space when sampling from conditional distributions by MCMC. [sent-75, score-0.351]
27 Inference in CGMs can be viewed as a form of lifted inference [8–12]. [sent-79, score-0.15]
28 The counting arguments used to derive the CGM distribution (see below) are similar to the operations of counting elimination [9] and counting conversion [10] used in exact lifted inference algorithms for first-order probabilistic models. [sent-80, score-0.325]
29 For example, when applied to the bird migration model, the C-FOVE algorithm of Milch et al. [sent-82, score-0.229]
30 [10] cannot introduce contingency tables over pairs of variables (Xt , Xt+1 ) as required to represent the sufficient statistics; it can only introduce histograms over single variables Xt . [sent-83, score-0.499]
31 A collection A is decomposable if there is a junction tree T = (A, E(T )) on vertex set A [7]. [sent-104, score-0.244]
32 Any collection A can be extended to a decomposable collection B such that A B; this corresponds to adding fill-in edges to a graphical model. [sent-105, score-0.26]
33 A contingency table n = (n(i))i∈X M has entries n(i) = m=1 I{x(m) = i} that count the number of times each element i ∈ X appears in the sample. [sent-110, score-0.301]
34 We use index variables such as i, j ∈ X (instead of x ∈ X ) to refer to cells of the contingency table, where i = (i1 , . [sent-111, score-0.243]
35 Let tbl(A) denote the set of all valid contingency tables on the domain XA . [sent-115, score-0.437]
36 For a full table n ∈ tbl(V ) and A ⊆ V , let the marginal table n ↓ A ∈ tbl(A) be defined as (m) M (n ↓ A)(iA ) = m=1 I{xA = iA } = iB ∈XV \A n(iA , iB ). [sent-117, score-0.166]
37 , x(M ) } is drawn from the individual model, resulting in a complete, but unobserved, contingency table nV . [sent-125, score-0.315]
38 We then observe the marginal tables nD = nV ↓ D for each set D in a collection of observed margins D, which we require to be decomposable. [sent-126, score-0.331]
39 Write this overall collection of tables as nD = {nD }D∈D . [sent-127, score-0.267]
40 In a discrete graphical model, the sufficient statistics are the contingency tables nC = {nC }C∈C over cliques. [sent-131, score-0.532]
41 Let µC be the table of marginal probabilities for clique C (i. [sent-134, score-0.195]
42 Let S be the collection of separators of TC (with repetition if the same set appears as a separator multiple times) and let nS and µS be the tables of counts and marginal probabilities for the separator S ∈ S. [sent-137, score-0.56]
43 It is this distribution that we call the collective graphical model; the parameters are the marginal probabilities of the individual model. [sent-142, score-0.249]
44 To understand the conditional distribution given the observations, let us further assume that D C (if not, add additional fillin edges for variables that co-occur within D), so that each observed table is determined by some clique table. [sent-143, score-0.162]
45 Write nD nC to express the condition that the tables nC produce observations nD : formally, this means that D C and that D ⊆ C implies that nD nC . [sent-144, score-0.285]
46 (3) In general, the number of contingency tables over small sets of variables leads to huge state spaces that prohibit exact inference schemes using (2) and (3). [sent-147, score-0.554]
47 First, the clique tables must match the observations (i. [sent-150, score-0.365]
48 Second, implicit in (2) is the constraint that the tables nC must be consistent in the sense that they are the sufficient statistics of some sample, otherwise p(nC ) = 0. [sent-153, score-0.287]
49 Refer to the set of contingency tables nA = {nA }A∈A as a configuration. [sent-155, score-0.437]
50 Consistency requires, for example, that any two tables must agree on their common marginal, which yields the flow conservation constraints in the bird migration model. [sent-157, score-0.493]
51 3 Inference Our goal is to develop a sampler for p(nC | nD ) given the observed tables nD . [sent-160, score-0.341]
52 Doing so without instantiating huge intermediate tables requires a careful sequence of operations on the two junction trees TC and TD . [sent-164, score-0.316]
53 Let A be a decomposable collection with junction tree TA . [sent-167, score-0.244]
54 In the bird migration example, Theorem 1 guarantees that preserving flow conservation is enough to maintain consistency. [sent-172, score-0.268]
55 , [15]) which asserts that marginal probability tables {µA }A∈A that are locally consistent are realizable as the marginals of some joint distribution p(x). [sent-175, score-0.321]
56 However, the integrality requirements of contingency tables necessitate a different style of construction. [sent-177, score-0.437]
57 1 Markov Basis The first key challenge in designing the MCMC sampler is constructing a set of moves that preserve the constraints mentioned above, yet still connect any two points in the support of the distribution. [sent-179, score-0.47]
58 Such a set of moves is called a Markov basis [5]. [sent-180, score-0.329]
59 A set of moves M is a Markov basis for the set F if, for any two configurations L n, n ∈ F, there is a sequence of moves z1 , . [sent-182, score-0.579]
60 Dobra [6] showed how to construct a Markov basis for moves in a complete contingency table given a decomposable set of margins. [sent-195, score-0.703]
61 Let Md=2 (A, S, B) be the set of all degree-two moves generated from this partition. [sent-201, score-0.25]
62 These are extensions of the well-known “swap moves” for two-dimensional contingency tables (e. [sent-202, score-0.437]
63 In this arrangement, it is clear that any such move preserves the i + − marginal table nA (row sums) and the marginal table nB (column sums); in other i − + words, z ↓ A = 0 and z ↓ B = 0. [sent-205, score-0.305]
64 The cycle in Figure 1(e) is a degree-two move on the table n1,2 , with A = {X1 }, S = ∅, C = {X2 }. [sent-207, score-0.159]
65 Let M∗ be the union of the sets of A degree-two moves Md=2 (A, S, B) where S is a separator of TA and (A, S, B) is the corresponding ∗ decomposition of V . [sent-210, score-0.347]
66 Thus, the image of the Dobra basis under A is a Markov basis for FnA . [sent-217, score-0.158]
67 The practical message so far is that to sample from p(nC | nD ), it suffices to generate moves from the projected Dobra basis MD . [sent-234, score-0.329]
68 This is done by first selecting a degree-two move z ∈ M∗ , and then marginalizing z onto each clique of C. [sent-235, score-0.203]
69 However, we will show that z ↓ C will be zero for many cliques, a fact we can exploit to implement moves more efficiently. [sent-237, score-0.25]
70 Let us now consider Algorithm 1: The projected Dobra basis MA settings where some variables are not part of Input: Junction tree TA with separators SA any observed table, which may happen when 1 Before sampling the individual model has hidden variables, or, later, with noisy observations. [sent-244, score-0.256]
71 Additional 2 For each S ∈ SA , find the associated moves are needed to connect two configuradecomposition (A, S, B) 3 Find the cliques C ∈ C that have non-empty tions that disagree on marginal tables involvintersection with both A and B. [sent-245, score-0.684]
72 d=1 of degree-one moves z ∈ M (A, B), 4 Let AS = A ∩ VS and BS = B ∩ VS which partition the variables into two sets 5 During sampling: to generate a move for separator (A, B) and have two nonzero entries z(i, j) = S ∈ SA 1, z(i , j) = −1 for i = i ∈ XA , j ∈ XB . [sent-250, score-0.491]
73 In the parlance of two-dimensional tables, these 6 Select z ∈ Md=2 (AS , S, BS ) 7 For each clique C ∈ CS , calculate z ↓ C moves adjust two entries in a single column so they preserve the column sums (nB ) but modify the row sums (nA ). [sent-251, score-0.491]
74 The cycle in Figure 1(d) is a degree-one move which adjusts the marginal table over A = {X2 }, but preserves the marginal table over B = {X1 , X3 }. [sent-252, score-0.338]
75 We proceed once again by constructing a basis for complete tables and then marginalizing the moves onto cliques. [sent-253, score-0.602]
76 Let U be any decomposable collection on the set of unobserved variables U = V \ D, and let D = D ∪ U. [sent-255, score-0.212]
77 Let M∗ consist of the moves M∗ together with the moves Md=1 (A, V \ A) D ∗ for each A ∈ U. [sent-256, score-0.5]
78 Then M∗ is a Markov basis for FnD , and M = {Az : z ∈ M∗ } is a Markov basis for FnD . [sent-257, score-0.158]
79 The degree-one moves also become local upon marginalization: it is easy to check that z ↓ C is zero unless C ∩ A is nonempty. [sent-259, score-0.25]
80 This has the effect of adding degree-one moves for each clique of C. [sent-262, score-0.33]
81 By matching the structure of TC , many of the additional degree-two moves become zero upon marginalization. [sent-263, score-0.25]
82 2 Constructing an efficient MCMC sampler The second key challenge in constructing the MCMC sampler is utilizing the moves from the Markov basis in a way that efficiently explores the state space. [sent-265, score-0.561]
83 A standard approach is to select a random move z, a direction δ = ±1 (each with probability 1/2), and then propose the move nC + δz in a Metropolis Hastings sampler. [sent-266, score-0.15]
84 Although these moves are enough to connect any two configurations, we are particularly interested in problems where M is large, for which moving by increments of ±1 will be prohibitively slow. [sent-267, score-0.313]
85 For general Markov bases, Diaconis and Sturmfels [5] suggest instead to construct a Gibbs sampler that uses the moves as directions for longer steps, by choosing the value of δ from the following distribution: p(δ) ∝ p(nC + δz | nD ), δ ∈ {δ : nC + δz ≥ 0}. [sent-268, score-0.366]
86 Consider the Markov chain with moves δz generated by first choosing z uniformly at random from M and then choosing δ according to (5). [sent-271, score-0.303]
87 For a separator S ∈ S, define zS as zC ↓ S for any clique C containing S. [sent-277, score-0.177]
88 C∈C(z),i∈I (zC ) C∈C(z),j∈I (zC ) (7) (8) Notably, each move in our basis satisfies |I + (zA ) ∪ I + (zA )| ≤ 4, so p(δ) can be evaluated by examining at most four entries in each table for cliques in C(z). [sent-301, score-0.325]
89 It is worth noting that Equation (7) reduces to the binomial distribution for degree-one moves and the (noncentral) hypergeometric distribution for degree-two moves, so we may sample from these distributions directly when |C(z)| = 1. [sent-302, score-0.325]
90 The proof of Theorem 4, which is found in the supplementary material, then pairs each separator S with a clique C and uses properties of the moves to show that pC (δ)/pS (δ) is also log-concave. [sent-309, score-0.427]
91 3 Noisy Observations Population-level counts from real survey data are rarely exact, and it is thus important to incorporate noisy observations into our model. [sent-317, score-0.151]
92 In this section, we describe how to modify the sampler for 7 the case when all observations are noisy; it is a straightforward generalization to allow both noisy and exact observations. [sent-318, score-0.298]
93 Suppose that we make noisy observations yR = {yR : R ∈ R} corresponding to the true marginal tables nR for a collection R C (that need not be decomposable). [sent-319, score-0.447]
94 A canonical example from the bird migration model is p(y | n) = Poisson(αn), so the survey count is Poisson with mean proportional to the true number of birds present. [sent-323, score-0.32]
95 4 Experiments We implemented our sampler in MATLAB using Murphy’s Bayes net toolbox [18] for the underlying operations on graphical models and junction trees. [sent-334, score-0.272]
96 The task was to estimate E[n2,3 | n1 , n3 ] in the bird migration model for L = 2, T = 3, and varying M . [sent-337, score-0.229]
97 We generated 30 random Bayes nets on 10 binary variables, and generated two sets of observed tables for a population of M = 100, 000: the set NODES has a table for each single variable, while the set CHAIN has tables for pairs of variables that are adjacent in a random ordering. [sent-340, score-0.604]
98 The sampler converged quickly in all cases with the more complex CHAIN observation model taking longer than NODES, and noisy observations taking slightly longer than exact ones. [sent-348, score-0.27]
99 Collective inference on Markov models for modeling bird migration. [sent-365, score-0.157]
100 Some results about decomposable (or Markov-type) models for multidimensional contingency tables: distribution of marginals and partitioning of tests. [sent-436, score-0.323]
wordName wordTfidf (topN-words)
[('nc', 0.517), ('moves', 0.25), ('tables', 0.225), ('contingency', 0.212), ('dobra', 0.206), ('fna', 0.206), ('cgm', 0.171), ('nv', 0.147), ('yr', 0.125), ('na', 0.122), ('cgms', 0.12), ('migration', 0.12), ('sampler', 0.116), ('decomposable', 0.111), ('bird', 0.109), ('za', 0.105), ('tbl', 0.103), ('lifted', 0.102), ('separator', 0.097), ('ic', 0.091), ('junction', 0.091), ('birds', 0.091), ('fnd', 0.09), ('tc', 0.09), ('markov', 0.088), ('cliques', 0.082), ('xa', 0.08), ('clique', 0.08), ('basis', 0.079), ('ow', 0.079), ('move', 0.075), ('population', 0.072), ('diaconis', 0.069), ('trellis', 0.069), ('collective', 0.068), ('nb', 0.068), ('guration', 0.066), ('graphical', 0.065), ('marginal', 0.064), ('connect', 0.063), ('ia', 0.062), ('ta', 0.062), ('observations', 0.06), ('xc', 0.059), ('noisy', 0.056), ('aggregate', 0.055), ('az', 0.055), ('chain', 0.053), ('zc', 0.052), ('mcmc', 0.052), ('individual', 0.052), ('anv', 0.051), ('sturmfels', 0.051), ('nt', 0.051), ('table', 0.051), ('md', 0.05), ('marginalizing', 0.048), ('inference', 0.048), ('sheldon', 0.045), ('individuals', 0.043), ('collection', 0.042), ('hypergeometric', 0.042), ('plate', 0.042), ('preserve', 0.041), ('elimination', 0.041), ('conservation', 0.039), ('sampling', 0.038), ('entries', 0.038), ('exact', 0.038), ('issn', 0.037), ('poisson', 0.036), ('ns', 0.035), ('xv', 0.035), ('counts', 0.035), ('gibbs', 0.035), ('apsel', 0.034), ('ows', 0.034), ('nr', 0.034), ('cycle', 0.033), ('binomial', 0.033), ('oregon', 0.033), ('consistent', 0.032), ('counting', 0.032), ('nd', 0.031), ('con', 0.031), ('variables', 0.031), ('rejection', 0.031), ('univariate', 0.031), ('ib', 0.03), ('statistics', 0.03), ('sa', 0.03), ('unobserved', 0.028), ('modify', 0.028), ('suf', 0.028), ('pk', 0.028), ('zl', 0.028), ('zr', 0.028), ('devroye', 0.028), ('milch', 0.028), ('sums', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 55 nips-2011-Collective Graphical Models
Author: Daniel R. Sheldon, Thomas G. Dietterich
Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1
2 0.08683449 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. 1
3 0.073730856 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
4 0.067832723 104 nips-2011-Generalized Beta Mixtures of Gaussians
Author: Artin Armagan, Merlise Clyde, David B. Dunson
Abstract: In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We first propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efficiently to the types of truly massive data sets that are now encountered routinely. 1
5 0.066434942 274 nips-2011-Structure Learning for Optimization
Author: Shulin Yang, Ali Rahimi
Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1
6 0.060433507 275 nips-2011-Structured Learning for Cell Tracking
7 0.058520224 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
8 0.057721093 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
9 0.057486244 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
10 0.057067372 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
11 0.055912543 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
12 0.055640437 131 nips-2011-Inference in continuous-time change-point models
13 0.054848582 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
14 0.052936088 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
15 0.052898124 221 nips-2011-Priors over Recurrent Continuous Time Processes
16 0.052831046 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
17 0.052730698 249 nips-2011-Sequence learning with hidden units in spiking neural networks
18 0.052676313 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
19 0.049927637 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
20 0.048203547 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints
topicId topicWeight
[(0, 0.153), (1, 0.017), (2, 0.021), (3, -0.019), (4, -0.037), (5, -0.083), (6, -0.048), (7, -0.074), (8, 0.043), (9, -0.012), (10, 0.026), (11, -0.065), (12, 0.013), (13, -0.021), (14, -0.019), (15, -0.006), (16, 0.041), (17, -0.098), (18, -0.043), (19, 0.041), (20, 0.023), (21, 0.037), (22, -0.003), (23, -0.054), (24, -0.053), (25, 0.058), (26, -0.051), (27, 0.046), (28, 0.028), (29, 0.068), (30, 0.015), (31, -0.043), (32, -0.028), (33, 0.022), (34, -0.081), (35, -0.061), (36, -0.018), (37, -0.014), (38, -0.045), (39, -0.017), (40, -0.025), (41, -0.03), (42, -0.129), (43, 0.021), (44, -0.013), (45, -0.017), (46, -0.017), (47, -0.056), (48, 0.004), (49, -0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.94465035 55 nips-2011-Collective Graphical Models
Author: Daniel R. Sheldon, Thomas G. Dietterich
Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1
2 0.76550704 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation
Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman
Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. 1
3 0.69255042 221 nips-2011-Priors over Recurrent Continuous Time Processes
Author: Ardavan Saeedi, Alexandre Bouchard-côté
Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1
4 0.68749791 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference
Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind
Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals. 1
5 0.64948857 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC
Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter
Abstract: Multi-structure model fitting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint fitting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefficient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good fit. To overcome this weakness we propose a new multi-structure fitting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisfies the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efficiency over previous two-stage methods. 1
6 0.64776719 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference
7 0.61120141 131 nips-2011-Inference in continuous-time change-point models
8 0.60045809 197 nips-2011-On Tracking The Partition Function
9 0.59673876 8 nips-2011-A Model for Temporal Dependencies in Event Streams
10 0.58904022 229 nips-2011-Query-Aware MCMC
11 0.57572883 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes
12 0.54893756 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
13 0.5348255 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo
14 0.51365656 69 nips-2011-Differentially Private M-Estimators
15 0.50088388 14 nips-2011-A concave regularization technique for sparse mixture models
16 0.4953618 104 nips-2011-Generalized Beta Mixtures of Gaussians
17 0.49028441 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions
18 0.48914692 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
19 0.48124114 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing
20 0.48066926 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
topicId topicWeight
[(0, 0.024), (4, 0.049), (16, 0.27), (20, 0.069), (26, 0.026), (31, 0.08), (33, 0.056), (43, 0.06), (45, 0.058), (57, 0.06), (65, 0.013), (74, 0.045), (83, 0.046), (84, 0.014), (99, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.74108446 55 nips-2011-Collective Graphical Models
Author: Daniel R. Sheldon, Thomas G. Dietterich
Abstract: There are many settings in which we wish to fit a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufficient statistics of the individual model. We derive a highlyefficient Gibbs sampling algorithm for sampling from the posterior distribution of the sufficient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1
2 0.60965562 109 nips-2011-Greedy Model Averaging
Author: Dong Dai, Tong Zhang
Abstract: This paper considers the problem of combining multiple models to achieve a prediction accuracy not much worse than that of the best single model for least squares regression. It is known that if the models are mis-specified, model averaging is superior to model selection. Specifically, let n be the sample size, then the worst case regret of the former decays at the rate of O(1/n) while the worst √ case regret of the latter decays at the rate of O(1/ n). In the literature, the most important and widely studied model averaging method that achieves the optimal O(1/n) average regret is the exponential weighted model averaging (EWMA) algorithm. However this method suffers from several limitations. The purpose of this paper is to present a new greedy model averaging procedure that improves EWMA. We prove strong theoretical guarantees for the new procedure and illustrate our theoretical results with empirical examples. 1
3 0.52427113 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei
Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.
4 0.52125877 227 nips-2011-Pylon Model for Semantic Segmentation
Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman
Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1
5 0.51333928 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu
Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.
6 0.50771916 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
7 0.50610846 180 nips-2011-Multiple Instance Filtering
8 0.5042578 66 nips-2011-Crowdclustering
9 0.50297624 303 nips-2011-Video Annotation and Tracking with Active Learning
10 0.50077891 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
11 0.4985956 219 nips-2011-Predicting response time and error rates in visual search
12 0.49839506 127 nips-2011-Image Parsing with Stochastic Scene Grammar
13 0.49819914 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models
14 0.49811864 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
15 0.49678499 75 nips-2011-Dynamical segmentation of single trials from population neural data
16 0.49609566 102 nips-2011-Generalised Coupled Tensor Factorisation
17 0.49503189 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
18 0.4948341 156 nips-2011-Learning to Learn with Compound HD Models
19 0.49417818 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
20 0.49358714 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation