nips nips2011 nips2011-55 knowledge-graph by maker-knowledge-mining

55 nips-2011-Collective Graphical Models

Source: pdf

Author: Daniel R. Sheldon, Thomas G. Dietterich

Abstract: There are many settings in which we wish to ﬁt a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). This paper introduces Collective Graphical Models—a framework for modeling and probabilistic inference that operates directly on the sufﬁcient statistics of the individual model. We derive a highlyefﬁcient Gibbs sampling algorithm for sampling from the posterior distribution of the sufﬁcient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract There are many settings in which we wish to ﬁt a model of the behavior of individuals but where our data consist only of aggregate information (counts or low-dimensional contingency tables). [sent-6, score-0.31]

2 We derive a highlyefﬁcient Gibbs sampling algorithm for sampling from the posterior distribution of the sufﬁcient statistics conditioned on noisy aggregate observations, prove its correctness, and demonstrate its effectiveness experimentally. [sent-8, score-0.217]

3 Far more readily available are aggregated data in the form of counts or low-dimensional contingency tables. [sent-10, score-0.247]

4 This paper introduces a formalism in which one starts with a graphical model describing the behavior of individuals, and then derives a new graphical model — the Collective Graphical Model (CGM) — on the sufﬁcient statistics of a population drawn from that model. [sent-16, score-0.232]

5 This paper is devoted to the problem of inference in CGMs, where the goal is to calculate conditional probabilities over the sufﬁcient statistics given partial observations made at the population level. [sent-18, score-0.21]

6 We consider both an exact observation model where subtables of the sufﬁcient statistics are observed directly, and a noisy observation model where these counts are corrupted. [sent-19, score-0.159]

7 Figure 1(a) shows the graphical model plate notation for the bird migration model from [1, 2], in which birds transition stochastically among a discrete set of locations (say, grid cells on a map) according to a Markov m chain (the individual model). [sent-23, score-0.532]

8 The variable Xt denotes the location of the mth bird at time t, and birds are independent and identically distributed. [sent-24, score-0.2]

9 Suppose, for example, that very accurate surveys reveal the number of birds nt (i) in each location i at each time t, and these numbers are collected into a single vector nt for each time step. [sent-26, score-0.193]

10 Figure 1(b) shows the CGM for this model, which we obtain by analytically marginalizing away the individual variables to get a new model on their sufﬁcient statistics, which are the tables nt,t+1 with entries nt,t+1 (i, j) equaling the number of birds that ﬂy from i to j from time t to t + 1. [sent-42, score-0.485]

11 Here, we are faced with yet another challenge: the CGM has hard constraints encoded into its distribution, and our MCMC moves must preserve these constraints yet still connect the state space. [sent-48, score-0.354]

12 To understand this, observe that the hidden variables in this example comprise a ﬂow of M units through the trellis graph of the Markov chain, with the interpretation that nt,t+1 (i, j) birds “ﬂow” along edge (i, j) at time t (see Figure 1(c) and [1]). [sent-49, score-0.191]

13 The constraints are that (1) ﬂow is conserved at each trellis node, and (2) the number of birds that enter location i at time t equals the observed number nt (i). [sent-50, score-0.211]

14 ) How can we design a set of moves that connect any two M -unit ﬂows while preserving these constraints? [sent-52, score-0.313]

15 The answer is to make moves that send ﬂow around cycles. [sent-53, score-0.25]

16 Cycles of the form illustrated in Figure 1(d) preserve ﬂow conservation but change the amount of ﬂow through some trellis nodes. [sent-54, score-0.149]

17 One can show by graph-theoretic arguments that moves of these two general classes are enough to connect any two ﬂows. [sent-56, score-0.313]

18 This gives us the skeleton of an ergodic MCMC sampler: starting with a feasible ﬂow, select cycles from these two classes uniformly at random and propose moves that send δ units of ﬂow around the cycle. [sent-57, score-0.25]

19 This paper formally develops these concepts in a way that generalizes the construction of Figure 1 to allow arbitrary graphical models inside the plate, and a more general observation model that includes both noisy observations and observations involving multiple variables. [sent-63, score-0.241]

20 We develop an efﬁcient Gibbs sampler to conduct inference in CGMs that builds on existing work for conducting exact tests in contingency tables and makes several novel technical contributions. [sent-64, score-0.639]

21 Foremost is the analysis of the distribution over the move size δ, which we show to be a discrete univariate distribution that generalizes both the binomial and hypergeometric distributions. [sent-65, score-0.181]

22 The bird migration model of [1, 2] is a special case of CGMs where the individual model is a Markov chain and observations are made for single variables only. [sent-69, score-0.425]

23 Sampling methods for exact tests in contingency tables (e. [sent-71, score-0.475]

24 [4]) generate tables with the same sufﬁcient statistics as an observed table. [sent-73, score-0.255]

25 Our work differs in that our observations are not sufﬁcient, and we are sampling the sufﬁcient statistics instead of the complete contingency table. [sent-74, score-0.34]

26 Diaconis and Sturmfels [5] broadly introduced the concept of Markov bases, which are sets of moves that connect the state space when sampling from conditional distributions by MCMC. [sent-75, score-0.351]

27 Inference in CGMs can be viewed as a form of lifted inference [8–12]. [sent-79, score-0.15]

28 The counting arguments used to derive the CGM distribution (see below) are similar to the operations of counting elimination [9] and counting conversion [10] used in exact lifted inference algorithms for ﬁrst-order probabilistic models. [sent-80, score-0.325]

29 For example, when applied to the bird migration model, the C-FOVE algorithm of Milch et al. [sent-82, score-0.229]

30 [10] cannot introduce contingency tables over pairs of variables (Xt , Xt+1 ) as required to represent the sufﬁcient statistics; it can only introduce histograms over single variables Xt . [sent-83, score-0.499]

31 A collection A is decomposable if there is a junction tree T = (A, E(T )) on vertex set A [7]. [sent-104, score-0.244]

32 Any collection A can be extended to a decomposable collection B such that A B; this corresponds to adding ﬁll-in edges to a graphical model. [sent-105, score-0.26]

33 A contingency table n = (n(i))i∈X M has entries n(i) = m=1 I{x(m) = i} that count the number of times each element i ∈ X appears in the sample. [sent-110, score-0.301]

34 We use index variables such as i, j ∈ X (instead of x ∈ X ) to refer to cells of the contingency table, where i = (i1 , . [sent-111, score-0.243]

35 Let tbl(A) denote the set of all valid contingency tables on the domain XA . [sent-115, score-0.437]

36 For a full table n ∈ tbl(V ) and A ⊆ V , let the marginal table n ↓ A ∈ tbl(A) be deﬁned as (m) M (n ↓ A)(iA ) = m=1 I{xA = iA } = iB ∈XV \A n(iA , iB ). [sent-117, score-0.166]

37 , x(M ) } is drawn from the individual model, resulting in a complete, but unobserved, contingency table nV . [sent-125, score-0.315]

38 We then observe the marginal tables nD = nV ↓ D for each set D in a collection of observed margins D, which we require to be decomposable. [sent-126, score-0.331]

39 Write this overall collection of tables as nD = {nD }D∈D . [sent-127, score-0.267]

40 In a discrete graphical model, the sufﬁcient statistics are the contingency tables nC = {nC }C∈C over cliques. [sent-131, score-0.532]

41 Let µC be the table of marginal probabilities for clique C (i. [sent-134, score-0.195]

42 Let S be the collection of separators of TC (with repetition if the same set appears as a separator multiple times) and let nS and µS be the tables of counts and marginal probabilities for the separator S ∈ S. [sent-137, score-0.56]

43 It is this distribution that we call the collective graphical model; the parameters are the marginal probabilities of the individual model. [sent-142, score-0.249]

44 To understand the conditional distribution given the observations, let us further assume that D C (if not, add additional ﬁllin edges for variables that co-occur within D), so that each observed table is determined by some clique table. [sent-143, score-0.162]

45 Write nD nC to express the condition that the tables nC produce observations nD : formally, this means that D C and that D ⊆ C implies that nD nC . [sent-144, score-0.285]

46 (3) In general, the number of contingency tables over small sets of variables leads to huge state spaces that prohibit exact inference schemes using (2) and (3). [sent-147, score-0.554]

47 First, the clique tables must match the observations (i. [sent-150, score-0.365]

48 Second, implicit in (2) is the constraint that the tables nC must be consistent in the sense that they are the sufﬁcient statistics of some sample, otherwise p(nC ) = 0. [sent-153, score-0.287]

49 Refer to the set of contingency tables nA = {nA }A∈A as a conﬁguration. [sent-155, score-0.437]

50 Consistency requires, for example, that any two tables must agree on their common marginal, which yields the ﬂow conservation constraints in the bird migration model. [sent-157, score-0.493]

51 3 Inference Our goal is to develop a sampler for p(nC | nD ) given the observed tables nD . [sent-160, score-0.341]

52 Doing so without instantiating huge intermediate tables requires a careful sequence of operations on the two junction trees TC and TD . [sent-164, score-0.316]

53 Let A be a decomposable collection with junction tree TA . [sent-167, score-0.244]

54 In the bird migration example, Theorem 1 guarantees that preserving ﬂow conservation is enough to maintain consistency. [sent-172, score-0.268]

55 , [15]) which asserts that marginal probability tables {µA }A∈A that are locally consistent are realizable as the marginals of some joint distribution p(x). [sent-175, score-0.321]

56 However, the integrality requirements of contingency tables necessitate a different style of construction. [sent-177, score-0.437]

57 1 Markov Basis The ﬁrst key challenge in designing the MCMC sampler is constructing a set of moves that preserve the constraints mentioned above, yet still connect any two points in the support of the distribution. [sent-179, score-0.47]

58 Such a set of moves is called a Markov basis [5]. [sent-180, score-0.329]

59 A set of moves M is a Markov basis for the set F if, for any two conﬁgurations L n, n ∈ F, there is a sequence of moves z1 , . [sent-182, score-0.579]

60 Dobra [6] showed how to construct a Markov basis for moves in a complete contingency table given a decomposable set of margins. [sent-195, score-0.703]

61 Let Md=2 (A, S, B) be the set of all degree-two moves generated from this partition. [sent-201, score-0.25]

62 These are extensions of the well-known “swap moves” for two-dimensional contingency tables (e. [sent-202, score-0.437]

63 In this arrangement, it is clear that any such move preserves the i + − marginal table nA (row sums) and the marginal table nB (column sums); in other i − + words, z ↓ A = 0 and z ↓ B = 0. [sent-205, score-0.305]

64 The cycle in Figure 1(e) is a degree-two move on the table n1,2 , with A = {X1 }, S = ∅, C = {X2 }. [sent-207, score-0.159]

65 Let M∗ be the union of the sets of A degree-two moves Md=2 (A, S, B) where S is a separator of TA and (A, S, B) is the corresponding ∗ decomposition of V . [sent-210, score-0.347]

66 Thus, the image of the Dobra basis under A is a Markov basis for FnA . [sent-217, score-0.158]

67 The practical message so far is that to sample from p(nC | nD ), it sufﬁces to generate moves from the projected Dobra basis MD . [sent-234, score-0.329]

68 This is done by ﬁrst selecting a degree-two move z ∈ M∗ , and then marginalizing z onto each clique of C. [sent-235, score-0.203]

69 However, we will show that z ↓ C will be zero for many cliques, a fact we can exploit to implement moves more efﬁciently. [sent-237, score-0.25]

70 Let us now consider Algorithm 1: The projected Dobra basis MA settings where some variables are not part of Input: Junction tree TA with separators SA any observed table, which may happen when 1 Before sampling the individual model has hidden variables, or, later, with noisy observations. [sent-244, score-0.256]

71 Additional 2 For each S ∈ SA , ﬁnd the associated moves are needed to connect two conﬁguradecomposition (A, S, B) 3 Find the cliques C ∈ C that have non-empty tions that disagree on marginal tables involvintersection with both A and B. [sent-245, score-0.684]

72 d=1 of degree-one moves z ∈ M (A, B), 4 Let AS = A ∩ VS and BS = B ∩ VS which partition the variables into two sets 5 During sampling: to generate a move for separator (A, B) and have two nonzero entries z(i, j) = S ∈ SA 1, z(i , j) = −1 for i = i ∈ XA , j ∈ XB . [sent-250, score-0.491]

73 In the parlance of two-dimensional tables, these 6 Select z ∈ Md=2 (AS , S, BS ) 7 For each clique C ∈ CS , calculate z ↓ C moves adjust two entries in a single column so they preserve the column sums (nB ) but modify the row sums (nA ). [sent-251, score-0.491]

74 The cycle in Figure 1(d) is a degree-one move which adjusts the marginal table over A = {X2 }, but preserves the marginal table over B = {X1 , X3 }. [sent-252, score-0.338]

75 We proceed once again by constructing a basis for complete tables and then marginalizing the moves onto cliques. [sent-253, score-0.602]

76 Let U be any decomposable collection on the set of unobserved variables U = V \ D, and let D = D ∪ U. [sent-255, score-0.212]

77 Let M∗ consist of the moves M∗ together with the moves Md=1 (A, V \ A) D ∗ for each A ∈ U. [sent-256, score-0.5]

78 Then M∗ is a Markov basis for FnD , and M = {Az : z ∈ M∗ } is a Markov basis for FnD . [sent-257, score-0.158]

79 The degree-one moves also become local upon marginalization: it is easy to check that z ↓ C is zero unless C ∩ A is nonempty. [sent-259, score-0.25]

80 This has the effect of adding degree-one moves for each clique of C. [sent-262, score-0.33]

81 By matching the structure of TC , many of the additional degree-two moves become zero upon marginalization. [sent-263, score-0.25]

82 2 Constructing an efﬁcient MCMC sampler The second key challenge in constructing the MCMC sampler is utilizing the moves from the Markov basis in a way that efﬁciently explores the state space. [sent-265, score-0.561]

83 A standard approach is to select a random move z, a direction δ = ±1 (each with probability 1/2), and then propose the move nC + δz in a Metropolis Hastings sampler. [sent-266, score-0.15]

84 Although these moves are enough to connect any two conﬁgurations, we are particularly interested in problems where M is large, for which moving by increments of ±1 will be prohibitively slow. [sent-267, score-0.313]

85 For general Markov bases, Diaconis and Sturmfels [5] suggest instead to construct a Gibbs sampler that uses the moves as directions for longer steps, by choosing the value of δ from the following distribution: p(δ) ∝ p(nC + δz | nD ), δ ∈ {δ : nC + δz ≥ 0}. [sent-268, score-0.366]

86 Consider the Markov chain with moves δz generated by ﬁrst choosing z uniformly at random from M and then choosing δ according to (5). [sent-271, score-0.303]

87 For a separator S ∈ S, deﬁne zS as zC ↓ S for any clique C containing S. [sent-277, score-0.177]

88 C∈C(z),i∈I (zC ) C∈C(z),j∈I (zC ) (7) (8) Notably, each move in our basis satisﬁes |I + (zA ) ∪ I + (zA )| ≤ 4, so p(δ) can be evaluated by examining at most four entries in each table for cliques in C(z). [sent-301, score-0.325]

89 It is worth noting that Equation (7) reduces to the binomial distribution for degree-one moves and the (noncentral) hypergeometric distribution for degree-two moves, so we may sample from these distributions directly when |C(z)| = 1. [sent-302, score-0.325]

90 The proof of Theorem 4, which is found in the supplementary material, then pairs each separator S with a clique C and uses properties of the moves to show that pC (δ)/pS (δ) is also log-concave. [sent-309, score-0.427]

91 3 Noisy Observations Population-level counts from real survey data are rarely exact, and it is thus important to incorporate noisy observations into our model. [sent-317, score-0.151]

92 In this section, we describe how to modify the sampler for 7 the case when all observations are noisy; it is a straightforward generalization to allow both noisy and exact observations. [sent-318, score-0.298]

93 Suppose that we make noisy observations yR = {yR : R ∈ R} corresponding to the true marginal tables nR for a collection R C (that need not be decomposable). [sent-319, score-0.447]

94 A canonical example from the bird migration model is p(y | n) = Poisson(αn), so the survey count is Poisson with mean proportional to the true number of birds present. [sent-323, score-0.32]

95 4 Experiments We implemented our sampler in MATLAB using Murphy’s Bayes net toolbox [18] for the underlying operations on graphical models and junction trees. [sent-334, score-0.272]

96 The task was to estimate E[n2,3 | n1 , n3 ] in the bird migration model for L = 2, T = 3, and varying M . [sent-337, score-0.229]

97 We generated 30 random Bayes nets on 10 binary variables, and generated two sets of observed tables for a population of M = 100, 000: the set NODES has a table for each single variable, while the set CHAIN has tables for pairs of variables that are adjacent in a random ordering. [sent-340, score-0.604]

98 The sampler converged quickly in all cases with the more complex CHAIN observation model taking longer than NODES, and noisy observations taking slightly longer than exact ones. [sent-348, score-0.27]

99 Collective inference on Markov models for modeling bird migration. [sent-365, score-0.157]

100 Some results about decomposable (or Markov-type) models for multidimensional contingency tables: distribution of marginals and partitioning of tests. [sent-436, score-0.323]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nc', 0.517), ('moves', 0.25), ('tables', 0.225), ('contingency', 0.212), ('dobra', 0.206), ('fna', 0.206), ('cgm', 0.171), ('nv', 0.147), ('yr', 0.125), ('na', 0.122), ('cgms', 0.12), ('migration', 0.12), ('sampler', 0.116), ('decomposable', 0.111), ('bird', 0.109), ('za', 0.105), ('tbl', 0.103), ('lifted', 0.102), ('separator', 0.097), ('ic', 0.091), ('junction', 0.091), ('birds', 0.091), ('fnd', 0.09), ('tc', 0.09), ('markov', 0.088), ('cliques', 0.082), ('xa', 0.08), ('clique', 0.08), ('basis', 0.079), ('ow', 0.079), ('move', 0.075), ('population', 0.072), ('diaconis', 0.069), ('trellis', 0.069), ('collective', 0.068), ('nb', 0.068), ('guration', 0.066), ('graphical', 0.065), ('marginal', 0.064), ('connect', 0.063), ('ia', 0.062), ('ta', 0.062), ('observations', 0.06), ('xc', 0.059), ('noisy', 0.056), ('aggregate', 0.055), ('az', 0.055), ('chain', 0.053), ('zc', 0.052), ('mcmc', 0.052), ('individual', 0.052), ('anv', 0.051), ('sturmfels', 0.051), ('nt', 0.051), ('table', 0.051), ('md', 0.05), ('marginalizing', 0.048), ('inference', 0.048), ('sheldon', 0.045), ('individuals', 0.043), ('collection', 0.042), ('hypergeometric', 0.042), ('plate', 0.042), ('preserve', 0.041), ('elimination', 0.041), ('conservation', 0.039), ('sampling', 0.038), ('entries', 0.038), ('exact', 0.038), ('issn', 0.037), ('poisson', 0.036), ('ns', 0.035), ('xv', 0.035), ('counts', 0.035), ('gibbs', 0.035), ('apsel', 0.034), ('ows', 0.034), ('nr', 0.034), ('cycle', 0.033), ('binomial', 0.033), ('oregon', 0.033), ('consistent', 0.032), ('counting', 0.032), ('nd', 0.031), ('con', 0.031), ('variables', 0.031), ('rejection', 0.031), ('univariate', 0.031), ('ib', 0.03), ('statistics', 0.03), ('sa', 0.03), ('unobserved', 0.028), ('modify', 0.028), ('suf', 0.028), ('pk', 0.028), ('zl', 0.028), ('zr', 0.028), ('devroye', 0.028), ('milch', 0.028), ('sums', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 55 nips-2011-Collective Graphical Models

Author: Daniel R. Sheldon, Thomas G. Dietterich

2 0.08683449 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

Abstract: We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a ﬂat histogram sampling technique by signiﬁcantly reducing the number of “null moves” in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efﬁcient weight learning in Markov Logic theories. 1

3 0.073730856 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, speciﬁed by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

4 0.067832723 104 nips-2011-Generalized Beta Mixtures of Gaussians

Author: Artin Armagan, Merlise Clyde, David B. Dunson

Abstract: In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We ﬁrst propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efﬁciently to the types of truly massive data sets that are now encountered routinely. 1

5 0.066434942 274 nips-2011-Structure Learning for Optimization

Author: Shulin Yang, Ali Rahimi

Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1

6 0.060433507 275 nips-2011-Structured Learning for Cell Tracking

7 0.058520224 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

8 0.057721093 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

9 0.057486244 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

10 0.057067372 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

11 0.055912543 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

12 0.055640437 131 nips-2011-Inference in continuous-time change-point models

13 0.054848582 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

14 0.052936088 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

15 0.052898124 221 nips-2011-Priors over Recurrent Continuous Time Processes

16 0.052831046 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

17 0.052730698 249 nips-2011-Sequence learning with hidden units in spiking neural networks

18 0.052676313 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

19 0.049927637 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

20 0.048203547 40 nips-2011-Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.153), (1, 0.017), (2, 0.021), (3, -0.019), (4, -0.037), (5, -0.083), (6, -0.048), (7, -0.074), (8, 0.043), (9, -0.012), (10, 0.026), (11, -0.065), (12, 0.013), (13, -0.021), (14, -0.019), (15, -0.006), (16, 0.041), (17, -0.098), (18, -0.043), (19, 0.041), (20, 0.023), (21, 0.037), (22, -0.003), (23, -0.054), (24, -0.053), (25, 0.058), (26, -0.051), (27, 0.046), (28, 0.028), (29, 0.068), (30, 0.015), (31, -0.043), (32, -0.028), (33, 0.022), (34, -0.081), (35, -0.061), (36, -0.018), (37, -0.014), (38, -0.045), (39, -0.017), (40, -0.025), (41, -0.03), (42, -0.129), (43, 0.021), (44, -0.013), (45, -0.017), (46, -0.017), (47, -0.056), (48, 0.004), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94465035 55 nips-2011-Collective Graphical Models

Author: Daniel R. Sheldon, Thomas G. Dietterich

2 0.76550704 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

3 0.69255042 221 nips-2011-Priors over Recurrent Continuous Time Processes

Author: Ardavan Saeedi, Alexandre Bouchard-côté

Abstract: We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efﬁciently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach. 1

4 0.68749791 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

Author: David Wingate, Noah Goodman, Andreas Stuhlmueller, Jeffrey M. Siskind

Abstract: Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efﬁcient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efﬁcient construction of global proposals. 1

5 0.64948857 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter

Abstract: Multi-structure model ﬁtting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint ﬁtting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefﬁcient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good ﬁt. To overcome this weakness we propose a new multi-structure ﬁtting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisﬁes the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efﬁciency over previous two-stage methods. 1

6 0.64776719 201 nips-2011-On the Completeness of First-Order Knowledge Compilation for Lifted Probabilistic Inference

7 0.61120141 131 nips-2011-Inference in continuous-time change-point models

8 0.60045809 197 nips-2011-On Tracking The Partition Function

9 0.59673876 8 nips-2011-A Model for Temporal Dependencies in Event Streams

10 0.58904022 229 nips-2011-Query-Aware MCMC

11 0.57572883 173 nips-2011-Modelling Genetic Variations using Fragmentation-Coagulation Processes

12 0.54893756 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning

13 0.5348255 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

14 0.51365656 69 nips-2011-Differentially Private M-Estimators

15 0.50088388 14 nips-2011-A concave regularization technique for sparse mixture models

16 0.4953618 104 nips-2011-Generalized Beta Mixtures of Gaussians

17 0.49028441 136 nips-2011-Inverting Grice's Maxims to Learn Rules from Natural Language Extractions

18 0.48914692 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

19 0.48124114 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

20 0.48066926 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (4, 0.049), (16, 0.27), (20, 0.069), (26, 0.026), (31, 0.08), (33, 0.056), (43, 0.06), (45, 0.058), (57, 0.06), (65, 0.013), (74, 0.045), (83, 0.046), (84, 0.014), (99, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74108446 55 nips-2011-Collective Graphical Models

Author: Daniel R. Sheldon, Thomas G. Dietterich

2 0.60965562 109 nips-2011-Greedy Model Averaging

Author: Dong Dai, Tong Zhang

Abstract: This paper considers the problem of combining multiple models to achieve a prediction accuracy not much worse than that of the best single model for least squares regression. It is known that if the models are mis-speciﬁed, model averaging is superior to model selection. Speciﬁcally, let n be the sample size, then the worst case regret of the former decays at the rate of O(1/n) while the worst √ case regret of the latter decays at the rate of O(1/ n). In the literature, the most important and widely studied model averaging method that achieves the optimal O(1/n) average regret is the exponential weighted model averaging (EWMA) algorithm. However this method suffers from several limitations. The purpose of this paper is to present a new greedy model averaging procedure that improves EWMA. We prove strong theoretical guarantees for the new procedure and illustrate our theoretical results with empirical examples. 1

3 0.52427113 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the ﬁrst rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.

4 0.52125877 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random ﬁeld representations of the image, it gives globally optimal results and there are efﬁcient polynomial time implementations. Often, the random ﬁeld is applied over a ﬂat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a ﬂat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding beneﬁts of global optimality and efﬁciency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a ﬂat partitioning) and demonstrate that the optimization in the pylon model is able to ﬂexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

5 0.51333928 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of ﬁgure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as ﬁrst sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect conﬁgurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

6 0.50771916 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

7 0.50610846 180 nips-2011-Multiple Instance Filtering

8 0.5042578 66 nips-2011-Crowdclustering

9 0.50297624 303 nips-2011-Video Annotation and Tracking with Active Learning

10 0.50077891 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

11 0.4985956 219 nips-2011-Predicting response time and error rates in visual search

12 0.49839506 127 nips-2011-Image Parsing with Stochastic Scene Grammar

13 0.49819914 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models

14 0.49811864 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

15 0.49678499 75 nips-2011-Dynamical segmentation of single trials from population neural data

16 0.49609566 102 nips-2011-Generalised Coupled Tensor Factorisation

17 0.49503189 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

18 0.4948341 156 nips-2011-Learning to Learn with Compound HD Models

19 0.49417818 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression

20 0.49358714 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation