nips nips2012 nips2012-251 knowledge-graph by maker-knowledge-mining

251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Source: pdf

Author: Deepak Venugopal, Vibhav Gogate

Abstract: First-order probabilistic models combine the power of ﬁrst-order logic, the de facto tool for handling relational structure, with probabilistic graphical models, the de facto tool for handling uncertainty. Lifted probabilistic inference algorithms for them have been the subject of much recent research. The main idea in these algorithms is to improve the accuracy and scalability of existing graphical models’ inference algorithms by exploiting symmetry in the ﬁrst-order representation. In this paper, we consider blocked Gibbs sampling, an advanced MCMC scheme, and lift it to the ﬁrst-order level. We propose to achieve this by partitioning the ﬁrst-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. We propose an approach for constructing the clusters and show how it can be used to trade accuracy with computational complexity in a principled manner. Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We propose to achieve this by partitioning the ﬁrst-order atoms in the model into a set of disjoint clusters such that exact lifted inference is polynomial in each cluster given an assignment to all other atoms not in the cluster. [sent-8, score-1.447]

2 Our experimental evaluation shows that lifted Gibbs sampling is superior to the propositional algorithm in terms of accuracy, scalability and convergence. [sent-10, score-0.628]

3 An alternative approach, which has gained prominence since the work of Poole [25] is lifted or ﬁrst-order inference. [sent-22, score-0.395]

4 The algorithms are called lifted algorithms because they identify symmetry by consulting the ﬁrst-order representation without grounding the model. [sent-24, score-0.488]

5 Notable approximate inference algorithms are lifted Belief propagation [30] and lifted importance sampling [8, 9], which lift belief propagation [20] and importance sampling respectively. [sent-27, score-1.022]

6 Blocked Gibbs sampling improves upon the Gibbs sampling algorithm by grouping variables (each group is called a block) and then jointly sampling all variables in the block [10, 16]. [sent-29, score-0.34]

7 We circumvent these issues by lifting the blocked Gibbs sampling algorithm, which as we show is more amenable to lifting. [sent-38, score-0.348]

8 Our main idea in applying the blocking approach to SRL models is to partition the set of ﬁrst-order atoms in the model into disjoint clusters such that PTP (an exact lifted inference scheme) is feasible in each cluster given an assignment to all other atoms not in the cluster. [sent-39, score-1.497]

9 Given such a set of clusters, we show that Gibbs sampling is essentially a message passing algorithm over the cluster graph formed by connecting clusters that have atoms that are in the Markov blanket of each other. [sent-40, score-0.94]

10 Each message from a sender to a receiving cluster is a truth assignment to all ground atoms that are in the Markov blanket of the receiving cluster. [sent-41, score-0.926]

11 We show how to store this message compactly by taking advantage of the ﬁrst-order representation yielding a lifted MCMC algorithm. [sent-42, score-0.527]

12 We present experimental results comparing the performance of lifted blocked Gibbs sampling with (propositional) blocked Gibbs sampling, MC-SAT [26, 27] and Lifted BP [30] on various benchmark SRL models. [sent-43, score-0.906]

13 Our experiments show that lifted Gibbs sampling is superior to blocked Gibbs sampling and MC-SAT in terms of convergence, accuracy and scalability. [sent-44, score-0.775]

14 It is also more accurate than lifted BP on some instances. [sent-45, score-0.395]

15 2 Notation and Preliminaries In this section, we describe notation and preliminaries on propositional logic, ﬁrst-order logic, Markov logic networks and Gibbs sampling. [sent-46, score-0.285]

16 The language of propositional logic consists of atomic sentences called propositions or atoms, and logical connectives such as ∧ (conjunction), ∨ (disjunction), ¬ (negation), ⇒ (implication) and ⇔ (equivalence). [sent-48, score-0.39]

17 A propositional formula f is an atom, or any complex formula that can be constructed from atoms using logical connectives. [sent-50, score-0.731]

18 For example, A, B and C are propositional atoms and f = A ∨ ¬B ∧ C is a propositional formula. [sent-51, score-0.639]

19 A world is a truth assignment to all atoms in the KB. [sent-53, score-0.469]

20 First-order logic (FOL) generalizes propositional logic by allowing atoms to have internal structure; an atom in FOL is a predicate that represents relations between objects. [sent-54, score-1.009]

21 A formula in ﬁrst order logic is a predicate (atom), or any complex sentence that can be constructed from atoms using logical connectives and quantiﬁers. [sent-65, score-0.709]

22 We assume that each formula f is of the form ∀x f , where x are the set of variables in f and f is a conjunction or disjunction of literals; each literal being an atom or its negation. [sent-71, score-0.327]

23 A ground KB is a KB containing all possible groundings of all of its formulas. [sent-81, score-0.296]

24 A world in FOL is a truth assignment to all atoms in its grounding. [sent-83, score-0.469]

25 A soft formula or a weighted formula is a pair (f, w) where f is a formula in FOL and w is a real-number. [sent-85, score-0.262]

26 A Markov logic network (MLN), denoted by M, is a set of weighted formulas (fi , wi ). [sent-86, score-0.271]

27 PM (ω) = 1 exp Z(M) wi N (fi , ω) (1) i where ω is a world, N (fi , ω) is the number of groundings of fi that evaluate to True in the world ω and Z(M) is a normalization constant or the partition function. [sent-89, score-0.259]

28 A normal MLN [11] is an MLN that satisﬁes the following two properties: (1) There are no constants in any formula, and (2) If two distinct atoms with the same predicate symbol have variables x and y in the same position then ∆x = ∆y . [sent-94, score-0.473]

29 Note that in a normal MLN, we assume that the terms in each atom are ordered and therefore we can identify each term by its position in the order. [sent-95, score-0.261]

30 1 Gibbs Sampling and Blocking Given an MLN, a set of query atoms and evidence, we can adapt the basic (propositional) Gibbs sampling [6] algorithm for computing the marginal probabilities of query atoms given evidence as follows. [sent-97, score-0.829]

31 Second, we instantiate all the evidence atoms in the network. [sent-99, score-0.339]

32 The main idea in blocked Gibbs sampling [10] is grouping variables to form a block, and then jointly sampling all variables in a block given an assignment to all other variables not in the block. [sent-133, score-0.578]

33 However, the computational complexity of jointly sampling all variables in a block typically increases with the treewidth of the Markov network projected on the block. [sent-135, score-0.256]

34 Thus, in practice, given time and memory resource constraints, the main issue in blocked Gibbs sampling is ﬁnding the right balance between computational complexity and accuracy. [sent-136, score-0.335]

35 Note that the problem of computing the partition function of this MLN for arbitrary domain sizes is non-trivial; it cannot be polynomially solved using existing exact lifted approaches such as PTP [8] and lifted VE [2]. [sent-138, score-0.811]

36 Our main idea is to partition the set of atoms into disjoint blocks (clusters) such that PTP is polynomial in each cluster and then sample all atoms in the cluster jointly. [sent-139, score-1.058]

37 PTP is polynomial if we can recursively apply its two lifting rules (deﬁned next), the power rule and the generalized binomial rule, until the treewidth of the remaining ground network is bounded by a constant. [sent-140, score-0.312]

38 Given a normal MLN M, a set of logical variables, denoted by x, is called a decomposer if it satisﬁes the following two conditions: (i) Every atom in M contains exactly one variable from x, and (ii) For any predicate symbol R, there exists a position s. [sent-142, score-0.427]

39 variables from x only appear at that position in atoms of R. [sent-144, score-0.392]

40 The generalized binomial rule is used to sample singleton atoms efﬁciently (the rule also requires that the atom is not involved in self-joins, i. [sent-147, score-0.744]

41 Given a normal MLN M having a singleton atom R(x), we can show that ¯ ¯ Z(M) = |∆x | |∆x | Z(M|Ri )w(i)2p(i) where Ri is a sample of R s. [sent-150, score-0.28]

42 M|R all R(x) and set its groundings to have the same assignment as Ri , (ii) Delete formulas that evaluate to either True or False, (iii) Delete all groundings of R(x) and (iv) Convert the resulting MLN to a normal MLN. [sent-153, score-0.687]

43 w(i) is the exponentiated sum of the weights of formulas that evaluate to True and p(i) is the number of ground atoms that are removed from the MLN as a result of removing formulas (these are essentially don’t care atoms which can be assigned to either True or False). [sent-154, score-0.925]

44 Let us put each ﬁrst-order atom in a cluster by R(x,y) y S(y,z) itself, namely we have three clusters: R(x, y), S(y, z) R(x, y) S(y, z) and T(z, u) (see Figure 1(a)). [sent-156, score-0.382]

45 Note that each (ﬁrst-order) cluster represents all groundings of all atoms in the z z cluster. [sent-157, score-0.759]

46 To perform Gibbs sampling over this clustering, we need to compute three conditional distributions: P (R(x, y)|¯(y, z), ¯(z, u)), S T P (S(y, z)|¯(x, y), ¯(z, u)) T(z, u) R T T(z, u) and P (T(z, u)|¯(x, y), ¯(y, z)) where ¯(x, y) denotes R S R a truth assignment to all possible groundings of R. [sent-158, score-0.452]

47 Naively, given an Figure 1: Two possible clusterings for assignment to all other atoms not in the cluster, we will lifted blocked Gibbs sampling on the exam2 need O(2n ) time and space for computing and specifying ple MLN having two weighted formulas. [sent-160, score-1.132]

48 This is because there are n2 ground atoms associated with each cluster. [sent-162, score-0.396]

49 Notice however that all groundings of each ﬁrst-order atom are conditionally independent of each other given a truth assignment to all other atoms. [sent-163, score-0.57]

50 In other words, we can apply PTP here and compute each conditional distribution in O(n3 ) time and space (since there are n3 groundings of each formula and we need to process each ground formula at least once). [sent-164, score-0.458]

51 Thus, the complexity of sampling all atoms in all clusters is O(n3 ). [sent-165, score-0.521]

52 Note that the complexity of sampling all variables using propositional Gibbs sampling is also O(n3 ). [sent-166, score-0.379]

53 Intuitively, this clustering is likely to yield better accuracy than the previous one because more 4 atoms will be sampled jointly. [sent-168, score-0.376]

54 To perform blocked Gibbs sampling over Clustering 2, we need to compute two distributions P (R(x, y), S(y, z)|¯(z, u)), P (T(z, u)|¯(x, y), ¯(y, z)). [sent-170, score-0.297]

55 If we instantiate all groundings of T, we get the following reduced T MLN {R(x, y) ∨ S(y, Zi ), w1 }n and {S(y, Zi ), ki w2 }n where Zi ∈ ∆z and ki is the number i=1 i=1 of False groundings of T(y, Zi ). [sent-172, score-0.478]

56 R(x, Y ) is a singleton atom and therefore applying the generalized binomial rule, we will get n + 1 reduced MLNs, each containing n atoms of the form {S(Y, Zi )}n . [sent-175, score-0.662]

57 These i=1 atoms are conditionally independent of each other and a distribution over them can be computed in O(n) time. [sent-176, score-0.339]

58 Therefore, the complexity of sampling all atoms using the clustering shown in Figure 1(b) is O(n2 ). [sent-181, score-0.497]

59 4 The Lifted Blocked Gibbs Sampling Algorithm Next, we will formalize the discussion in the previous section yielding a lifted blocked Gibbs sampling algorithm. [sent-187, score-0.721]

60 We deﬁne a cluster as a set of ﬁrst order atoms (these atoms will be sampled jointly in a lifted Gibbs sampling iteration). [sent-189, score-1.357]

61 , Cm }, the Markov blanket of a cluster Ci is the set of clusters that have at least one atom that is in the Markov blanket of an atom in Ci . [sent-193, score-0.912]

62 Given a MLN M, the Gibbs cluster graph is a graph G (each vertex of G is a cluster) such that: (i) Each atom in the MLN is in exactly one cluster of G (ii) Two clusters Ci and Cj in G are connected by an edge if Cj is in the Markov blanket of Ci . [sent-194, score-0.872]

63 Note that by deﬁnition if Ci is in the Markov blanket of Cj , then Cj is in the Markov blanket of Ci . [sent-195, score-0.268]

64 The lifted blocked Gibbs sampling algorithm (see Algorithm 1) can be envisioned as a message passing algorithm over a Gibbs cluster graph G. [sent-196, score-1.015]

65 The message from Ci to Cj contains the current truth assignment to all groundings of all atoms (we will discuss how to represent the truth assignment in a lifted manner shortly) that are in the Markov blanket of one or more atoms in Ci . [sent-198, score-1.791]

66 Then at each Gibbs iteration, we generate a sample over all atoms by sampling the clusters along an ordering (C1 , . [sent-200, score-0.483]

67 At each cluster, we ﬁrst use PTP to compute a conditional joint distribution over all atoms in the cluster given an assignment to atoms in their Markov 10 end blanket. [sent-204, score-0.941]

68 Then, we sample all atoms in the cluster from the joint distribution and update the estimate for query atoms in the cluster as well as all outgoing messages. [sent-206, score-1.107]

69 Algorithm 1: Lifted Blocked Gibbs Sampling Input: A normal MLN M, a Gibbs cluster graph G, an integer N and a set of query atoms Output: A Marginal Distribution over the query atoms 1 begin 2 for t = 1 to N do 3 Let (C1 , . [sent-209, score-1.016]

70 1 Lifted Message Representation We say that a representation of truth assignments to the groundings of an atom is lifted if we only specify the number of true (or false) assignments to its full or partial grounding. [sent-213, score-0.901]

71 We can represent the truth assignment (R(X1 , Y1 ) = 1, R(X1 , Y2 ) = 0, R(X2 , Y1 ) = 1, R(X2 , Y2 ) = 0) in a lifted manner using either an integer 2 or a vector ([Y1 , 2], [Y2 , 0]). [sent-216, score-0.525]

72 The ﬁrst representation says that 2 groundings of R(x, y) are true while the second representation says that 2 groundings of R(x, Y1 ) and 0 groundings of R(x, Y2 ) are true. [sent-217, score-0.753]

73 Next, we state sufﬁcient conditions for representing a message in a lifted manner while ensuring correctness, summarized in Theorem 2. [sent-218, score-0.48]

74 , Sk } if there exists a formula f such that in f , a logical variable appears at position i in R and in one or more atoms in {S1 , . [sent-232, score-0.528]

75 Given a Gibbs cluster graph G and an MLN M, let R be an atom in Ci and let Cj be a neighbor of Ci in G. [sent-241, score-0.439]

76 Let SR,Cj be the set of atoms formed by taking an intersection between the Markov blanket of R and the union of the Markov blanket of atoms in Cj . [sent-242, score-0.946]

77 Let the outgoing message from Ci to Cj be represented using a vector of |∆x | pairs of the form [Xk , rk ] where ∆x is the Cartesian product of the domains of all terms in x, Xk ∈ ∆x is the k-th element in ∆x and rk is the number of groundings of R(Xk , y) that are true in the current assignment. [sent-247, score-0.357]

78 If all messages in the lifted Blocked Gibbs sampling algorithm (Algorithm 1) use the aforementioned representation, then the stationary distribution of the Markov chain induced by the algorithm is the distribution represented by the input normal MLN. [sent-248, score-0.544]

79 The generalized Binomial rule states that all MLNs obtained by conditioning on a singleton atom S with exactly k of its groundings set to true are equivalent to each other. [sent-251, score-0.547]

80 Consider the MLN M′ which is obtained from M by ﬁrst removing all formulas that do not mention atoms in Cj and then (partially) grounding all the shared terms of R. [sent-254, score-0.514]

81 Moreover, each k atom R′ (y ′ ) is a singleton and therefore it follows from the generalized Binomial rule that in order k to compute the distribution associated with M′ conditioned on R′ (y ′ ), we only need to know how k many of its possible groundings are true. [sent-257, score-0.547]

82 Since Ci sends precisely this information to Cj using the message deﬁned in the statement of this theorem, it follows that the lifted Blocked Gibbs sampling algorithm which uses a lifted message representation is equivalent to the algorithm (Algorithm 1) that uses a propositional representation. [sent-258, score-1.211]

83 Given a Gibbs cluster graph G and an MLN M, let the outgoing message from cluster Ci to cluster Cj in G be deﬁned over the set {R1 , . [sent-264, score-0.718]

84 Note that the time/space requirements of the algorithm is the sum of the time/space required to run PTP for a cluster and the time/space for the message from the cluster. [sent-270, score-0.266]

85 If neither the power rule nor the generalized binomial rule can be applied at any point during search, the complexity of PTP is exponential in the treewidth of the remaining ground network. [sent-275, score-0.318]

86 More precisely, the complexity of PTP is O(exp(g) × exp(w + 1)) where g is the number of times the generalized binomial rule is applied and w is the treewidth (computed heuristically) of the remaining ground network. [sent-276, score-0.258]

87 In our implementation, we measure coupling using the number of times two atoms appear together in a formula. [sent-284, score-0.339]

88 The algorithm begins by constructing a Gibbs cluster graph in which each ﬁrst-order atom is in a cluster by itself. [sent-286, score-0.643]

89 At each iteration, given the current cluster graph G, for every possible pair of clusters (Ci , Cj ) of G, the algorithm creates a new cluster graph G′ from G by merging Ci and Cj . [sent-288, score-0.537]

90 Note that increasing the cluster size may decrease the complexity of the cluster graph in some cases and therefore we require steps 6 and 7 which add G′ to the feasible set if its complexity is smaller than G. [sent-291, score-0.495]

91 Also note that the algorithm is not guaranteed to return a cluster graph that satisﬁes the input complexity bounds, even if such a cluster graph exists. [sent-292, score-0.514]

92 5 Experiments In this section, we compare the performance of lifted blocked Gibbs sampling (LBG) with (propositional) blocked Gibbs sampling (BG), lazy MC-SAT [26, 27] and lifted belief propagation (LBP) [30]. [sent-294, score-1.384]

93 For each MLN, we set 10% randomly selected ground atoms as evidence. [sent-297, score-0.396]

94 Note that for lifted BP, the values displayed are the ones obtained after the algorithm has converged. [sent-318, score-0.395]

95 6 Summary and Future Work In this paper, we proposed lifted Blocked Gibbs sampling, a new algorithm that improves blocked Gibbs sampling by exploiting relational or ﬁrst-order structure. [sent-335, score-0.754]

96 Our algorithm operates by constructing a Gibbs cluster graph, which represents a partitioning of atoms into clusters and then performs message passing over the graph. [sent-336, score-0.689]

97 Each message is a truth assignment to the Markov blanket of the cluster and we showed how to represent it in a lifted manner. [sent-337, score-0.925]

98 We proposed an algorithm for constructing the Gibbs cluster graph and showed that it can be used to trade accuracy with computational complexity. [sent-338, score-0.261]

99 Our experiments demonstrate clearly that lifted blocked Gibbs sampling is more accurate and scalable than propositional blocked Gibbs sampling as well as MC-SAT. [sent-339, score-1.139]

100 Blocking gibbs sampling in very large probabilistic expert systems. [sent-399, score-0.344]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lifted', 0.395), ('mln', 0.369), ('atoms', 0.339), ('groundings', 0.239), ('gibbs', 0.237), ('blocked', 0.214), ('atom', 0.201), ('ptp', 0.197), ('lbg', 0.184), ('cluster', 0.181), ('propositional', 0.15), ('ci', 0.14), ('logic', 0.135), ('blanket', 0.134), ('cj', 0.103), ('formulas', 0.095), ('bg', 0.094), ('srl', 0.087), ('message', 0.085), ('sampling', 0.083), ('assignment', 0.082), ('formula', 0.081), ('logical', 0.08), ('fol', 0.076), ('smokes', 0.076), ('asthma', 0.074), ('markov', 0.063), ('relational', 0.062), ('clusters', 0.061), ('ground', 0.057), ('graph', 0.057), ('binomial', 0.056), ('kb', 0.056), ('grounding', 0.056), ('lifting', 0.051), ('blocking', 0.05), ('pgms', 0.05), ('predicate', 0.049), ('truth', 0.048), ('treewidth', 0.047), ('singleton', 0.047), ('gogate', 0.043), ('mlns', 0.043), ('rule', 0.041), ('mcmc', 0.041), ('zi', 0.041), ('lbp', 0.04), ('complexity', 0.038), ('richardson', 0.038), ('clustering', 0.037), ('decomposer', 0.037), ('lift', 0.034), ('query', 0.034), ('messages', 0.034), ('outgoing', 0.033), ('aaai', 0.033), ('ana', 0.033), ('normal', 0.032), ('inference', 0.032), ('xk', 0.031), ('bob', 0.03), ('yielding', 0.029), ('position', 0.028), ('xi', 0.027), ('poon', 0.026), ('variables', 0.025), ('scan', 0.025), ('alchemy', 0.025), ('connectives', 0.025), ('existential', 0.025), ('jha', 0.025), ('milch', 0.025), ('smoke', 0.025), ('probabilistic', 0.024), ('shared', 0.024), ('objects', 0.023), ('constructing', 0.023), ('false', 0.023), ('network', 0.022), ('singla', 0.022), ('dallas', 0.022), ('block', 0.021), ('domain', 0.021), ('kl', 0.02), ('typed', 0.02), ('aperiodic', 0.02), ('disjunction', 0.02), ('jointly', 0.02), ('fi', 0.02), ('power', 0.019), ('weighted', 0.019), ('symmetry', 0.019), ('cm', 0.019), ('bp', 0.019), ('facto', 0.019), ('domingos', 0.019), ('generalized', 0.019), ('divergence', 0.018), ('representation', 0.018), ('disjoint', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

2 0.19592947 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

Author: Tivadar Papai, Henry Kautz, Daniel Stefankovic

Abstract: Markov logic is a widely used tool in statistical relational learning, which uses a weighted ﬁrst-order logic knowledge base to specify a Markov random ﬁeld (MRF) or a conditional random ﬁeld (CRF). In many applications, a Markov logic network (MLN) is trained in one domain, but used in a different one. This paper focuses on dynamic Markov logic networks, where the size of the discretized time-domain typically varies between training and testing. It has been previously pointed out that the marginal probabilities of truth assignments to ground atoms can change if one extends or reduces the domains of predicates in an MLN. We show that in addition to this problem, the standard way of unrolling a Markov logic theory into a MRF may result in time-inhomogeneity of the underlying Markov chain. Furthermore, even if these representational problems are not signiﬁcant for a given domain, we show that the more practical problem of generating samples in a sequential conditional random ﬁeld for the next slice relying on the samples from the previous slice has high computational cost in the general case, due to the need to estimate a normalization factor for each sample. We propose a new discriminative model, slice normalized dynamic Markov logic networks (SN-DMLN), that suffers from none of these issues. It supports efﬁcient online inference, and can directly model inﬂuences between variables within a time slice that do not have a causal direction, in contrast with fully directed models (e.g., DBNs). Experimental results show an improvement in accuracy over previous approaches to online inference in dynamic Markov logic networks. 1

3 0.14533404 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

Abstract: A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly speciﬁc to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampler that allows efﬁcient inference across this class of models. 1

4 0.14056598 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

Abstract: Mixture distributions are often used to model complex data. In this paper, we develop a new method that jointly estimates mixture models over multiple data sets by exploiting the statistical dependencies between them. Speciﬁcally, we introduce a set of latent Dirichlet processes as sources of component models (atoms), and for each data set, we construct a nonparametric mixture model by combining sub-sampled versions of the latent DPs. Each mixture model may acquire atoms from different latent DPs, while each atom may be shared by multiple mixtures. This multi-to-multi association distinguishes the proposed method from previous ones that require the model structure to be a tree or a chain, allowing more ﬂexible designs. We also derive a sampling algorithm that jointly infers the model parameters and present experiments on both document analysis and image modeling. 1

5 0.13628662 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

Author: Lucas Theis, Jascha Sohl-dickstein, Matthias Bethge

Abstract: We present a new learning strategy based on an efﬁcient blocked Gibbs sampler for sparse overcomplete linear models. Particular emphasis is placed on statistical image modeling, where overcomplete models have played an important role in discovering sparse representations. Our Gibbs sampler is faster than general purpose sampling schemes while also requiring no tuning as it is free of parameters. Using the Gibbs sampler and a persistent variant of expectation maximization, we are able to extract highly sparse distributions over latent sources from data. When applied to natural images, our algorithm learns source distributions which resemble spike-and-slab distributions. We evaluate the likelihood and quantitatively compare the performance of the overcomplete linear model to its complete counterpart as well as a product of experts model, which represents another overcomplete generalization of the complete linear model. In contrast to previous claims, we ﬁnd that overcomplete representations lead to signiﬁcant improvements, but that the overcomplete linear model still underperforms other models. 1

6 0.12440249 60 nips-2012-Bayesian nonparametric models for ranked data

7 0.12244976 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

8 0.10265872 126 nips-2012-FastEx: Hash Clustering with Exponential Families

9 0.098369464 69 nips-2012-Clustering Sparse Graphs

10 0.098005377 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

11 0.087755255 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

12 0.081815906 358 nips-2012-Value Pursuit Iteration

13 0.074776314 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

14 0.074358985 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

15 0.061408587 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

16 0.05625825 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

17 0.055503163 26 nips-2012-A nonparametric variable clustering model

18 0.055404544 286 nips-2012-Random Utility Theory for Social Choice

19 0.054135241 204 nips-2012-MAP Inference in Chains using Column Generation

20 0.052841611 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.145), (1, 0.035), (2, 0.013), (3, -0.006), (4, -0.139), (5, -0.088), (6, -0.015), (7, -0.07), (8, 0.086), (9, 0.084), (10, 0.025), (11, -0.116), (12, 0.004), (13, -0.043), (14, 0.041), (15, -0.172), (16, -0.13), (17, -0.091), (18, -0.012), (19, -0.057), (20, 0.008), (21, -0.038), (22, -0.016), (23, 0.01), (24, -0.083), (25, -0.014), (26, -0.039), (27, -0.025), (28, -0.037), (29, 0.058), (30, -0.026), (31, 0.05), (32, 0.021), (33, 0.068), (34, 0.001), (35, -0.06), (36, -0.059), (37, -0.035), (38, 0.007), (39, -0.155), (40, -0.105), (41, -0.112), (42, -0.068), (43, 0.127), (44, -0.081), (45, -0.009), (46, 0.006), (47, 0.066), (48, -0.074), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94698066 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

2 0.71419698 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

Author: Nicholas Foti, Sinead Williamson

3 0.61130571 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

Author: Ke Jiang, Brian Kulis, Michael I. Jordan

Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulﬁlled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the ﬂexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1

4 0.57396585 89 nips-2012-Coupling Nonparametric Mixtures via Latent Dirichlet Processes

Author: Dahua Lin, John W. Fisher

5 0.56986761 126 nips-2012-FastEx: Hash Clustering with Exponential Families

Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy

Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1

6 0.53058302 26 nips-2012-A nonparametric variable clustering model

7 0.52313858 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process

8 0.51836681 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space

9 0.48751965 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

10 0.48659605 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

11 0.48169893 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

12 0.47464141 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data

13 0.47320095 60 nips-2012-Bayesian nonparametric models for ranked data

14 0.47039372 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

15 0.41159686 358 nips-2012-Value Pursuit Iteration

16 0.40995097 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

17 0.40757778 69 nips-2012-Clustering Sparse Graphs

18 0.4070766 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

19 0.39973268 59 nips-2012-Bayesian nonparametric models for bipartite graphs

20 0.38668808 294 nips-2012-Repulsive Mixtures

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (21, 0.031), (22, 0.237), (38, 0.062), (42, 0.032), (54, 0.037), (55, 0.015), (74, 0.038), (76, 0.105), (80, 0.204), (92, 0.078)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81050324 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

Author: Deepak Venugopal, Vibhav Gogate

2 0.76281625 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

Author: Ashwini Shukla, Aude Billard

Abstract: Non-linear dynamical systems (DS) have been used extensively for building generative models of human behavior. Their applications range from modeling brain dynamics to encoding motor commands. Many schemes have been proposed for encoding robot motions using dynamical systems with a single attractor placed at a predeﬁned target in state space. Although these enable the robots to react against sudden perturbations without any re-planning, the motions are always directed towards a single target. In this work, we focus on combining several such DS with distinct attractors, resulting in a multi-stable DS. We show its applicability in reach-to-grasp tasks where the attractors represent several grasping points on the target object. While exploiting multiple attractors provides more ﬂexibility in recovering from unseen perturbations, it also increases the complexity of the underlying learning problem. Here we present the Augmented-SVM (A-SVM) model which inherits region partitioning ability of the well known SVM classiﬁer and is augmented with novel constraints derived from the individual DS. The new constraints modify the original SVM dual whose optimal solution then results in a new class of support vectors (SV). These new SV ensure that the resulting multistable DS incurs minimum deviation from the original dynamics and is stable at each of the attractors within a ﬁnite region of attraction. We show, via implementations on a simulated 10 degrees of freedom mobile robotic platform, that the model is capable of real-time motion generation and is able to adapt on-the-ﬂy to perturbations.

3 0.73157924 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

Author: Michael Collins, Shay B. Cohen

Abstract: We describe an approach to speed-up inference with latent-variable PCFGs, which have been shown to be highly effective for natural language parsing. Our approach is based on a tensor formulation recently introduced for spectral estimation of latent-variable PCFGs coupled with a tensor decomposition algorithm well-known in the multilinear algebra literature. We also describe an error bound for this approximation, which gives guarantees showing that if the underlying tensors are well approximated, then the probability distribution over trees will also be well approximated. Empirical evaluation on real-world natural language parsing data demonstrates a signiﬁcant speed-up at minimal cost for parsing performance. 1

4 0.71214283 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the ﬁrst discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efﬁcient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classiﬁcation tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

5 0.70253879 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

Author: Tivadar Papai, Henry Kautz, Daniel Stefankovic

6 0.70043486 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

7 0.69685054 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

8 0.69060493 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

9 0.69036287 204 nips-2012-MAP Inference in Chains using Column Generation

10 0.6815396 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

11 0.67621297 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

12 0.66842532 197 nips-2012-Learning with Recursive Perceptual Representations

13 0.66034108 200 nips-2012-Local Supervised Learning through Space Partitioning

14 0.65641052 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

15 0.65597135 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

16 0.65118498 279 nips-2012-Projection Retrieval for Classification

17 0.64779299 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

18 0.64359784 168 nips-2012-Kernel Latent SVM for Visual Recognition

19 0.63981849 65 nips-2012-Cardinality Restricted Boltzmann Machines

20 0.63820934 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models