nips nips2013 nips2013-122 knowledge-graph by maker-knowledge-mining

122 nips-2013-First-order Decomposition Trees

Source: pdf

Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel

Abstract: Lifting attempts to speedup probabilistic inference by exploiting symmetries in the model. Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the ﬁrst-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 First-Order Decomposition Trees Nima Taghipour Jesse Davis Hendrik Blockeel Department of Computer Science, KU Leuven Celestijnenlaan 200A, B-3001 Heverlee, Belgium Abstract Lifting attempts to speedup probabilistic inference by exploiting symmetries in the model. [sent-1, score-0.239]

2 Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. [sent-2, score-0.81]

3 In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. [sent-3, score-0.471]

4 However, there is currently no equivalent structure nor analogous complexity results for lifted inference. [sent-4, score-0.601]

5 In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the ﬁrst-order level. [sent-5, score-0.249]

6 We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree. [sent-6, score-2.63]

7 To address this, Poole [12] introduced the concept of lifted probabilistic inference, i. [sent-9, score-0.596]

8 , inference that exploits the symmetries in the model to improve efﬁciency. [sent-11, score-0.205]

9 Various lifted algorithms have been proposed, mainly by lifting propositional inference algorithms [3, 6, 8, 9, 10, 13, 15, 17, 18, 19, 21, 22]. [sent-12, score-0.906]

10 While the relation between the propositional algorithms is well studied, we have far less insight into their lifted counterparts. [sent-13, score-0.715]

11 The performance of propositional inference, such as variable elimination [4, 14] or recursive conditioning [2], is characterized in terms of a corresponding tree decomposition of the model, and their complexity is measured based on properties of the decomposition, mainly its width. [sent-14, score-0.437]

12 However, the existing notion of treewidth does not provide a tight upper bound for the complexity of lifted inference, since it ignores the opportunities that lifting exploits to improve efﬁciency. [sent-18, score-0.753]

13 Currently, there exists no notion analogous to treewidth for lifted inference to analyze inference complexity based on the model structure. [sent-19, score-0.824]

14 Our work centers around a new structure for specifying and analyzing a lifted solution to an inference problem, and makes the following contributions. [sent-21, score-0.656]

15 First, building on the existing structure of dtrees for propositional graphical models, we propose the structure of First-Order dtrees (FO-dtrees) for PLMs. [sent-22, score-0.344]

16 An FO-dtree represents both the decomposition of a PLM and the symmetries that lifting exploits for performing inference. [sent-23, score-0.285]

17 Second, we show how to determine whether an FO-dtree has a lifted solution, from its structure alone. [sent-24, score-0.561]

18 Third, we present a method to read a lifted solution (a sequence of lifted inference operations) from a liftable FO-dtree, just like we can read a propositional inference solution from a dtree. [sent-25, score-1.63]

19 We formally analyze the complexity of lifted inference in terms of the novel, symmetry-aware notion of lifted width for FO-dtrees. [sent-27, score-1.316]

20 As such, FO-dtrees serve as the ﬁrst formal tool for ﬁnding, evaluating, and choosing among lifted solutions. [sent-28, score-0.583]

21 We use logvar for logical variables and randvar for random variables. [sent-30, score-0.204]

22 1 Propositional and ﬁrst-order graphical models Probabilistic graphical models such as Bayesian networks, Markov networks and factor graphs compactly represent a joint distribution over a set of randvars V = {V1 , . [sent-38, score-0.369]

23 Parfactors use parametrized randvars (PRVs) to represent entire sets of randvars. [sent-48, score-0.343]

24 It represents the set of all randvars P (x) where x ∈ D(X) and x satisﬁes C; this set is denoted rv(P (X)|C). [sent-51, score-0.343]

25 Formally, a parfactor is of the form φ(A)|C, where A = (Ai )n is a sequence of PRVs, C is a i=1 constraint on the logvars appearing in A, and φ is a potential function. [sent-55, score-0.343]

26 The set of logvars occurring in A is denoted logvar(A). [sent-56, score-0.201]

27 A parfactor g represents the set of all factors that can be obtained by applying a grounding substitution to g that is consistent with C; this set is called the grounding of g, and is denoted gr(g). [sent-58, score-0.32]

28 Following the literature, we assume that the model is in a normal form, such that (i) each pair of logvars have either identical or disjoint domains, and (ii) for each pair of co-domain logvars X, X in a parfactor φ(A)|C, (X = X ) ∈ C. [sent-61, score-0.544]

29 Then, the decomposi1 Similarly to existing studies on propositional inference [2, 4], our analysis only considers the model’s global structure, and makes no assumptions about its local structure. [sent-74, score-0.249]

30 A decomposition tree (dtree) is a structure that represents the decomposition used by a speciﬁc solution and allows us to determine its complexity [2]. [sent-89, score-0.236]

31 Formally, a dtree is a rooted tree in which each leaf represents a factor in the model. [sent-90, score-0.29]

32 2 Each node in the tree represents a decomposition of the model into the models under its child subtrees. [sent-91, score-0.249]

33 Child(T ) refers to T ’s child nodes; rv(T ) refers to the randvars under T , which are those in its factor if T is a leaf and rv(T ) = ∪T ∈Child(T ) rv(T ) otherwise. [sent-93, score-0.436]

34 Intuitively, the properties of dtree nodes help us analyze the size of subproblems solved during inference. [sent-96, score-0.233]

35 Lifted inference aims at exploiting symmetries among a model’s isomorphic parts. [sent-102, score-0.403]

36 Two constructs are isomorphic if there is a structure preserving bijection between their components. [sent-103, score-0.23]

37 All exact lifted inference methods use two main tools for exploiting symmetries, i. [sent-106, score-0.678]

38 Count the number of isomorphic conﬁgurations for a group of interchangeable variables instead of enumerating all possible conﬁgurations. [sent-110, score-0.32]

39 Below, we show how these tools are used by lifted variable elimination (LVE) [3, 10, 12, 17, 18]. [sent-117, score-0.656]

40 The ﬁrst lifting tool identiﬁes cases where the application of the decomposition rule results in a product of isomorphic sum-product problems. [sent-119, score-0.396]

41 In LVE, this corresponds to lifted elimination, which uses the operations of lifted multiplication and lifted sum-out on parfactors to evaluate a single representative problem. [sent-121, score-1.835]

42 To sum-out the randvars F using the decomposition rule, we partition the ground factors into six groups of the form {φ(F (x, y), F (y, x)), φ(F (y, x), F (x, y))}, i. [sent-127, score-0.457]

43 Since no randvars are shared between the groups, this decomposes the problem into the product of six isomorphic sums F (x,y),F (y,x) φ(F (x, y), F (y, x)) · φ(F (y, x), F (x, y)). [sent-130, score-0.542]

44 Whereas isomorphic decomposition exploits symmetry among problems, counting exploits symmetries within a problem, by identifying interchangeable randvars. [sent-134, score-0.62]

45 A group of (k-tuples of) randvars are interchangeable, if permuting the assignment of values to the group results in an equivalent model. [sent-135, score-0.395]

46 Consider a sumproduct subproblem V M (V, V ) that contains a set of n interchangeable (k-tuples of) randvars V = {(Vi1 , Vi2 , . [sent-136, score-0.438]

47 The interchangeability allows us to rewrite V into a single counting i=1 randvar #[V], whose value is the histogram h = {(v1 , n1 ), . [sent-140, score-0.22]

48 This lifting tool is employed in LVE by counting conversion, which rewrites the model in terms of counting randvars. [sent-147, score-0.37]

49 , S(xn )} are interchangeable here, since under any value assignment where nt randvars are true and nf randvars are f alse, the model evaluates to the same value φ (nt , nf ) = φ(t, t)nt . [sent-153, score-0.934]

50 4 First-order decomposition trees In this section, we propose the structure of FO-dtrees, which compactly represent a recursive decomposition for a PLM and the symmetries therein. [sent-159, score-0.326]

51 1 Structure An FO-dtree provides a compact representation of a propositional dtree, just like a PLM is a compact representation of a propositional model. [sent-161, score-0.308]

52 It does so by explicitly capturing isomorphic decomposition, which in a dtree correspond to a node with isomorphic children. [sent-162, score-0.669]

53 , Xk } of its logvars that all have the same domain DX . [sent-178, score-0.224]

54 The key intuition behind DPG is that for any x, x ⊆k DX , Gx is isomorphic to Gx , since any bijection from x to x yields a bijection from Gx to Gx . [sent-195, score-0.261]

55 DP G can be applied to a whole model G = {gi }m , if G’s logvars are (re-)named such that (i) i=1 only co-domain logvars share the same name, and (ii) logvars X appear in all parfactors. [sent-196, score-0.603]

56 FO-dtrees simply add to dtrees special nodes for representing DPGs in parfactor models. [sent-203, score-0.269]

57 Xk } is a set of logvars with the same domain DX , x = {x1 , . [sent-207, score-0.224]

58 For this, each TX has a single child distinguished as Tx , under which the model is a representative instance of the isomorphic models Gx in the DPG. [sent-214, score-0.332]

59 each leaf with a representative object x is the descendent of exactly one DPG node TX = (X, x, C), such that x ∈ x 4. [sent-218, score-0.207]

60 , which are isomorphic i=1 up to a permutation of the representative objects x. [sent-225, score-0.296]

61 Figure 3 (a) shows the dtree of Example 3 and its corresponding FO-dtree, which only has one instance Tx of all isomorphic subtrees Txi . [sent-230, score-0.4]

62 However, we use both to distinguish between a whole group of randvars (a PRV P (X)), and a representative of this group (a representative randvar P (x)). [sent-233, score-0.608]

63 2 Properties Darwiche [2] showed that important properties of a recursive decomposition are captured in the properties of dtree nodes. [sent-235, score-0.311]

64 Adapting the definitions of the dtree properties, such as cutset, context, and cluster, for FO-dtrees requires accounting for the semantics of an FO-dtree, which uses DPG nodes and representative objects. [sent-237, score-0.304]

65 Second, for two sets A = {ai }n and B = {bi }n i=1 i=1 of (representative) randvars we deﬁne: A ∩θ B = {ai |∃θ ∈ Θ : ai θ ∈ B}, with Θ the set of grounding substitutions to their representative objects. [sent-240, score-0.512]

66 FO-dtrees capture the ﬁrst lifting tool, isomorphic decomposition, explicitly in DPG nodes. [sent-249, score-0.295]

67 The second tool, counting, can be simply captured by rewriting interchangeable randvars in clusters of the tree nodes with counting randvars. [sent-250, score-0.658]

68 This can be done in FO-dtrees similarly to the operation of counting conversion on logvars in LVE. [sent-251, score-0.353]

69 5 Liftable FO-dtrees When inference can be performed using the lifted operations (i. [sent-255, score-0.706]

70 Not all FO-dtrees have a lifted solution, which is easy to see since not 4 The only non-trivial property is cutset of DPG nodes. [sent-259, score-0.656]

71 5 Fortunately, we can structurally identify the FO-dtrees for which we know a lifted solution. [sent-262, score-0.561]

72 Lifted inference identiﬁes isomorphic problems and solves only one instance of those. [sent-264, score-0.294]

73 Similar to propositional inference, for a lifted method the difﬁculty of each sub-problem increases with the number of variables in the problem– those that appear in the clusters of FO-dtree nodes. [sent-265, score-0.751]

74 While traditional inference is then intractable, lifting may be able to exploit the interchangeability among the randvars and reduce the complexity by counting. [sent-268, score-0.609]

75 Thus, whether a problem has a lifted solution boils down to whether we can rewrite it such that it only contains a bounded (domain-independent) number of counting randvars and ground randvars. [sent-269, score-1.053]

76 This requires the problem to have enough symmetries in it such that all the randvars V = V1 , . [sent-270, score-0.43]

77 Vn in each cluster can be divided into k groups of interchangeable (tuples of) randvars V1 , V2 , . [sent-273, score-0.484]

78 Theorem 1 A (non-counted) FO-dtree has a lifted inference solution if its clusters only consist of (representative) randvars and 1-logvar PRVs. [sent-277, score-1.035]

79 7 6 Lifted inference based on FO-dtrees A dtree can prescribe the operations performed by propositional inference, such as VE [2]. [sent-281, score-0.5]

80 In this section, we show how a liftable FO-dtree can prescribe an LVE solution for the model, thus providing the ﬁrst formal method for symbolic operation selection in lifted inference. [sent-282, score-0.667]

81 Darwiche [2] shows how we can read a (partial) elimination order from a dtree (by assigning elimination of each randvar to some tree node). [sent-284, score-0.529]

82 For this, we assign to each node a set of lifted operations, including lifted elimination of PRVs (using multiplication and sum-out), and counting conversion and aggregation of logvars: • V : A PRV V is eliminated at a node T , if V ∈ cluster(T ) \ context(T ). [sent-286, score-1.536]

83 A lifted solution can be characterized by a sequence of these operations. [sent-289, score-0.561]

84 7 Complexity of lifted inference In this section, we show how to compute the complexity of lifted inference based on an FO-dtree. [sent-296, score-1.352]

85 Just as the complexity of ground inference for a dtree is parametrized in terms of the tree’s width, we deﬁne a lifted width for FO-dtrees and use it to parametrize the complexity of lifted inference. [sent-297, score-1.592]

86 However, unlike in standard inference, this complexity need not be exponential in |rv(cluster(T ))|, since the clusters can contain counting randvars that allow us to handle interchangeable randvars more efﬁciently. [sent-304, score-0.971]

87 To accommodate this in our analysis, we deﬁne two widths for a cluster: a ground width wg , which is the number of ground randvars in the cluster, and a counting width, w# , which is the number of counting randvars in it. [sent-305, score-1.12]

88 The cornerstone of our analysis is that the complexity of an operation performed at node T is exponential only in wg , and polynomial in the domain size with degree w# . [sent-306, score-0.21]

89 Deﬁnition 4 (Lifted width) The lifted width of an FO-dtree T is a pair (wg , w# ), where wg is the largest ground width among the clusters of T and and w# is the largest counting width among them. [sent-309, score-1.0]

90 Theorem 2 The complexity of lifted variable elimination for a counted liftable FO-dtree T is: (w ·r# ) O(nT · log n · exp(wg ) · n# # ), where nT is the number of nodes in T , (wg , w# ) is its lifted width, n (resp. [sent-310, score-1.454]

91 , n# ) is the the largest domain size among its logvars (resp. [sent-311, score-0.224]

92 Roughly speaking, lifting allows us to perform nT /nG of the ground operations by isomorphic decomposition. [sent-324, score-0.38]

93 w# ) in its complexity, instead of n## for lifted inference. [sent-327, score-0.561]

94 These speedups, achieved by counting, are the most signiﬁcant for lifted inference, and what allows it to tackle high treewidth models. [sent-329, score-0.594]

95 An FO-dtree explicitly shows the symmetry between its isomorphic parts, and can thus show a form of decomposition that lifted inference methods employ. [sent-331, score-0.934]

96 We showed how to decide whether an FO-dtree is liftable (has a corresponding lifted solution), and how to derive the sequence of lifted operations and the complexity of LVE based on such a tree. [sent-332, score-1.318]

97 While we focused on LVE, our analysis is also applicable to lifted search-based methods, such as lifted recursive conditioning [13], weighted ﬁrstorder model counting [21], and probabilistic theorem proving [6]. [sent-333, score-1.302]

98 FO-dtrees are also useful to approximate lifted inference algorithms, such as lifted blocked Gibbs sampling [22] and RCR [20], that attempt to improve their inference accuracy by identifying liftable subproblems and handling them by exact inference. [sent-336, score-1.418]

99 On the completeness of ﬁrst-order knowledge compilation for lifted probabilistic inference. [sent-411, score-0.596]

100 Lifted relax, compensate and then recover: From approximate to exact lifted probabilistic inference. [sent-414, score-0.596]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lifted', 0.561), ('randvars', 0.343), ('tx', 0.231), ('dpg', 0.225), ('dtree', 0.201), ('logvars', 0.201), ('isomorphic', 0.199), ('lve', 0.177), ('propositional', 0.154), ('parfactor', 0.142), ('counting', 0.114), ('liftable', 0.106), ('prvs', 0.104), ('lifting', 0.096), ('elimination', 0.095), ('inference', 0.095), ('cutset', 0.095), ('dtrees', 0.095), ('interchangeable', 0.095), ('logvar', 0.095), ('symmetries', 0.087), ('rv', 0.086), ('decomposition', 0.079), ('grounding', 0.077), ('wg', 0.077), ('randvar', 0.071), ('representative', 0.071), ('node', 0.07), ('child', 0.062), ('width', 0.059), ('counted', 0.059), ('plm', 0.059), ('gx', 0.055), ('nf', 0.052), ('nima', 0.052), ('taghipour', 0.052), ('operations', 0.05), ('nt', 0.049), ('prv', 0.048), ('acutset', 0.047), ('txy', 0.047), ('cluster', 0.046), ('jesse', 0.045), ('complexity', 0.04), ('guy', 0.04), ('dx', 0.039), ('logical', 0.038), ('conversion', 0.038), ('tree', 0.038), ('clusters', 0.036), ('den', 0.036), ('ground', 0.035), ('agg', 0.035), ('descendent', 0.035), ('hendrik', 0.035), ('interchangeability', 0.035), ('plms', 0.035), ('probabilistic', 0.035), ('treewidth', 0.033), ('nodes', 0.032), ('xk', 0.032), ('broeck', 0.031), ('parfactors', 0.031), ('poole', 0.031), ('leaf', 0.031), ('bijection', 0.031), ('recursive', 0.031), ('davis', 0.029), ('read', 0.029), ('vi', 0.027), ('aggregation', 0.027), ('vibhav', 0.027), ('group', 0.026), ('compactly', 0.026), ('objects', 0.026), ('tuples', 0.026), ('friends', 0.025), ('substitution', 0.024), ('bacchus', 0.024), ('bloodt', 0.024), ('groundings', 0.024), ('moke', 0.024), ('rewrites', 0.024), ('ype', 0.024), ('trees', 0.024), ('domain', 0.023), ('ty', 0.023), ('gr', 0.023), ('exploits', 0.023), ('exploiting', 0.022), ('ve', 0.022), ('tool', 0.022), ('vk', 0.021), ('fierens', 0.021), ('kersting', 0.021), ('kristian', 0.021), ('starai', 0.021), ('ai', 0.021), ('intelligence', 0.021), ('rooted', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 122 nips-2013-First-order Decomposition Trees

Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel

2 0.34763312 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Guy van den Broeck, Adnan Darwiche

Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efﬁcient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1

3 0.074675649 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

Author: Matthew Johnson, James Saunderson, Alan Willsky

Abstract: Sampling inference methods are computationally difﬁcult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going “Hogwild” and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean. 1

4 0.054488547 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar

Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efﬁcient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisﬁability and its extensions such as maximum satisﬁability, satisﬁability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1

5 0.049515959 350 nips-2013-Wavelets on Graphs via Deep Learning

Author: Raif Rustamov, Leonidas Guibas

Abstract: An increasing number of applications require processing of signals deﬁned on weighted graphs. While wavelets provide a ﬂexible tool for signal processing in the classical setting of regular domains, the existing graph wavelet constructions are less ﬂexible – they are guided solely by the structure of the underlying graph and do not take directly into consideration the particular class of signals to be processed. This paper introduces a machine learning framework for constructing graph wavelets that can sparsely represent a given class of signals. Our construction uses the lifting scheme, and is based on the observation that the recurrent nature of the lifting scheme gives rise to a structure resembling a deep auto-encoder network. Particular properties that the resulting wavelets must satisfy determine the training objective and the structure of the involved neural networks. The training is unsupervised, and is conducted similarly to the greedy pre-training of a stack of auto-encoders. After training is completed, we obtain a linear wavelet transform that can be applied to any graph signal in time and memory linear in the size of the graph. Improved sparsity of our wavelet transform for the test signals is conﬁrmed via experiments both on synthetic and real data. 1

6 0.040574286 318 nips-2013-Structured Learning via Logistic Regression

7 0.039978836 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

8 0.038021464 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

9 0.038013164 289 nips-2013-Scalable kernels for graphs with continuous attributes

10 0.037080962 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

11 0.036133725 7 nips-2013-A Gang of Bandits

12 0.036006205 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

13 0.034988225 98 nips-2013-Documents as multiple overlapping windows into grids of counts

14 0.03477294 66 nips-2013-Computing the Stationary Distribution Locally

15 0.034390029 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

16 0.033893812 93 nips-2013-Discriminative Transfer Learning with Tree-based Priors

17 0.032915175 249 nips-2013-Polar Operators for Structured Sparse Estimation

18 0.031776376 63 nips-2013-Cluster Trees on Manifolds

19 0.031268802 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

20 0.031159231 196 nips-2013-Modeling Overlapping Communities with Node Popularities

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.089), (1, 0.025), (2, -0.018), (3, 0.014), (4, 0.041), (5, 0.045), (6, 0.051), (7, -0.042), (8, 0.037), (9, 0.003), (10, -0.001), (11, -0.016), (12, 0.108), (13, -0.014), (14, -0.025), (15, -0.002), (16, -0.027), (17, -0.013), (18, -0.006), (19, -0.011), (20, 0.025), (21, 0.033), (22, -0.025), (23, -0.022), (24, 0.027), (25, 0.002), (26, 0.005), (27, 0.002), (28, 0.031), (29, 0.062), (30, -0.03), (31, -0.082), (32, 0.002), (33, 0.057), (34, -0.004), (35, -0.087), (36, 0.022), (37, 0.034), (38, 0.18), (39, -0.012), (40, -0.37), (41, 0.079), (42, 0.139), (43, -0.24), (44, -0.162), (45, 0.109), (46, 0.104), (47, 0.22), (48, -0.01), (49, -0.162)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95691973 122 nips-2013-First-order Decomposition Trees

Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel

2 0.90299606 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Guy van den Broeck, Adnan Darwiche

3 0.34618714 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs

Author: Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, Josh Tenenbaum

Abstract: The idea of computer vision as the Bayesian inverse problem to computer graphics has a long history and an appealing elegance, but it has proved difﬁcult to directly implement. Instead, most vision tasks are approached via complex bottom-up processing pipelines. Here we show that it is possible to write short, simple probabilistic graphics programs that deﬁne ﬂexible generative models and to automatically invert them to interpret real-world images. Generative probabilistic graphics programs (GPGP) consist of a stochastic scene generator, a renderer based on graphics software, a stochastic likelihood model linking the renderer’s output and the data, and latent variables that adjust the ﬁdelity of the renderer and the tolerance of the likelihood. Representations and algorithms from computer graphics are used as the deterministic backbone for highly approximate and stochastic generative models. This formulation combines probabilistic programming, computer graphics, and approximate Bayesian computation, and depends only on generalpurpose, automatic inference techniques. We describe two applications: reading sequences of degraded and adversarially obscured characters, and inferring 3D road models from vehicle-mounted camera images. Each of the probabilistic graphics programs we present relies on under 20 lines of probabilistic code, and yields accurate, approximately Bayesian inferences about real-world images. 1

4 0.33986527 337 nips-2013-Transportability from Multiple Environments with Limited Experiments

Author: Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl

Abstract: This paper considers the problem of transferring experimental ﬁndings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power. 1 Motivation Transporting and synthesizing experimental knowledge from heterogeneous settings are central to scientiﬁc discovery. Conclusions that are obtained in a laboratory setting are transported and applied elsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. However, the conditions under which this extrapolation can be legitimized have not been formally articulated until very recently. Although the problem has been discussed in many areas of statistics, economics, and the health sciences, under rubrics such as “external validity” [2, 3], “meta-analysis” [4], “quasi-experiments” [5], “heterogeneity” [6], these discussions are limited to verbal narratives in the form of heuristic guidelines for experimental researchers – no formal treatment of the problem has been attempted to answer the practical challenge of generalizing causal knowledge across multiple heterogeneous domains with disparate experimental data posed in this paper. The ﬁelds of artiﬁcial intelligence and statistics provide the theoretical underpinnings necessary for tackling transportability. First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven ﬁelds. Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. Armed with these techniques, [1] introduced a formal language for encoding differences and commonalities between domains accompanied with necessary or sufﬁcient conditions under which transportability of empirical ﬁndings is feasible between two domains, a source and a target; then, these conditions were extended for a complete characterization for transportability in one domain with unrestricted experimental data [11]. Subsequently, these results were generalized for the settings when ∗ These authors contributed equally to this paper. The authors’ addresses are respectively eb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu. 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”.1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , ..., πn } to a target domain π ∗ . In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. In π ∗ , potentially different from πi , only passive observations can be collected (this constraint is weakened later on). The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. Clearly, if nothing is known about the relationship between Π and π ∗ , the problem is trivial; no transfer can be justiﬁed. Yet the fact that all scientiﬁc experiments are conducted with the intent of being used elsewhere (e.g., outside the lab) implies that scientiﬁc progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. 2 Previous work and our contributions Consider Fig. 1(a) in which the node S represents factors that produce differences between source and target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.), but we ﬁnd the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter P ∗ (x, y, z)). In particular, the average age in the U.S. is signiﬁcantly higher than that in LA. How are we to estimate the causal effect of X on Y in U.S., denoted R = P ∗ (y|do(x))? 2 3 The selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespeciﬁc effects P ∗ (y|do(x), Z = z) are invariant across populations. Difference-generating factors are represented by a special set of variables called selection variables S (or simply S-variables), which are graphically depicted as square nodes ( ). From this assumption, the overall causal effect in the U.S. can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U.S. population, P ∗ (z), to obtain an experimental claim P ∗ (y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-speciﬁc effects to account for the new age distribution. In general, however, a more involved mixture of experimental and observational ﬁndings would be necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. In real world applications, it may happen that certain controlled experiments cannot be conducted in the source environment (for ﬁnancial, ethical, or technical reasons), so only a limited amount 1 The machine learning literature has been concerned about discrepancies among domains in the context, almost exclusively, on predictive or classiﬁcation tasks as opposed to learning causal or counterfactual measures [16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of learning and also leverages knowledge about the underlying data-generating structure [18, 19]. 2 We will use Px (y | z) interchangeably with P (y | do(x), z). 3 We use the structural interpretation of causal diagrams as described in [9, pp. 205]. 2 S S Z Z1 S S X Y (a) X Y (b) Z2 X Y (c) Figure 1: The selection variables S are depicted as square nodes ( ). (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. (b) The smallest possible selection diagram in which a causal relation is not transportable. (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. of experimental information can be gathered. A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. 4 One might surmise that multiple pairwise z-transportability would be sufﬁcient to solve the mztransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol level), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. More interestingly, we consider the more stringent scenario where only certain experiments can be performed in each of the domains. For instance, assume that it is only possible to conduct experiments over {Z2 } in πa and over {Z1 } in πb . Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i.e., {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. In the sequel, we try to better understand some of these issues, and we develop sufﬁcient or (speciﬁc) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. We also consider issues relative to the variance of the estimand aiming for improving sample efﬁciency and increasing statistical power. 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as deﬁned in [9, pp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions are modiﬁcations of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). If we assume, more stringently, availability of experiments Za = {Z2 }, Zb = {Z1 }, Z∗ = {}, a more elaborated analysis can show that R can be estimated combining different pieces from both domains. a new model Mx = U, V, Fx , P (U) , where Fx is obtained after replacing fX ∈ F for every X ∈ X with a new function that outputs a constant value x given by do(x). 5 We follow the conventions given in [9]. We denote variables by capital letters and their realized values by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of realized values by bold small letters. We use the typical graph-theoretic terminology with the corresponding abbreviations P a(Y)G and An(Y)G , which will denote respectively the set of observable parents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph of G where all incoming arrows into X and all outgoing arrows from Z are removed. Key to the analysis of transportability is the notion of “identiﬁability,” deﬁned below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. Deﬁnition 1 (Causal Effects Identiﬁability (Pearl, 2000, pp. 77)). The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identiﬁable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. Causal models and their induced graphs are usually associated with one particular domain (also called setting, study, population, or environment). In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. This is possible if we assume that the structural equations share the same set of arguments, though the functional forms of the equations may vary arbitrarily [11]. 6 Deﬁnition 2 (Selection Diagram). Let M, M ∗ be a pair of structural causal models [9, pp. 205] relative to domains π, π ∗ , sharing a causal diagram G. M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. Every edge in G is also an edge in D; 2. D contains an extra edge Si → Vi whenever there might exist a discrepancy fi = fi∗ or P (Ui ) = P ∗ (Ui ) between M and M ∗ . In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place.7 Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is identical in both domains. 5 The results presented here are also valid in other formalisms for causality based on potential outcomes. As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. 6 4 Armed with the concept of identiﬁability and selection diagrams, mz-transportability of causal effects can be deﬁned as follows: Deﬁnition 3 (mz-Transportability). Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,...,n P i , Iz ∪ P ∗ , Iz in any model that induces D. ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufﬁcient condition. Theorem 1. Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . The relation R = P ∗ (y|do(x), w) is mz-transportable from Π to π ∗ in D if the expression P (y|do(x), w, S1 , ..., Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . This result provides a powerful way to syntactically establish mz-transportability, but it is not immediately obvious whether a sequence of applications of the rules of causal calculus that achieves the reduction required by the theorem exists, and even if such sequence exists, it is not obvious how to obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b). Corollary 1. P ∗ (y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2 } and Zb = {Z1 }. Proof. The goal is to show that R = P ∗ (y|do(x)) is mz-transportable from {πa , πb } to π ∗ using experiments conducted over {Z2 } in πa and {Z1 } in πb . Note that naively trying to transport R from each of the domains individually is not possible, but R can be decomposed as follows: P ∗ (y|do(x)) = P ∗ (y|do(x), do(Z1 )) (2) P ∗ (y|do(x), do(Z1 ), z2 )P ∗ (z2 |do(x), do(Z1 )) (3) P ∗ (y|do(x), do(Z1 ), do(z2 ))P ∗ (z2 |do(x), do(Z1 )), = (4) z2 = z2 where Eq. (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. (3), and Eq. (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). 1 2 Now we can rewrite the ﬁrst term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2): P ∗ (y|do(x), do(Z1 ), do(z2 )) = P (y|do(x), do(Z1 ), do(z2 ), Sa , Sb ) (5) = P (y|do(x), do(Z1 ), do(z2 ), Sb ) (6) = P (y|do(z2 ), Sb ) (7) = P (a) (y|do(z2 )), (8) where Eq. (5) follows from the theorem (and the deﬁnition of selection diagram), Eq. (6) follows , Eq. (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. (8) by the deﬁnition of selection diagram. Finally, we can rewrite the second term of Eq. (4) as follows: P ∗ (z2 |do(x), do(Z1 )) = P (z2 |do(x), do(Z1 ), Sa , Sb ) = P (z2 |do(x), do(Z1 ), Sa ) = P (z2 |x, do(Z1 ), Sa ) = P (b) (z2 |x, do(Z1 )), 5 (9) (10) (11) (12) where Eq. (9) follows from the theorem (and the deﬁnition of selection diagram), Eq. (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . Note that this equation matches the condition of the ⊥ Z1 X theorem, separate do(Z1 ) from Sb (i.e., experiments over Z1 can be used since they are available in πb ), so we can rewrite Eq. (12) using the deﬁnition of selection diagrams, the corollary follows. The next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which also demonstrates the challenge of relating mz-transportability to other types of transportability. Corollary 2. R = P ∗ (y|do(x)) is mz-transportable in D if there exists Zi ⊆ Zi such that all paths from Zi to Y are blocked by X, (Si ⊥ Y|X, Zi )D(i) , and R is computable from do(Zi ). ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identiﬁability. So, it is not immediately obvious how to combine the topological relations of Zi ’s with X and Y in order to create a general condition for mz-transportability, the relationship between the distributions in the different domains can get relatively intricate, but we defer this discussion for now and consider a simpler case. It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. Consider again the diagrams in Fig. 2(a,b), and assume that randomized experiments are available over {Z1 } in πa and {Z2 } in πb . Theorem 2. P ∗ (y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1 } and Zb = {Z2 }. Proof. Formally, we need to display two models M1 , M2 such that the following relations hold (as implied by Def. 3):  (a) (a)  PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ),   (b)   P (Z1 , X, Z2 , Y ) = P (b) (Z1 , X, Z2 , Y ),  M1 M2 (a) (a) (13) PM1 (X, Z2 , Y |do(Z1 )) = PM2 (X, Z2 , Y |do(Z1 )),  (b)   P (Z , X, Y |do(Z )) = P (b) (Z , X, Y |do(Z )),  M 2 1 2  M2  ∗1 1 ∗ PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), for all values of Z1 , X, Z2 , and Y , and also, ∗ ∗ PM1 (Y |do(X)) = PM2 (Y |do(X)), (14) for some value of X and Y . Let V be the set of observable variables and U be the set of unobservable variables in D. Let us assume that all variables in U ∪ V are binary. Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. Deﬁne the two models as follows:    Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa  Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa     X=U X = Z1 ⊕ U1 1 M1 = M2 =  Z2 = (X ⊕ U2 ⊕ (U4 ∧ Sa )) ∨ U6  Z2 = U4 ∧ Sa ∧ U6 ⊕ U6     Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) where ⊕ represents the exclusive or function. Both models agree in respect to P (U), which is deﬁned as P (Ui ) = 1/2, i = 1, ..., 8. It is not difﬁcult to evaluate these models and note that the constraints given in Eqs. (13) and (14) are satisﬁed (including positivity), the theorem follows. 4 Algorithm for computing mz-transportability In this section, we build on previous analyses of identiﬁability [7, 21, 22, 23] in order to obtain a mechanic procedure in which a collection of selection diagrams and experimental data is inputted, and the procedure returns a transport formula whenever it is able to produce one. More speciﬁcally, 6 PROCEDURE TRmz (y, x, P, I, S, W, D) INPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes π ∗ ) and active experiments I; W: weighting scheme; D: backbone of selection diagram; Si : selection nodes in πi (S0 = ∅ (i) relative to π ∗ ); [The following set and distributions are globally deﬁned: Zi , P ∗ , PZi .] (i) ∗ ∗ OUTPUT: Px (y) in terms of P ∗ , PZ , PZi or F AIL(D, C0 ). P 1 if x = ∅, return V\Y P. P 2 if V \ An(Y)D = ∅, return TRmz (y, x ∩ An(Y)D , V\An(Y)D P, I, S, W, DAn(Y) ). 3 set W = (V \ X) \ An(Y)DX . if W = ∅, return TRmz (y, x ∪ w, P, I, S, W, D). Q P 4 if C(D \ X) = {C0 , C1 , ..., Ck }, return V\{Y,X} i TRmz (ci , v \ ci , P, I, S, W, D). 5 if C(D \ X) = {C0 }, 6 if C(D) = {D}, Q P P 7 if C0 ∈ C(D), return i|Vi ∈C0 V\V (i) P/ V\V (i−1) P. D 8 9 10 11 12 D if (∃C )C0 ⊂ C ∈ C(D), (i−1) for {i|Vi ∈ C }, set κi = κi ∪ vD \C . Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). else, if I`= ∅, for i = 0, ..., |D|, ´ if (Si ⊥ Y | X)D(i) ∧ (Zi ∩ X = ∅) , Ei = TRmz (y, x \ zi , P, Zi ∩ X, i, W, D \ {Zi ∩ X}). ⊥ PX (j) if |E| > 0, return |E| wi Ei . i=1 else, FAIL(D, C0 ). Figure 3: Modiﬁed version of identiﬁcation algorithm capable of recognizing mz-transportability. our algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for identiﬁcation of causal effects [22, 23] (and a version of the identiﬁcation algorithm called ID). The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm. ∗ Theorem 3 (soundness). Whenever TRmz returns an expression for Px (y), it is correct. In the sequel, we demonstrate how the algorithm works through the mz-transportability of Q = P ∗ (y|do(x)) in Fig. 2(c,d) with Z∗ = {Z1 }, Za = {Z2 }, and Zb = {Z1 }. Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. The new call triggers line 4 and C(D \ {X, Z2 }) = {C0 , C1 , C2 , C3 }, where C0 = DZ1 , C1 = DZ3 , C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ∗ ∗ ∗ ually Q0 = Px,z2 ,z3 ,u,w,y (z1 ), Q1 = Px,z1 ,z2 ,u,w,y (z3 ), Q2 = Px,z1 ,z2 ,z3 ,w,y (u), and Q3 = ∗ Px,z1 ,z2 ,z3 ,u (w, y). Thus the original problem reduces to try to evaluate the equivalent expression ∗ ∗ ∗ ∗ z1 ,z3 ,u,w Px,z2 ,z3 ,u,w,y (z1 )Px,z1 ,z2 ,u,w,y (z3 ) Px,z1 ,z2 ,z3 ,w,y (u)Px,z1 ,z2 ,z3 ,u (w, y). First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. Thus it proceeds to line 6 testing whether C(D \ {X, Z1 , Z2 }) is different from D1 , which is false. In this call, ordinary identiﬁability would fail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can help for computing Q1 . In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . Finally, TR triggers line 8 since X is not part of Z3 ’s components in D1 (or, Z3 ∈ C = {Z2 Z3 }), so line 2 is triggered since Z2 is no longer an ancestor of Z3 in D1 , and then line 1 is triggered since the interventional set is empty in (i) (i) ∗ this local call, so Px,z1 ,z2 (z3 ) = Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ), for i = {b, ∗}. 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . TR goes to line 5, and in this local call C(D \ {X, Z1 , Z2 , Z3 , W }) = {DU }, and the test in 6 succeed, since there are more components in D. So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . The algorithm makes Px,z1 ,z2 ,z3 ,w (u) = Px,z1 ,z2 ,z3 (u)D2 |W (and update the working distribution); note that in this call, ordinary identiﬁability would fail since the nodes are in the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying to ﬁnd experiments that can help in Q2 ’s computation. In this case, πb cannot help but πa (a) and π ∗ perhaps can, noting that new calls are launched for computing Px,z1 ,z3 (u)D2 \Z2 |W rel∗ ative to πa , and Px,z2 ,z3 (u)D2 \Z1 |W relative to π ∗ with the corresponding data structures set. (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simpliﬁcation) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . ∗ In turn, both tests at lines 6 and 7 succeed, which makes the procedure to return Px,z1 ,z2 ,z3 ,u (w, y) = ∗ ∗ P (w|z3 , x, z1 , z2 )P (y|w, x, z1 , z2 , z3 , u). The composition of the return of these calls generates the following expression: ! ∗ Px (y) = X ∗ P (z1 ) z1 ,z3 ,w,u (2) w1 (1) w1 X ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) X (b) (b) Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Z2 Z2 „X + (1) w2 « « „X ∗ ∗ ∗ ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / Z2 Z2 ! (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, ..., nk ), and nk is the number of feasible estimands of k. Eq. (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. For instance, one might use an analogous to inverse-variance weighting, which sets the weights for the normalized (k) nk −2 −2 2 inverse of their variances (i.e., wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter usually disregards the intricacies of the relationships between variables, so producing a statistically less powerful estimand. Our method leverages this non-trivial and highly structured relationships, as exempliﬁed in Eq. (15), which yields an estimand with less variance and statistically more powerful. 5 Conclusions In this paper, we treat a special type of transportability in which experiments can be conducted only over limited sets of variables in the sources and target domains, and the goal is to infer whether a certain effect can be estimated in the target using the information scattered throughout the domains. We provide a general sufﬁcient graphical conditions for transportability based on the causal calculus along with a necessary condition for a speciﬁc scenario, which should be generalized for arbitrary structures. We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard statistical procedures and leads to the construction of statistically more powerful estimands. Acknowledgment The work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack Lee and Vasant Honavar was partially completed while they were with the Department of Computer Science at Iowa State University. The work of Vasant Honavar while working at the National Science Foundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by the grant from NSF (IIS-0711356). Any opinions, ﬁndings, and conclusions contained in this article are those of the authors and do not necessarily reﬂect the views of the sponsors. 8 References [1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artiﬁcial Intelligence, pages 247–254. AAAI Press, Menlo Park, CA, 2011. [2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963. [3] C. Manski. Identiﬁcation for Prediction and Decision. Harvard University Press, Cambridge, Massachusetts, 2007. [4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985. [5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifﬂin, Boston, second edition, 2002. [6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007. [7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–710, 1995. [8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000. [9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009. [10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artiﬁcial Intelligence, pages 698–704. AAAI Press, Menlo Park, CA, 2012. [12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artiﬁcial Intelligence, pages 95–101, Menlo Park, CA, 2013. AAAI Press. [13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: ztransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 361–370. AUAI Press, 2013. [14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pages 135–143. JMLR W&CP; 31, 2013. [15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artiﬁcial Intelligence, pages 583–590, Menlo Park, CA, 2013. AAAI Press. [16] H. Daume III and D. Marcu. Domain adaptation for statistical classiﬁers. Journal of Artiﬁcial Intelligence Research, 26:101–126, 2006. [17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 3–28. MIT Press, Cambridge, MA, 2009. [18] B. Sch¨ lkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal o learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress. [19] K. Zhang, B. Sch¨ lkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional o shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP; volume 28, 2013. [20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identiﬁability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 113–120. AUAI Press, 2012. [21] M. Kuroki and M. Miyakawa. Identiﬁability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105–117, 1999. [22] J. Tian and J. Pearl. A general identiﬁcation condition for causal effects. In Proceedings of the Eighteenth National Conference on Artiﬁcial Intelligence, pages 567–573. AAAI Press/The MIT Press, Menlo Park, CA, 2002. [23] I. Shpitser and J. Pearl. Identiﬁcation of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artiﬁcial Intelligence, pages 1219–1226. AAAI Press, Menlo Park, CA, 2006. 9

5 0.3216508 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models

Author: Khaled Refaat, Arthur Choi, Adnan Darwiche

Abstract: EDML is a recently proposed algorithm for learning parameters in Bayesian networks. It was originally derived in terms of approximate inference on a metanetwork, which underlies the Bayesian approach to parameter estimation. While this initial derivation helped discover EDML in the ﬁrst place and provided a concrete context for identifying some of its properties (e.g., in contrast to EM), the formal setting was somewhat tedious in the number of concepts it drew on. In this paper, we propose a greatly simpliﬁed perspective on EDML, which casts it as a general approach to continuous optimization. The new perspective has several advantages. First, it makes immediate some results that were non-trivial to prove initially. Second, it facilitates the design of EDML algorithms for new graphical models, leading to a new algorithm for learning parameters in Markov networks. We derive this algorithm in this paper, and show, empirically, that it can sometimes learn estimates more efﬁciently from complete data, compared to commonly used optimization methods, such as conjugate gradient and L-BFGS. 1

6 0.30890593 350 nips-2013-Wavelets on Graphs via Deep Learning

7 0.27971959 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

8 0.27644676 186 nips-2013-Matrix factorization with binary components

9 0.267088 34 nips-2013-Analyzing Hogwild Parallel Gaussian Gibbs Sampling

10 0.26674846 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

11 0.26185614 134 nips-2013-Graphical Models for Inference with Missing Data

12 0.25793958 184 nips-2013-Marginals-to-Models Reducibility

13 0.2554552 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

14 0.2489195 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

15 0.24566609 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars

16 0.24421559 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

17 0.23884317 340 nips-2013-Understanding variable importances in forests of randomized trees

18 0.23227151 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

19 0.23045503 139 nips-2013-How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal

20 0.22281492 167 nips-2013-Learning the Local Statistics of Optical Flow

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.024), (33, 0.052), (34, 0.478), (41, 0.019), (49, 0.024), (56, 0.067), (59, 0.014), (70, 0.02), (85, 0.029), (89, 0.014), (93, 0.024), (95, 0.126), (99, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92619514 256 nips-2013-Probabilistic Principal Geodesic Analysis

Author: Miaomiao Zhang, P.T. Fletcher

Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is deﬁned as a geometric ﬁt to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1

same-paper 2 0.92526281 122 nips-2013-First-order Decomposition Trees

Author: Nima Taghipour, Jesse Davis, Hendrik Blockeel

3 0.91829938 210 nips-2013-Noise-Enhanced Associative Memories

Author: Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi, Lav R. Varshney

Abstract: Recent advances in associative memory design through structured pattern sets and graph-based inference algorithms allow reliable learning and recall of exponential numbers of patterns. Though these designs correct external errors in recall, they assume neurons compute noiselessly, in contrast to highly variable neurons in hippocampus and olfactory cortex. Here we consider associative memories with noisy internal computations and analytically characterize performance. As long as internal noise is less than a speciﬁed threshold, error probability in the recall phase can be made exceedingly small. More surprisingly, we show internal noise actually improves performance of the recall phase. Computational experiments lend additional support to our theoretical analysis. This work suggests a functional beneﬁt to noisy neurons in biological neuronal networks. 1

4 0.91817892 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

Author: Zhenwen Dai, Georgios Exarchakis, Jörg Lücke

Abstract: We study optimal image encoding based on a generative approach with non-linear feature combinations and explicit position encoding. By far most approaches to unsupervised learning of visual features, such as sparse coding or ICA, account for translations by representing the same features at different positions. Some earlier models used a separate encoding of features and their positions to facilitate invariant data encoding and recognition. All probabilistic generative models with explicit position encoding have so far assumed a linear superposition of components to encode image patches. Here, we for the ﬁrst time apply a model with non-linear feature superposition and explicit position encoding for patches. By avoiding linear superpositions, the studied model represents a closer match to component occlusions which are ubiquitous in natural images. In order to account for occlusions, the non-linear model encodes patches qualitatively very different from linear models by using component representations separated into mask and feature parameters. We ﬁrst investigated encodings learned by the model using artiﬁcial data with mutually occluding components. We ﬁnd that the model extracts the components, and that it can correctly identify the occlusive components with the hidden variables of the model. On natural image patches, the model learns component masks and features for typical image components. By using reverse correlation, we estimate the receptive ﬁelds associated with the model’s hidden units. We ﬁnd many Gabor-like or globular receptive ﬁelds as well as ﬁelds sensitive to more complex structures. Our results show that probabilistic models that capture occlusions and invariances can be trained efﬁciently on image patches, and that the resulting encoding represents an alternative model for the neural encoding of images in the primary visual cortex. 1

5 0.91139013 202 nips-2013-Multiclass Total Variation Clustering

Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht

Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1

6 0.90553778 129 nips-2013-Generalized Random Utility Models with Multiple Types

7 0.89498252 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

8 0.88302439 38 nips-2013-Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

9 0.8671546 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory

10 0.86107391 143 nips-2013-Integrated Non-Factorized Variational Inference

11 0.75788546 347 nips-2013-Variational Planning for Graph-based MDPs

12 0.75406408 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

13 0.75012732 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

14 0.7107318 86 nips-2013-Demixing odors - fast inference in olfaction

15 0.71069205 322 nips-2013-Symbolic Opportunistic Policy Iteration for Factored-Action MDPs

16 0.70855594 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

17 0.70674562 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

18 0.70568675 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

19 0.70373982 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

20 0.70126504 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning