jmlr jmlr2008 jmlr2008-20 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
Reference: text
sentIndex sentText sentNum sentScore
1 One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. [sent-6, score-0.749]
2 In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). [sent-8, score-0.633]
3 This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. [sent-9, score-0.785]
4 Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention 1. [sent-15, score-0.528]
5 A prominent machinery for causal reasoning of this kind is known as causal Bayesian network (Spirtes et al. [sent-20, score-0.584]
6 So even if the causal DAG (with latent variables) is fully known, we may not be able to predict certain intervention effects because we only have data from the marginal distribution over the observed variables instead of the joint distribution over all causally relevant variables. [sent-27, score-0.531]
7 The familiar curse is that very rarely can observational data determine a unique causal structure, and many causal discovery algorithms in the literature output an equivalence class of causal structures based on observational data (Spirtes et al. [sent-32, score-0.983]
8 The first part is what some causal discovery algorithms attempt to achieve, namely, to learn something about the causal structure—usually features shared by all causal structures in an equivalence class—from the pre-intervention distribution of O. [sent-48, score-0.925]
9 First, we extend the do-calculus of Pearl (1995) to the context of ancestral graphs (Section 4), so that the resulting calculus is based on an equivalence class of causal DAGs with latent variables rather than a single one. [sent-58, score-0.657]
10 Our result improves upon the Spirtes-Glymour-Scheines conditions for invariance formulated with respect to the so-called inducing path graphs, whose relationship with ancestral graphs is discussed in Appendix A. [sent-61, score-0.654]
11 In a causal Bayesian network, the DAG G is interpreted causally, as a representation of the causal structure over V. [sent-64, score-0.584]
12 The postulate that the (pre-intervention) joint distribution P factorizes according to the causal DAG G is known as the causal Markov condition. [sent-67, score-0.584]
13 Note that in the case of a null intervention (when X = Ø), the intervention principle implies the factorization of the pre-intervention distribution P according to G , which is just the causal Markov 1439 Z HANG condition. [sent-75, score-0.522]
14 So the intervention principle generalizes the causal Markov condition: it assumes that the post-intervention distribution also satisfies the causal Markov condition with the post-intervention causal graph. [sent-76, score-0.991]
15 Given a set of variables V, and two variables A, B ∈ V, a variable C (not necessarily included in V) is called a common direct cause of A and B relative to V if C has a direct causal influence on A and also a direct causal influence on B relative to V ∪ {C}. [sent-82, score-0.726]
16 Taking both complications into account, the interesting question is this: what causal reasoning is warranted given the causal information learnable by algorithms that do not assume causal sufficiency for the set of observed variables, such as the FCI algorithm presented in Spirtes et al. [sent-90, score-0.876]
17 In Figure 1(a), for example, B is a collider on the path A, B, D , but is a non-collider on the path C, B, D . [sent-128, score-0.568]
18 A collider path is a path on which every vertex except for the endpoints is a collider. [sent-129, score-0.64]
19 For example, in Figure 1(a), the path C, A, B, D is a collider path because both A and B are colliders on the path. [sent-130, score-0.589]
20 An inducing path relative to L is a path on which every vertex not in L (except for the endpoints) is a collider on the path and every collider is an ancestor of an endpoint of the path. [sent-132, score-1.314]
21 For example, any single-edge path is trivially an inducing path relative to any set of vertices (because the definition does not constrain the endpoints of the path). [sent-133, score-0.723]
22 In Figure 1(a), the path C, B, D is an inducing path relative to {B}, but not an inducing path relative to the empty set (because B is not a collider). [sent-134, score-1.145]
23 However, the path C, A, B, D is an inducing path relative to the empty set, because both A and B are colliders on the path, A is an ancestor of D, and B is an ancestor of C. [sent-135, score-0.865]
24 for each pair of variables A, B ∈ O, A and B are adjacent in M G if and only if there is an inducing path between them relative to L in G ; 1442 C AUSAL R EASONING WITH A NCESTRAL G RAPHS 2. [sent-168, score-0.563]
25 So, if G is the causal DAG for O, L , it is fair to call MG the causal MAG for O. [sent-172, score-0.584]
26 Different causal DAGs may correspond to the same causal MAG. [sent-174, score-0.584]
27 A causal MAG thus carries uncertainty about what the true causal DAG is, but also reveals features that must be satisfied by the underlying causal DAG. [sent-176, score-0.876]
28 There is then a natural causal interpretation of the edges in MAGs, derivative from the causal interpretation of DAGs. [sent-177, score-0.633]
29 , there is a latent variable L in the underlying DAG such that there is a directed path from L to A and a directed path from L to B 6 ). [sent-180, score-0.731]
30 1443 Z HANG The causal MAG that corresponds to the causal DAG is depicted in Figure 3(a)—which syntactically happens to be a DAG in this case. [sent-193, score-0.603]
31 Assuming the causal Markov condition and its converse, the causal Faithfulness condition,9 there is a provably correct independence-constraintbased algorithm to learn a PAG from an oracle of conditional independence relations (Spirtes et al. [sent-232, score-0.632]
32 ,Vn = Y , is called a possibly directed path from X to Y 11 if for every 0 < i ≤ n, the edge between Vi−1 and Vi is not into Vi−1 . [sent-240, score-0.515]
33 In terms of d-separation, the causal Markov condition says that d-separation in a causal DAG implies conditional independence in the (pre-intervention) population distribution. [sent-252, score-0.606]
34 The causal Faithfulness condition says that d-connection in a causal DAG implies conditional dependence in the (pre-intervention) population distribution. [sent-253, score-0.584]
35 It is not hard to see that if there is a definite m-connecting path between X and Y given Z in a PAG, then in every MAG represented by the PAG, the corresponding path is an m-connecting path between X and Y given Z. [sent-283, score-0.734]
36 For example, in Figure 4 the path I, S, G is definitely m-connecting given L, and this path is m-connecting given L in every member of the equivalence class. [sent-284, score-0.588]
37 Another analogue of m-connecting path is the following: Definition 5 (Possibly m-connecting path) In a partial mixed graph, a path p between vertices X and Y is possibly m-connecting relative to a (possibly empty) set of vertices Z (X,Y ∈ Z) if / i. [sent-286, score-0.635]
38 In the PAG, the path X,Y, Z,W is a possibly m-connecting path but not a definite m-connecting path relative to {Y, Z}, because Y and Z are neither colliders nor definite non-colliders on the path. [sent-293, score-0.775]
39 So unlike a definite m-connecting path, a mere possibly m-connecting path in a PAG does not necessarily correspond to a m-connecting path (or imply the existence of a m-connecting path) in a representative MAG in the equivalence class. [sent-296, score-0.528]
40 1446 C AUSAL R EASONING WITH A NCESTRAL G RAPHS X W X W Y Z Y Z Figure 5: Difference between possible and definite m-connecting paths: in the PAG on the right, X,Y, Z,W is a possibly m-connecting path relative to {Y, Z} but not a definite mconnecting path relative to {Y, Z}. [sent-304, score-0.598]
41 Definition 8 (Visibility) Given a MAG M , a directed edge A → B in M is visible if there is a vertex C not adjacent to B, such that either there is an edge between C and A that is into A, or there is a collider path between C and A that is into A and every vertex on the path is a parent of B. [sent-343, score-1.282]
42 For any A, B ∈ O, if A ∈ AnG (B), and there is an inducing path relative to L between A and B that is into A in G , then there is a directed edge A → B in M that is invisible. [sent-348, score-0.7]
43 Taking the contrapositive of Lemma 9 gives us the fact that if A → B is visible in a MAG, then in every DAG represented by the MAG, there is no inducing path between A and B relative to the set of latent variables that is also into A. [sent-357, score-0.689]
44 This implies that for every such DAG G, G A —the graph resulting from eliminating edges out of A in G—will not contain any inducing path between A and B relative to the set of latent variables, which means that the MAG that represents G A will not contain any edge between A and B. [sent-358, score-0.791]
45 Obviously A ← LAB → B is an inducing path between A and B relative to the set of latent variables. [sent-365, score-0.532]
46 The definition of visibility still makes sense in PAGs, except that we will call a directed edge in a PAG definitely visible if it satisfies the condition for visibility in Definition 8, in order to emphasize that this edge is visible in all MAGs in the equivalence class. [sent-434, score-0.591]
47 For any A, B ∈ O and C ⊆ O that does not contain A or B, if a path p between A and B is mconnecting given C in MYX , then p, the same sequence of variables, forms a possibly m-connecting path between A and B given C in PYX . [sent-450, score-0.502]
48 For our purpose, what we need is the obvious consequence of the lemma that if there is an m-connecting path in MYX , then there is a possibly m-connecting path in PYX . [sent-461, score-0.526]
49 What they meant is that if the conditions are not satisfied, then the causal structure does not entail the invariance, although there may exist some particular distribution compatible with the causal structure such that P(y|z) is invariant under some particular intervention on X. [sent-530, score-0.793]
50 From now on when we speak of invariance entailed by the causal DAG, we mean that the conditions in Proposition 18 are satisfied—or equivalently, that the invariance follows from an application of rule 2 or rule 3 in the DAG-based do-calculus. [sent-531, score-0.569]
51 For any A, B ∈ O and C ⊆ O that does not contain A or B, if there is a path d-connecting A and B given C in G that is into A, then there is a path m-connecting A and B given C in M that is either into A or contains an invisible edge out of A. [sent-556, score-0.712]
52 For example, given the MAG in Figure 3(a), P(L|G, S) is invariant under interventions on S, because the only m-connecting path between L and S given G is L, S , which contains a visible directed edge out of L, and so the relevant clause in Theorem 24, clause (1), is satisfied. [sent-586, score-0.816]
53 By contrast, P(L|G, S) is not entailed to be invariant under interventions on G given the MAG—in the sense that there exists a causal DAG compatible with the MAG given which P(L|G, S) is not entailed to be invariant under interventions on G—because clause (1) is not satisfied. [sent-587, score-0.845]
54 Furthermore, if there is an m-connecting path in M that is either into A or out of A with an invisible directed edge, then there is a definite m-connecting path in P that does not start with a definitely visible edge out of A. [sent-601, score-0.895]
55 P(L|G, S) is also invariant under interventions on S because the only definitely m-connecting path between L and S given {G} is S → L which contains a definitely visible edge out of S, satisfying the relevant clause—clause (1)—in Theorem 30. [sent-638, score-0.598]
56 We include in Appendix A a description of the inducing path graphs (IPGs) as well as their relationship to ancestral graphs. [sent-662, score-0.566]
57 As shown there, syntactically the class of ancestral graphs is a proper subclass of the class of inducing path graphs. [sent-663, score-0.585]
58 Regarding the case in Figure 5, for example, their theorems do not imply that P(W |Y, Z) is entailed to be invariant under interventions on X, due to the presence of the possibly m-connecting path in the graph (which in this case is also the POIPG). [sent-676, score-0.532]
59 In this paper we have provided some results about causal reasoning under weaker causal assumptions, represented by a maximal ancestral graph or a partial ancestral graph, the latter of which is fully testable with observational data (assuming the causal Faithfulness condition). [sent-682, score-1.227]
60 In particular, we justify an independent syntactic definition of inducing path graphs, which makes it clear that syntactically the class of MAGs is a subclass of inducing path graphs. [sent-708, score-0.841]
61 An inducing path graph (IPG) is a directed mixed graph, defined relative to a DAG, through the following construction: Input: a DAG G over O, L Output: an IPG IG over O 1. [sent-709, score-0.644]
62 for each pair of variables A, B ∈ O, A and B are adjacent in I G if and only if there is an inducing path between them relative to L in G ; 2. [sent-710, score-0.563]
63 for each pair of adjacent vertices A, B in IG , mark the A-end of the edge as an arrowhead if there is an inducing path between A and B that is into A, otherwise mark the A-end of the edge as a tail. [sent-711, score-0.898]
64 Furthermore, IG encodes information about inducing paths in the original graph, which in turn implies features of the original DAG that bear causal significance. [sent-713, score-0.519]
65 , On , O1 , then between any pair of adjacent nodes in the cycle, Oi and Oi+1 (1 ≤ i ≤ n and On+1 = O1 ), there is an inducing path between them in G relative 20. [sent-728, score-0.54]
66 By the construction, there is an inducing path relative to L in G between A and O 1 that is into O1 , and an inducing path relative to L in G between B and On that is into On , and for every 1 ≤ i ≤ i − 1, there is an inducing path relative to L in G between O i and Oi+1 that is into both. [sent-741, score-1.399]
67 By Lemma 32 in Appendix B, it follows that there is an inducing path between A and B relative to L in G , and so A and B should be adjacent in I , a contradiction. [sent-742, score-0.54]
68 So there is no inducing path between A and B relative to L in G , which means that A and B are not adjacent in IG . [sent-774, score-0.54]
69 Furthermore, if the said inducing path between V0 and V1 is into V0 , then s is into V0 , and if the said inducing path between Vn−1 and Vn is into Vn , then s is into Vn . [sent-796, score-0.822]
70 ) Lemma 32 gives a way to argue for the presence of an inducing path between two variables in a DAG, and hence is very useful for demonstrating that two variables are adjacent in the corresponding MAG. [sent-804, score-0.538]
71 Proof of Lemma 9 Proof Since there is an inducing path between A and B relative to L in G , A and B are adjacent in M . [sent-806, score-0.54]
72 To show this, it suffices to show that for any C, if in M there is an edge between C and A that is into A or there is a collider path between C and A that is into A and every vertex on the path is a parent of B, then C is adjacent to B, which means that the condition for visibility cannot be met. [sent-809, score-0.91]
73 Given u, u and the fact that A ∈ AnG (B), we can apply Lemma 32 to conclude that there is an inducing path between C and B relative to L in G , which means C and B are adjacent in M. [sent-814, score-0.54]
74 Case 2: There is a collider path p in M between C and A that is into A and every non-endpoint vertex on the path is a parent of B. [sent-815, score-0.69]
75 It follows that there is an inducing path in G between Vi and Vi+1 relative to L such that the path is into Vi+1 , and is into Vi unless possibly Vi = C. [sent-817, score-0.711]
76 Given these inducing paths and the fact that every variable other than C on p is an ancestor of B, we can apply Lemma 32 to conclude that there is an inducing path between C and B relative to L in G , which means C and B are adjacent in M . [sent-818, score-0.868]
77 Since X ← LXY → Y is an inducing path between X and Y relative to L in G , X and Y are adjacent in MG . [sent-835, score-0.54]
78 The edge between them is not Ok−1 ↔ B, for otherwise there would be an inducing path between X and Y relative to L in G that includes fewer observed variables than p does, which contradicts our choice of p. [sent-868, score-0.621]
79 The edge is not Om ← B on pain of making M non-ancestral; and the edge is not Om ↔ B on pain of creating an inducing path that includes fewer observed variables than p does. [sent-873, score-0.712]
80 So the edge between X and B in M is into B, but then there is an inducing path between X and Y relative to L in G that includes fewer observed variables than p does, which is a contradiction with our choice of p. [sent-878, score-0.621]
81 There is no inducing path between X and Y relative to L in G , and hence X and Y are not adjacent in MG . [sent-880, score-0.54]
82 1467 Z HANG For that purpose, we first establish two facts: (1) every directed edge in M GYX is also in MYX ; and (2) for every bi-directed edge S ↔ T in MGYX , S and T are also adjacent in MYX ; and the edge between S and T is either a bi-directed edge or an invisible directed edge in M YX . [sent-886, score-1.143]
83 Furthermore, because GYX is a subgraph of G , any inducing path between P and Q relative to L in GYX is also present in G , and any directed path from P to Q in the former is also present in the latter. [sent-889, score-0.788]
84 Suppose for the sake of contradiction that S → T is visible in MYX , that there is a vertex R not adjacent to T , such that either R∗→ S is in MYX or there is a collider path c in MYX between R and S that is into S on which every collider is a parent of T . [sent-902, score-0.739]
85 Because R → T is visible, by definition, there is a vertex Q not adjacent to T such that Q∗→ R is in M or there is a collider path in M between Q 1468 C AUSAL R EASONING WITH A NCESTRAL G RAPHS and R that is into R on which every collider is a parent of T . [sent-925, score-0.658]
86 On the other hand, suppose there is a collider path c into R on which every collider is a parent of T . [sent-930, score-0.527]
87 Then if there is a collider P on c such that P ↔ S is in M , we immediately get a collider path between Q and S that is into S on which every collider is a parent of T . [sent-931, score-0.641]
88 Case 2: There is a collider path c in MYX between R and S that is into S on which every collider is a parent of T . [sent-934, score-0.527]
89 Thus the edge between S and T is either a bi-directed edge or an invisible directed edge in MYX . [sent-948, score-0.638]
90 1469 Z HANG Proof of Lemma 16 Proof It is not hard to check that for any two variables P, Q ∈ O, if P and Q are adjacent in M YX , then they are adjacent in PYX (though the converse is not necessarily true, because an edge not definitely visible in P may still be visible in M ). [sent-965, score-0.519]
91 23 Let p be a minimal d-connecting path between A and B relative to C in G that is into A, minimal in the sense that no other d-connecting path between A and B relative to C that is into A is composed of fewer variables than p is. [sent-970, score-0.573]
92 For each 0 ≤ n ≤ m − 1, if T[n] is in O, then S0 [n] = T[n]; otherwise T[n] is a (latent) collider on p, which, by the fact that p is d-connecting given C, implies that there is a directed path from T[n] to a member of C. [sent-975, score-0.506]
93 Proof of Lemma 23 Proof Note that because A is not an ancestor of any member of C, if there is a path out of A dconnecting A and B given C in G , the path must be a directed path from A to B. [sent-1011, score-0.925]
94 The sub-sequence of that path consisting of observed variables then constitutes a directed path from A to B in M , which is of course out of A and also m-connecting given C in M . [sent-1013, score-0.579]
95 All we need to show is that if the edge between A and D is not a definitely visible edge A → D in P , then there exists a MAG represented by P in which the edge between A and D is not a visible edge out of A. [sent-1017, score-0.749]
96 Obviously if the edge in P is not A → D, there exists a MAG represented in P in which the edge is not A → D, which follows from the fact that P , by definition, displays all edge marks that are shared by all MAGs in the equivalence class. [sent-1018, score-0.535]
97 Then there exists in M a vertex E not adjacent to D such that either E∗→ A or there is a collider path between E and A that is into A and every collider on the path is a parent of D. [sent-1028, score-0.885]
98 Proof of Lemma 28 Proof Note that since A does not have a descendant in C, an m-connecting path out of A given C in M has to be a directed path from A to B such that every vertex on the path is not in C. [sent-1056, score-0.874]
99 Then a shortest such path has to be uncovered,26 and so will correspond to a definite m-connecting path between A and B given C in P (on which every vertex is a definite non-collider). [sent-1057, score-0.526]
100 A triple of vertices X,Y, Z in a graph is called an unshielded triple if there is an edge between X and Y , an edge between Y and Z, but no edge between X and Z. [sent-1063, score-0.522]
wordName wordTfidf (topN-words)
[('mag', 0.54), ('causal', 0.292), ('pag', 0.287), ('dag', 0.249), ('path', 0.227), ('myx', 0.224), ('inducing', 0.184), ('mags', 0.178), ('edge', 0.139), ('spirtes', 0.138), ('ancestral', 0.121), ('invisible', 0.119), ('intervention', 0.115), ('collider', 0.114), ('interventions', 0.111), ('directed', 0.102), ('mgyx', 0.092), ('easoning', 0.082), ('ncestral', 0.082), ('visible', 0.081), ('adjacent', 0.081), ('entailed', 0.08), ('ancestor', 0.079), ('latent', 0.073), ('ipg', 0.069), ('invariance', 0.067), ('calculus', 0.065), ('member', 0.063), ('raphs', 0.063), ('dags', 0.06), ('clause', 0.058), ('gyx', 0.058), ('sk', 0.058), ('pags', 0.054), ('ausal', 0.053), ('hang', 0.053), ('ang', 0.05), ('parent', 0.05), ('vertex', 0.05), ('oi', 0.05), ('edges', 0.049), ('equivalence', 0.049), ('graph', 0.049), ('pearl', 0.048), ('relative', 0.048), ('richardson', 0.047), ('lemma', 0.047), ('arrowhead', 0.045), ('paths', 0.043), ('nitely', 0.042), ('pyx', 0.041), ('supergraph', 0.041), ('invariant', 0.04), ('manipulations', 0.039), ('markov', 0.039), ('marks', 0.038), ('ok', 0.038), ('vertices', 0.037), ('psh', 0.037), ('mg', 0.036), ('yx', 0.035), ('vi', 0.034), ('mixed', 0.034), ('graphs', 0.034), ('ig', 0.033), ('converse', 0.033), ('compatible', 0.033), ('represented', 0.031), ('px', 0.031), ('observational', 0.029), ('causally', 0.028), ('lab', 0.028), ('proposition', 0.028), ('possibleanp', 0.027), ('relations', 0.026), ('possibly', 0.025), ('obviously', 0.024), ('deletes', 0.023), ('variables', 0.023), ('anm', 0.023), ('mconnecting', 0.023), ('poipg', 0.023), ('zhang', 0.023), ('mark', 0.023), ('vn', 0.022), ('every', 0.022), ('independence', 0.022), ('conditions', 0.021), ('rule', 0.021), ('colliders', 0.021), ('graphical', 0.02), ('descendant', 0.019), ('manipulated', 0.019), ('incident', 0.019), ('om', 0.019), ('syntactically', 0.019), ('unshielded', 0.019), ('proof', 0.019), ('auai', 0.018), ('footnote', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
3 0.21207589 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
4 0.20212118 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
5 0.19929232 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
6 0.15637997 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
7 0.13622151 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
8 0.12949243 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
9 0.069440596 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
10 0.058229715 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees
11 0.052724641 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
12 0.035717189 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
13 0.032442685 9 jmlr-2008-Active Learning by Spherical Subdivision
14 0.028737377 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
15 0.025746586 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
17 0.021944774 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
18 0.019694809 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
19 0.019598933 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
20 0.01907203 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
topicId topicWeight
[(0, 0.212), (1, 0.485), (2, 0.018), (3, 0.025), (4, 0.09), (5, -0.078), (6, -0.0), (7, 0.006), (8, -0.03), (9, -0.038), (10, 0.053), (11, 0.02), (12, 0.095), (13, -0.31), (14, 0.106), (15, -0.011), (16, 0.028), (17, -0.033), (18, -0.095), (19, -0.022), (20, -0.005), (21, -0.069), (22, 0.027), (23, 0.038), (24, 0.014), (25, -0.077), (26, -0.048), (27, 0.049), (28, 0.024), (29, -0.029), (30, 0.077), (31, 0.028), (32, -0.075), (33, 0.022), (34, 0.063), (35, -0.046), (36, 0.066), (37, 0.011), (38, -0.003), (39, 0.009), (40, 0.031), (41, -0.03), (42, -0.009), (43, -0.028), (44, -0.006), (45, 0.133), (46, -0.04), (47, 0.005), (48, -0.003), (49, -0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.9740507 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
2 0.73968363 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
3 0.71957743 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
5 0.65883392 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
Author: Jean-Philippe Pellet, André Elisseeff
Abstract: We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw , with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy. Keywords: causal structure learning, feature selection, Markov blanket, partial correlation, statistical test of conditional independence
6 0.6350497 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
7 0.4735001 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
8 0.37285081 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
9 0.2497122 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees
10 0.24001925 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
11 0.17464152 61 jmlr-2008-Mixed Membership Stochastic Blockmodels
12 0.15596813 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
13 0.13204399 9 jmlr-2008-Active Learning by Spherical Subdivision
14 0.10835762 60 jmlr-2008-Minimal Nonlinear Distortion Principle for Nonlinear Independent Component Analysis
15 0.10793691 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
17 0.10154138 38 jmlr-2008-Generalization from Observed to Unobserved Features by Clustering
18 0.10071874 1 jmlr-2008-A Bahadur Representation of the Linear Support Vector Machine
19 0.099148273 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
20 0.093417875 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
topicId topicWeight
[(0, 0.071), (5, 0.015), (17, 0.016), (26, 0.356), (31, 0.011), (40, 0.016), (54, 0.025), (58, 0.028), (61, 0.072), (66, 0.037), (76, 0.034), (87, 0.063), (88, 0.05), (92, 0.057), (94, 0.028), (99, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.73549449 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
3 0.35259533 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
4 0.32248896 26 jmlr-2008-Consistency of Trace Norm Minimization
Author: Francis R. Bach
Abstract: Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled. Keywords: convex optimization, singular value decomposition, trace norm, consistency
5 0.32205003 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
Author: Mathias Drton, Thomas S. Richardson
Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution
6 0.32068908 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
7 0.31563836 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
8 0.29233405 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
9 0.28809249 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
10 0.28500247 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
11 0.28453594 15 jmlr-2008-An Information Criterion for Variable Selection in Support Vector Machines (Special Topic on Model Selection)
12 0.27673557 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers
13 0.27515072 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss
14 0.26975304 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
15 0.26968613 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning
16 0.26449436 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
17 0.26124918 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
18 0.25995424 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
19 0.25958925 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
20 0.25770718 9 jmlr-2008-Active Learning by Spherical Subdivision