jmlr jmlr2008 jmlr2008-24 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
Reference: text
sentIndex sentText sentNum sentScore
1 A key point here is that in order to make causal inferences from statistics, additional causal assumptions are needed. [sent-29, score-0.73]
2 In this paper, the language that we use for this purpose is the language of graphs, and our causal assumptions will be encoded by a special directed graph called a causal diagram. [sent-34, score-0.835]
3 A causal diagram encodes variables of interest as nodes, and possible direct causal influences between two variables as arrows. [sent-37, score-0.785]
4 Associated with each node in a causal diagram is a stable causal mechanism which determines its value in terms of the values of its parents. [sent-38, score-0.836]
5 The crucial feature of causal effect queries which distinguishes them from more complex questions in our hierarchy is that they are restricted to the post-intervention world alone. [sent-45, score-0.55]
6 The third and final class of queries we consider are counterfactual or “what-if” questions which arise when we simultaneously ask about multiple hypothetical worlds, with potentially conflicting 1942 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY interventions or observations. [sent-46, score-0.72]
7 If everything about our causal domain is known, in other words if we have knowledge of both the causal mechanisms and the distributions over unobservable variables, it is possible to compute counterfactual questions directly (Balke and Pearl, 1994b). [sent-50, score-1.366]
8 We therefore consider the more practical question of how to compute counterfactual questions from both experimental studies and the structure of the causal diagram. [sent-52, score-0.936]
9 Section 4 considers identification of counterfactual queries, while Section 5 summarizes the conclusions. [sent-61, score-0.531]
10 Notation and Definitions The primary object of causal inquiry is a probabilistic causal model. [sent-66, score-0.73]
11 The causal diagram, our vehicle for expressing causal assumptions, is defined by the causal model as follows. [sent-83, score-1.095]
12 A graph defined as above from a causal model M is said to be a causal diagram induced by M. [sent-90, score-0.89]
13 For any given u, the effect of do(x) on a set of variables Y will be represented by counterfactual variables Yx (u), where Y ∈ Y. [sent-101, score-0.557]
14 We now define joint probabilities on counterfactuals, in multiple worlds, which will serve as the formalization of counterfactual queries. [sent-105, score-0.531]
15 In this way, P(u) induces a distribution on all possible counterfactual variables in M. [sent-120, score-0.531]
16 In this paper, we will represent counterfactual utterances by joint distributions such as P(γ) or conditional distributions such as P(γ|δ), where γ and δ are conjunctions of counterfactual events. [sent-121, score-1.062]
17 First, having defined causal diagrams induced by natural causal models, we consider the graphs induced by models derived from interventional and counterfactual queries. [sent-167, score-1.382]
18 ∧ yk k , as we already discussed invokes multiple hypothetical causal x x worlds, each represented by a submodel, where all worlds share the same background context U. [sent-173, score-0.536]
19 In Section 4 we discuss this issue in more detail and suggest a more appropriate graphical representation of counterfactual situations. [sent-176, score-0.531]
20 The goal of this paper is a complete characterization of causal graphs which permit the answering of causal queries of a given type. [sent-186, score-0.875]
21 Since intervening on X cuts off X from the normal causal influences of its parents in the graph, we can interpret the causal effect of X on Y as the flow of dependence which leaves X via outgoing arrows only. [sent-209, score-0.827]
22 Such counterfactual questions will be considered in the next section. [sent-369, score-0.571]
23 We conclude this section by showing that our notion of a causal theory as a set of independencies embodied by the causal graph, together with rules of probability and do-calculus is complete for computing causal effects, if we also take statistical data embodied by P(v) as axiomatic. [sent-401, score-1.095]
24 1954 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY A A=false A*=true H H H* (a) (b) Figure 8: (a) A causal graph for the aspirin/headache domain (b) A corresponding twin network ∗ graph for the query P(Ha∗ =true |A = f alse). [sent-402, score-0.699]
25 An informal counterfactual statement in natural language such as “would I have a headache had I taken an aspirin” talks about multiple worlds: the actual world, and other, hypothetical worlds which differ in some small respect from the actual world (e. [sent-409, score-0.802]
26 In this paper, we represent the actual world by a causal model in its natural state, devoid of any interventions, while the alternative worlds are represented by submodels M x where the action do(x) implements the hypothetical change from the actual state of affairs considered. [sent-412, score-0.587]
27 People make sense of informal statements involving multiple, possibly conflicting worlds because they expect not only the causal rules to be invariant across these worlds (e. [sent-413, score-0.619]
28 ” The actual world referenced by this query is represented by a causal model containing two variables, headache and aspirin, with aspirin being a parent of headache, see Fig. [sent-419, score-0.735]
29 The two worlds share the background variables U, and so can be represented by a 1955 S HPITSER AND P EARL single causal model with the graph shown in Fig. [sent-424, score-0.597]
30 The graphs representing two hypothetical worlds invoked by a counterfactual query like the one shown in Fig. [sent-430, score-0.865]
31 Instead, the typical state of knowledge of a causal domain is the statistical behavior of the observable variables in the domain, summarized by the distribution P(v), together with knowledge of causal directionality, obtained either from expert judgment (e. [sent-436, score-0.813]
32 Nevertheless, there are reasons to consider computing counterfactual quantities from experimental, rather than observational studies. [sent-442, score-0.56]
33 In general, a counterfactual can posit worlds with features contradictory to what has actually been observed. [sent-443, score-0.658]
34 The question that we ask in this section, then, is whether it is possible to identify a query P(γ|δ), where γ, δ are conjunctions of counterfactual events (with δ possibly empty), from the graph G and the set of all experiments P∗ . [sent-449, score-0.749]
35 Before tackling the problem of identifying counterfactual queries from experiments, we extend our example in Fig. [sent-452, score-0.623]
36 8 (b) to a general graphical representation for worlds invoked by a counterfactual query. [sent-453, score-0.658]
37 (a) Original causal diagram (b) Parallel worlds graph for P(y x |x , zd , d) (the two nodes denoted by U are the same). [sent-458, score-0.802]
38 9 shows the original causal graph and the parallel worlds graph for γ = yx ∧ x ∧ zd ∧ d. [sent-468, score-0.97]
39 The first is that by removing node duplication, we also determine which syntactically distinct counterfactual variables correspond to the same random variable. [sent-479, score-0.582]
40 By identifying such equivalence classes of counterfactual variables, we guarantee that syntactically different variables are in fact different, and this makes it simpler to reason about counterfactuals in order to identify them. [sent-480, score-0.681]
41 For instance, a counterfactual P(y x , y ) may either be non-identifiable or inconsistent (and so identifiable to equal 0), depending on whether Yx and Y are the same variable. [sent-481, score-0.531]
42 If two distinct nodes in a causal diagram represent the same random variable, the diagram contains redundant information, and the nodes must be merged. [sent-495, score-0.649]
43 The new node ω we obtain from Lemma 25 can be thought of as a new counterfactual variable. [sent-506, score-0.582]
44 Since we replaced α, β by ω, we replace any mention of α, β in our given counterfactual query P(γ) by ω. [sent-512, score-0.613]
45 1958 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY function make-cg(G, γ) INPUT: G a causal diagram, γ a conjunction of counterfactual events OUTPUT: A counterfactual graph Gγ , and either a set of events γ s. [sent-513, score-1.594]
46 Figure 10: An algorithm for constructing counterfactual graphs Note that since α, β are the same, their value assignments must be the same (say equal to y). [sent-523, score-0.641]
47 The first is that if variables Yx ,Yz were judged to be the same by Lemma 24, but γ assigns them different values, this implies that the original set of counterfactual events γ is inconsistent, and so P(γ) = 0. [sent-530, score-0.562]
48 Finally, because the algorithm can make an arbitrary choice picking a parent of ω each time Lemma 25 is applied, both the counterfactual graph G , and the corresponding modified counterfactual γ are not unique. [sent-534, score-1.243]
49 However, in our counterfactual query, which is P(y x |x , zd , d), the variable D happens to be observed to attain the value d, the same as the intervention value for the parent of Zd . [sent-554, score-0.735]
50 9 (c), which is a counterfactual graph for our query. [sent-589, score-0.636]
51 Having constructed a graphical representation of worlds mentioned in counterfactual queries, we can turn to identification. [sent-590, score-0.658]
52 We construct two algorithms for this task, the first is called ID* and works for unconditional queries, while the second, IDC*, works on queries with counterfactual evidence and calls the first as a subroutine. [sent-591, score-0.595]
53 ) the set of values (either set or observed) appearing in a given counterfactual conjunction (or set of counterfactual events), while val(. [sent-597, score-1.062]
54 ) is the value assigned to a given counterfactual variable. [sent-598, score-0.531]
55 This notation is used to extract variables and values present in the original causal model from a counterfactual which refers to parallel worlds. [sent-599, score-0.92]
56 A counterfactual sr , where s, r are value assignments to sets of nodes, represents the event “the node set S attains values s under intervention do(r). [sent-603, score-0.676]
57 to mean some counterfactual variable derived from X where x appears in the subscript (the rest of the subscript can be arbitrary), which also attains value x. [sent-607, score-0.599]
58 The second line states that if γ contains a counterfactual which violates the Axiom of Effectiveness (Pearl, 2000), then γ is inconsistent, and we return probability 0. [sent-611, score-0.607]
59 The third line states that if a counterfactual contains its own value in the subscript, then it is a tautological event, and it can be removed from γ without affecting its probability. [sent-612, score-0.581]
60 Line 4 invokes make-cg to construct a counterfactual graph G , and the corresponding relabeled counterfactual γ . [sent-613, score-1.167]
61 Line 5 returns probability 0 if an inconsistency was found during the construction of the counterfactual graph, for example, if two variables found to be the same in γ had different value assignments. [sent-614, score-0.531]
62 Line 6 is analogous to Line 4 in the ID algorithm, it decomposes the problem into a set of subproblems, one for each C-component in the counterfactual graph. [sent-615, score-0.531]
63 Line 7 is the base case, where our counterfactual graph has a single Ccomponent. [sent-618, score-0.636]
64 The problem with counterfactual distributions is there is no simple way to prevent non-positive distributions spanning multiple worlds from arising, even if the original P(v) was positive—hence the explicit check. [sent-625, score-0.658]
65 The second line constructs the counterfactual graph, except since make-cg can only take conjunctions, we provide it with a joint counterfactual γ ∧ δ. [sent-626, score-1.112]
66 Here in IDC*, we move a counterfactual value assignment Yx = y from being observed (that is being a part of δ), to being fixed (that is appearing in every subscript of γ ) if there are no back-door paths from Yx to the counterfactual of interest γ . [sent-630, score-1.136]
67 Finally, line 5 of IDC* is the analogue of line 2 of IDC, we attempt to identify a joint counterfactual probability, and then obtain a conditional counterfactual probability from the result. [sent-631, score-1.162]
68 Since P(x , zd , d) is not inconsistent, we proceed to construct the counterfactual graph on line 2. [sent-633, score-0.749]
69 Note that since the subscripts in one of the variables of our query changed, the counterfactual graph generated will change as well. [sent-637, score-0.744]
70 This is not a coincidence, since a counterfactual graph can be thought of as a causal graph for a particular large causal model which happens to have some distinct nodes share the same causal mechanisms. [sent-649, score-1.923]
71 This means that all the theorems and definitions used in the previous sections for causal diagrams transfer over without change to counterfactual graphs. [sent-650, score-0.896]
72 Note that since Wx,z is a counterfactual variable derived from W , it shares its domain with W . [sent-660, score-0.531]
73 1963 S HPITSER AND P EARL of counterfactual events does not contain conflicting value assignments to any variable, obtained either by observation or intervention, then taking the union of all actions of the events results in a consistent action. [sent-662, score-0.622]
74 We catalogue all difficult counterfactual graphs which arise from queries which cannot be identified from P∗ . [sent-666, score-0.676]
75 The simplest difficult counterfactual graph arises from the query P(y x , yx ) named “probability of necessity and sufficiency” by Pearl (2000). [sent-669, score-0.899]
76 The intuitive explanation for this result is that P(yx , yx ) is derived from the joint distribution over the counterfactual variables in the w-graph, while if we restrict ourselves to P∗ , we only have access to marginal distributions—one marginal for each possible world. [sent-680, score-0.712]
77 Because counterfactual variables Yx and Yx share an unobserved parent U, they are dependent, and their joint distribution cannot be decomposed into a product of marginals. [sent-681, score-0.607]
78 This intuitive argument can be generalized to a counterfactual graph with more than two nodes, the so-called “zig-zag graphs” an example of which is shown in Fig. [sent-683, score-0.636]
79 In order to continue, we must provide two lemmas which allow us to transform difficult graphs in various ways by adding nodes and edges, while retaining the non-identifiability of the underlying counterfactual from P∗ . [sent-702, score-0.699]
80 , yn m } be a subset x x of counterfactual events in γ. [sent-707, score-0.562]
81 x x 1964 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY X Y W 1 x W 2 Z Y (a) x’ W1 W2 Z (b) Figure 13: (a) Causal diagram (b) Corresponding counterfactual graph for the non-identifiable query P(Yx ,W 1 ,W 2 , Zx ). [sent-720, score-0.773]
82 If a given P(γ) cannot be identified from P∗ , this implies that there exist two models which agree on P∗ , but disagree on P(γ), where γ is a set of counterfactual causes. [sent-723, score-0.581]
83 It is then possible to augment these models using the one-to-one function in question to obtain disagreement on P(δ), where δ is a set of counterfactual effects of γ. [sent-724, score-0.578]
84 If we have a causal model with a graph G where some counterfactual P(γ) is not identifiable, then a coarser, more “near-sighted” view of G which merges two distinct variables with their own mechanisms into a single variable with a single mechanism will not render P(γ) identifiable. [sent-740, score-1.034]
85 It turns out that whenever ID* fails on P(γ), the corresponding counterfactual graph contains a subgraph which can be obtained by a set of applications of the previous two lemmas to the w-graph and the zig-zag graphs. [sent-745, score-0.662]
86 Since ID* is complete for P(γ) queries, we can give a graphical characterization of counterfactual graphs where P(γ) cannot be identified from P∗ . [sent-748, score-0.612]
87 Specifically, we provide complete algorithms for identifying causal effects and conditional causal effects from observational studies, and show that a graphical structure called a hedge completely characterizes all cases where causal effects are non-identifiable. [sent-756, score-1.398]
88 We also provide complete algorithms for identifying counterfactual queries (possibly conditional) from experimental studies. [sent-758, score-0.623]
89 If we view the structure of the causal graph as experimentally testable, as is often the case in practice, this result can be viewed as giving a full characterization of testable counterfactuals assuming structural semantics. [sent-759, score-0.592]
90 The most important point in these proofs is that counterfactual graphs are generally no different from causal diagrams discussed in Sections 2 and 3, with their only special feature being that by construction, some nodes in the graph happen to share functions. [sent-1037, score-1.169]
91 But we know that the counterfactual graph is just a causal diagram for a model where some nodes share functions, so the same reasoning applies. [sent-1053, score-1.143]
92 To show completeness of ID* and IDC*, we first prove a utility lemma which will make it easier to construct counterexamples which agree on P∗ but disagree on a given counterfactual query. [sent-1055, score-0.628]
93 The next result generalizes Lemma 27 to a wider set of counterfactual graphs which result from non-identifiable queries. [sent-1063, score-0.612]
94 To obtain a full characterization of non-identifiable counterfactual graphs, we augment the difficult graphs we obtained from the previous two results using certain graph transformation rules which preserve non-identifiability. [sent-1094, score-0.717]
95 , yn m } be a subset of counterfactual events in γ. [sent-1100, score-0.562]
96 However, since γ is left alone if X,Y do not occur in γ (the third condition), only the remaining four of these conditions result in an actual modification of a counterfactual variable in γ. [sent-1143, score-0.531]
97 The first clearly results in the same counterfactual variable. [sent-1145, score-0.531]
98 Proof The difficult step is to show that after line 5 is reached, if P∗ , G id P(γ, δ) then P∗ , G id P(γ|δ). [sent-1185, score-0.622]
99 Augment the counterexample models which induce counterfactual graph G with an additional binary node for Y , and let the value of D be set as the old value plus Y modulo |D|. [sent-1193, score-0.687]
100 In case 1, we must construct the distributions along the back-door path in such a way that if P∗ , G id P(Y |δ ) then P∗ , G id P(Y |δ ), where Y is a node preceding Y on the path. [sent-1202, score-0.655]
wordName wordTfidf (topN-words)
[('counterfactual', 0.531), ('causal', 0.365), ('px', 0.311), ('id', 0.286), ('yx', 0.181), ('idc', 0.17), ('worlds', 0.127), ('counterfactuals', 0.122), ('aspirin', 0.112), ('graph', 0.105), ('hedge', 0.105), ('dentification', 0.101), ('earl', 0.101), ('hpitser', 0.101), ('ierarchy', 0.101), ('identi', 0.087), ('nodes', 0.087), ('observable', 0.083), ('query', 0.082), ('graphs', 0.081), ('judea', 0.081), ('pearl', 0.08), ('parity', 0.079), ('parent', 0.076), ('headache', 0.074), ('omplete', 0.07), ('ethods', 0.065), ('ausal', 0.065), ('intervention', 0.065), ('queries', 0.064), ('zd', 0.063), ('submodel', 0.063), ('bidirected', 0.06), ('mod', 0.058), ('soundness', 0.055), ('diagram', 0.055), ('node', 0.051), ('line', 0.05), ('agree', 0.05), ('mx', 0.048), ('effects', 0.047), ('lemma', 0.047), ('si', 0.046), ('child', 0.045), ('bit', 0.045), ('hypothetical', 0.044), ('parents', 0.043), ('twin', 0.042), ('yw', 0.042), ('able', 0.042), ('interventions', 0.041), ('interventional', 0.04), ('paths', 0.04), ('questions', 0.04), ('balke', 0.037), ('arrow', 0.037), ('arcs', 0.037), ('zx', 0.035), ('subscript', 0.034), ('mechanisms', 0.033), ('path', 0.032), ('bow', 0.032), ('ilya', 0.032), ('shpitser', 0.032), ('unobservable', 0.032), ('events', 0.031), ('merge', 0.031), ('arc', 0.03), ('incoming', 0.03), ('hierarchy', 0.029), ('observational', 0.029), ('odd', 0.029), ('assignments', 0.029), ('wx', 0.028), ('downward', 0.028), ('identifying', 0.028), ('arrows', 0.028), ('ow', 0.028), ('proof', 0.027), ('alse', 0.027), ('avin', 0.027), ('confounded', 0.027), ('nonidenti', 0.027), ('pcm', 0.027), ('effect', 0.026), ('return', 0.026), ('gx', 0.026), ('subscripts', 0.026), ('icting', 0.026), ('sub', 0.026), ('world', 0.026), ('subgraph', 0.026), ('action', 0.025), ('parallel', 0.024), ('theorem', 0.024), ('ws', 0.024), ('ancestors', 0.024), ('pv', 0.024), ('ui', 0.024), ('pa', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
2 0.19929232 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
3 0.15546399 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
Author: Jean-Philippe Pellet, André Elisseeff
Abstract: We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw , with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy. Keywords: causal structure learning, feature selection, Markov blanket, partial correlation, statistical test of conditional independence
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
5 0.1373857 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
6 0.086737901 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
7 0.077085726 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
8 0.074457079 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
9 0.065543093 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
10 0.060197558 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
11 0.059023131 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
13 0.044034235 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
14 0.040829044 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
15 0.04057955 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
16 0.030813318 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
17 0.028125597 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
18 0.026780231 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming
19 0.026725622 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces
20 0.026224216 64 jmlr-2008-Model Selection in Kernel Based Regression using the Influence Function (Special Topic on Model Selection)
topicId topicWeight
[(0, 0.201), (1, 0.303), (2, 0.019), (3, 0.009), (4, 0.021), (5, -0.073), (6, 0.02), (7, 0.001), (8, -0.045), (9, -0.066), (10, -0.015), (11, 0.088), (12, 0.066), (13, -0.399), (14, 0.159), (15, -0.016), (16, 0.103), (17, 0.014), (18, -0.196), (19, -0.041), (20, -0.037), (21, -0.016), (22, 0.026), (23, 0.098), (24, -0.07), (25, -0.036), (26, -0.062), (27, -0.057), (28, -0.008), (29, -0.062), (30, -0.089), (31, -0.014), (32, 0.058), (33, -0.102), (34, 0.017), (35, 0.081), (36, -0.043), (37, -0.023), (38, -0.001), (39, -0.135), (40, 0.027), (41, 0.137), (42, 0.003), (43, -0.017), (44, -0.022), (45, -0.046), (46, -0.071), (47, -0.103), (48, -0.034), (49, -0.106)]
simIndex simValue paperId paperTitle
same-paper 1 0.95972103 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
2 0.6923033 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
3 0.63766927 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
Author: Jean-Philippe Pellet, André Elisseeff
Abstract: We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw , with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy. Keywords: causal structure learning, feature selection, Markov blanket, partial correlation, statistical test of conditional independence
4 0.56888193 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
Author: Tianjiao Chu, Clark Glymour
Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efficiently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model specification procedure is able to extract relatively detailed causal information. We investigate the finite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
6 0.35010046 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction
7 0.30062476 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment
8 0.27372465 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
9 0.2566945 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
10 0.20647739 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
12 0.17711051 47 jmlr-2008-Learning Balls of Strings from Edit Corrections
13 0.1765593 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
14 0.16277802 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming
15 0.16090822 64 jmlr-2008-Model Selection in Kernel Based Regression using the Influence Function (Special Topic on Model Selection)
16 0.14658457 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data
17 0.13804644 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning
18 0.13595077 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
19 0.13391255 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces
20 0.13341169 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
topicId topicWeight
[(0, 0.03), (1, 0.011), (5, 0.018), (17, 0.401), (30, 0.012), (31, 0.031), (40, 0.028), (54, 0.035), (58, 0.03), (61, 0.021), (66, 0.043), (76, 0.05), (78, 0.011), (87, 0.023), (88, 0.072), (92, 0.034), (94, 0.036), (99, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.77572489 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
Author: Ilya Shpitser, Judea Pearl
Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identification
2 0.28757811 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while at the same time allow for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial both in the size of the graph and the treewidth bound. At the heart of our method is a dynamic triangulation that we update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life data sets and show that it is superior to the greedy approach as soon as the bound on the treewidth is nontrivial. Importantly, we also show that by making use of global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. Keywords: Bayesian networks, structure learning, model selection, bounded treewidth
3 0.2866807 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
Author: Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, Peter L. Bartlett
Abstract: Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or maxmargin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O( 1 ) EG ε updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log( 1 )) updates are required. For both the max-margin and log-linear cases, our bounds suggest ε that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods. Keywords: exponentiated gradient, log-linear models, maximum-margin models, structured prediction, conditional random fields ∗. These authors contributed equally. c 2008 Michael Col
4 0.28073409 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
Author: Zongming Ma, Xianchao Xie, Zhi Geng
Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning
5 0.27982712 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
Author: Matthias W. Seeger
Abstract: The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing
6 0.27218235 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers
7 0.27200061 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning
8 0.27033651 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
10 0.26959139 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
11 0.26957718 57 jmlr-2008-Manifold Learning: The Price of Normalization
12 0.26900035 58 jmlr-2008-Max-margin Classification of Data with Absent Features
13 0.26882043 9 jmlr-2008-Active Learning by Spherical Subdivision
14 0.26845682 56 jmlr-2008-Magic Moments for Structured Output Prediction
15 0.26831806 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration
16 0.26612335 83 jmlr-2008-Robust Submodular Observation Selection
17 0.2648837 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection
18 0.26455784 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
19 0.26375449 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
20 0.26320326 86 jmlr-2008-SimpleMKL