jmlr jmlr2008 jmlr2008-24 knowledge-graph by maker-knowledge-mining

24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)

Source: pdf

Author: Ilya Shpitser, Judea Pearl

Abstract: We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple “parallel worlds” and resulting from simultaneous, possibly conﬂicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Speciﬁcally, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. Keywords: causality, graphical causal models, identiﬁcation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A key point here is that in order to make causal inferences from statistics, additional causal assumptions are needed. [sent-29, score-0.73]

2 In this paper, the language that we use for this purpose is the language of graphs, and our causal assumptions will be encoded by a special directed graph called a causal diagram. [sent-34, score-0.835]

3 A causal diagram encodes variables of interest as nodes, and possible direct causal inﬂuences between two variables as arrows. [sent-37, score-0.785]

4 Associated with each node in a causal diagram is a stable causal mechanism which determines its value in terms of the values of its parents. [sent-38, score-0.836]

5 The crucial feature of causal effect queries which distinguishes them from more complex questions in our hierarchy is that they are restricted to the post-intervention world alone. [sent-45, score-0.55]

6 The third and ﬁnal class of queries we consider are counterfactual or “what-if” questions which arise when we simultaneously ask about multiple hypothetical worlds, with potentially conﬂicting 1942 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY interventions or observations. [sent-46, score-0.72]

7 If everything about our causal domain is known, in other words if we have knowledge of both the causal mechanisms and the distributions over unobservable variables, it is possible to compute counterfactual questions directly (Balke and Pearl, 1994b). [sent-50, score-1.366]

8 We therefore consider the more practical question of how to compute counterfactual questions from both experimental studies and the structure of the causal diagram. [sent-52, score-0.936]

9 Section 4 considers identiﬁcation of counterfactual queries, while Section 5 summarizes the conclusions. [sent-61, score-0.531]

10 Notation and Deﬁnitions The primary object of causal inquiry is a probabilistic causal model. [sent-66, score-0.73]

11 The causal diagram, our vehicle for expressing causal assumptions, is deﬁned by the causal model as follows. [sent-83, score-1.095]

12 A graph deﬁned as above from a causal model M is said to be a causal diagram induced by M. [sent-90, score-0.89]

13 For any given u, the effect of do(x) on a set of variables Y will be represented by counterfactual variables Yx (u), where Y ∈ Y. [sent-101, score-0.557]

14 We now deﬁne joint probabilities on counterfactuals, in multiple worlds, which will serve as the formalization of counterfactual queries. [sent-105, score-0.531]

15 In this way, P(u) induces a distribution on all possible counterfactual variables in M. [sent-120, score-0.531]

16 In this paper, we will represent counterfactual utterances by joint distributions such as P(γ) or conditional distributions such as P(γ|δ), where γ and δ are conjunctions of counterfactual events. [sent-121, score-1.062]

17 First, having deﬁned causal diagrams induced by natural causal models, we consider the graphs induced by models derived from interventional and counterfactual queries. [sent-167, score-1.382]

18 ∧ yk k , as we already discussed invokes multiple hypothetical causal x x worlds, each represented by a submodel, where all worlds share the same background context U. [sent-173, score-0.536]

19 In Section 4 we discuss this issue in more detail and suggest a more appropriate graphical representation of counterfactual situations. [sent-176, score-0.531]

20 The goal of this paper is a complete characterization of causal graphs which permit the answering of causal queries of a given type. [sent-186, score-0.875]

21 Since intervening on X cuts off X from the normal causal inﬂuences of its parents in the graph, we can interpret the causal effect of X on Y as the ﬂow of dependence which leaves X via outgoing arrows only. [sent-209, score-0.827]

22 Such counterfactual questions will be considered in the next section. [sent-369, score-0.571]

23 We conclude this section by showing that our notion of a causal theory as a set of independencies embodied by the causal graph, together with rules of probability and do-calculus is complete for computing causal effects, if we also take statistical data embodied by P(v) as axiomatic. [sent-401, score-1.095]

24 1954 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY A A=false A*=true H H H* (a) (b) Figure 8: (a) A causal graph for the aspirin/headache domain (b) A corresponding twin network ∗ graph for the query P(Ha∗ =true |A = f alse). [sent-402, score-0.699]

25 An informal counterfactual statement in natural language such as “would I have a headache had I taken an aspirin” talks about multiple worlds: the actual world, and other, hypothetical worlds which differ in some small respect from the actual world (e. [sent-409, score-0.802]

26 In this paper, we represent the actual world by a causal model in its natural state, devoid of any interventions, while the alternative worlds are represented by submodels M x where the action do(x) implements the hypothetical change from the actual state of affairs considered. [sent-412, score-0.587]

27 People make sense of informal statements involving multiple, possibly conﬂicting worlds because they expect not only the causal rules to be invariant across these worlds (e. [sent-413, score-0.619]

28 ” The actual world referenced by this query is represented by a causal model containing two variables, headache and aspirin, with aspirin being a parent of headache, see Fig. [sent-419, score-0.735]

29 The two worlds share the background variables U, and so can be represented by a 1955 S HPITSER AND P EARL single causal model with the graph shown in Fig. [sent-424, score-0.597]

30 The graphs representing two hypothetical worlds invoked by a counterfactual query like the one shown in Fig. [sent-430, score-0.865]

31 Instead, the typical state of knowledge of a causal domain is the statistical behavior of the observable variables in the domain, summarized by the distribution P(v), together with knowledge of causal directionality, obtained either from expert judgment (e. [sent-436, score-0.813]

32 Nevertheless, there are reasons to consider computing counterfactual quantities from experimental, rather than observational studies. [sent-442, score-0.56]

33 In general, a counterfactual can posit worlds with features contradictory to what has actually been observed. [sent-443, score-0.658]

34 The question that we ask in this section, then, is whether it is possible to identify a query P(γ|δ), where γ, δ are conjunctions of counterfactual events (with δ possibly empty), from the graph G and the set of all experiments P∗ . [sent-449, score-0.749]

35 Before tackling the problem of identifying counterfactual queries from experiments, we extend our example in Fig. [sent-452, score-0.623]

36 8 (b) to a general graphical representation for worlds invoked by a counterfactual query. [sent-453, score-0.658]

37 (a) Original causal diagram (b) Parallel worlds graph for P(y x |x , zd , d) (the two nodes denoted by U are the same). [sent-458, score-0.802]

38 9 shows the original causal graph and the parallel worlds graph for γ = yx ∧ x ∧ zd ∧ d. [sent-468, score-0.97]

39 The ﬁrst is that by removing node duplication, we also determine which syntactically distinct counterfactual variables correspond to the same random variable. [sent-479, score-0.582]

40 By identifying such equivalence classes of counterfactual variables, we guarantee that syntactically different variables are in fact different, and this makes it simpler to reason about counterfactuals in order to identify them. [sent-480, score-0.681]

41 For instance, a counterfactual P(y x , y ) may either be non-identiﬁable or inconsistent (and so identiﬁable to equal 0), depending on whether Yx and Y are the same variable. [sent-481, score-0.531]

42 If two distinct nodes in a causal diagram represent the same random variable, the diagram contains redundant information, and the nodes must be merged. [sent-495, score-0.649]

43 The new node ω we obtain from Lemma 25 can be thought of as a new counterfactual variable. [sent-506, score-0.582]

44 Since we replaced α, β by ω, we replace any mention of α, β in our given counterfactual query P(γ) by ω. [sent-512, score-0.613]

45 1958 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY function make-cg(G, γ) INPUT: G a causal diagram, γ a conjunction of counterfactual events OUTPUT: A counterfactual graph Gγ , and either a set of events γ s. [sent-513, score-1.594]

46 Figure 10: An algorithm for constructing counterfactual graphs Note that since α, β are the same, their value assignments must be the same (say equal to y). [sent-523, score-0.641]

47 The ﬁrst is that if variables Yx ,Yz were judged to be the same by Lemma 24, but γ assigns them different values, this implies that the original set of counterfactual events γ is inconsistent, and so P(γ) = 0. [sent-530, score-0.562]

48 Finally, because the algorithm can make an arbitrary choice picking a parent of ω each time Lemma 25 is applied, both the counterfactual graph G , and the corresponding modiﬁed counterfactual γ are not unique. [sent-534, score-1.243]

49 However, in our counterfactual query, which is P(y x |x , zd , d), the variable D happens to be observed to attain the value d, the same as the intervention value for the parent of Zd . [sent-554, score-0.735]

50 9 (c), which is a counterfactual graph for our query. [sent-589, score-0.636]

51 Having constructed a graphical representation of worlds mentioned in counterfactual queries, we can turn to identiﬁcation. [sent-590, score-0.658]

52 We construct two algorithms for this task, the ﬁrst is called ID* and works for unconditional queries, while the second, IDC*, works on queries with counterfactual evidence and calls the ﬁrst as a subroutine. [sent-591, score-0.595]

53 ) the set of values (either set or observed) appearing in a given counterfactual conjunction (or set of counterfactual events), while val(. [sent-597, score-1.062]

54 ) is the value assigned to a given counterfactual variable. [sent-598, score-0.531]

55 This notation is used to extract variables and values present in the original causal model from a counterfactual which refers to parallel worlds. [sent-599, score-0.92]

56 A counterfactual sr , where s, r are value assignments to sets of nodes, represents the event “the node set S attains values s under intervention do(r). [sent-603, score-0.676]

57 to mean some counterfactual variable derived from X where x appears in the subscript (the rest of the subscript can be arbitrary), which also attains value x. [sent-607, score-0.599]

58 The second line states that if γ contains a counterfactual which violates the Axiom of Effectiveness (Pearl, 2000), then γ is inconsistent, and we return probability 0. [sent-611, score-0.607]

59 The third line states that if a counterfactual contains its own value in the subscript, then it is a tautological event, and it can be removed from γ without affecting its probability. [sent-612, score-0.581]

60 Line 4 invokes make-cg to construct a counterfactual graph G , and the corresponding relabeled counterfactual γ . [sent-613, score-1.167]

61 Line 5 returns probability 0 if an inconsistency was found during the construction of the counterfactual graph, for example, if two variables found to be the same in γ had different value assignments. [sent-614, score-0.531]

62 Line 6 is analogous to Line 4 in the ID algorithm, it decomposes the problem into a set of subproblems, one for each C-component in the counterfactual graph. [sent-615, score-0.531]

63 Line 7 is the base case, where our counterfactual graph has a single Ccomponent. [sent-618, score-0.636]

64 The problem with counterfactual distributions is there is no simple way to prevent non-positive distributions spanning multiple worlds from arising, even if the original P(v) was positive—hence the explicit check. [sent-625, score-0.658]

65 The second line constructs the counterfactual graph, except since make-cg can only take conjunctions, we provide it with a joint counterfactual γ ∧ δ. [sent-626, score-1.112]

66 Here in IDC*, we move a counterfactual value assignment Yx = y from being observed (that is being a part of δ), to being ﬁxed (that is appearing in every subscript of γ ) if there are no back-door paths from Yx to the counterfactual of interest γ . [sent-630, score-1.136]

67 Finally, line 5 of IDC* is the analogue of line 2 of IDC, we attempt to identify a joint counterfactual probability, and then obtain a conditional counterfactual probability from the result. [sent-631, score-1.162]

68 Since P(x , zd , d) is not inconsistent, we proceed to construct the counterfactual graph on line 2. [sent-633, score-0.749]

69 Note that since the subscripts in one of the variables of our query changed, the counterfactual graph generated will change as well. [sent-637, score-0.744]

70 This is not a coincidence, since a counterfactual graph can be thought of as a causal graph for a particular large causal model which happens to have some distinct nodes share the same causal mechanisms. [sent-649, score-1.923]

71 This means that all the theorems and deﬁnitions used in the previous sections for causal diagrams transfer over without change to counterfactual graphs. [sent-650, score-0.896]

72 Note that since Wx,z is a counterfactual variable derived from W , it shares its domain with W . [sent-660, score-0.531]

73 1963 S HPITSER AND P EARL of counterfactual events does not contain conﬂicting value assignments to any variable, obtained either by observation or intervention, then taking the union of all actions of the events results in a consistent action. [sent-662, score-0.622]

74 We catalogue all difﬁcult counterfactual graphs which arise from queries which cannot be identiﬁed from P∗ . [sent-666, score-0.676]

75 The simplest difﬁcult counterfactual graph arises from the query P(y x , yx ) named “probability of necessity and sufﬁciency” by Pearl (2000). [sent-669, score-0.899]

76 The intuitive explanation for this result is that P(yx , yx ) is derived from the joint distribution over the counterfactual variables in the w-graph, while if we restrict ourselves to P∗ , we only have access to marginal distributions—one marginal for each possible world. [sent-680, score-0.712]

77 Because counterfactual variables Yx and Yx share an unobserved parent U, they are dependent, and their joint distribution cannot be decomposed into a product of marginals. [sent-681, score-0.607]

78 This intuitive argument can be generalized to a counterfactual graph with more than two nodes, the so-called “zig-zag graphs” an example of which is shown in Fig. [sent-683, score-0.636]

79 In order to continue, we must provide two lemmas which allow us to transform difﬁcult graphs in various ways by adding nodes and edges, while retaining the non-identiﬁability of the underlying counterfactual from P∗ . [sent-702, score-0.699]

80 , yn m } be a subset x x of counterfactual events in γ. [sent-707, score-0.562]

81 x x 1964 C OMPLETE I DENTIFICATION M ETHODS FOR THE C AUSAL H IERARCHY X Y W 1 x W 2 Z Y (a) x’ W1 W2 Z (b) Figure 13: (a) Causal diagram (b) Corresponding counterfactual graph for the non-identiﬁable query P(Yx ,W 1 ,W 2 , Zx ). [sent-720, score-0.773]

82 If a given P(γ) cannot be identiﬁed from P∗ , this implies that there exist two models which agree on P∗ , but disagree on P(γ), where γ is a set of counterfactual causes. [sent-723, score-0.581]

83 It is then possible to augment these models using the one-to-one function in question to obtain disagreement on P(δ), where δ is a set of counterfactual effects of γ. [sent-724, score-0.578]

84 If we have a causal model with a graph G where some counterfactual P(γ) is not identiﬁable, then a coarser, more “near-sighted” view of G which merges two distinct variables with their own mechanisms into a single variable with a single mechanism will not render P(γ) identiﬁable. [sent-740, score-1.034]

85 It turns out that whenever ID* fails on P(γ), the corresponding counterfactual graph contains a subgraph which can be obtained by a set of applications of the previous two lemmas to the w-graph and the zig-zag graphs. [sent-745, score-0.662]

86 Since ID* is complete for P(γ) queries, we can give a graphical characterization of counterfactual graphs where P(γ) cannot be identiﬁed from P∗ . [sent-748, score-0.612]

87 Speciﬁcally, we provide complete algorithms for identifying causal effects and conditional causal effects from observational studies, and show that a graphical structure called a hedge completely characterizes all cases where causal effects are non-identiﬁable. [sent-756, score-1.398]

88 We also provide complete algorithms for identifying counterfactual queries (possibly conditional) from experimental studies. [sent-758, score-0.623]

89 If we view the structure of the causal graph as experimentally testable, as is often the case in practice, this result can be viewed as giving a full characterization of testable counterfactuals assuming structural semantics. [sent-759, score-0.592]

90 The most important point in these proofs is that counterfactual graphs are generally no different from causal diagrams discussed in Sections 2 and 3, with their only special feature being that by construction, some nodes in the graph happen to share functions. [sent-1037, score-1.169]

91 But we know that the counterfactual graph is just a causal diagram for a model where some nodes share functions, so the same reasoning applies. [sent-1053, score-1.143]

92 To show completeness of ID* and IDC*, we ﬁrst prove a utility lemma which will make it easier to construct counterexamples which agree on P∗ but disagree on a given counterfactual query. [sent-1055, score-0.628]

93 The next result generalizes Lemma 27 to a wider set of counterfactual graphs which result from non-identiﬁable queries. [sent-1063, score-0.612]

94 To obtain a full characterization of non-identiﬁable counterfactual graphs, we augment the difﬁcult graphs we obtained from the previous two results using certain graph transformation rules which preserve non-identiﬁability. [sent-1094, score-0.717]

95 , yn m } be a subset of counterfactual events in γ. [sent-1100, score-0.562]

96 However, since γ is left alone if X,Y do not occur in γ (the third condition), only the remaining four of these conditions result in an actual modiﬁcation of a counterfactual variable in γ. [sent-1143, score-0.531]

97 The ﬁrst clearly results in the same counterfactual variable. [sent-1145, score-0.531]

98 Proof The difﬁcult step is to show that after line 5 is reached, if P∗ , G id P(γ, δ) then P∗ , G id P(γ|δ). [sent-1185, score-0.622]

99 Augment the counterexample models which induce counterfactual graph G with an additional binary node for Y , and let the value of D be set as the old value plus Y modulo |D|. [sent-1193, score-0.687]

100 In case 1, we must construct the distributions along the back-door path in such a way that if P∗ , G id P(Y |δ ) then P∗ , G id P(Y |δ ), where Y is a node preceding Y on the path. [sent-1202, score-0.655]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('counterfactual', 0.531), ('causal', 0.365), ('px', 0.311), ('id', 0.286), ('yx', 0.181), ('idc', 0.17), ('worlds', 0.127), ('counterfactuals', 0.122), ('aspirin', 0.112), ('graph', 0.105), ('hedge', 0.105), ('dentification', 0.101), ('earl', 0.101), ('hpitser', 0.101), ('ierarchy', 0.101), ('identi', 0.087), ('nodes', 0.087), ('observable', 0.083), ('query', 0.082), ('graphs', 0.081), ('judea', 0.081), ('pearl', 0.08), ('parity', 0.079), ('parent', 0.076), ('headache', 0.074), ('omplete', 0.07), ('ethods', 0.065), ('ausal', 0.065), ('intervention', 0.065), ('queries', 0.064), ('zd', 0.063), ('submodel', 0.063), ('bidirected', 0.06), ('mod', 0.058), ('soundness', 0.055), ('diagram', 0.055), ('node', 0.051), ('line', 0.05), ('agree', 0.05), ('mx', 0.048), ('effects', 0.047), ('lemma', 0.047), ('si', 0.046), ('child', 0.045), ('bit', 0.045), ('hypothetical', 0.044), ('parents', 0.043), ('twin', 0.042), ('yw', 0.042), ('able', 0.042), ('interventions', 0.041), ('interventional', 0.04), ('paths', 0.04), ('questions', 0.04), ('balke', 0.037), ('arrow', 0.037), ('arcs', 0.037), ('zx', 0.035), ('subscript', 0.034), ('mechanisms', 0.033), ('path', 0.032), ('bow', 0.032), ('ilya', 0.032), ('shpitser', 0.032), ('unobservable', 0.032), ('events', 0.031), ('merge', 0.031), ('arc', 0.03), ('incoming', 0.03), ('hierarchy', 0.029), ('observational', 0.029), ('odd', 0.029), ('assignments', 0.029), ('wx', 0.028), ('downward', 0.028), ('identifying', 0.028), ('arrows', 0.028), ('ow', 0.028), ('proof', 0.027), ('alse', 0.027), ('avin', 0.027), ('confounded', 0.027), ('nonidenti', 0.027), ('pcm', 0.027), ('effect', 0.026), ('return', 0.026), ('gx', 0.026), ('subscripts', 0.026), ('icting', 0.026), ('sub', 0.026), ('world', 0.026), ('subgraph', 0.026), ('action', 0.025), ('parallel', 0.024), ('theorem', 0.024), ('ws', 0.024), ('ancestors', 0.024), ('pv', 0.024), ('ui', 0.024), ('pa', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999899 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)

Author: Ilya Shpitser, Judea Pearl

2 0.19929232 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)

Author: Jiji Zhang

Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The ﬁrst result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the ﬁrst result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention

3 0.15546399 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)

Author: Jean-Philippe Pellet, André Elisseeff

Abstract: We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We ﬁrst show an efﬁcient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw , with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on ﬁve artiﬁcial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy. Keywords: causal structure learning, feature selection, Markov blanket, partial correlation, statistical test of conditional independence

4 0.14358401 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientiﬁc investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we ﬁrst ﬁnd a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

5 0.1373857 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

Author: Tianjiao Chu, Clark Glymour

Abstract: Pointwise consistent, feasible procedures for estimating contemporaneous linear causal structure from time series data have been developed using multiple conditional independence tests, but no such procedures are available for non-linear systems. We describe a feasible procedure for learning a class of non-linear time series structures, which we call additive non-linear time series. We show that for data generated from stationary models of this type, two classes of conditional independence relations among time series variables and their lags can be tested efﬁciently and consistently using tests based on additive model regression. Combining results of statistical tests for these two classes of conditional independence relations and the temporal structure of time series data, a new consistent model speciﬁcation procedure is able to extract relatively detailed causal information. We investigate the ﬁnite sample behavior of the procedure through simulation, and illustrate the application of this method through analysis of the possible causal connections among four ocean indices. Several variants of the procedure are also discussed. Keywords: conditional independence test, contemporaneous causation, additive model regression, Granger causality, ocean indices

6 0.086737901 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

7 0.077085726 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

8 0.074457079 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

9 0.065543093 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

10 0.060197558 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

11 0.059023131 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

12 0.048264857 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

13 0.044034235 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

14 0.040829044 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

15 0.04057955 47 jmlr-2008-Learning Balls of Strings from Edit Corrections

16 0.030813318 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods

17 0.028125597 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

18 0.026780231 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming

19 0.026725622 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

20 0.026224216 64 jmlr-2008-Model Selection in Kernel Based Regression using the Influence Function (Special Topic on Model Selection)

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.201), (1, 0.303), (2, 0.019), (3, 0.009), (4, 0.021), (5, -0.073), (6, 0.02), (7, 0.001), (8, -0.045), (9, -0.066), (10, -0.015), (11, 0.088), (12, 0.066), (13, -0.399), (14, 0.159), (15, -0.016), (16, 0.103), (17, 0.014), (18, -0.196), (19, -0.041), (20, -0.037), (21, -0.016), (22, 0.026), (23, 0.098), (24, -0.07), (25, -0.036), (26, -0.062), (27, -0.057), (28, -0.008), (29, -0.062), (30, -0.089), (31, -0.014), (32, 0.058), (33, -0.102), (34, 0.017), (35, 0.081), (36, -0.043), (37, -0.023), (38, -0.001), (39, -0.135), (40, 0.027), (41, 0.137), (42, 0.003), (43, -0.017), (44, -0.022), (45, -0.046), (46, -0.071), (47, -0.103), (48, -0.034), (49, -0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95972103 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)

Author: Ilya Shpitser, Judea Pearl

2 0.6923033 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)

Author: Jiji Zhang

3 0.63766927 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)

Author: Jean-Philippe Pellet, André Elisseeff

4 0.56888193 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

Author: Tianjiao Chu, Clark Glymour

5 0.37144062 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

6 0.35010046 59 jmlr-2008-Maximal Causes for Non-linear Component Extraction

7 0.30062476 13 jmlr-2008-An Error Bound Based on a Worst Likely Assignment

8 0.27372465 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

9 0.2566945 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

10 0.20647739 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

11 0.19833803 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

12 0.17711051 47 jmlr-2008-Learning Balls of Strings from Edit Corrections

13 0.1765593 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

14 0.16277802 44 jmlr-2008-Incremental Identification of Qualitative Models of Biological Systems using Inductive Logic Programming

15 0.16090822 64 jmlr-2008-Model Selection in Kernel Based Regression using the Influence Function (Special Topic on Model Selection)

16 0.14658457 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data

17 0.13804644 49 jmlr-2008-Learning Control Knowledge for Forward Search Planning

18 0.13595077 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

19 0.13391255 70 jmlr-2008-On Relevant Dimensions in Kernel Feature Spaces

20 0.13341169 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.03), (1, 0.011), (5, 0.018), (17, 0.401), (30, 0.012), (31, 0.031), (40, 0.028), (54, 0.035), (58, 0.03), (61, 0.021), (66, 0.043), (76, 0.05), (78, 0.011), (87, 0.023), (88, 0.072), (92, 0.034), (94, 0.036), (99, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77572489 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)

Author: Ilya Shpitser, Judea Pearl

2 0.28757811 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

Author: Gal Elidan, Stephen Gould

Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufﬁciently expressive for generalization while at the same time allow for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overﬁtting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modiﬁcations and that is polynomial both in the size of the graph and the treewidth bound. At the heart of our method is a dynamic triangulation that we update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life data sets and show that it is superior to the greedy approach as soon as the bound on the treewidth is nontrivial. Importantly, we also show that by making use of global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. Keywords: Bayesian networks, structure learning, model selection, bounded treewidth

3 0.2866807 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

Author: Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras, Peter L. Bartlett

Abstract: Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efﬁcient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or maxmargin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O( 1 ) EG ε updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log( 1 )) updates are required. For both the max-margin and log-linear cases, our bounds suggest ε that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments conﬁrm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efﬁciently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods. Keywords: exponentiated gradient, log-linear models, maximum-margin models, structured prediction, conditional random ﬁelds ∗. These authors contributed equally. c 2008 Michael Col

4 0.28073409 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

5 0.27982712 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

Author: Matthias W. Seeger

Abstract: The linear model with sparsity-favouring prior on the coefﬁcients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identiﬁcation from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in signiﬁcant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identiﬁcation application appears in Steinke et al. (2007). Keywords: sparse linear model, Laplace prior, expectation propagation, approximate inference, optimal design, Bayesian statistics, gene network recovery, image coding, compressive sensing

6 0.27218235 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

7 0.27200061 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

8 0.27033651 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)

9 0.269869 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

10 0.26959139 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)

11 0.26957718 57 jmlr-2008-Manifold Learning: The Price of Normalization

12 0.26900035 58 jmlr-2008-Max-margin Classification of Data with Absent Features

13 0.26882043 9 jmlr-2008-Active Learning by Spherical Subdivision

14 0.26845682 56 jmlr-2008-Magic Moments for Structured Output Prediction

15 0.26831806 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

16 0.26612335 83 jmlr-2008-Robust Submodular Observation Selection

17 0.2648837 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

18 0.26455784 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

19 0.26375449 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix

20 0.26320326 86 jmlr-2008-SimpleMKL