nips nips2013 nips2013-337 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl
Abstract: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power. 1 Motivation Transporting and synthesizing experimental knowledge from heterogeneous settings are central to scientific discovery. Conclusions that are obtained in a laboratory setting are transported and applied elsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. However, the conditions under which this extrapolation can be legitimized have not been formally articulated until very recently. Although the problem has been discussed in many areas of statistics, economics, and the health sciences, under rubrics such as “external validity” [2, 3], “meta-analysis” [4], “quasi-experiments” [5], “heterogeneity” [6], these discussions are limited to verbal narratives in the form of heuristic guidelines for experimental researchers – no formal treatment of the problem has been attempted to answer the practical challenge of generalizing causal knowledge across multiple heterogeneous domains with disparate experimental data posed in this paper. The fields of artificial intelligence and statistics provide the theoretical underpinnings necessary for tackling transportability. First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven fields. Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. Armed with these techniques, [1] introduced a formal language for encoding differences and commonalities between domains accompanied with necessary or sufficient conditions under which transportability of empirical findings is feasible between two domains, a source and a target; then, these conditions were extended for a complete characterization for transportability in one domain with unrestricted experimental data [11]. Subsequently, these results were generalized for the settings when ∗ These authors contributed equally to this paper. The authors’ addresses are respectively eb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu. 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”.1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , ..., πn } to a target domain π ∗ . In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. In π ∗ , potentially different from πi , only passive observations can be collected (this constraint is weakened later on). The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. Clearly, if nothing is known about the relationship between Π and π ∗ , the problem is trivial; no transfer can be justified. Yet the fact that all scientific experiments are conducted with the intent of being used elsewhere (e.g., outside the lab) implies that scientific progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. 2 Previous work and our contributions Consider Fig. 1(a) in which the node S represents factors that produce differences between source and target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.), but we find the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter P ∗ (x, y, z)). In particular, the average age in the U.S. is significantly higher than that in LA. How are we to estimate the causal effect of X on Y in U.S., denoted R = P ∗ (y|do(x))? 2 3 The selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespecific effects P ∗ (y|do(x), Z = z) are invariant across populations. Difference-generating factors are represented by a special set of variables called selection variables S (or simply S-variables), which are graphically depicted as square nodes ( ). From this assumption, the overall causal effect in the U.S. can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U.S. population, P ∗ (z), to obtain an experimental claim P ∗ (y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-specific effects to account for the new age distribution. In general, however, a more involved mixture of experimental and observational findings would be necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. In real world applications, it may happen that certain controlled experiments cannot be conducted in the source environment (for financial, ethical, or technical reasons), so only a limited amount 1 The machine learning literature has been concerned about discrepancies among domains in the context, almost exclusively, on predictive or classification tasks as opposed to learning causal or counterfactual measures [16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of learning and also leverages knowledge about the underlying data-generating structure [18, 19]. 2 We will use Px (y | z) interchangeably with P (y | do(x), z). 3 We use the structural interpretation of causal diagrams as described in [9, pp. 205]. 2 S S Z Z1 S S X Y (a) X Y (b) Z2 X Y (c) Figure 1: The selection variables S are depicted as square nodes ( ). (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. (b) The smallest possible selection diagram in which a causal relation is not transportable. (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. of experimental information can be gathered. A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. 4 One might surmise that multiple pairwise z-transportability would be sufficient to solve the mztransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol level), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. More interestingly, we consider the more stringent scenario where only certain experiments can be performed in each of the domains. For instance, assume that it is only possible to conduct experiments over {Z2 } in πa and over {Z1 } in πb . Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i.e., {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. In the sequel, we try to better understand some of these issues, and we develop sufficient or (specific) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. We also consider issues relative to the variance of the estimand aiming for improving sample efficiency and increasing statistical power. 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as defined in [9, pp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions are modifications of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). If we assume, more stringently, availability of experiments Za = {Z2 }, Zb = {Z1 }, Z∗ = {}, a more elaborated analysis can show that R can be estimated combining different pieces from both domains. a new model Mx = U, V, Fx , P (U) , where Fx is obtained after replacing fX ∈ F for every X ∈ X with a new function that outputs a constant value x given by do(x). 5 We follow the conventions given in [9]. We denote variables by capital letters and their realized values by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of realized values by bold small letters. We use the typical graph-theoretic terminology with the corresponding abbreviations P a(Y)G and An(Y)G , which will denote respectively the set of observable parents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph of G where all incoming arrows into X and all outgoing arrows from Z are removed. Key to the analysis of transportability is the notion of “identifiability,” defined below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. Definition 1 (Causal Effects Identifiability (Pearl, 2000, pp. 77)). The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identifiable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. Causal models and their induced graphs are usually associated with one particular domain (also called setting, study, population, or environment). In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. This is possible if we assume that the structural equations share the same set of arguments, though the functional forms of the equations may vary arbitrarily [11]. 6 Definition 2 (Selection Diagram). Let M, M ∗ be a pair of structural causal models [9, pp. 205] relative to domains π, π ∗ , sharing a causal diagram G. M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. Every edge in G is also an edge in D; 2. D contains an extra edge Si → Vi whenever there might exist a discrepancy fi = fi∗ or P (Ui ) = P ∗ (Ui ) between M and M ∗ . In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place.7 Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is identical in both domains. 5 The results presented here are also valid in other formalisms for causality based on potential outcomes. As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. 6 4 Armed with the concept of identifiability and selection diagrams, mz-transportability of causal effects can be defined as follows: Definition 3 (mz-Transportability). Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,...,n P i , Iz ∪ P ∗ , Iz in any model that induces D. ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufficient condition. Theorem 1. Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . The relation R = P ∗ (y|do(x), w) is mz-transportable from Π to π ∗ in D if the expression P (y|do(x), w, S1 , ..., Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . This result provides a powerful way to syntactically establish mz-transportability, but it is not immediately obvious whether a sequence of applications of the rules of causal calculus that achieves the reduction required by the theorem exists, and even if such sequence exists, it is not obvious how to obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b). Corollary 1. P ∗ (y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2 } and Zb = {Z1 }. Proof. The goal is to show that R = P ∗ (y|do(x)) is mz-transportable from {πa , πb } to π ∗ using experiments conducted over {Z2 } in πa and {Z1 } in πb . Note that naively trying to transport R from each of the domains individually is not possible, but R can be decomposed as follows: P ∗ (y|do(x)) = P ∗ (y|do(x), do(Z1 )) (2) P ∗ (y|do(x), do(Z1 ), z2 )P ∗ (z2 |do(x), do(Z1 )) (3) P ∗ (y|do(x), do(Z1 ), do(z2 ))P ∗ (z2 |do(x), do(Z1 )), = (4) z2 = z2 where Eq. (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. (3), and Eq. (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). 1 2 Now we can rewrite the first term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2): P ∗ (y|do(x), do(Z1 ), do(z2 )) = P (y|do(x), do(Z1 ), do(z2 ), Sa , Sb ) (5) = P (y|do(x), do(Z1 ), do(z2 ), Sb ) (6) = P (y|do(z2 ), Sb ) (7) = P (a) (y|do(z2 )), (8) where Eq. (5) follows from the theorem (and the definition of selection diagram), Eq. (6) follows , Eq. (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. (8) by the definition of selection diagram. Finally, we can rewrite the second term of Eq. (4) as follows: P ∗ (z2 |do(x), do(Z1 )) = P (z2 |do(x), do(Z1 ), Sa , Sb ) = P (z2 |do(x), do(Z1 ), Sa ) = P (z2 |x, do(Z1 ), Sa ) = P (b) (z2 |x, do(Z1 )), 5 (9) (10) (11) (12) where Eq. (9) follows from the theorem (and the definition of selection diagram), Eq. (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . Note that this equation matches the condition of the ⊥ Z1 X theorem, separate do(Z1 ) from Sb (i.e., experiments over Z1 can be used since they are available in πb ), so we can rewrite Eq. (12) using the definition of selection diagrams, the corollary follows. The next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which also demonstrates the challenge of relating mz-transportability to other types of transportability. Corollary 2. R = P ∗ (y|do(x)) is mz-transportable in D if there exists Zi ⊆ Zi such that all paths from Zi to Y are blocked by X, (Si ⊥ Y|X, Zi )D(i) , and R is computable from do(Zi ). ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identifiability. So, it is not immediately obvious how to combine the topological relations of Zi ’s with X and Y in order to create a general condition for mz-transportability, the relationship between the distributions in the different domains can get relatively intricate, but we defer this discussion for now and consider a simpler case. It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. Consider again the diagrams in Fig. 2(a,b), and assume that randomized experiments are available over {Z1 } in πa and {Z2 } in πb . Theorem 2. P ∗ (y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1 } and Zb = {Z2 }. Proof. Formally, we need to display two models M1 , M2 such that the following relations hold (as implied by Def. 3): (a) (a) PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), (b) P (Z1 , X, Z2 , Y ) = P (b) (Z1 , X, Z2 , Y ), M1 M2 (a) (a) (13) PM1 (X, Z2 , Y |do(Z1 )) = PM2 (X, Z2 , Y |do(Z1 )), (b) P (Z , X, Y |do(Z )) = P (b) (Z , X, Y |do(Z )), M 2 1 2 M2 ∗1 1 ∗ PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), for all values of Z1 , X, Z2 , and Y , and also, ∗ ∗ PM1 (Y |do(X)) = PM2 (Y |do(X)), (14) for some value of X and Y . Let V be the set of observable variables and U be the set of unobservable variables in D. Let us assume that all variables in U ∪ V are binary. Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. Define the two models as follows: Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa X=U X = Z1 ⊕ U1 1 M1 = M2 = Z2 = (X ⊕ U2 ⊕ (U4 ∧ Sa )) ∨ U6 Z2 = U4 ∧ Sa ∧ U6 ⊕ U6 Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) where ⊕ represents the exclusive or function. Both models agree in respect to P (U), which is defined as P (Ui ) = 1/2, i = 1, ..., 8. It is not difficult to evaluate these models and note that the constraints given in Eqs. (13) and (14) are satisfied (including positivity), the theorem follows. 4 Algorithm for computing mz-transportability In this section, we build on previous analyses of identifiability [7, 21, 22, 23] in order to obtain a mechanic procedure in which a collection of selection diagrams and experimental data is inputted, and the procedure returns a transport formula whenever it is able to produce one. More specifically, 6 PROCEDURE TRmz (y, x, P, I, S, W, D) INPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes π ∗ ) and active experiments I; W: weighting scheme; D: backbone of selection diagram; Si : selection nodes in πi (S0 = ∅ (i) relative to π ∗ ); [The following set and distributions are globally defined: Zi , P ∗ , PZi .] (i) ∗ ∗ OUTPUT: Px (y) in terms of P ∗ , PZ , PZi or F AIL(D, C0 ). P 1 if x = ∅, return V\Y P. P 2 if V \ An(Y)D = ∅, return TRmz (y, x ∩ An(Y)D , V\An(Y)D P, I, S, W, DAn(Y) ). 3 set W = (V \ X) \ An(Y)DX . if W = ∅, return TRmz (y, x ∪ w, P, I, S, W, D). Q P 4 if C(D \ X) = {C0 , C1 , ..., Ck }, return V\{Y,X} i TRmz (ci , v \ ci , P, I, S, W, D). 5 if C(D \ X) = {C0 }, 6 if C(D) = {D}, Q P P 7 if C0 ∈ C(D), return i|Vi ∈C0 V\V (i) P/ V\V (i−1) P. D 8 9 10 11 12 D if (∃C )C0 ⊂ C ∈ C(D), (i−1) for {i|Vi ∈ C }, set κi = κi ∪ vD \C . Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). else, if I`= ∅, for i = 0, ..., |D|, ´ if (Si ⊥ Y | X)D(i) ∧ (Zi ∩ X = ∅) , Ei = TRmz (y, x \ zi , P, Zi ∩ X, i, W, D \ {Zi ∩ X}). ⊥ PX (j) if |E| > 0, return |E| wi Ei . i=1 else, FAIL(D, C0 ). Figure 3: Modified version of identification algorithm capable of recognizing mz-transportability. our algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for identification of causal effects [22, 23] (and a version of the identification algorithm called ID). The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm. ∗ Theorem 3 (soundness). Whenever TRmz returns an expression for Px (y), it is correct. In the sequel, we demonstrate how the algorithm works through the mz-transportability of Q = P ∗ (y|do(x)) in Fig. 2(c,d) with Z∗ = {Z1 }, Za = {Z2 }, and Zb = {Z1 }. Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. The new call triggers line 4 and C(D \ {X, Z2 }) = {C0 , C1 , C2 , C3 }, where C0 = DZ1 , C1 = DZ3 , C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ∗ ∗ ∗ ually Q0 = Px,z2 ,z3 ,u,w,y (z1 ), Q1 = Px,z1 ,z2 ,u,w,y (z3 ), Q2 = Px,z1 ,z2 ,z3 ,w,y (u), and Q3 = ∗ Px,z1 ,z2 ,z3 ,u (w, y). Thus the original problem reduces to try to evaluate the equivalent expression ∗ ∗ ∗ ∗ z1 ,z3 ,u,w Px,z2 ,z3 ,u,w,y (z1 )Px,z1 ,z2 ,u,w,y (z3 ) Px,z1 ,z2 ,z3 ,w,y (u)Px,z1 ,z2 ,z3 ,u (w, y). First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. Thus it proceeds to line 6 testing whether C(D \ {X, Z1 , Z2 }) is different from D1 , which is false. In this call, ordinary identifiability would fail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can help for computing Q1 . In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . Finally, TR triggers line 8 since X is not part of Z3 ’s components in D1 (or, Z3 ∈ C = {Z2 Z3 }), so line 2 is triggered since Z2 is no longer an ancestor of Z3 in D1 , and then line 1 is triggered since the interventional set is empty in (i) (i) ∗ this local call, so Px,z1 ,z2 (z3 ) = Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ), for i = {b, ∗}. 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . TR goes to line 5, and in this local call C(D \ {X, Z1 , Z2 , Z3 , W }) = {DU }, and the test in 6 succeed, since there are more components in D. So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . The algorithm makes Px,z1 ,z2 ,z3 ,w (u) = Px,z1 ,z2 ,z3 (u)D2 |W (and update the working distribution); note that in this call, ordinary identifiability would fail since the nodes are in the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying to find experiments that can help in Q2 ’s computation. In this case, πb cannot help but πa (a) and π ∗ perhaps can, noting that new calls are launched for computing Px,z1 ,z3 (u)D2 \Z2 |W rel∗ ative to πa , and Px,z2 ,z3 (u)D2 \Z1 |W relative to π ∗ with the corresponding data structures set. (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simplification) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . ∗ In turn, both tests at lines 6 and 7 succeed, which makes the procedure to return Px,z1 ,z2 ,z3 ,u (w, y) = ∗ ∗ P (w|z3 , x, z1 , z2 )P (y|w, x, z1 , z2 , z3 , u). The composition of the return of these calls generates the following expression: ! ∗ Px (y) = X ∗ P (z1 ) z1 ,z3 ,w,u (2) w1 (1) w1 X ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) X (b) (b) Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Z2 Z2 „X + (1) w2 « « „X ∗ ∗ ∗ ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / Z2 Z2 ! (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, ..., nk ), and nk is the number of feasible estimands of k. Eq. (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. For instance, one might use an analogous to inverse-variance weighting, which sets the weights for the normalized (k) nk −2 −2 2 inverse of their variances (i.e., wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter usually disregards the intricacies of the relationships between variables, so producing a statistically less powerful estimand. Our method leverages this non-trivial and highly structured relationships, as exemplified in Eq. (15), which yields an estimand with less variance and statistically more powerful. 5 Conclusions In this paper, we treat a special type of transportability in which experiments can be conducted only over limited sets of variables in the sources and target domains, and the goal is to infer whether a certain effect can be estimated in the target using the information scattered throughout the domains. We provide a general sufficient graphical conditions for transportability based on the causal calculus along with a necessary condition for a specific scenario, which should be generalized for arbitrary structures. We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard statistical procedures and leads to the construction of statistically more powerful estimands. Acknowledgment The work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack Lee and Vasant Honavar was partially completed while they were with the Department of Computer Science at Iowa State University. The work of Vasant Honavar while working at the National Science Foundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by the grant from NSF (IIS-0711356). Any opinions, findings, and conclusions contained in this article are those of the authors and do not necessarily reflect the views of the sponsors. 8 References [1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artificial Intelligence, pages 247–254. AAAI Press, Menlo Park, CA, 2011. [2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963. [3] C. Manski. Identification for Prediction and Decision. Harvard University Press, Cambridge, Massachusetts, 2007. [4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985. [5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston, second edition, 2002. [6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007. [7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–710, 1995. [8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000. [9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009. [10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence, pages 698–704. AAAI Press, Menlo Park, CA, 2012. [12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 95–101, Menlo Park, CA, 2013. AAAI Press. [13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: ztransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), pages 361–370. AUAI Press, 2013. [14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135–143. JMLR W&CP; 31, 2013. [15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 583–590, Menlo Park, CA, 2013. AAAI Press. [16] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, 2006. [17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 3–28. MIT Press, Cambridge, MA, 2009. [18] B. Sch¨ lkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal o learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress. [19] K. Zhang, B. Sch¨ lkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional o shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP; volume 28, 2013. [20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 113–120. AUAI Press, 2012. [21] M. Kuroki and M. Miyakawa. Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105–117, 1999. [22] J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567–573. AAAI Press/The MIT Press, Menlo Park, CA, 2002. [23] I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219–1226. AAAI Press, Menlo Park, CA, 2006. 9
Reference: text
sentIndex sentText sentNum sentScore
1 We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. [sent-2, score-1.713]
2 We also consider the issue of minimizing the variance of the produced estimand in order to increase power. [sent-4, score-0.094]
3 In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. [sent-7, score-0.195]
4 First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven fields. [sent-11, score-1.779]
5 Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. [sent-12, score-0.675]
6 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. [sent-24, score-0.504]
7 This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”. [sent-25, score-0.197]
8 1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , . [sent-26, score-0.794]
9 In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. [sent-30, score-0.617]
10 The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. [sent-32, score-0.525]
11 Yet the fact that all scientific experiments are conducted with the intent of being used elsewhere (e. [sent-34, score-0.106]
12 , outside the lab) implies that scientific progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. [sent-36, score-0.71]
13 The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. [sent-37, score-0.48]
14 Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). [sent-38, score-0.482]
15 The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. [sent-39, score-0.108]
16 1(a) in which the node S represents factors that produce differences between source and target populations. [sent-41, score-0.132]
17 Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). [sent-42, score-0.593]
18 How are we to estimate the causal effect of X on Y in U. [sent-51, score-0.555]
19 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespecific effects P ∗ (y|do(x), Z = z) are invariant across populations. [sent-55, score-0.121]
20 From this assumption, the overall causal effect in the U. [sent-57, score-0.555]
21 can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. [sent-59, score-0.195]
22 It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U. [sent-60, score-0.158]
23 In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-specific effects to account for the new age distribution. [sent-65, score-0.251]
24 In general, however, a more involved mixture of experimental and observational findings would be necessary to obtain a bias-free estimate of the target relation R. [sent-66, score-0.27]
25 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. [sent-68, score-0.476]
26 3 We use the structural interpretation of causal diagrams as described in [9, pp. [sent-72, score-0.749]
27 (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. [sent-75, score-0.758]
28 (b) The smallest possible selection diagram in which a causal relation is not transportable. [sent-76, score-0.746]
29 (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. [sent-77, score-0.66]
30 A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. [sent-79, score-0.232]
31 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. [sent-81, score-0.187]
32 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). [sent-84, score-0.235]
33 We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. [sent-94, score-0.129]
34 A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. [sent-95, score-0.172]
35 Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. [sent-98, score-0.133]
36 In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. [sent-99, score-0.132]
37 Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i. [sent-100, score-0.235]
38 , {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). [sent-102, score-0.287]
39 This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. [sent-103, score-0.445]
40 In the sequel, we try to better understand some of these issues, and we develop sufficient or (specific) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. [sent-104, score-0.732]
41 We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. [sent-105, score-0.763]
42 We also consider issues relative to the variance of the estimand aiming for improving sample efficiency and increasing statistical power. [sent-106, score-0.094]
43 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as defined in [9, pp. [sent-107, score-0.576]
44 7], actions are modifications of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. [sent-110, score-0.728]
45 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). [sent-111, score-0.65]
46 Key to the analysis of transportability is the notion of “identifiability,” defined below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. [sent-120, score-1.633]
47 The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identifiable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. [sent-123, score-0.635]
48 In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. [sent-125, score-0.154]
49 Let M, M ∗ be a pair of structural causal models [9, pp. [sent-128, score-0.576]
50 205] relative to domains π, π ∗ , sharing a causal diagram G. [sent-129, score-0.806]
51 M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. [sent-130, score-0.192]
52 In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place. [sent-133, score-0.248]
53 As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. [sent-136, score-0.453]
54 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. [sent-137, score-0.73]
55 In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. [sent-138, score-0.698]
56 6 4 Armed with the concept of identifiability and selection diagrams, mz-transportability of causal effects can be defined as follows: Definition 3 (mz-Transportability). [sent-139, score-0.673]
57 , D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , . [sent-143, score-0.465]
58 , πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). [sent-146, score-0.286]
59 Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . [sent-147, score-0.408]
60 ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,. [sent-148, score-0.635]
61 ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufficient condition. [sent-152, score-0.698]
62 , D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , . [sent-157, score-0.465]
63 , πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . [sent-160, score-0.36]
64 Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . [sent-161, score-0.349]
65 , Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . [sent-165, score-0.564]
66 For concreteness, we illustrate this result using the selection diagrams in Fig. [sent-167, score-0.238]
67 (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. [sent-175, score-0.675]
68 (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). [sent-177, score-0.802]
69 (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . [sent-183, score-1.35]
70 Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. [sent-184, score-0.098]
71 (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. [sent-189, score-0.675]
72 (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . [sent-190, score-0.675]
73 ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identifiability. [sent-198, score-0.921]
74 It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. [sent-200, score-0.525]
75 Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . [sent-211, score-0.122]
76 Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. [sent-212, score-0.154]
77 Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). [sent-233, score-0.093]
78 3), and is based on the C-component decomposition for identification of causal effects [22, 23] (and a version of the identification algorithm called ID). [sent-242, score-0.608]
79 The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. [sent-243, score-0.22]
80 Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. [sent-244, score-0.139]
81 Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. [sent-250, score-0.159]
82 First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). [sent-253, score-0.287]
83 Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . [sent-254, score-0.14]
84 TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. [sent-255, score-0.092]
85 In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . [sent-259, score-0.335]
86 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . [sent-261, score-0.198]
87 So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . [sent-263, score-0.159]
88 But TRmz proceeds to line 9 trying to find experiments that can help in Q2 ’s computation. [sent-265, score-0.096]
89 (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simplification) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . [sent-267, score-0.159]
90 Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . [sent-268, score-0.256]
91 (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, . [sent-272, score-0.094]
92 (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. [sent-277, score-0.207]
93 , wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). [sent-280, score-0.094]
94 (15), which yields an estimand with less variance and statistically more powerful. [sent-283, score-0.094]
95 We provide a general sufficient graphical conditions for transportability based on the causal calculus along with a necessary condition for a specific scenario, which should be generalized for arbitrary structures. [sent-285, score-1.12]
96 We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. [sent-286, score-0.8]
97 Transportability of causal and statistical relations: A formal approach. [sent-296, score-0.525]
98 Causal transportability of experiments on controllable subsets of variables: ztransportability. [sent-370, score-0.476]
99 m-transportability: Transportability of a causal effect from multiple environments. [sent-386, score-0.555]
100 Identification of joint interventional distributions in recursive semi-Markovian causal models. [sent-447, score-0.619]
wordName wordTfidf (topN-words)
[('causal', 0.525), ('transportability', 0.445), ('trmz', 0.285), ('iz', 0.188), ('diagrams', 0.173), ('domains', 0.154), ('calculus', 0.15), ('sa', 0.134), ('diagram', 0.127), ('sb', 0.12), ('observational', 0.11), ('bareinboim', 0.107), ('menlo', 0.107), ('transport', 0.095), ('interventional', 0.094), ('estimand', 0.094), ('triggers', 0.094), ('target', 0.083), ('effects', 0.083), ('zi', 0.081), ('px', 0.074), ('selection', 0.065), ('line', 0.065), ('sources', 0.062), ('domain', 0.061), ('mz', 0.058), ('computable', 0.055), ('editors', 0.054), ('cholesterol', 0.053), ('sanghack', 0.053), ('vasant', 0.053), ('park', 0.052), ('pearl', 0.052), ('structural', 0.051), ('conducted', 0.05), ('zb', 0.05), ('source', 0.049), ('experimental', 0.048), ('honavar', 0.047), ('za', 0.046), ('identi', 0.045), ('judea', 0.043), ('discrepancies', 0.043), ('heterogeneous', 0.042), ('expression', 0.039), ('aaai', 0.039), ('age', 0.038), ('ca', 0.037), ('subgraph', 0.036), ('anticausal', 0.036), ('diet', 0.036), ('pzi', 0.036), ('limited', 0.035), ('return', 0.035), ('formula', 0.035), ('ndings', 0.033), ('illustrating', 0.032), ('si', 0.032), ('national', 0.032), ('fx', 0.032), ('exclusive', 0.032), ('fusing', 0.031), ('disparate', 0.031), ('penn', 0.031), ('randomizing', 0.031), ('experiments', 0.031), ('syntactic', 0.031), ('weighting', 0.03), ('effect', 0.03), ('vi', 0.03), ('pieces', 0.029), ('disturbance', 0.029), ('elias', 0.029), ('commonalities', 0.029), ('ucla', 0.029), ('soundness', 0.029), ('relation', 0.029), ('intelligence', 0.028), ('rewrite', 0.028), ('arti', 0.028), ('lee', 0.028), ('unrestricted', 0.027), ('arrows', 0.027), ('ability', 0.027), ('call', 0.027), ('press', 0.026), ('succeed', 0.026), ('synthesize', 0.026), ('triggered', 0.026), ('try', 0.025), ('individually', 0.025), ('uniquely', 0.025), ('la', 0.025), ('available', 0.025), ('desjardins', 0.025), ('scattered', 0.025), ('elsewhere', 0.025), ('whenever', 0.025), ('collection', 0.024), ('nodes', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
Author: Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl
Abstract: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power. 1 Motivation Transporting and synthesizing experimental knowledge from heterogeneous settings are central to scientific discovery. Conclusions that are obtained in a laboratory setting are transported and applied elsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. However, the conditions under which this extrapolation can be legitimized have not been formally articulated until very recently. Although the problem has been discussed in many areas of statistics, economics, and the health sciences, under rubrics such as “external validity” [2, 3], “meta-analysis” [4], “quasi-experiments” [5], “heterogeneity” [6], these discussions are limited to verbal narratives in the form of heuristic guidelines for experimental researchers – no formal treatment of the problem has been attempted to answer the practical challenge of generalizing causal knowledge across multiple heterogeneous domains with disparate experimental data posed in this paper. The fields of artificial intelligence and statistics provide the theoretical underpinnings necessary for tackling transportability. First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven fields. Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. Armed with these techniques, [1] introduced a formal language for encoding differences and commonalities between domains accompanied with necessary or sufficient conditions under which transportability of empirical findings is feasible between two domains, a source and a target; then, these conditions were extended for a complete characterization for transportability in one domain with unrestricted experimental data [11]. Subsequently, these results were generalized for the settings when ∗ These authors contributed equally to this paper. The authors’ addresses are respectively eb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu. 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”.1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , ..., πn } to a target domain π ∗ . In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. In π ∗ , potentially different from πi , only passive observations can be collected (this constraint is weakened later on). The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. Clearly, if nothing is known about the relationship between Π and π ∗ , the problem is trivial; no transfer can be justified. Yet the fact that all scientific experiments are conducted with the intent of being used elsewhere (e.g., outside the lab) implies that scientific progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. 2 Previous work and our contributions Consider Fig. 1(a) in which the node S represents factors that produce differences between source and target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.), but we find the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter P ∗ (x, y, z)). In particular, the average age in the U.S. is significantly higher than that in LA. How are we to estimate the causal effect of X on Y in U.S., denoted R = P ∗ (y|do(x))? 2 3 The selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespecific effects P ∗ (y|do(x), Z = z) are invariant across populations. Difference-generating factors are represented by a special set of variables called selection variables S (or simply S-variables), which are graphically depicted as square nodes ( ). From this assumption, the overall causal effect in the U.S. can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U.S. population, P ∗ (z), to obtain an experimental claim P ∗ (y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-specific effects to account for the new age distribution. In general, however, a more involved mixture of experimental and observational findings would be necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. In real world applications, it may happen that certain controlled experiments cannot be conducted in the source environment (for financial, ethical, or technical reasons), so only a limited amount 1 The machine learning literature has been concerned about discrepancies among domains in the context, almost exclusively, on predictive or classification tasks as opposed to learning causal or counterfactual measures [16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of learning and also leverages knowledge about the underlying data-generating structure [18, 19]. 2 We will use Px (y | z) interchangeably with P (y | do(x), z). 3 We use the structural interpretation of causal diagrams as described in [9, pp. 205]. 2 S S Z Z1 S S X Y (a) X Y (b) Z2 X Y (c) Figure 1: The selection variables S are depicted as square nodes ( ). (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. (b) The smallest possible selection diagram in which a causal relation is not transportable. (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. of experimental information can be gathered. A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. 4 One might surmise that multiple pairwise z-transportability would be sufficient to solve the mztransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol level), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. More interestingly, we consider the more stringent scenario where only certain experiments can be performed in each of the domains. For instance, assume that it is only possible to conduct experiments over {Z2 } in πa and over {Z1 } in πb . Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i.e., {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. In the sequel, we try to better understand some of these issues, and we develop sufficient or (specific) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. We also consider issues relative to the variance of the estimand aiming for improving sample efficiency and increasing statistical power. 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as defined in [9, pp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions are modifications of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). If we assume, more stringently, availability of experiments Za = {Z2 }, Zb = {Z1 }, Z∗ = {}, a more elaborated analysis can show that R can be estimated combining different pieces from both domains. a new model Mx = U, V, Fx , P (U) , where Fx is obtained after replacing fX ∈ F for every X ∈ X with a new function that outputs a constant value x given by do(x). 5 We follow the conventions given in [9]. We denote variables by capital letters and their realized values by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of realized values by bold small letters. We use the typical graph-theoretic terminology with the corresponding abbreviations P a(Y)G and An(Y)G , which will denote respectively the set of observable parents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph of G where all incoming arrows into X and all outgoing arrows from Z are removed. Key to the analysis of transportability is the notion of “identifiability,” defined below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. Definition 1 (Causal Effects Identifiability (Pearl, 2000, pp. 77)). The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identifiable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. Causal models and their induced graphs are usually associated with one particular domain (also called setting, study, population, or environment). In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. This is possible if we assume that the structural equations share the same set of arguments, though the functional forms of the equations may vary arbitrarily [11]. 6 Definition 2 (Selection Diagram). Let M, M ∗ be a pair of structural causal models [9, pp. 205] relative to domains π, π ∗ , sharing a causal diagram G. M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. Every edge in G is also an edge in D; 2. D contains an extra edge Si → Vi whenever there might exist a discrepancy fi = fi∗ or P (Ui ) = P ∗ (Ui ) between M and M ∗ . In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place.7 Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is identical in both domains. 5 The results presented here are also valid in other formalisms for causality based on potential outcomes. As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. 6 4 Armed with the concept of identifiability and selection diagrams, mz-transportability of causal effects can be defined as follows: Definition 3 (mz-Transportability). Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,...,n P i , Iz ∪ P ∗ , Iz in any model that induces D. ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufficient condition. Theorem 1. Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . The relation R = P ∗ (y|do(x), w) is mz-transportable from Π to π ∗ in D if the expression P (y|do(x), w, S1 , ..., Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . This result provides a powerful way to syntactically establish mz-transportability, but it is not immediately obvious whether a sequence of applications of the rules of causal calculus that achieves the reduction required by the theorem exists, and even if such sequence exists, it is not obvious how to obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b). Corollary 1. P ∗ (y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2 } and Zb = {Z1 }. Proof. The goal is to show that R = P ∗ (y|do(x)) is mz-transportable from {πa , πb } to π ∗ using experiments conducted over {Z2 } in πa and {Z1 } in πb . Note that naively trying to transport R from each of the domains individually is not possible, but R can be decomposed as follows: P ∗ (y|do(x)) = P ∗ (y|do(x), do(Z1 )) (2) P ∗ (y|do(x), do(Z1 ), z2 )P ∗ (z2 |do(x), do(Z1 )) (3) P ∗ (y|do(x), do(Z1 ), do(z2 ))P ∗ (z2 |do(x), do(Z1 )), = (4) z2 = z2 where Eq. (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. (3), and Eq. (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). 1 2 Now we can rewrite the first term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2): P ∗ (y|do(x), do(Z1 ), do(z2 )) = P (y|do(x), do(Z1 ), do(z2 ), Sa , Sb ) (5) = P (y|do(x), do(Z1 ), do(z2 ), Sb ) (6) = P (y|do(z2 ), Sb ) (7) = P (a) (y|do(z2 )), (8) where Eq. (5) follows from the theorem (and the definition of selection diagram), Eq. (6) follows , Eq. (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. (8) by the definition of selection diagram. Finally, we can rewrite the second term of Eq. (4) as follows: P ∗ (z2 |do(x), do(Z1 )) = P (z2 |do(x), do(Z1 ), Sa , Sb ) = P (z2 |do(x), do(Z1 ), Sa ) = P (z2 |x, do(Z1 ), Sa ) = P (b) (z2 |x, do(Z1 )), 5 (9) (10) (11) (12) where Eq. (9) follows from the theorem (and the definition of selection diagram), Eq. (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . Note that this equation matches the condition of the ⊥ Z1 X theorem, separate do(Z1 ) from Sb (i.e., experiments over Z1 can be used since they are available in πb ), so we can rewrite Eq. (12) using the definition of selection diagrams, the corollary follows. The next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which also demonstrates the challenge of relating mz-transportability to other types of transportability. Corollary 2. R = P ∗ (y|do(x)) is mz-transportable in D if there exists Zi ⊆ Zi such that all paths from Zi to Y are blocked by X, (Si ⊥ Y|X, Zi )D(i) , and R is computable from do(Zi ). ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identifiability. So, it is not immediately obvious how to combine the topological relations of Zi ’s with X and Y in order to create a general condition for mz-transportability, the relationship between the distributions in the different domains can get relatively intricate, but we defer this discussion for now and consider a simpler case. It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. Consider again the diagrams in Fig. 2(a,b), and assume that randomized experiments are available over {Z1 } in πa and {Z2 } in πb . Theorem 2. P ∗ (y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1 } and Zb = {Z2 }. Proof. Formally, we need to display two models M1 , M2 such that the following relations hold (as implied by Def. 3): (a) (a) PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), (b) P (Z1 , X, Z2 , Y ) = P (b) (Z1 , X, Z2 , Y ), M1 M2 (a) (a) (13) PM1 (X, Z2 , Y |do(Z1 )) = PM2 (X, Z2 , Y |do(Z1 )), (b) P (Z , X, Y |do(Z )) = P (b) (Z , X, Y |do(Z )), M 2 1 2 M2 ∗1 1 ∗ PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), for all values of Z1 , X, Z2 , and Y , and also, ∗ ∗ PM1 (Y |do(X)) = PM2 (Y |do(X)), (14) for some value of X and Y . Let V be the set of observable variables and U be the set of unobservable variables in D. Let us assume that all variables in U ∪ V are binary. Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. Define the two models as follows: Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa X=U X = Z1 ⊕ U1 1 M1 = M2 = Z2 = (X ⊕ U2 ⊕ (U4 ∧ Sa )) ∨ U6 Z2 = U4 ∧ Sa ∧ U6 ⊕ U6 Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) where ⊕ represents the exclusive or function. Both models agree in respect to P (U), which is defined as P (Ui ) = 1/2, i = 1, ..., 8. It is not difficult to evaluate these models and note that the constraints given in Eqs. (13) and (14) are satisfied (including positivity), the theorem follows. 4 Algorithm for computing mz-transportability In this section, we build on previous analyses of identifiability [7, 21, 22, 23] in order to obtain a mechanic procedure in which a collection of selection diagrams and experimental data is inputted, and the procedure returns a transport formula whenever it is able to produce one. More specifically, 6 PROCEDURE TRmz (y, x, P, I, S, W, D) INPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes π ∗ ) and active experiments I; W: weighting scheme; D: backbone of selection diagram; Si : selection nodes in πi (S0 = ∅ (i) relative to π ∗ ); [The following set and distributions are globally defined: Zi , P ∗ , PZi .] (i) ∗ ∗ OUTPUT: Px (y) in terms of P ∗ , PZ , PZi or F AIL(D, C0 ). P 1 if x = ∅, return V\Y P. P 2 if V \ An(Y)D = ∅, return TRmz (y, x ∩ An(Y)D , V\An(Y)D P, I, S, W, DAn(Y) ). 3 set W = (V \ X) \ An(Y)DX . if W = ∅, return TRmz (y, x ∪ w, P, I, S, W, D). Q P 4 if C(D \ X) = {C0 , C1 , ..., Ck }, return V\{Y,X} i TRmz (ci , v \ ci , P, I, S, W, D). 5 if C(D \ X) = {C0 }, 6 if C(D) = {D}, Q P P 7 if C0 ∈ C(D), return i|Vi ∈C0 V\V (i) P/ V\V (i−1) P. D 8 9 10 11 12 D if (∃C )C0 ⊂ C ∈ C(D), (i−1) for {i|Vi ∈ C }, set κi = κi ∪ vD \C . Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). else, if I`= ∅, for i = 0, ..., |D|, ´ if (Si ⊥ Y | X)D(i) ∧ (Zi ∩ X = ∅) , Ei = TRmz (y, x \ zi , P, Zi ∩ X, i, W, D \ {Zi ∩ X}). ⊥ PX (j) if |E| > 0, return |E| wi Ei . i=1 else, FAIL(D, C0 ). Figure 3: Modified version of identification algorithm capable of recognizing mz-transportability. our algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for identification of causal effects [22, 23] (and a version of the identification algorithm called ID). The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm. ∗ Theorem 3 (soundness). Whenever TRmz returns an expression for Px (y), it is correct. In the sequel, we demonstrate how the algorithm works through the mz-transportability of Q = P ∗ (y|do(x)) in Fig. 2(c,d) with Z∗ = {Z1 }, Za = {Z2 }, and Zb = {Z1 }. Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. The new call triggers line 4 and C(D \ {X, Z2 }) = {C0 , C1 , C2 , C3 }, where C0 = DZ1 , C1 = DZ3 , C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ∗ ∗ ∗ ually Q0 = Px,z2 ,z3 ,u,w,y (z1 ), Q1 = Px,z1 ,z2 ,u,w,y (z3 ), Q2 = Px,z1 ,z2 ,z3 ,w,y (u), and Q3 = ∗ Px,z1 ,z2 ,z3 ,u (w, y). Thus the original problem reduces to try to evaluate the equivalent expression ∗ ∗ ∗ ∗ z1 ,z3 ,u,w Px,z2 ,z3 ,u,w,y (z1 )Px,z1 ,z2 ,u,w,y (z3 ) Px,z1 ,z2 ,z3 ,w,y (u)Px,z1 ,z2 ,z3 ,u (w, y). First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. Thus it proceeds to line 6 testing whether C(D \ {X, Z1 , Z2 }) is different from D1 , which is false. In this call, ordinary identifiability would fail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can help for computing Q1 . In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . Finally, TR triggers line 8 since X is not part of Z3 ’s components in D1 (or, Z3 ∈ C = {Z2 Z3 }), so line 2 is triggered since Z2 is no longer an ancestor of Z3 in D1 , and then line 1 is triggered since the interventional set is empty in (i) (i) ∗ this local call, so Px,z1 ,z2 (z3 ) = Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ), for i = {b, ∗}. 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . TR goes to line 5, and in this local call C(D \ {X, Z1 , Z2 , Z3 , W }) = {DU }, and the test in 6 succeed, since there are more components in D. So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . The algorithm makes Px,z1 ,z2 ,z3 ,w (u) = Px,z1 ,z2 ,z3 (u)D2 |W (and update the working distribution); note that in this call, ordinary identifiability would fail since the nodes are in the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying to find experiments that can help in Q2 ’s computation. In this case, πb cannot help but πa (a) and π ∗ perhaps can, noting that new calls are launched for computing Px,z1 ,z3 (u)D2 \Z2 |W rel∗ ative to πa , and Px,z2 ,z3 (u)D2 \Z1 |W relative to π ∗ with the corresponding data structures set. (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simplification) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . ∗ In turn, both tests at lines 6 and 7 succeed, which makes the procedure to return Px,z1 ,z2 ,z3 ,u (w, y) = ∗ ∗ P (w|z3 , x, z1 , z2 )P (y|w, x, z1 , z2 , z3 , u). The composition of the return of these calls generates the following expression: ! ∗ Px (y) = X ∗ P (z1 ) z1 ,z3 ,w,u (2) w1 (1) w1 X ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) X (b) (b) Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Z2 Z2 „X + (1) w2 « « „X ∗ ∗ ∗ ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / Z2 Z2 ! (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, ..., nk ), and nk is the number of feasible estimands of k. Eq. (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. For instance, one might use an analogous to inverse-variance weighting, which sets the weights for the normalized (k) nk −2 −2 2 inverse of their variances (i.e., wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter usually disregards the intricacies of the relationships between variables, so producing a statistically less powerful estimand. Our method leverages this non-trivial and highly structured relationships, as exemplified in Eq. (15), which yields an estimand with less variance and statistically more powerful. 5 Conclusions In this paper, we treat a special type of transportability in which experiments can be conducted only over limited sets of variables in the sources and target domains, and the goal is to infer whether a certain effect can be estimated in the target using the information scattered throughout the domains. We provide a general sufficient graphical conditions for transportability based on the causal calculus along with a necessary condition for a specific scenario, which should be generalized for arbitrary structures. We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard statistical procedures and leads to the construction of statistically more powerful estimands. Acknowledgment The work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack Lee and Vasant Honavar was partially completed while they were with the Department of Computer Science at Iowa State University. The work of Vasant Honavar while working at the National Science Foundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by the grant from NSF (IIS-0711356). Any opinions, findings, and conclusions contained in this article are those of the authors and do not necessarily reflect the views of the sponsors. 8 References [1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artificial Intelligence, pages 247–254. AAAI Press, Menlo Park, CA, 2011. [2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963. [3] C. Manski. Identification for Prediction and Decision. Harvard University Press, Cambridge, Massachusetts, 2007. [4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985. [5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston, second edition, 2002. [6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007. [7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–710, 1995. [8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000. [9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009. [10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence, pages 698–704. AAAI Press, Menlo Park, CA, 2012. [12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 95–101, Menlo Park, CA, 2013. AAAI Press. [13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: ztransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), pages 361–370. AUAI Press, 2013. [14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135–143. JMLR W&CP; 31, 2013. [15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 583–590, Menlo Park, CA, 2013. AAAI Press. [16] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, 2006. [17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 3–28. MIT Press, Cambridge, MA, 2009. [18] B. Sch¨ lkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal o learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress. [19] K. Zhang, B. Sch¨ lkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional o shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP; volume 28, 2013. [20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 113–120. AUAI Press, 2012. [21] M. Kuroki and M. Miyakawa. Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105–117, 1999. [22] J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567–573. AAAI Press/The MIT Press, Menlo Park, CA, 2002. [23] I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219–1226. AAAI Press, Menlo Park, CA, 2006. 9
2 0.20569523 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf
Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identifiability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufficient or the model is misspecified, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artificial and real data and code is provided. 1
3 0.14118236 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
Author: Boqing Gong, Kristen Grauman, Fei Sha
Abstract: In visual recognition problems, the common data distribution mismatches between training and testing make domain adaptation essential. However, image data is difficult to manually divide into the discrete domains required by adaptation algorithms, and the standard practice of equating datasets with domains is a weak proxy for all the real conditions that alter the statistics in complex ways (lighting, pose, background, resolution, etc.) We propose an approach to automatically discover latent domains in image or video datasets. Our formulation imposes two key properties on domains: maximum distinctiveness and maximum learnability. By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other to the maximum extent; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain. We devise a nonparametric formulation and efficient optimization procedure that can successfully discover domains among both training and test data. We extensively evaluate our approach on object recognition and human activity recognition tasks. 1
4 0.082799874 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
Author: Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, Oliver Stegle
Abstract: Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives. 1
5 0.065255605 134 nips-2013-Graphical Models for Inference with Missing Data
Author: Karthika Mohan, Judea Pearl, Jin Tian
Abstract: We address the problem of recoverability i.e. deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called ‘Missingness Graphs’ to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we derive conditions that the graph should satisfy to ensure recoverability and devise algorithms to detect the presence of these conditions in the graph. 1
6 0.054836597 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
7 0.053201493 226 nips-2013-One-shot learning by inverting a compositional causal process
8 0.050927632 211 nips-2013-Non-Linear Domain Adaptation with Boosting
9 0.048382934 9 nips-2013-A Kernel Test for Three-Variable Interactions
10 0.047885668 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models
11 0.047642339 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
12 0.046477873 183 nips-2013-Mapping paradigm ontologies to and from the brain
13 0.042634811 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.041750003 91 nips-2013-Dirty Statistical Models
15 0.03966422 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
16 0.039607383 36 nips-2013-Annealing between distributions by averaging moments
18 0.038140006 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
19 0.03803489 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
20 0.036923531 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
topicId topicWeight
[(0, 0.117), (1, 0.026), (2, -0.004), (3, 0.01), (4, 0.013), (5, 0.014), (6, 0.037), (7, -0.0), (8, -0.043), (9, 0.0), (10, -0.028), (11, -0.079), (12, -0.008), (13, -0.032), (14, -0.035), (15, 0.035), (16, 0.019), (17, 0.041), (18, -0.052), (19, -0.024), (20, -0.013), (21, 0.0), (22, 0.023), (23, -0.015), (24, 0.013), (25, -0.026), (26, 0.03), (27, -0.015), (28, -0.077), (29, -0.051), (30, 0.03), (31, -0.019), (32, 0.014), (33, 0.022), (34, 0.02), (35, -0.034), (36, -0.014), (37, 0.071), (38, 0.06), (39, 0.12), (40, -0.225), (41, -0.009), (42, -0.022), (43, 0.047), (44, 0.134), (45, -0.072), (46, 0.046), (47, 0.076), (48, -0.131), (49, -0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.94576126 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
Author: Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl
Abstract: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power. 1 Motivation Transporting and synthesizing experimental knowledge from heterogeneous settings are central to scientific discovery. Conclusions that are obtained in a laboratory setting are transported and applied elsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. However, the conditions under which this extrapolation can be legitimized have not been formally articulated until very recently. Although the problem has been discussed in many areas of statistics, economics, and the health sciences, under rubrics such as “external validity” [2, 3], “meta-analysis” [4], “quasi-experiments” [5], “heterogeneity” [6], these discussions are limited to verbal narratives in the form of heuristic guidelines for experimental researchers – no formal treatment of the problem has been attempted to answer the practical challenge of generalizing causal knowledge across multiple heterogeneous domains with disparate experimental data posed in this paper. The fields of artificial intelligence and statistics provide the theoretical underpinnings necessary for tackling transportability. First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven fields. Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. Armed with these techniques, [1] introduced a formal language for encoding differences and commonalities between domains accompanied with necessary or sufficient conditions under which transportability of empirical findings is feasible between two domains, a source and a target; then, these conditions were extended for a complete characterization for transportability in one domain with unrestricted experimental data [11]. Subsequently, these results were generalized for the settings when ∗ These authors contributed equally to this paper. The authors’ addresses are respectively eb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu. 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”.1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , ..., πn } to a target domain π ∗ . In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. In π ∗ , potentially different from πi , only passive observations can be collected (this constraint is weakened later on). The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. Clearly, if nothing is known about the relationship between Π and π ∗ , the problem is trivial; no transfer can be justified. Yet the fact that all scientific experiments are conducted with the intent of being used elsewhere (e.g., outside the lab) implies that scientific progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. 2 Previous work and our contributions Consider Fig. 1(a) in which the node S represents factors that produce differences between source and target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.), but we find the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter P ∗ (x, y, z)). In particular, the average age in the U.S. is significantly higher than that in LA. How are we to estimate the causal effect of X on Y in U.S., denoted R = P ∗ (y|do(x))? 2 3 The selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespecific effects P ∗ (y|do(x), Z = z) are invariant across populations. Difference-generating factors are represented by a special set of variables called selection variables S (or simply S-variables), which are graphically depicted as square nodes ( ). From this assumption, the overall causal effect in the U.S. can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U.S. population, P ∗ (z), to obtain an experimental claim P ∗ (y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-specific effects to account for the new age distribution. In general, however, a more involved mixture of experimental and observational findings would be necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. In real world applications, it may happen that certain controlled experiments cannot be conducted in the source environment (for financial, ethical, or technical reasons), so only a limited amount 1 The machine learning literature has been concerned about discrepancies among domains in the context, almost exclusively, on predictive or classification tasks as opposed to learning causal or counterfactual measures [16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of learning and also leverages knowledge about the underlying data-generating structure [18, 19]. 2 We will use Px (y | z) interchangeably with P (y | do(x), z). 3 We use the structural interpretation of causal diagrams as described in [9, pp. 205]. 2 S S Z Z1 S S X Y (a) X Y (b) Z2 X Y (c) Figure 1: The selection variables S are depicted as square nodes ( ). (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. (b) The smallest possible selection diagram in which a causal relation is not transportable. (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. of experimental information can be gathered. A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. 4 One might surmise that multiple pairwise z-transportability would be sufficient to solve the mztransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol level), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. More interestingly, we consider the more stringent scenario where only certain experiments can be performed in each of the domains. For instance, assume that it is only possible to conduct experiments over {Z2 } in πa and over {Z1 } in πb . Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i.e., {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. In the sequel, we try to better understand some of these issues, and we develop sufficient or (specific) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. We also consider issues relative to the variance of the estimand aiming for improving sample efficiency and increasing statistical power. 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as defined in [9, pp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions are modifications of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). If we assume, more stringently, availability of experiments Za = {Z2 }, Zb = {Z1 }, Z∗ = {}, a more elaborated analysis can show that R can be estimated combining different pieces from both domains. a new model Mx = U, V, Fx , P (U) , where Fx is obtained after replacing fX ∈ F for every X ∈ X with a new function that outputs a constant value x given by do(x). 5 We follow the conventions given in [9]. We denote variables by capital letters and their realized values by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of realized values by bold small letters. We use the typical graph-theoretic terminology with the corresponding abbreviations P a(Y)G and An(Y)G , which will denote respectively the set of observable parents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph of G where all incoming arrows into X and all outgoing arrows from Z are removed. Key to the analysis of transportability is the notion of “identifiability,” defined below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. Definition 1 (Causal Effects Identifiability (Pearl, 2000, pp. 77)). The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identifiable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. Causal models and their induced graphs are usually associated with one particular domain (also called setting, study, population, or environment). In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. This is possible if we assume that the structural equations share the same set of arguments, though the functional forms of the equations may vary arbitrarily [11]. 6 Definition 2 (Selection Diagram). Let M, M ∗ be a pair of structural causal models [9, pp. 205] relative to domains π, π ∗ , sharing a causal diagram G. M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. Every edge in G is also an edge in D; 2. D contains an extra edge Si → Vi whenever there might exist a discrepancy fi = fi∗ or P (Ui ) = P ∗ (Ui ) between M and M ∗ . In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place.7 Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is identical in both domains. 5 The results presented here are also valid in other formalisms for causality based on potential outcomes. As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. 6 4 Armed with the concept of identifiability and selection diagrams, mz-transportability of causal effects can be defined as follows: Definition 3 (mz-Transportability). Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,...,n P i , Iz ∪ P ∗ , Iz in any model that induces D. ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufficient condition. Theorem 1. Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . The relation R = P ∗ (y|do(x), w) is mz-transportable from Π to π ∗ in D if the expression P (y|do(x), w, S1 , ..., Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . This result provides a powerful way to syntactically establish mz-transportability, but it is not immediately obvious whether a sequence of applications of the rules of causal calculus that achieves the reduction required by the theorem exists, and even if such sequence exists, it is not obvious how to obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b). Corollary 1. P ∗ (y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2 } and Zb = {Z1 }. Proof. The goal is to show that R = P ∗ (y|do(x)) is mz-transportable from {πa , πb } to π ∗ using experiments conducted over {Z2 } in πa and {Z1 } in πb . Note that naively trying to transport R from each of the domains individually is not possible, but R can be decomposed as follows: P ∗ (y|do(x)) = P ∗ (y|do(x), do(Z1 )) (2) P ∗ (y|do(x), do(Z1 ), z2 )P ∗ (z2 |do(x), do(Z1 )) (3) P ∗ (y|do(x), do(Z1 ), do(z2 ))P ∗ (z2 |do(x), do(Z1 )), = (4) z2 = z2 where Eq. (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. (3), and Eq. (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). 1 2 Now we can rewrite the first term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2): P ∗ (y|do(x), do(Z1 ), do(z2 )) = P (y|do(x), do(Z1 ), do(z2 ), Sa , Sb ) (5) = P (y|do(x), do(Z1 ), do(z2 ), Sb ) (6) = P (y|do(z2 ), Sb ) (7) = P (a) (y|do(z2 )), (8) where Eq. (5) follows from the theorem (and the definition of selection diagram), Eq. (6) follows , Eq. (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. (8) by the definition of selection diagram. Finally, we can rewrite the second term of Eq. (4) as follows: P ∗ (z2 |do(x), do(Z1 )) = P (z2 |do(x), do(Z1 ), Sa , Sb ) = P (z2 |do(x), do(Z1 ), Sa ) = P (z2 |x, do(Z1 ), Sa ) = P (b) (z2 |x, do(Z1 )), 5 (9) (10) (11) (12) where Eq. (9) follows from the theorem (and the definition of selection diagram), Eq. (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . Note that this equation matches the condition of the ⊥ Z1 X theorem, separate do(Z1 ) from Sb (i.e., experiments over Z1 can be used since they are available in πb ), so we can rewrite Eq. (12) using the definition of selection diagrams, the corollary follows. The next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which also demonstrates the challenge of relating mz-transportability to other types of transportability. Corollary 2. R = P ∗ (y|do(x)) is mz-transportable in D if there exists Zi ⊆ Zi such that all paths from Zi to Y are blocked by X, (Si ⊥ Y|X, Zi )D(i) , and R is computable from do(Zi ). ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identifiability. So, it is not immediately obvious how to combine the topological relations of Zi ’s with X and Y in order to create a general condition for mz-transportability, the relationship between the distributions in the different domains can get relatively intricate, but we defer this discussion for now and consider a simpler case. It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. Consider again the diagrams in Fig. 2(a,b), and assume that randomized experiments are available over {Z1 } in πa and {Z2 } in πb . Theorem 2. P ∗ (y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1 } and Zb = {Z2 }. Proof. Formally, we need to display two models M1 , M2 such that the following relations hold (as implied by Def. 3): (a) (a) PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), (b) P (Z1 , X, Z2 , Y ) = P (b) (Z1 , X, Z2 , Y ), M1 M2 (a) (a) (13) PM1 (X, Z2 , Y |do(Z1 )) = PM2 (X, Z2 , Y |do(Z1 )), (b) P (Z , X, Y |do(Z )) = P (b) (Z , X, Y |do(Z )), M 2 1 2 M2 ∗1 1 ∗ PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), for all values of Z1 , X, Z2 , and Y , and also, ∗ ∗ PM1 (Y |do(X)) = PM2 (Y |do(X)), (14) for some value of X and Y . Let V be the set of observable variables and U be the set of unobservable variables in D. Let us assume that all variables in U ∪ V are binary. Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. Define the two models as follows: Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa X=U X = Z1 ⊕ U1 1 M1 = M2 = Z2 = (X ⊕ U2 ⊕ (U4 ∧ Sa )) ∨ U6 Z2 = U4 ∧ Sa ∧ U6 ⊕ U6 Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) where ⊕ represents the exclusive or function. Both models agree in respect to P (U), which is defined as P (Ui ) = 1/2, i = 1, ..., 8. It is not difficult to evaluate these models and note that the constraints given in Eqs. (13) and (14) are satisfied (including positivity), the theorem follows. 4 Algorithm for computing mz-transportability In this section, we build on previous analyses of identifiability [7, 21, 22, 23] in order to obtain a mechanic procedure in which a collection of selection diagrams and experimental data is inputted, and the procedure returns a transport formula whenever it is able to produce one. More specifically, 6 PROCEDURE TRmz (y, x, P, I, S, W, D) INPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes π ∗ ) and active experiments I; W: weighting scheme; D: backbone of selection diagram; Si : selection nodes in πi (S0 = ∅ (i) relative to π ∗ ); [The following set and distributions are globally defined: Zi , P ∗ , PZi .] (i) ∗ ∗ OUTPUT: Px (y) in terms of P ∗ , PZ , PZi or F AIL(D, C0 ). P 1 if x = ∅, return V\Y P. P 2 if V \ An(Y)D = ∅, return TRmz (y, x ∩ An(Y)D , V\An(Y)D P, I, S, W, DAn(Y) ). 3 set W = (V \ X) \ An(Y)DX . if W = ∅, return TRmz (y, x ∪ w, P, I, S, W, D). Q P 4 if C(D \ X) = {C0 , C1 , ..., Ck }, return V\{Y,X} i TRmz (ci , v \ ci , P, I, S, W, D). 5 if C(D \ X) = {C0 }, 6 if C(D) = {D}, Q P P 7 if C0 ∈ C(D), return i|Vi ∈C0 V\V (i) P/ V\V (i−1) P. D 8 9 10 11 12 D if (∃C )C0 ⊂ C ∈ C(D), (i−1) for {i|Vi ∈ C }, set κi = κi ∪ vD \C . Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). else, if I`= ∅, for i = 0, ..., |D|, ´ if (Si ⊥ Y | X)D(i) ∧ (Zi ∩ X = ∅) , Ei = TRmz (y, x \ zi , P, Zi ∩ X, i, W, D \ {Zi ∩ X}). ⊥ PX (j) if |E| > 0, return |E| wi Ei . i=1 else, FAIL(D, C0 ). Figure 3: Modified version of identification algorithm capable of recognizing mz-transportability. our algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for identification of causal effects [22, 23] (and a version of the identification algorithm called ID). The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm. ∗ Theorem 3 (soundness). Whenever TRmz returns an expression for Px (y), it is correct. In the sequel, we demonstrate how the algorithm works through the mz-transportability of Q = P ∗ (y|do(x)) in Fig. 2(c,d) with Z∗ = {Z1 }, Za = {Z2 }, and Zb = {Z1 }. Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. The new call triggers line 4 and C(D \ {X, Z2 }) = {C0 , C1 , C2 , C3 }, where C0 = DZ1 , C1 = DZ3 , C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ∗ ∗ ∗ ually Q0 = Px,z2 ,z3 ,u,w,y (z1 ), Q1 = Px,z1 ,z2 ,u,w,y (z3 ), Q2 = Px,z1 ,z2 ,z3 ,w,y (u), and Q3 = ∗ Px,z1 ,z2 ,z3 ,u (w, y). Thus the original problem reduces to try to evaluate the equivalent expression ∗ ∗ ∗ ∗ z1 ,z3 ,u,w Px,z2 ,z3 ,u,w,y (z1 )Px,z1 ,z2 ,u,w,y (z3 ) Px,z1 ,z2 ,z3 ,w,y (u)Px,z1 ,z2 ,z3 ,u (w, y). First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. Thus it proceeds to line 6 testing whether C(D \ {X, Z1 , Z2 }) is different from D1 , which is false. In this call, ordinary identifiability would fail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can help for computing Q1 . In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . Finally, TR triggers line 8 since X is not part of Z3 ’s components in D1 (or, Z3 ∈ C = {Z2 Z3 }), so line 2 is triggered since Z2 is no longer an ancestor of Z3 in D1 , and then line 1 is triggered since the interventional set is empty in (i) (i) ∗ this local call, so Px,z1 ,z2 (z3 ) = Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ), for i = {b, ∗}. 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . TR goes to line 5, and in this local call C(D \ {X, Z1 , Z2 , Z3 , W }) = {DU }, and the test in 6 succeed, since there are more components in D. So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . The algorithm makes Px,z1 ,z2 ,z3 ,w (u) = Px,z1 ,z2 ,z3 (u)D2 |W (and update the working distribution); note that in this call, ordinary identifiability would fail since the nodes are in the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying to find experiments that can help in Q2 ’s computation. In this case, πb cannot help but πa (a) and π ∗ perhaps can, noting that new calls are launched for computing Px,z1 ,z3 (u)D2 \Z2 |W rel∗ ative to πa , and Px,z2 ,z3 (u)D2 \Z1 |W relative to π ∗ with the corresponding data structures set. (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simplification) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . ∗ In turn, both tests at lines 6 and 7 succeed, which makes the procedure to return Px,z1 ,z2 ,z3 ,u (w, y) = ∗ ∗ P (w|z3 , x, z1 , z2 )P (y|w, x, z1 , z2 , z3 , u). The composition of the return of these calls generates the following expression: ! ∗ Px (y) = X ∗ P (z1 ) z1 ,z3 ,w,u (2) w1 (1) w1 X ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) X (b) (b) Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Z2 Z2 „X + (1) w2 « « „X ∗ ∗ ∗ ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / Z2 Z2 ! (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, ..., nk ), and nk is the number of feasible estimands of k. Eq. (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. For instance, one might use an analogous to inverse-variance weighting, which sets the weights for the normalized (k) nk −2 −2 2 inverse of their variances (i.e., wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter usually disregards the intricacies of the relationships between variables, so producing a statistically less powerful estimand. Our method leverages this non-trivial and highly structured relationships, as exemplified in Eq. (15), which yields an estimand with less variance and statistically more powerful. 5 Conclusions In this paper, we treat a special type of transportability in which experiments can be conducted only over limited sets of variables in the sources and target domains, and the goal is to infer whether a certain effect can be estimated in the target using the information scattered throughout the domains. We provide a general sufficient graphical conditions for transportability based on the causal calculus along with a necessary condition for a specific scenario, which should be generalized for arbitrary structures. We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard statistical procedures and leads to the construction of statistically more powerful estimands. Acknowledgment The work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack Lee and Vasant Honavar was partially completed while they were with the Department of Computer Science at Iowa State University. The work of Vasant Honavar while working at the National Science Foundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by the grant from NSF (IIS-0711356). Any opinions, findings, and conclusions contained in this article are those of the authors and do not necessarily reflect the views of the sponsors. 8 References [1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artificial Intelligence, pages 247–254. AAAI Press, Menlo Park, CA, 2011. [2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963. [3] C. Manski. Identification for Prediction and Decision. Harvard University Press, Cambridge, Massachusetts, 2007. [4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985. [5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston, second edition, 2002. [6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007. [7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–710, 1995. [8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000. [9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009. [10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence, pages 698–704. AAAI Press, Menlo Park, CA, 2012. [12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 95–101, Menlo Park, CA, 2013. AAAI Press. [13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: ztransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), pages 361–370. AUAI Press, 2013. [14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135–143. JMLR W&CP; 31, 2013. [15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 583–590, Menlo Park, CA, 2013. AAAI Press. [16] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, 2006. [17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 3–28. MIT Press, Cambridge, MA, 2009. [18] B. Sch¨ lkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal o learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress. [19] K. Zhang, B. Sch¨ lkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional o shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP; volume 28, 2013. [20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 113–120. AUAI Press, 2012. [21] M. Kuroki and M. Miyakawa. Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105–117, 1999. [22] J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567–573. AAAI Press/The MIT Press, Menlo Park, CA, 2002. [23] I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219–1226. AAAI Press, Menlo Park, CA, 2006. 9
2 0.69295025 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
Author: Boqing Gong, Kristen Grauman, Fei Sha
Abstract: In visual recognition problems, the common data distribution mismatches between training and testing make domain adaptation essential. However, image data is difficult to manually divide into the discrete domains required by adaptation algorithms, and the standard practice of equating datasets with domains is a weak proxy for all the real conditions that alter the statistics in complex ways (lighting, pose, background, resolution, etc.) We propose an approach to automatically discover latent domains in image or video datasets. Our formulation imposes two key properties on domains: maximum distinctiveness and maximum learnability. By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other to the maximum extent; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain. We devise a nonparametric formulation and efficient optimization procedure that can successfully discover domains among both training and test data. We extensively evaluate our approach on object recognition and human activity recognition tasks. 1
3 0.68368846 134 nips-2013-Graphical Models for Inference with Missing Data
Author: Karthika Mohan, Judea Pearl, Jin Tian
Abstract: We address the problem of recoverability i.e. deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called ‘Missingness Graphs’ to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we derive conditions that the graph should satisfy to ensure recoverability and devise algorithms to detect the presence of these conditions in the graph. 1
4 0.52743661 211 nips-2013-Non-Linear Domain Adaptation with Boosting
Author: Carlos J. Becker, Christos M. Christoudias, Pascal Fua
Abstract: A common assumption in machine vision is that the training and test samples are drawn from the same distribution. However, there are many problems when this assumption is grossly violated, as in bio-medical applications where different acquisitions can generate drastic variations in the appearance of the data due to changing experimental conditions. This problem is accentuated with 3D data, for which annotation is very time-consuming, limiting the amount of data that can be labeled in new acquisitions for training. In this paper we present a multitask learning algorithm for domain adaptation based on boosting. Unlike previous approaches that learn task-specific decision boundaries, our method learns a single decision boundary in a shared feature space, common to all tasks. We use the boosting-trick to learn a non-linear mapping of the observations in each task, with no need for specific a-priori knowledge of its global analytical form. This yields a more parameter-free domain adaptation approach that successfully leverages learning on new tasks where labeled data is scarce. We evaluate our approach on two challenging bio-medical datasets and achieve a significant improvement over the state of the art. 1
5 0.51705098 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf
Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identifiability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufficient or the model is misspecified, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artificial and real data and code is provided. 1
6 0.48135555 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models
7 0.43105754 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
8 0.41659096 9 nips-2013-A Kernel Test for Three-Variable Interactions
9 0.41113913 332 nips-2013-Tracking Time-varying Graphical Structure
10 0.39864773 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
11 0.39743015 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
12 0.39394084 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
13 0.38589245 110 nips-2013-Estimating the Unseen: Improved Estimators for Entropy and other Properties
14 0.38476002 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling
15 0.38150465 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors
16 0.38124365 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
17 0.38113555 183 nips-2013-Mapping paradigm ontologies to and from the brain
18 0.37598905 122 nips-2013-First-order Decomposition Trees
19 0.37486237 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
20 0.37143984 294 nips-2013-Similarity Component Analysis
topicId topicWeight
[(2, 0.012), (16, 0.023), (19, 0.014), (33, 0.095), (34, 0.102), (41, 0.033), (49, 0.063), (52, 0.301), (56, 0.103), (70, 0.021), (85, 0.039), (89, 0.044), (93, 0.051), (95, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.74845517 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
Author: Elias Bareinboim, Sanghack Lee, Vasant Honavar, Judea Pearl
Abstract: This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target domain, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the causal calculus, thus extending the original setting of transportability introduced in [1], which assumes only one domain with full experimental information available. We further provide different graphical and algorithmic conditions for computing the transport formula in this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects in the target domain. We also consider the issue of minimizing the variance of the produced estimand in order to increase power. 1 Motivation Transporting and synthesizing experimental knowledge from heterogeneous settings are central to scientific discovery. Conclusions that are obtained in a laboratory setting are transported and applied elsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven sciences, experiments are conducted on disparate domains, but the intention is almost invariably to fuse the acquired knowledge, and translate it into some meaningful claim about a target domain, which is usually different than any of the individual study domains. However, the conditions under which this extrapolation can be legitimized have not been formally articulated until very recently. Although the problem has been discussed in many areas of statistics, economics, and the health sciences, under rubrics such as “external validity” [2, 3], “meta-analysis” [4], “quasi-experiments” [5], “heterogeneity” [6], these discussions are limited to verbal narratives in the form of heuristic guidelines for experimental researchers – no formal treatment of the problem has been attempted to answer the practical challenge of generalizing causal knowledge across multiple heterogeneous domains with disparate experimental data posed in this paper. The fields of artificial intelligence and statistics provide the theoretical underpinnings necessary for tackling transportability. First, the distinction between statistical and causal knowledge has received syntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal inference in data-driven fields. Second, the inferential machinery provided by the causal calculus (do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains. Armed with these techniques, [1] introduced a formal language for encoding differences and commonalities between domains accompanied with necessary or sufficient conditions under which transportability of empirical findings is feasible between two domains, a source and a target; then, these conditions were extended for a complete characterization for transportability in one domain with unrestricted experimental data [11]. Subsequently, these results were generalized for the settings when ∗ These authors contributed equally to this paper. The authors’ addresses are respectively eb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu. 1 only limited experiments are available in the source domain [12, 13], and further for when multiple source domains with unrestricted experimental information are available [14, 15]. This paper broadens these discussions introducing a more general setting in which multiple heterogeneous sources with limited and distinct experiments are available, a task that we call here “mz-transportability”.1 More formally, the mz-transportability problem concerns with the transfer of causal knowledge from a heterogeneous collection of source domains Π = {π1 , ..., πn } to a target domain π ∗ . In each domain πi ∈ Π, experiments over a set of variables Zi can be performed, and causal knowledge gathered. In π ∗ , potentially different from πi , only passive observations can be collected (this constraint is weakened later on). The problem is to infer a causal relationship R in π ∗ using knowledge obtained in Π. Clearly, if nothing is known about the relationship between Π and π ∗ , the problem is trivial; no transfer can be justified. Yet the fact that all scientific experiments are conducted with the intent of being used elsewhere (e.g., outside the lab) implies that scientific progress relies on the assumption that certain domains share common characteristics and that, owed to these commonalities, causal claims would be valid in new settings even where experiments cannot be conducted. The problem stated in this paper generalizes the one-dimensional version of transportability with limited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of interest might not be individually transportable to the target domain from the experiments in any of the available sources, combining different pieces from the various sources may enable the estimation of the desired effects (to be shown later on). The goal of this paper is to formally understand under which conditions the target quantity is (non-parametrically) estimable from the available data. 2 Previous work and our contributions Consider Fig. 1(a) in which the node S represents factors that produce differences between source and target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and estimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by P (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.), but we find the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter P ∗ (x, y, z)). In particular, the average age in the U.S. is significantly higher than that in LA. How are we to estimate the causal effect of X on Y in U.S., denoted R = P ∗ (y|do(x))? 2 3 The selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference between the two populations are factors determining age distributions, shown as S → Z, while agespecific effects P ∗ (y|do(x), Z = z) are invariant across populations. Difference-generating factors are represented by a special set of variables called selection variables S (or simply S-variables), which are graphically depicted as square nodes ( ). From this assumption, the overall causal effect in the U.S. can be derived as follows: R P ∗ (y|do(x), z)P ∗ (z) = z P (y|do(x), z)P ∗ (z) = (1) z The last line is the transport formula for R. It combines experimental results obtained in LA, P (y|do(x), z), with observational aspects of the U.S. population, P ∗ (z), to obtain an experimental claim P ∗ (y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple re-calibration (or re-weighting) of the age-specific effects to account for the new age distribution. In general, however, a more involved mixture of experimental and observational findings would be necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest example in which transportability is not feasible even when experiments over X in π are available. In real world applications, it may happen that certain controlled experiments cannot be conducted in the source environment (for financial, ethical, or technical reasons), so only a limited amount 1 The machine learning literature has been concerned about discrepancies among domains in the context, almost exclusively, on predictive or classification tasks as opposed to learning causal or counterfactual measures [16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of learning and also leverages knowledge about the underlying data-generating structure [18, 19]. 2 We will use Px (y | z) interchangeably with P (y | do(x), z). 3 We use the structural interpretation of causal diagrams as described in [9, pp. 205]. 2 S S Z Z1 S S X Y (a) X Y (b) Z2 X Y (c) Figure 1: The selection variables S are depicted as square nodes ( ). (a) Selection diagram illustrating when transportability between two domains is trivially solved through simple recalibration. (b) The smallest possible selection diagram in which a causal relation is not transportable. (c) Selection diagram illustrating transportability when only experiments over {Z1 } are available in the source. of experimental information can be gathered. A natural question arises whether the investigator in possession of a limited set of experiments would still be able to estimate the desired effects at the target domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the target quantity is R = P ∗ (y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1 )), the conditional distribution of Y given X in the experimental study when Z1 is randomized. 4 One might surmise that multiple pairwise z-transportability would be sufficient to solve the mztransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns the transport of experimental results from two sources ({πa , πb }) to infer the effect of X on Y in π ∗ , R = P ∗ (y|do(x)). In these diagrams, X may represent the treatment (e.g., cholesterol level), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable (e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental studies randomizing {Z1 , Z2 } can be conducted in both domains. A simple analysis based on [12] can show that R cannot be z-transported from either source alone, but it turns out that combining in a special way experiments from both sources allows one to determine the effect in the target. More interestingly, we consider the more stringent scenario where only certain experiments can be performed in each of the domains. For instance, assume that it is only possible to conduct experiments over {Z2 } in πa and over {Z1 } in πb . Obviously, R will not be z-transported individually from these domains, but it turns out that taking both sets of experiments into account, R = z2 P (a) (y|do(z2 ))P (b) (z2 |x, do(Z1 )), which fully uses all pieces of experimental data available. In other words, we were able to decompose R into subrelations such that each one is separately z-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the case in this example that if the domains in which experiments were conducted were reversed (i.e., {Z1 } randomized in πa , {Z2 } in πb ), it will not be possible to transport R by any method – the target relation is simply not computable from the available data (formally shown later on). This illustrates some of the subtle issues mz-transportability entails, which cannot be immediately cast in terms of previous instances of the transportability class. In the sequel, we try to better understand some of these issues, and we develop sufficient or (specific) necessary conditions for deciding special transportability for arbitrary collection of selection diagrams and set of experiments. We further construct an algorithm for deciding mz-transportability of joint causal effects and returning the correct transport formula whenever this is possible. We also consider issues relative to the variance of the estimand aiming for improving sample efficiency and increasing statistical power. 3 Graphical conditions for mz-transportability The basic semantical framework in our analysis rests on structural causal models as defined in [9, pp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions are modifications of functional relationships, and each action do(x) on a causal model M produces 4 A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experiments on diet (Z1 ) given that cholesterol levels cannot be randomized [20]. 3 Z2 Z1 Z1 X Z1 X X Z2 W Z2 Y (a) Z2 Z1 Y X Z3 W U U Y Y (b) (c) Z3 (d) Figure 2: Selection diagrams illustrating impossibility of estimating R = P ∗ (y|do(x)) through individual transportability from πa and πb even when Z = {Z1 , Z2 } (for (a, b), (c, d))). If we assume, more stringently, availability of experiments Za = {Z2 }, Zb = {Z1 }, Z∗ = {}, a more elaborated analysis can show that R can be estimated combining different pieces from both domains. a new model Mx = U, V, Fx , P (U) , where Fx is obtained after replacing fX ∈ F for every X ∈ X with a new function that outputs a constant value x given by do(x). 5 We follow the conventions given in [9]. We denote variables by capital letters and their realized values by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of realized values by bold small letters. We use the typical graph-theoretic terminology with the corresponding abbreviations P a(Y)G and An(Y)G , which will denote respectively the set of observable parents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G containing nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph of G where all incoming arrows into X and all outgoing arrows from Z are removed. Key to the analysis of transportability is the notion of “identifiability,” defined below, which expresses the requirement that causal effects are computable from a combination of data P and assumptions embodied in a causal graph G. Definition 1 (Causal Effects Identifiability (Pearl, 2000, pp. 77)). The causal effect of an action do(x) on a set of variables Y such that Y ∩ X = ∅ is said to be identifiable from P in G if Px (y) is uniquely computable from P (V) in any model that induces G. Causal models and their induced graphs are usually associated with one particular domain (also called setting, study, population, or environment). In ordinary transportability, this representation was extended to capture properties of two domains simultaneously. This is possible if we assume that the structural equations share the same set of arguments, though the functional forms of the equations may vary arbitrarily [11]. 6 Definition 2 (Selection Diagram). Let M, M ∗ be a pair of structural causal models [9, pp. 205] relative to domains π, π ∗ , sharing a causal diagram G. M, M ∗ is said to induce a selection diagram D if D is constructed as follows: 1. Every edge in G is also an edge in D; 2. D contains an extra edge Si → Vi whenever there might exist a discrepancy fi = fi∗ or P (Ui ) = P ∗ (Ui ) between M and M ∗ . In words, the S-variables locate the mechanisms where structural discrepancies between the two domains are suspected to take place.7 Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is identical in both domains. 5 The results presented here are also valid in other formalisms for causality based on potential outcomes. As discussed in the reference, the assumption of no structural changes between domains can be relaxed, but some structural assumptions regarding the discrepancies between domains must still hold. 7 Transportability assumes that enough structural knowledge about both domains is known in order to substantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery algorithms might be used to infer the diagrams from data [8, 9]. 6 4 Armed with the concept of identifiability and selection diagrams, mz-transportability of causal effects can be defined as follows: Definition 3 (mz-Transportability). Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Zi (and Z∗ ) i be the variables in which experiments can be conducted in domain πi (and π ∗ ). Let P i , Iz be i i the pair of observational and interventional distributions of πi , where Iz = Z ⊆Zi P (v|do(z )), ∗ and in an analogous manner, P ∗ , Iz be the observational and interventional distributions of π ∗ . ∗ ∗ The causal effect R = Px (y|w) is said to be mz-transportable from Π to π ∗ in D if Px (y|w) is i ∗ uniquely computable from i=1,...,n P i , Iz ∪ P ∗ , Iz in any model that induces D. ∗ i The requirement that R is uniquely computable from P ∗ , Iz and P i , Iz from all sources has a syntactic image in the causal calculus, which is captured by the following sufficient condition. Theorem 1. Let D = {D(1) , ..., D(n) } be a collection of selection diagrams relative to source domains Π = {π1 , ..., πn }, and target domain π ∗ , respectively, and Si represents the collection of i ∗ S-variables in the selection diagram D(i) . Let { P i , Iz } and P ∗ , Iz be respectively the pairs of observational and interventional distributions in the sources Π and target π ∗ . The relation R = P ∗ (y|do(x), w) is mz-transportable from Π to π ∗ in D if the expression P (y|do(x), w, S1 , ..., Sn ) is reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that ∗ i apply to subsets of Iz have no Si -variables or (2) do-operators apply only to subsets of Iz . This result provides a powerful way to syntactically establish mz-transportability, but it is not immediately obvious whether a sequence of applications of the rules of causal calculus that achieves the reduction required by the theorem exists, and even if such sequence exists, it is not obvious how to obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b). Corollary 1. P ∗ (y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2 } and Zb = {Z1 }. Proof. The goal is to show that R = P ∗ (y|do(x)) is mz-transportable from {πa , πb } to π ∗ using experiments conducted over {Z2 } in πa and {Z1 } in πb . Note that naively trying to transport R from each of the domains individually is not possible, but R can be decomposed as follows: P ∗ (y|do(x)) = P ∗ (y|do(x), do(Z1 )) (2) P ∗ (y|do(x), do(Z1 ), z2 )P ∗ (z2 |do(x), do(Z1 )) (3) P ∗ (y|do(x), do(Z1 ), do(z2 ))P ∗ (z2 |do(x), do(Z1 )), = (4) z2 = z2 where Eq. (2) follows by rule 3 of the causal calculus since (Z1 ⊥ Y |X)DX,Z holds, we con⊥ 1 dition on Z2 in Eq. (3), and Eq. (4) follows by rule 2 of the causal calculus since (Z2 ⊥ ⊥ Y |X, Z1 )DX,Z ,Z , where D is the diagram in π ∗ (despite the location of the S-nodes). 1 2 Now we can rewrite the first term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2): P ∗ (y|do(x), do(Z1 ), do(z2 )) = P (y|do(x), do(Z1 ), do(z2 ), Sa , Sb ) (5) = P (y|do(x), do(Z1 ), do(z2 ), Sb ) (6) = P (y|do(z2 ), Sb ) (7) = P (a) (y|do(z2 )), (8) where Eq. (5) follows from the theorem (and the definition of selection diagram), Eq. (6) follows , Eq. (7) follows from rule from rule 1 of the causal calculus since (Sa ⊥ Y |Z1 , Z2 , X)D(a) ⊥ Z1 ,Z2 ,X 3 of the causal calculus since (Z1 , X ⊥ Y |Z2 )D(a) ⊥ . Note that this equation matches with the Z1 ,Z2 ,X a syntactic goal of Theorem 1 since we have precisely do(z2 ) separated from Sa (and Z2 ∈ Iz ); so, we can rewrite the expression which results in Eq. (8) by the definition of selection diagram. Finally, we can rewrite the second term of Eq. (4) as follows: P ∗ (z2 |do(x), do(Z1 )) = P (z2 |do(x), do(Z1 ), Sa , Sb ) = P (z2 |do(x), do(Z1 ), Sa ) = P (z2 |x, do(Z1 ), Sa ) = P (b) (z2 |x, do(Z1 )), 5 (9) (10) (11) (12) where Eq. (9) follows from the theorem (and the definition of selection diagram), Eq. (10) follows from rule 1 of the causal calculus since (Sb ⊥ Z2 |Z1 , X)D(b) , Eq. (11) follows from rule 2 of ⊥ Z1 ,X the causal calculus since (X ⊥ Z2 |Z1 )D(b) . Note that this equation matches the condition of the ⊥ Z1 X theorem, separate do(Z1 ) from Sb (i.e., experiments over Z1 can be used since they are available in πb ), so we can rewrite Eq. (12) using the definition of selection diagrams, the corollary follows. The next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which also demonstrates the challenge of relating mz-transportability to other types of transportability. Corollary 2. R = P ∗ (y|do(x)) is mz-transportable in D if there exists Zi ⊆ Zi such that all paths from Zi to Y are blocked by X, (Si ⊥ Y|X, Zi )D(i) , and R is computable from do(Zi ). ⊥ X,Z i Remarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability in the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X, which suggests how different this transportability is from z-identifiability. So, it is not immediately obvious how to combine the topological relations of Zi ’s with X and Y in order to create a general condition for mz-transportability, the relationship between the distributions in the different domains can get relatively intricate, but we defer this discussion for now and consider a simpler case. It is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we show an example in which such derivation does not even exist. Consider again the diagrams in Fig. 2(a,b), and assume that randomized experiments are available over {Z1 } in πa and {Z2 } in πb . Theorem 2. P ∗ (y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1 } and Zb = {Z2 }. Proof. Formally, we need to display two models M1 , M2 such that the following relations hold (as implied by Def. 3): (a) (a) PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), (b) P (Z1 , X, Z2 , Y ) = P (b) (Z1 , X, Z2 , Y ), M1 M2 (a) (a) (13) PM1 (X, Z2 , Y |do(Z1 )) = PM2 (X, Z2 , Y |do(Z1 )), (b) P (Z , X, Y |do(Z )) = P (b) (Z , X, Y |do(Z )), M 2 1 2 M2 ∗1 1 ∗ PM1 (Z1 , X, Z2 , Y ) = PM2 (Z1 , X, Z2 , Y ), for all values of Z1 , X, Z2 , and Y , and also, ∗ ∗ PM1 (Y |do(X)) = PM2 (Y |do(X)), (14) for some value of X and Y . Let V be the set of observable variables and U be the set of unobservable variables in D. Let us assume that all variables in U ∪ V are binary. Let U1 , U2 ∈ U be the common causes of Z1 and X and Z2 , respectively; let U3 , U4 , U5 ∈ U be a random disturbance exclusive to Z1 , Z2 , and Y , respectively, and U6 ∈ U be an extra random disturbance exclusive to Z2 , and U7 , U8 ∈ U to Y . Let Sa and Sb index the model in the following way: the tuples Sa = 1, Sb = 0 , Sa = 0, Sb = 1 , Sa = 0, Sb = 0 represent domains πa , πb , and π ∗ , respectively. Define the two models as follows: Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa Z1 = U1 ⊕ U2 ⊕ U3 ∧ Sa X=U X = Z1 ⊕ U1 1 M1 = M2 = Z2 = (X ⊕ U2 ⊕ (U4 ∧ Sa )) ∨ U6 Z2 = U4 ∧ Sa ∧ U6 ⊕ U6 Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) Y = (Z2 ∧ U5 ) ⊕ (U5 ∧ U7 ) ⊕ (Sb ∧ U8 ) where ⊕ represents the exclusive or function. Both models agree in respect to P (U), which is defined as P (Ui ) = 1/2, i = 1, ..., 8. It is not difficult to evaluate these models and note that the constraints given in Eqs. (13) and (14) are satisfied (including positivity), the theorem follows. 4 Algorithm for computing mz-transportability In this section, we build on previous analyses of identifiability [7, 21, 22, 23] in order to obtain a mechanic procedure in which a collection of selection diagrams and experimental data is inputted, and the procedure returns a transport formula whenever it is able to produce one. More specifically, 6 PROCEDURE TRmz (y, x, P, I, S, W, D) INPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes π ∗ ) and active experiments I; W: weighting scheme; D: backbone of selection diagram; Si : selection nodes in πi (S0 = ∅ (i) relative to π ∗ ); [The following set and distributions are globally defined: Zi , P ∗ , PZi .] (i) ∗ ∗ OUTPUT: Px (y) in terms of P ∗ , PZ , PZi or F AIL(D, C0 ). P 1 if x = ∅, return V\Y P. P 2 if V \ An(Y)D = ∅, return TRmz (y, x ∩ An(Y)D , V\An(Y)D P, I, S, W, DAn(Y) ). 3 set W = (V \ X) \ An(Y)DX . if W = ∅, return TRmz (y, x ∪ w, P, I, S, W, D). Q P 4 if C(D \ X) = {C0 , C1 , ..., Ck }, return V\{Y,X} i TRmz (ci , v \ ci , P, I, S, W, D). 5 if C(D \ X) = {C0 }, 6 if C(D) = {D}, Q P P 7 if C0 ∈ C(D), return i|Vi ∈C0 V\V (i) P/ V\V (i−1) P. D 8 9 10 11 12 D if (∃C )C0 ⊂ C ∈ C(D), (i−1) for {i|Vi ∈ C }, set κi = κi ∪ vD \C . Q (i−1) mz return TR (y, x ∩ C , i|Vi ∈C P(Vi |VD ∩ C , κi ), I, S, W, C ). else, if I`= ∅, for i = 0, ..., |D|, ´ if (Si ⊥ Y | X)D(i) ∧ (Zi ∩ X = ∅) , Ei = TRmz (y, x \ zi , P, Zi ∩ X, i, W, D \ {Zi ∩ X}). ⊥ PX (j) if |E| > 0, return |E| wi Ei . i=1 else, FAIL(D, C0 ). Figure 3: Modified version of identification algorithm capable of recognizing mz-transportability. our algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for identification of causal effects [22, 23] (and a version of the identification algorithm called ID). The rationale behind TRmz is to apply Tian’s factorization and decompose the target relation into smaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression can be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the experiments available from the target and, if possible, from the sources; this essentially implements the declarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm. ∗ Theorem 3 (soundness). Whenever TRmz returns an expression for Px (y), it is correct. In the sequel, we demonstrate how the algorithm works through the mz-transportability of Q = P ∗ (y|do(x)) in Fig. 2(c,d) with Z∗ = {Z1 }, Za = {Z2 }, and Zb = {Z1 }. Since (V \ X) \ An(Y)DX = {Z2 }, TRmz invokes line 3 with {Z2 } ∪ {X} as interventional set. The new call triggers line 4 and C(D \ {X, Z2 }) = {C0 , C1 , C2 , C3 }, where C0 = DZ1 , C1 = DZ3 , C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ∗ ∗ ∗ ually Q0 = Px,z2 ,z3 ,u,w,y (z1 ), Q1 = Px,z1 ,z2 ,u,w,y (z3 ), Q2 = Px,z1 ,z2 ,z3 ,w,y (u), and Q3 = ∗ Px,z1 ,z2 ,z3 ,u (w, y). Thus the original problem reduces to try to evaluate the equivalent expression ∗ ∗ ∗ ∗ z1 ,z3 ,u,w Px,z2 ,z3 ,u,w,y (z1 )Px,z1 ,z2 ,u,w,y (z3 ) Px,z1 ,z2 ,z3 ,w,y (u)Px,z1 ,z2 ,z3 ,u (w, y). First, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored ∗ since they are not ancestors of {Z1 }, which implies after line 1 that Px,z2 ,z3 ,u,w,y (z1 ) = P ∗ (z1 ). Second, TRmz evaluates the expression Q1 triggering line 2, which implies that ∗ ∗ Px,z1 ,z2 ,u,w,y (z3 ) = Px,z1 ,z2 (z3 ) with induced subgraph D1 = DX,Z1 ,Z2 ,Z3 . TRmz goes to line 5, in which in the local call C(D \ {X, Z1 , Z2 }) = {DZ3 }. Thus it proceeds to line 6 testing whether C(D \ {X, Z1 , Z2 }) is different from D1 , which is false. In this call, ordinary identifiability would fail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can help for computing Q1 . In this case, πa fails immediately the test in line 10, but πb and π ∗ succeed, (i) which means experiments in these domains may eventually help; the new call is Px,z2 (z3 )D\Z1 , for mz i = {b, ∗} with induced graph D1 = DX,Z2 ,Z3 . Finally, TR triggers line 8 since X is not part of Z3 ’s components in D1 (or, Z3 ∈ C = {Z2 Z3 }), so line 2 is triggered since Z2 is no longer an ancestor of Z3 in D1 , and then line 1 is triggered since the interventional set is empty in (i) (i) ∗ this local call, so Px,z1 ,z2 (z3 ) = Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ), for i = {b, ∗}. 2 7 ∗ Third, evaluating the expression Q2 , TRmz goes to line 2, which implies that Px,z1 ,z2 ,z3 ,w,y (u) = ∗ mz Px,z1 ,z2 ,z3 ,w (u) with induced subgraph D2 = DX,Z1 ,Z2 ,Z3 ,W,U . TR goes to line 5, and in this local call C(D \ {X, Z1 , Z2 , Z3 , W }) = {DU }, and the test in 6 succeed, since there are more components in D. So, it triggers line 8 since W is not part of U ’s component ∗ ∗ in D2 . The algorithm makes Px,z1 ,z2 ,z3 ,w (u) = Px,z1 ,z2 ,z3 (u)D2 |W (and update the working distribution); note that in this call, ordinary identifiability would fail since the nodes are in the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying to find experiments that can help in Q2 ’s computation. In this case, πb cannot help but πa (a) and π ∗ perhaps can, noting that new calls are launched for computing Px,z1 ,z3 (u)D2 \Z2 |W rel∗ ative to πa , and Px,z2 ,z3 (u)D2 \Z1 |W relative to π ∗ with the corresponding data structures set. (a) (a) In πa , the algorithm triggers line 7, which yields Px,z1 ,z3 (u)D2 \Z2 |W = Pz2 (u|w, z3 , x, z1 ), ∗ and a bit more involved analysis for πb yields (after simplification) Px,z2 ,z3 (u)D2 \Z1 |W = ∗ ∗ ∗ ∗ ∗ Z Pz1 (z3 |x, Z2 )Pz1 (Z2 ) . Z Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / 2 2 mz Fourth, TR evaluates the expression Q3 and triggers line 5, C(D\{X, Z1 , Z2 , Z3 , U }) = DW,Y . ∗ In turn, both tests at lines 6 and 7 succeed, which makes the procedure to return Px,z1 ,z2 ,z3 ,u (w, y) = ∗ ∗ P (w|z3 , x, z1 , z2 )P (y|w, x, z1 , z2 , z3 , u). The composition of the return of these calls generates the following expression: ! ∗ Px (y) = X ∗ P (z1 ) z1 ,z3 ,w,u (2) w1 (1) w1 X ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) X (b) (b) Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Z2 Z2 „X + (1) w2 « « „X ∗ ∗ ∗ ∗ ∗ Pz1 (z3 |x, Z2 )Pz1 (Z2 ) Pz1 (u|w, z3 , x, Z2 )Pz1 (z3 |x, Z2 )Pz1 (Z2 ) / Z2 Z2 ! (2) (a) + w2 Pz2 (u|w, z3 , x, z1 ) P ∗ (w|x, z1 , z2 , z3 ) P ∗ (y|x, z1 , z2 , z3 , w, u) (15) (k) where wi represents the weight for each factor in estimand k (i = 1, ..., nk ), and nk is the number of feasible estimands of k. Eq. (15) depicts a powerful way to estimate P ∗ (y|do(x)) in the target domain, and depending on weighting choice a different estimand will be entailed. For instance, one might use an analogous to inverse-variance weighting, which sets the weights for the normalized (k) nk −2 −2 2 inverse of their variances (i.e., wi = σi / j=1 σj , where σj is the variance of the jth component of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter usually disregards the intricacies of the relationships between variables, so producing a statistically less powerful estimand. Our method leverages this non-trivial and highly structured relationships, as exemplified in Eq. (15), which yields an estimand with less variance and statistically more powerful. 5 Conclusions In this paper, we treat a special type of transportability in which experiments can be conducted only over limited sets of variables in the sources and target domains, and the goal is to infer whether a certain effect can be estimated in the target using the information scattered throughout the domains. We provide a general sufficient graphical conditions for transportability based on the causal calculus along with a necessary condition for a specific scenario, which should be generalized for arbitrary structures. We further provide a procedure for computing transportability, that is, generate a formula for fusing the available observational and experimental data to synthesize an estimate of the desired causal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard statistical procedures and leads to the construction of statistically more powerful estimands. Acknowledgment The work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack Lee and Vasant Honavar was partially completed while they were with the Department of Computer Science at Iowa State University. The work of Vasant Honavar while working at the National Science Foundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by the grant from NSF (IIS-0711356). Any opinions, findings, and conclusions contained in this article are those of the authors and do not necessarily reflect the views of the sponsors. 8 References [1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In W. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Artificial Intelligence, pages 247–254. AAAI Press, Menlo Park, CA, 2011. [2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth Publishing, Chicago, 1963. [3] C. Manski. Identification for Prediction and Decision. Harvard University Press, Cambridge, Massachusetts, 2007. [4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985. [5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton-Mifflin, Boston, second edition, 2002. [6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007. [7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669–710, 1995. [8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2000. [9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009. [10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009. [11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and B. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence, pages 698–704. AAAI Press, Menlo Park, CA, 2012. [12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 95–101, Menlo Park, CA, 2013. AAAI Press. [13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: ztransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), pages 361–370. AUAI Press, 2013. [14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135–143. JMLR W&CP; 31, 2013. [15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments. In M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence, pages 583–590, Menlo Park, CA, 2013. AAAI Press. [16] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101–126, 2006. [17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela, M. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages 3–28. MIT Press, Cambridge, MA, 2009. [18] B. Sch¨ lkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal o learning. In J Langford and J Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress. [19] K. Zhang, B. Sch¨ lkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional o shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP; volume 28, 2013. [20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. In N. Freitas and K. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI), pages 113–120. AUAI Press, 2012. [21] M. Kuroki and M. Miyakawa. Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society, 29:105–117, 1999. [22] J. Tian and J. Pearl. A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pages 567–573. AAAI Press/The MIT Press, Menlo Park, CA, 2002. [23] I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219–1226. AAAI Press, Menlo Park, CA, 2006. 9
2 0.62793314 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch
Abstract: In this paper we introduce a novel method that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. Our method can be regarded as a natural extension of the one-class SVM (OCSVM) algorithm that finds multiple parallel separating hyperplanes in a reproducing kernel Hilbert space. We call our method q-OCSVM, as it can be used to estimate q quantiles of a highdimensional distribution. For this purpose, we introduce a new global convex optimization program that finds all estimated sets at once and show that it can be solved efficiently. We prove the correctness of our method and present empirical results that demonstrate its superiority over existing methods. 1
3 0.53609771 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
Author: Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers
Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1
4 0.53530121 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin
Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1
5 0.5325678 121 nips-2013-Firing rate predictions in optimal balanced networks
Author: David G. Barrett, Sophie Denève, Christian K. Machens
Abstract: How are firing rates in a spiking network related to neural input, connectivity and network function? This is an important problem because firing rates are a key measure of network activity, in both the study of neural computation and neural network dynamics. However, it is a difficult problem, because the spiking mechanism of individual neurons is highly non-linear, and these individual neurons interact strongly through connectivity. We develop a new technique for calculating firing rates in optimal balanced networks. These are particularly interesting networks because they provide an optimal spike-based signal representation while producing cortex-like spiking activity through a dynamic balance of excitation and inhibition. We can calculate firing rates by treating balanced network dynamics as an algorithm for optimising signal representation. We identify this algorithm and then calculate firing rates by finding the solution to the algorithm. Our firing rate calculation relates network firing rates directly to network input, connectivity and function. This allows us to explain the function and underlying mechanism of tuning curves in a variety of systems. 1
6 0.53218967 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
7 0.53175384 69 nips-2013-Context-sensitive active sensing in humans
8 0.53161937 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
9 0.53142905 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
10 0.5309726 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
11 0.52991301 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
12 0.52973276 99 nips-2013-Dropout Training as Adaptive Regularization
13 0.52894032 5 nips-2013-A Deep Architecture for Matching Short Texts
15 0.52807724 70 nips-2013-Contrastive Learning Using Spectral Methods
16 0.52745873 249 nips-2013-Polar Operators for Structured Sparse Estimation
17 0.52712184 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
18 0.5270502 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
19 0.52675712 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
20 0.52586943 280 nips-2013-Robust Data-Driven Dynamic Programming