jmlr jmlr2012 jmlr2012-56 knowledge-graph by maker-knowledge-mining

56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Source: pdf

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identiﬁes linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufﬁcient conditions for identiﬁability, a search algorithm that is complete, and a discussion of what can be done when the identiﬁability conditions are not satisﬁed. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure ﬁrst presented by Eberhardt et al. (2010) and Hyttinen et al. (2010). Keywords: causality, graphical models, randomized experiments, structural equation models, latent variables, latent confounders, cycles

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 For instance, some causal discovery methods require assuming that the causal structure is acyclic (has no directed cycles), while others require causal sufﬁciency, that is, that there are no unmeasured common causes affecting the measured variables. [sent-39, score-0.475]

2 While for certain kinds of experimental data it is easy to identify the full causal structure, we show that signiﬁcant savings either in the number of experiments or in the number of randomized variables per experiment can be achieved. [sent-69, score-0.412]

3 , n) is determined by a linear combination of the values of its causal parents xi ∈ pa(x j ) and an additive disturbance (‘noise’) term e j : x j := ∑ b ji xi + e j . [sent-102, score-0.339]

4 xi ∈pa(x j ) Representing all the observed variables as a vector x and the corresponding disturbances as a vector e, these structural equations can be represented by a single matrix equation x := Bx + e, (1) where B is the (n × n)-matrix of coefﬁcients b ji . [sent-103, score-0.35]

5 A graphical representation of such a causal model is given by representing any non-zero causal effect b ji by an edge xi → x j in the corresponding graph. [sent-104, score-0.406]

6 Typically, a cyclic model is used to represent a causal process that is collapsed over the time dimension and where it is assumed that the data sample is taken after the causal process has ‘settled down’. [sent-122, score-0.396]

7 , 2000; Pearl, 2000), we consider in this paper randomized “surgical” interventions that break all incoming causal inﬂuences to the intervened variables by setting the intervened variables to values determined by an exogenous intervention distribution with mean µk and covariance cov(c) = Σk . [sent-162, score-0.741]

8 (4) For an intervened variable x j ∈ Jk , the manipulated model in Equation 4 replaces the original equation x j := ∑i∈pa( j) b ji xi + e j with the equation x j := c j , while the equations for passively observed variables xu ∈ Uk remain unchanged. [sent-172, score-0.755]

9 Deﬁnition 3 (Asymptotic Stability) A linear cyclic model with latent variables (B, Σe ) is asymptotically stable if and only if for every possible experiment Ek = (Jk , Uk ), the eigenvalues λi of the matrix Uk B satisfy ∀i : |λi | < 1. [sent-179, score-0.41]

10 This notational simpliﬁcation makes the partition into intervened and passively observed variables the only parameter specifying an experiment, and allows us to derive the theory purely in terms of the covariance matrices Ck of an experiment. [sent-189, score-0.358]

11 (8) We can now focus on analyzing the covariance matrix obtained from a canonical experiment Ek = (Jk , Uk ) on a canonical model (B, Σe ). [sent-205, score-0.33]

12 The upper left hand block is the identity matrix I, since in a canonical experiment the intervened variables are randomized independently with unit variance. [sent-208, score-0.384]

13 The lower left hand block Tk consists of covariances that represent the so-called experimental effects of x the intervened xi ∈ Jk on the passively observed xu ∈ Uk . [sent-211, score-0.726]

14 An experimental effect t(xi xu ||Jk ) is the overall causal effect of a variable xi on a variable xu in the experiment Ek = (Jk , Uk ); it corresponds to the coefﬁcient of xi when xu is regressed on the set of intervened variables in this experiment. [sent-212, score-1.337]

15 If only variable xi is intervened on in the experiment, then the experimental effect t(xi xu ||{xi }) is standardly called the total effect and denoted simply as t(xi xu ). [sent-213, score-0.744]

16 If all observed variables except for xu are intervened on, then an experimental effect is called a direct effect: t(xi xu ||V \ {xu }) = b(xi → xu ) = (B)ui = bui . [sent-214, score-1.072]

17 In our case, these trekrules imply that the experimental effect t(xi xu ||Jk ) can be expressed as the sum of contributions by all directed paths starting at xi and ending in xu in the manipulated graph, denoted by the set P (xi xu ||Jk ). [sent-217, score-0.934]

18 Any directed paths through ˜ marginalized variables are transformed into directed edges in B, and any confounding effect of the ˜ marginalized variables is integrated into the covariance matrix Σe of the disturbances. [sent-248, score-0.479]

19 We term this assumption weak stability: Deﬁnition 6 (Weak Stability) A linear cyclic causal model with latent variables (B, Σe ) is weakly stable if and only if for every experiment Ek = (Jk , Uk ), the matrix I − Uk B is invertible. [sent-267, score-0.585]

20 The interpretation of any learned weakly stable model (B, Σe ) is then only that the distribution over the observed variables produced at equilibrium by the true underlying asymptotically stable model has mean and covariance as described by Equations 7 and 8. [sent-276, score-0.47]

21 4 In the following two Lemmas, we give the details of how the canonical model over the observed variables is related to the original linear cyclic model in the case of hidden variables and self-cycles (respectively). [sent-278, score-0.329]

22 3397 H YTTINEN , E BERHARDT AND H OYER ˜ ˜ ˜ ized model (B, Σe ) over variables V = V \ M deﬁned by ˜ B = BV V + BV M (I − BM M )−1 BM V , ˜ ˜ ˜ ˜ ˜ e = (I − B) (I − B)−1 Σe (I − B)−T ˜ ˜ (I − B)T ˜ ˜ Σ VV is also a weakly stable linear cyclic causal model with latent variables. [sent-286, score-0.488]

23 The marginalized covariance matrix of the original model and the covariance matrix of the marginalized model are equal in ˜ any experiments where any subset of the variables in V are intervened on. [sent-287, score-0.626]

24 First, the coefﬁcient matrix B ˜ in the of the marginalized model is given by the existing coefﬁcients between the variables in V ˜ original model plus any paths in the original model from variables in V through variables in M ˜ ˜ and back to variables in V . [sent-289, score-0.417]

25 For a weakly stable model (B, Σe ) containing a self-loop for variable xi with coefﬁcient bii , we can deﬁne a model without that self-loop given by bii Ui (I − B), 1 − bii bii bii Ui )Σe (I + Ui )T . [sent-296, score-0.933]

26 = (I + 1 − bii 1 − bii ˜ B = B− ˜ Σe ˜ ˜ The resulting model (B, Σe ) is also weakly stable and yields the same observations at equilibrium in all experiments. [sent-297, score-0.501]

27 Note that Equation 12 relates the experimental effects of intervening on {x1 , x2 } to the experimental effects of intervening on {x1 , x2 , x4 }. [sent-327, score-0.522]

28 Thus, if we had a set of experiments that allowed us to infer all the experimental effects of all the experiments that intervene on all but one variable, then we would have determined all the direct effects and would thereby have identiﬁed the B-matrix. [sent-332, score-0.393]

29 On the other hand, Equation 12 shows how the measured experimental effects can be used to construct 3399 H YTTINEN , E BERHARDT AND H OYER linear constraints on the (unknown) direct effects b ji . [sent-333, score-0.377]

30 The example in Equation 12 can be generalized in the following way: As stated earlier, for an asymptotically stable model, the experimental effect t(xi xu ||Jk ) of xi ∈ Jk on xu ∈ Uk in experiment Ek = (Jk , Uk ) is the sum-product of coefﬁcients on all directed paths from xi to xu . [sent-339, score-1.135]

31 The sum-product of all those paths is equal to the experimental effect t(xi xu ||Jk ∪ {x j }), since all paths through x j are intercepted by additionally intervening on x j . [sent-342, score-0.44]

32 Second, the remaining paths are all of the form xi x j xu , where x j is the ˜ ˜ last occurrence of x j on the path (recall that paths may contain cycles, so there may be multiple occurrences of x j on the path). [sent-343, score-0.365]

33 The sum-product of coefﬁcients on all subpaths xi x j is given ˜ by t(xi x j || Jk ) and the sum-product of coefﬁcients on all subpaths x j xu is t(x j xu || Jk ∪ {x j }). [sent-344, score-0.547]

34 ˜ Taking all combinations of subpaths xi x j and x j xu , we obtain the contribution of all the paths ˜ ˜ through x j as the product t(xi x j || Jk )t(x j xu || Jk ∪ {x j }). [sent-345, score-0.579]

35 We thus obtain t(xi xu || Jk ) = t(xi xu || Jk ∪ {x j }) + t(xi x j || Jk )t(x j xu || Jk ∪ {x j }). [sent-346, score-0.738]

36 Now, if the matrix on the left is invertible, we can directly solve for the experimental effects of the third experiment just from the experimental effects in the ﬁrst two. [sent-354, score-0.506]

37 / Since there are no experimental effects in experiments intervening on 0 or V , the experimental effects are considered to be determined trivially in those cases. [sent-358, score-0.489]

38 x In a canonical model the coefﬁcients b(• → xu ) on the arcs into variable xu (the direct effects of the other variables on that variable) are equal to the experimental effects when intervening on everything except xu , that is, b(• → xu ) = t(• xu ||V \ {xu }). [sent-361, score-1.752]

39 ,K satisﬁes the pair condition for an ordered pair of variables (xi , xu ) ∈ V × V (with xi = xu ) whenever there is an experiment Ek = (Jk , Uk ) in {Ek }k=1,. [sent-371, score-0.864]

40 ,K such that xi ∈ Jk (xi is intervened on) and xu ∈ Uk (xu is passively observed). [sent-374, score-0.489]

41 From a set of experiments satisfying the pair condition for all ordered pairs, we can ﬁnd for all xi = xu an experiment satisfying the pair condition for the ˜ ˜ ˜ pair (xi , xu ). [sent-377, score-0.925]

42 Now, by iteratively ˜ ˜ ˜ using Lemma 9, we can determine the experimental effects in the union experiment E∪ = (J∪ , U∪ ) ˜ i }i=u , where variables in set J∪ = i=u Ji are intervened on. [sent-379, score-0.481]

43 Variable xu was passively observed in each experiment, thus xu ∈ J∪ . [sent-381, score-0.569]

44 The experimental effects of this union experiment intervening on / ˜ ˜∪ = V \ {xu } are thus the direct effects b(• → xu ). [sent-382, score-0.733]

45 ,K a weakly stable canonical model (B, Σe ) over the variables V is identiﬁable if the set of experiments satisﬁes the pair condition for each ordered pair of variables (xi , x j ) ∈ V × V (with xi = x j ) and the covariance condition for each unordered pair of variables {xi , x j } ⊆ V . [sent-401, score-0.779]

46 Then a model with coefﬁcient ˜ matrix B deﬁned by ˜ BK V = BK V , ˜ BLL = 0 bi j b ji + δ 0 , ˜ ˜ BLK = (I − BLL )(I − BLL )−1 BLK will produce the same experimental effects as B for any experiment that does not satisfy the pair ˜ condition for the pair (xi , x j ). [sent-431, score-0.524]

47 As in our example, it is generally the ˜ case that for δ = 0 the models B and B will produce different experimental effects in any experiment that satisﬁes the pair condition for the pair (xi , x j ). [sent-434, score-0.428]

48 To see the effect of the perturbation more clearly, we can write it explicitly as follows: ˜ ∀l = j, ∀k : blk = blk , ˜ b ji = b ji + δ, ˜ b j j = 0, bik + bi j b jk ˜ ∀k ∈ {i, j} : b jk = b jk − δ / . [sent-436, score-1.717]

49 If the pair condition is not satisﬁed for several pairs, then Lemma 13 can be applied iteratively for each missing pair to arrive at a model with different coefﬁcients, that produces the same experimental effects as the original for all experiments not satisfying the pairs in question. [sent-442, score-0.413]

50 However, the following lemma shows that the covariance matrix of disturbances can always be perturbed such that the two models become completely indistinguishable for any experiment that does not satisfy the pair condition for some pair (xi , x j ), as was the case in Figure 6. [sent-445, score-0.421]

51 ,K and all xi ∈ Jk and x j ∈ Uk it produces the ˜ ˜ ˜ ˜ same experimental effects t(xi x j || Jk ), then the model (B, Σe ) with Σe = (I − B)(I − B)−1 Σe (I − −T (I − B)T produces data covariance matrices Ck = Ck for all k = 1, . [sent-454, score-0.359]

52 ,K over the variables in V , all coefﬁcients b(xi → x j ) of a weakly stable canonical model are identiﬁed if and only if the pair condition is satisﬁed for all ordered pairs of variables with respect to these experiments. [sent-466, score-0.429]

53 ,K satisﬁes the pair condition for all ordered pairs (xi , x j ) ∈ V × V (such that xi = x j ) and the covariance condition for all unordered pairs {xi , x j } ⊆ V . [sent-478, score-0.335]

54 ) Since the covariance matrix Ck of an experiment Ek contains the experimental effects for all x pairs (xi , x j ) with xi ∈ Jk and x j ∈ Uk , each experiment generates mk = |Jk | × |Uk | constraints of the form of Equation 16. [sent-497, score-0.576]

55 ) We thus have a matrix equation T b = t, (17) where T is a ((∑K mk ) × (n2 − n))-matrix of (measured) experimental effects, b is the (n2 − n)k=1 vector of unknown b ji and t is a (∑K mk )-ary vector corresponding to the (measured) experimental k=1 effects on the left-hand side of Equation 16. [sent-503, score-0.361]

56 , Equation 16) only includes unknowns of the type bu• , corresponding to edge-coefﬁcients for edges into some node xu ∈ Uk , we can rearrange the equations such that the system of equations can be presented in the following form      T11 b1 t1   b2    t2  T22      (18)  . [sent-507, score-0.347]

57 Instead of solving the equation system in Equation 17 with (n2 − n) unknowns, Equation 18 allows us to separate the system into n blocks each constraining direct effects bu• into a different xu . [sent-517, score-0.406]

58 For example, in the case of the experiment intervening on Jk = {x1 , x2 } of the 4-variable model in Figure 3, we obtain the following experimental covariance matrix:   1 0 t(x1 x3 ||{x1 , x2 }) t(x1 x4 ||{x1 , x2 })  0 1 t(x2 x3 ||{x1 , x2 }) t(x2 x4 ||{x1 , x2 })  . [sent-520, score-0.335]

59 When the covariance condition is not satisﬁed for a particular pair, then the covariance of the disturbances for that pair remains undeﬁned. [sent-537, score-0.338]

60 One can take a more conservative approach and treat any b jk as undetermined for all k whenever there exists an i such that the pair condition is not fulﬁlled for the ordered pair (xi , x j ). [sent-560, score-0.679]

61 Similarly, the ﬁfth step of the algorithm implements a conservative condition for the identiﬁability of the covariance matrix: covariance σi j can be treated as determined if the covariance condition is satisﬁed for the pair {xi , x j } and the direct effects B{xi ,x j },V are determined. [sent-562, score-0.514]

62 ,K , determine which ordered pairs of variables satisfy the pair condition and which pairs of variables satisfy the covariance condition. [sent-582, score-0.365]

63 x (b) From the estimated covariance matrix, extract the experimental effects t(xi xu ||Jk ) for all (xi , xu ) ∈ Jk × Uk . [sent-585, score-0.775]

64 (c) For each pair (xi , xu ) ∈ Jk × Uk add an equation bui + ∑ t(xi x j ||Jk )bu j = t(xi xu ||Jk ) x j ∈Uk \{xu } into the system Tb = t. [sent-586, score-0.631]

65 Output the estimated model (B, Σe ), a list of ordered pairs of variables for which the pair condition is not satisﬁed, and a list of pairs of variables for which the covariance condition is not satisﬁed. [sent-593, score-0.417]

66 experimental effects or the entire covariance matrix for union- or intersection7 experiments of the available experiments even if the set of experiments does not satisfy the identiﬁability conditions. [sent-594, score-0.408]

67 , the set of four experiments on six variables above versus a set of six experiments each intervening on a single variable), the sequence of experiments intervening on multiple variables simultaneously will provide a better estimate of the underlying model even if the total sample size is the same. [sent-632, score-0.373]

68 linear acyclic models without latent variables, linear cyclic models without latent variables, linear acyclic models with latent variables, linear cyclic models with latent variables, and non-linear acyclic models without latent variables. [sent-666, score-0.517]

69 We can estimate the values of the variables xu such that u = i, j using the interpretation of the experimental effects as regression coefﬁcients: i, xu j,ko = t(xi xu ||{xi , x j }) · xii,ko + t(x j j,ko xu ||{xi , x j }) · x j . [sent-810, score-1.236]

70 Following Lauritzen and Richardson (2002) we refer to this most common interpretation of cyclic models as the deterministic equilibrium interpretation, since the value of the observed variables x at equilibrium is a deterministic function of the disturbances e. [sent-829, score-0.417]

71 However, interventions needn’t be “surgical” in this sense, but could instead only add an additional inﬂuence to the intervened variable without breaking the relations between the intervened variable and its causal parents. [sent-855, score-0.464]

72 Assuming that the inﬂuence of the soft interventions on the intervened variables is known, that is, that c is measured, and that multiple simultaneous soft interventions are performed independently, it can be shown that one can still determine the experimental effects of the intervened variables. [sent-860, score-0.634]

73 Lastly, it is worth noting that the LLC-Algorithm presented here uses the measured experimental effects t(xi xu ||J ) to linearly constrain the unknown direct effects b ji of B. [sent-865, score-0.623]

74 There may be circumstances in which it might be beneﬁcial to instead use the experimental effects to linearly constrain the total effects t(xi xu ). [sent-866, score-0.571]

75 Given an experiment Ek = (Jk , Uk ), the linear constraint of the measured experimental effects on the unknown total effects t(xi xu ) is then given by t(xi xu ) = t(xi xu ||Jk ) + ∑ t(xi x j )t(x j xu ||Jk ). [sent-869, score-1.404]

76 Recall that the total effect corresponds to the experimental effect in the single-intervention experiment where only the cause is subject to intervention, that is, t(xi xu ) = t(xi xu ||{xi }). [sent-875, score-0.65]

77 The ¯x x x x centering of Equation 27 implies that instead of randomizing the intervened variables in Jk with mean (µk )Jk and covariance (Σk )Jk Jk , the centered variables are considered to be randomized with c c ¯c mean (µk )Jk = (µk − µ0 )Jk and covariance (Σk )Jk Jk = (Σk )Jk Jk . [sent-911, score-0.451]

78 The theory in the paper can be used to estimate the direct effects matrix B and covariance matrix Σe , as the data covariance matrices are independent of the mean of the disturbances. [sent-917, score-0.355]

79 1 Weak Stability ˜ ˜ We show that if the full model (B, Σe ) is weakly stable then the marginalized model (B, Σe ) is also ˜ ˜ Σe ) is weakly unstable, thus there exists an weakly stable. [sent-946, score-0.333]

80 V Jk V Jk (28) (29) (30) (31) The goal is to derive Equation 31, which means that both models produce the same experimental ˜ ˜ effects from xi ∈ Jk to xu ∈ Uk . [sent-957, score-0.495]

81 If xi ∈ Jk we have that Uk Ui = 0n×n , then ˜ ˜ Uk B = Uk (I − bii Ui )B + bii Uk Ui = Uk B ˜ and Uk Bv = Uk Bv = v. [sent-983, score-0.347]

82 Alternatively if xi ∈ Uk , we have that Uk Ui = Ui , then ˜ Uk Bv = Uk (I − bii Ui )Bv + bii Uk Ui v ||Multiplication of diagonal matrices commutes ˜ = (I − bii Ui )Uk Bv + bii Uk Ui v = (I − bii Ui )v + bii Ui v = v. [sent-984, score-0.931]

83 First, if variable xi ∈ Jk , then Uk Ui = 0n×n , Uk B = Uk B (as shown above) and ˜ Uk Σe Uk = Uk (I + bii bii Ui )Σe (I + Ui )T Uk = Uk Σe Uk . [sent-989, score-0.347]

84 1 − bii 1 − bii 3429 H YTTINEN , E BERHARDT AND H OYER The covariance matrices are trivially equal: ˜x ˜ ˜ ˜ Ck = (I − Uk B)−1 (Jk + Uk Σe Uk )(I − Uk B)−T = (I − Uk B)−1 (Jk + Uk Σe Uk )(I − Uk B)−T = Ck . [sent-990, score-0.381]

85 commutes 1 − bii 1 − bii bii bii ˜ Ui )Σe (I + Ui )T Uk )(I − Uk B)−T +Uk (I + 1 − bii 1 − bii bii bii ˜ ˜ Ui )(Jk + Uk Σe Uk )(I + Ui )T (I − Uk B)−T ||id. [sent-994, score-1.168]

86 Derivation of Equation 13 Lemma 7 (Marginalization) showed that weak stability and experimental effects from an intervened variable xi ∈ Jk to an observed variable xu ∈ Uk are preserved (as part of the covariance matrix) when some variables in Uk are marginalized. [sent-997, score-0.83]

87 Then, it is sufﬁcient to show that Equation 13 applies in a weakly stable model where variables Uk \ {x j , xu } are marginalized. [sent-998, score-0.431]

88 Examine experiment Ek = (Jk , Uk ) where Uk = {x j , xu } in the marginalized model (B, Σe ). [sent-1000, score-0.429]

89 The experimental effects in the experiment intervening on Jk ∪ {x j } are just the direct effects t(xi xu ||Jk ∪ {x j }) = bui and t(x j xu ||Jk ∪ {x j }) = bu j . [sent-1001, score-1.094]

90 1 Generalizations of Equation 13 Equation 13 can be generalized to relate some experimental effects in Ek = (Jk , Uk ) to some experimental effects in Ek∪l = (Jk ∪ Jl , Uk ∩ Ul ) by applying Equation 13 iteratively: t(xi xu ||Jk ) = t(xi ∑ xu ||Jk ∪ Jl ) + t(xi x j ||Jk )t(x j xu ||Jk ∪ Jl ). [sent-1007, score-1.126]

91 Another way of writing the generalization relates some experimental effects in Ek = (Jk , Uk ) to experimental effects in Ek∩l = (Jk ∩ Jl , Uk ∪ Ul ): t(xi xu ||Jk ∩ Jl ) = t(xi xu ||Jk ) + ∑ t(xi x j ||Jk ∩ Jl )t(x j xu ||Jk ). [sent-1009, score-1.126]

92 For each pair (xk , xu ) with xk ∈ K and xu ∈ O we can form an equation of the form of Equation 34 using experimental effects from experiment Ek : t(xk xu ||Jk ∪ Jl ) + ∑ t(xk x j ||Jk )t(x j xu ||Jk ∪ Jl ) = t(xk xu ||Jk ). [sent-1013, score-1.602]

93 x Similarly, equations can be formed for all pairs (xk , xu ) with xk ∈ L and xu ∈ O using experimental effects from experiment El . [sent-1015, score-0.841]

94 For pairs (xk , xu ) with xk ∈ I and xu ∈ O , equations could be formed using the experimental effects from either experiments, but it turns out that only equations using the experimental effects of experiment Ek are needed. [sent-1016, score-1.07]

95 Earlier we showed that experimental effects are equal when x j is intervened on, this holds in particular for experiment (Jk ∪ L , Uk \ L ). [sent-1047, score-0.423]

96 By Lemma 9 (Union/Intersection Experiment) the effects of an intersection experiment Ek are deﬁned by the experimental effects of the two original experiments, so the experimental effects must be equal in experiment Ek . [sent-1048, score-0.709]

97 Say we have conducted experiment Ek observing covariance matrix Ck and experiment El observing covariance matrix Cl . [sent-1067, score-0.414]

98 x To predict the whole covariance matrix in the intersection experiment, we need the passive observational data covariance matrix C0 in addition to the observations in experiments Ek and El . [sent-1074, score-0.34]

99 In an arbitrary experiment Ek , equations for all pairs (xi , xu ) with xi ∈ Jk and xu ∈ Uk , can be represented neatly in matrix notation: B{xu }Jk + B{xu }(Uk \{xu }) (Tk )(Uk \{xu })Jk x (B{xu }Jk )T + ((Tk )(Uk \{xu })Jk )T (B{xu }(Uk \{xu }) )T x = (Tk ){xu }Jk x ⇔ = ((Tk ){xu }Jk )T . [sent-1085, score-0.725]

100 As we considered arbitrary xu ∈ O , the same procedure can be repeated for each xu ∈ O . [sent-1099, score-0.492]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jk', 0.515), ('uk', 0.488), ('xu', 0.246), ('buk', 0.22), ('ek', 0.152), ('tk', 0.148), ('bii', 0.146), ('causal', 0.139), ('intervened', 0.134), ('effects', 0.131), ('ck', 0.113), ('bm', 0.108), ('berhardt', 0.097), ('oyer', 0.097), ('yclic', 0.097), ('yttinen', 0.097), ('cyclic', 0.097), ('experiment', 0.095), ('llc', 0.089), ('covariance', 0.089), ('atent', 0.083), ('equilibrium', 0.082), ('disturbances', 0.075), ('ausal', 0.069), ('dream', 0.067), ('intervening', 0.067), ('marginalized', 0.067), ('experimental', 0.063), ('weakly', 0.059), ('bu', 0.059), ('variables', 0.058), ('jl', 0.058), ('interventions', 0.057), ('ul', 0.057), ('bll', 0.056), ('bui', 0.056), ('xi', 0.055), ('pair', 0.054), ('passively', 0.054), ('ji', 0.052), ('canonical', 0.051), ('eberhardt', 0.049), ('intervention', 0.049), ('inear', 0.048), ('odels', 0.048), ('stable', 0.047), ('ui', 0.046), ('latent', 0.046), ('ok', 0.045), ('passive', 0.044), ('bv', 0.043), ('dcg', 0.041), ('gh', 0.041), ('observational', 0.038), ('disturbance', 0.038), ('ju', 0.038), ('equations', 0.035), ('hyttinen', 0.035), ('experiments', 0.034), ('blk', 0.034), ('cycles', 0.033), ('identi', 0.033), ('ol', 0.032), ('earning', 0.032), ('oi', 0.032), ('paths', 0.032), ('condition', 0.031), ('stability', 0.031), ('acyclic', 0.031), ('edges', 0.031), ('coef', 0.03), ('tl', 0.029), ('equation', 0.029), ('directed', 0.027), ('underdetermination', 0.026), ('pairs', 0.025), ('cients', 0.025), ('ordered', 0.025), ('steady', 0.025), ('bk', 0.024), ('matrix', 0.023), ('asymptotically', 0.023), ('randomized', 0.023), ('ff', 0.023), ('observed', 0.023), ('knocked', 0.022), ('ability', 0.022), ('invertible', 0.022), ('el', 0.022), ('model', 0.021), ('faithfulness', 0.021), ('teams', 0.021), ('covariances', 0.02), ('uncorrelated', 0.02), ('generating', 0.019), ('knockout', 0.019), ('cx', 0.019), ('manipulated', 0.019), ('lk', 0.019), ('equilibrating', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

2 0.092164718 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

Author: Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine, Xin Yan

Abstract: Assessing treatment effects in observational studies is a multifaceted problem that not only involves heterogeneous mechanisms of how the treatment or cause is exposed to subjects, known as propensity, but also differential causal effects across sub-populations. We introduce a concept termed the facilitating score to account for both the confounding and interacting impacts of covariates on the treatment effect. Several approaches for estimating the facilitating score are discussed. In particular, we put forward a machine learning method, called causal inference tree (CIT), to provide a piecewise constant approximation of the facilitating score. With interpretable rules, CIT splits data in such a way that both the propensity and the treatment effect become more homogeneous within each resultant partition. Causal inference at different levels can be made on the basis of CIT. Together with an aggregated grouping procedure, CIT stratiﬁes data into strata where causal effects can be conveniently assessed within each. Besides, a feasible way of predicting individual causal effects (ICE) is made available by aggregating ensemble CIT models. Both the stratiﬁed results and the estimated ICE provide an assessment of heterogeneity of causal effects and can be integrated for estimating the average causal effect (ACE). Mean square consistency of CIT is also established. We evaluate the performance of proposed methods with simulations and illustrate their use with the NSW data in Dehejia and Wahba (1999) where the objective is to assess the impact of c 2012 Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine and Xin Yan. S U , K ANG , FAN , L EVINE AND YAN a labor training program, the National Supported Work (NSW) demonstration, on post-intervention earnings. Keywords: CART, causal inference, confounding, interaction, observational study, personalized medicine, recursive partitioning

3 0.085463397 68 jmlr-2012-Minimax Manifold Estimation

Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman

Abstract: We ﬁnd the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation

4 0.084680289 24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models

Author: Roland R. Ramsahai

Abstract: Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsiﬁcation and identiﬁcation. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph

5 0.081364669 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

Author: Ioannis Tsamardinos, Sofia Triantafillou, Vincenzo Lagani

Abstract: We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the ﬁeld of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low. The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (ﬁt) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causallyinspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org. Keywords: integrative causal analysis, causal discovery, Bayesian networks, maximal ancestral graphs, structural equation models, causality, statistical matching, data fusion

6 0.075814828 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

7 0.061258055 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

8 0.053289581 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

9 0.048071764 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity

10 0.047436085 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

11 0.045138132 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

12 0.044820115 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

13 0.044426151 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

14 0.043400876 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

15 0.042794511 82 jmlr-2012-On the Necessity of Irrelevant Variables

16 0.041268248 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

17 0.039999235 34 jmlr-2012-Dynamic Policy Programming

18 0.035837658 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification

19 0.035144061 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

20 0.034720644 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.162), (1, 0.11), (2, 0.046), (3, -0.094), (4, 0.094), (5, 0.077), (6, -0.114), (7, 0.141), (8, 0.219), (9, 0.0), (10, -0.009), (11, 0.027), (12, -0.068), (13, 0.075), (14, 0.09), (15, -0.057), (16, -0.083), (17, 0.06), (18, -0.105), (19, -0.06), (20, 0.004), (21, -0.031), (22, 0.155), (23, 0.052), (24, 0.106), (25, -0.048), (26, 0.027), (27, -0.078), (28, -0.018), (29, 0.047), (30, -0.102), (31, -0.206), (32, 0.015), (33, -0.014), (34, 0.054), (35, -0.11), (36, 0.064), (37, 0.006), (38, -0.1), (39, -0.021), (40, 0.226), (41, 0.029), (42, -0.128), (43, 0.034), (44, -0.061), (45, 0.134), (46, -0.059), (47, -0.031), (48, -0.019), (49, -0.089)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97888637 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

2 0.47431433 24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models

Author: Roland R. Ramsahai

3 0.46549094 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

Author: Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine, Xin Yan

4 0.46457922 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

Author: Ioannis Tsamardinos, Sofia Triantafillou, Vincenzo Lagani

5 0.37115109 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence deﬁnes a ﬁner partitioning of DAGs than observational Markov equivalence and hence improves the identiﬁability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

6 0.36156723 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification

7 0.36012837 68 jmlr-2012-Minimax Manifold Estimation

8 0.32227755 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

9 0.31210124 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

10 0.28970423 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

11 0.25303116 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

12 0.23837988 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems

13 0.22354829 70 jmlr-2012-Multi-Assignment Clustering for Boolean Data

14 0.22035006 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

15 0.21652511 34 jmlr-2012-Dynamic Policy Programming

16 0.21414939 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

17 0.21365394 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

18 0.20985229 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

19 0.19139422 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

20 0.1885924 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.025), (26, 0.027), (29, 0.495), (49, 0.013), (56, 0.014), (60, 0.018), (75, 0.045), (77, 0.014), (79, 0.021), (81, 0.012), (92, 0.117), (96, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93265182 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

2 0.91075075 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

Author: Philipp Hennig, Christian J. Schuler

Abstract: Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation. Keywords: optimization, probability, information, Gaussian processes, expectation propagation

3 0.85621935 4 jmlr-2012-A Kernel Two-Sample Test

Author: Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola

Abstract: We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efﬁcient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the ﬁrst such tests. ∗. †. ‡. §. Also at Gatsby Computational Neuroscience Unit, CSML, 17 Queen Square, London WC1N 3AR, UK. This work was carried out while K.M.B. was with the Ludwig-Maximilians-Universit¨ t M¨ nchen. a u This work was carried out while M.J.R. was with the Graz University of Technology. Also at The Australian National University, Canberra, ACT 0200, Australia. c 2012 Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ lkopf and Alexander Smola. o ¨ G RETTON , B ORGWARDT, R ASCH , S CH OLKOPF AND S MOLA Keywords: kernel methods, two-sample test, uniform convergence bounds, schema matching, integral probability metric, hypothesis testing

4 0.56100696 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

Author: Ioannis Tsamardinos, Sofia Triantafillou, Vincenzo Lagani

5 0.5141784 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

Author: Kay H. Brodersen, Christoph Mathys, Justin R. Chumbley, Jean Daunizeau, Cheng Soon Ong, Joachim M. Buhmann, Klaas E. Stephan

Abstract: Classiﬁcation algorithms are frequently used on data with a natural hierarchical structure. For instance, classiﬁers are often trained and tested on trial-wise measurements, separately for each subject within a group. One important question is how classiﬁcation outcomes observed in individual subjects can be generalized to the population from which the group was sampled. To address this question, this paper introduces novel statistical models that are guided by three desiderata. First, all models explicitly respect the hierarchical nature of the data, that is, they are mixed-effects models that simultaneously account for within-subjects (ﬁxed-effects) and across-subjects (random-effects) variance components. Second, maximum-likelihood estimation is replaced by full Bayesian inference in order to enable natural regularization of the estimation problem and to afford conclusions in terms of posterior probability statements. Third, inference on classiﬁcation accuracy is complemented by inference on the balanced accuracy, which avoids inﬂated accuracy estimates for imbalanced data sets. We introduce hierarchical models that satisfy these criteria and demonstrate their advantages over conventional methods using MCMC implementations for model inversion and model selection on both synthetic and empirical data. We envisage that our approach will improve the sensitivity and validity of statistical inference in future hierarchical classiﬁcation studies. Keywords: beta-binomial, normal-binomial, balanced accuracy, Bayesian inference, group studies

6 0.51127374 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

7 0.48708603 109 jmlr-2012-Stability of Density-Based Clustering

8 0.48529783 100 jmlr-2012-Robust Kernel Density Estimation

9 0.48351851 82 jmlr-2012-On the Necessity of Irrelevant Variables

10 0.47550574 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

11 0.46964169 118 jmlr-2012-Variational Multinomial Logit Gaussian Process

12 0.45546302 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox

13 0.45062613 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors

14 0.4477855 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

15 0.44509706 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems

16 0.44265896 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss

17 0.42943475 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

18 0.42614913 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

19 0.4257277 104 jmlr-2012-Security Analysis of Online Centroid Anomaly Detection

20 0.42317656 80 jmlr-2012-On Ranking and Generalization Bounds