jmlr jmlr2009 jmlr2009-74 knowledge-graph by maker-knowledge-mining

74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs

Source: pdf

Author: Tyler J. VanderWeele, James M. Robins

Abstract: Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modiﬁcation. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease. Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modiﬁcation, monotonicity

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Departments of Epidemiology and Biostatistics Harvard School of Public Health Boston, MA 02115, USA Editor: Peter Spirtes Abstract Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. [sent-6, score-0.965]

2 Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modiﬁcation. [sent-8, score-0.837]

3 Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modiﬁcation, monotonicity 1. [sent-11, score-0.572]

4 Introduction Several papers have considered various monotonicity relationships on Bayesian networks or directed acyclic graphs. [sent-12, score-0.476]

5 VanderWeele and Robins (2009) introduced the concept of a monotonic effect which is closely related to Wellman’s qualitative inﬂuence and considered the relationship between monotonicity properties and causal effects, covariance, bias and confounding. [sent-17, score-0.781]

6 In this paper we develop a number of probabilistic properties concerning monotonic effects and weak monotonic effects. [sent-18, score-0.813]

7 In Section 4, we deﬁne the concepts of a monotonic effect and a weak monotonic effect in the directed acyclic graph causal framework, the latter essentially being equivalent to Wellman’s (1990) qualitative inﬂuence. [sent-27, score-1.599]

8 In Section 5, we give a number of results relating weak monotonic effects to the monotonicity in the conditioning argument of certain conditional expectations; we also return to the motivating example and show how the theory developed can be applied to this example. [sent-28, score-0.64]

9 Finally, in Section 6, we give a number of results that relate weak monotonic effects to the existence of qualitative effect modiﬁers. [sent-29, score-0.662]

10 Notation and Directed Acyclic Graphs Following Pearl (1995), a causal directed acyclic graph is a set of nodes (X1 , . [sent-32, score-0.568]

11 , Xn ) and directed edges amongst nodes such that the graph has no cycles and such that for each node Xi on the graph the corresponding variable is given by its non-parametric structural equation Xi = fi (pai , εi ) where pai are the parents of Xi on the graph and the εi are mutually independent. [sent-35, score-0.551]

12 Further discussion of the causal interpretation of directed acyclic graphs can be found elsewhere (Pearl, 1995, 2000; Spirtes et al. [sent-42, score-0.509]

13 A path is a sequence of nodes connected by edges regardless of arrowhead direction; a directed path is a path which follows the edges in the direction indicated by the graph’s arrows. [sent-44, score-0.609]

14 A node C is said to be a common cause of A and Y if there exists a directed path from C to Y not through A and a directed path from C to A not through Y . [sent-45, score-0.698]

15 A collider is a particular node on a path such that both the preceding and subsequent nodes on the path have directed edges going into that node, that is, both the edge to and the edge from that node have arrowheads into the node. [sent-50, score-0.814]

16 A path between A and B is said to be blocked given some set of variables Z if either there is a variable in Z on the path that is not a collider or if there is a collider on the path such that neither the collider itself nor any of its descendants are in Z. [sent-51, score-0.917]

17 The directed acyclic graph causal framework has proven to be particularly useful in determining whether conditioning on a given set of variables, or none at all, is sufﬁcient to control for confounding. [sent-57, score-0.603]

18 A back-door path from some node A to another node Y is a path which begins with a directed edge into A. [sent-59, score-0.548]

19 Suppose that Figure 1 represents a causal directed acyclic graph. [sent-66, score-0.491]

20 On the Deﬁnition of a Monotonic Effect The deﬁnition of a monotonic effect is given in terms of a directed acyclic graph’s nonparametric structural equations. [sent-80, score-0.817]

21 Similarly A is said to have a negative monotonic effect on Y if for all paY and εY , f (paY , A1 , εY ) ≤ f (paY , A2 , εY ) whenever A1 ≥ A2 . [sent-83, score-0.469]

22 As we have deﬁned it above, a causal direct acyclic graph corresponds to a set of non-parametric structural equations and as such the deﬁnition of a monotonic effect given above is relative to a particular set of non-parametric structural equations. [sent-84, score-0.814]

23 The presence of a monotonic effect is closely related to the monotonicity of counterfactual variables as is made clear by the following proposition. [sent-85, score-0.56]

24 In the context of characterizations of causal directed acyclic graphs that make reference to counterfactuals but not to non-parametric structural equations (Robins, 2003), a positive monotonic effect could instead be deﬁned to be present if for all paY and a1 ≥ a0 , P(Ya1 ,paY ≥ Ya0 ,paY ) = 1. [sent-93, score-1.005]

25 We thus introduce the concept of a weak monotonic effect which is a special case of Wellman’s positive qualitative inﬂuence (Wellman, 1990). [sent-98, score-0.63]

26 The definition of a weak monotonic effect does not make reference to counterfactuals and thus can be used in characterizations of causal directed acyclic graphs that do not employ the concept of counterfactuals (Spirtes et al. [sent-99, score-1.095]

27 If A has a positive monotonic effect on Y then A has a weak positive monotonic effect on Y . [sent-106, score-0.942]

28 We note that for parent A and child Y , the deﬁnition of a weak monotonic effect coincides with Wellman’s (1990) deﬁnition of positive qualitative inﬂuence when the ”context” for qualitative inﬂuence is chosen to be the parents of Y other than A. [sent-107, score-0.848]

29 A monotonic effect is a relation between two nodes on a directed acyclic graph and as such it is associated with an edge. [sent-108, score-0.865]

30 The deﬁnition of the sign of an edge can be given either in terms of monotonic effects or weak monotonic effects. [sent-109, score-0.913]

31 We can deﬁne the sign of an edge as the sign of the monotonic effect or weak monotonic effect to which the edge corresponds; this in turn gives rise to a natural deﬁnition for the sign of a path. [sent-110, score-1.24]

32 An edge on a causal directed acyclic graph from X to Y is said to be of positive sign if X has a positive monotonic effect on Y . [sent-112, score-1.129]

33 An edge from X to Y is said to be of negative sign if X has a negative monotonic effect on Y . [sent-113, score-0.59]

34 If X has neither a positive monotonic effect nor a negative monotonic effect on Y , then the edge from X to Y is said to be without a sign. [sent-114, score-0.964]

35 The sign of a path on a causal directed acyclic graph is the product of the signs of the edges that constitute that path. [sent-116, score-0.721]

36 We will call a causal directed acyclic graph with signs on those edges which allow them a signed causal directed acyclic graph. [sent-118, score-1.087]

37 The theorems in this paper are given in terms of signed paths so as to be applicable to both monotonic effects and weak monotonic effects. [sent-119, score-0.997]

38 For the conditional expectation of some variable Y to be monotonic in a conditioning argument A, it requires that the conditioning set includes variables that block all backdoor paths from A to Y . [sent-140, score-0.901]

39 , Rm ) denote an ordered list of some set of nodes on directed paths from A to Y such that for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-147, score-1.041]

40 ,Vn−1 be an ordered list of all the nodes which are not in R but which are on directed paths from A to Y such that at least one of the directed paths from each node to Y is not blocked by R. [sent-154, score-1.056]

41 , Rm ) denote an ordered list of some set of nodes on directed paths from A to Y such that for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-167, score-1.041]

42 Lemma 3 which consists of the set of ancestors of A or Y which are not descendents of A with some other set X which consists of some set of non-descendents of A that blocks all backdoor paths from A to Y . [sent-185, score-0.547]

43 Propositions 5-8 relax the condition that the conditioning set includes variables that block all backdoor paths A to Y and impose certain other conditions; the proofs of each of these propositions make use of Proposition 4. [sent-186, score-0.554]

44 Propositions 7 and 8 consider the monotonicity of conditional expectations while conditioning on variables other than the variable in which monotonicity holds but not conditioning on variables that are sufﬁcient to block all backdoor paths between A and Y . [sent-202, score-0.785]

45 If all directed paths from A to Y are of positive sign and all directed paths from C to A not through Q are of the same sign as all directed paths from C to Y not through {Q, A} then E[Y |A, Q] is non-decreasing in A. [sent-208, score-1.279]

46 If all directed paths from A to Y are of positive sign and all directed paths from C to A not through Q are of the same sign as all directed paths from C to Y not through {Q, A} then E[A|Y, Q] is non-decreasing in Y . [sent-214, score-1.279]

47 If all directed paths from A to Y (or from A to Y ) are of positive sign and all directed paths from C to A not through {Q,Y } are of the same sign as all directed paths from C to Y not through {Q, A} then E[Y |A, Q] is non-decreasing in Y. [sent-219, score-1.279]

48 Consider once again the causal directed acyclic graph given in Figure 1. [sent-227, score-0.539]

49 Suppose that we may assume that A has a weak monotonic effect on Y . [sent-228, score-0.512]

50 Let ni jq denote the number of individuals in stratum Q = q with A = i and R = j and let let di jq denote the number of individuals in stratum Q = q with A = i and R = j and Y = 1. [sent-239, score-1.062]

51 Let pi jq denote the true probability P(Y = 1|A = i, R = j, Q = q). [sent-240, score-0.488]

52 From the null hypothesis that U is absent from Figure 1, it follows by Proposition 4 that d jq d jq p1 jq − p0 jq ≤ 0 for all j and q. [sent-241, score-2.008]

53 Thus we have di jq ∼ Bin(ni jq , pi jq ) with E[ nii jq ] = pi jq and Var( nii jq ) = d d pi jq (1−pi jq ) . [sent-242, score-3.904]

54 ni jq By the central limit central limit theorem d sky’s theorem we have 1 jq 0 jq p (1−p ) p1 jq (1−p1 jq ) + 0 jq n 0 jq n1 jq 0 jq . [sent-243, score-4.392]

55 ∼ N(0, 1) and by Slut- d ( n1 jq − n0 jq )−(p1 jq −p0 jq ) 1 jq ( n1 jq − n0 jq )−(p1 jq −p0 jq ) . [sent-244, score-4.392]

56 Effect Modiﬁcation and Monotonic Effects If when conditioning on a particular variable, the sign of the effect of another variable on the outcome varies between strata of the conditioning variable, then the conditioning variable is said to be a qualitative effect modiﬁer. [sent-258, score-0.587]

57 Monotonic effects and weak monotonic effects are closely related to the concept of qualitative effect modiﬁcation. [sent-263, score-0.694]

58 Essentially, the presence of a monotonic effect precludes the possibility of qualitative effect modiﬁcation. [sent-264, score-0.639]

59 Suppose that some parent A1 of Y is such that the A1 − Y edge is of positive sign then there can be no other parent, A2 , of Y which is a qualitative effect modiﬁer for causal effect of A1 on Y , either unconditionally or within some stratum C = c of the parents of Y other than A1 and A2 . [sent-267, score-0.723]

60 Suppose that all directed paths from A to Y are of positive sign (or are all of negative sign) then there exists no qualitative effect modiﬁer Q on the directed acyclic graph for the causal effect of A on Y . [sent-274, score-1.284]

61 Consider the signed directed acyclic graph given in Figure 4 in which the A −Y edge is of positive sign. [sent-277, score-0.5]

62 However, by Proposition 9 or 10, since A has a (weak) monotonic effect on Y , none of Q1 , Q2 , Q3 , Q4 or Q5 can serve as qualitative effect modiﬁers for the causal effect of A on Y . [sent-279, score-0.863]

63 Conversely, if it is found that one of Q1 , Q2 , Q3 , Q4 or Q5 is a qualitative effect modiﬁer for the causal effect of A on Y then the A −Y edge cannot be of positive (or negative) sign. [sent-280, score-0.498]

64 Concluding Remarks In this paper we have related weak monotonic effects to the monotonicity of certain conditional expectations in the conditioning argument and to qualitative effect modiﬁcation. [sent-282, score-0.872]

65 When the variables on a causal directed acyclic graph exhibit weak monotonic effects the results can be used to construct tests for the presence of unmeasured confounding variables. [sent-283, score-1.083]

66 5 Proof of Lemma 2 We will say a path from A to B is a frontdoor path from A to B if the path begins with a directed edge with the arrowhead pointing out of A. [sent-298, score-0.659]

67 Thus all frontdoor paths from Rm to Vk will be blocked given A,V1 , . [sent-320, score-0.458]

68 All backdoor paths from Rm to Vk with an edge going into Vk will be blocked given A,V1 , . [sent-327, score-0.708]

69 ,Vk−1 , Q, Rk since there is a directed path from Vk to Y and Q includes all ancestors of Y not on directed paths from A to Y . [sent-336, score-0.73]

70 All backdoor paths from Rm to Vk with an edge going out from Vk will be blocked given A, Q, R1 , . [sent-337, score-0.708]

71 , Rm−1 by hypothesis; otherwise there would be a backdoor path from Rm through Vk to Y not blocked by A, Q, R1 , . [sent-340, score-0.545]

72 But all backdoor paths from Rm to Vk with an edge going out from Vk which are blocked by A, Q, R1 , . [sent-344, score-0.708]

73 ,Vk−1 were a collider then the path would in fact be blocked by the parents of the collider since all the parents of V1 , . [sent-370, score-0.7]

74 ,Vk−1 , say Vp , were a descendent of the collider then none of the directed paths from the collider to Vp could contain nodes in R1 , . [sent-383, score-0.747]

75 ,Vp−1 since it is an ancestor of Vp with a directed path to Vp not blocked by R. [sent-396, score-0.523]

76 ,Vp−1 then the path would in fact be blocked by the parents of the collider since all the parents of V1 , . [sent-400, score-0.576]

77 From this it follows that all backdoor paths from Rm to Vk with an edge going out from Vk are blocked by A,V1 , . [sent-410, score-0.708]

78 All backdoor paths from Vk to Q\Qk will be blocked given A,V1 , . [sent-459, score-0.623]

79 Since Vk is not a descendent of Q\Qk all frontdoor paths from Vk to Q\Qk will involve at least one collider which is a descendent of Vk . [sent-463, score-0.532]

80 ,Vn−1 be an ordered list of all the nodes which are not in R but which are on directed paths from A to Y such that at least one of the directed paths from each node to Y is not blocked by R. [sent-499, score-1.056]

81 , n − 1 since Vi has either a weak positive monotonic effect or no effect on Vn . [sent-515, score-0.603]

82 , vn−2 , q, r since Vi has either a weak positive monotonic effect or no effect on Vn−1 . [sent-529, score-0.603]

83 |A,V1 , Q, R] is a non-decreasing function of v1 and v0 = a and since A has either a weak positive monotonic effect or no effect on V1 , S(v1 |a, q, r) = S(v1 |pav1 ) will be non-decreasing in a and thus by Lemma 1, S(y|a, q, r) = E[1(Vn > y)|A, Q, R] = E[E[. [sent-540, score-0.603]

84 Note that if for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-549, score-0.623]

85 , Ri−1 , A and X then these backdoor paths will also be blocked by R1 , . [sent-552, score-0.623]

87 All backdoor paths from A to W ′ are blocked given R and X by X since X blocks all backdoor paths from A to Y . [sent-559, score-1.062]

88 Suppose the collider were some node Ri ; by hypothesis all backdoor paths from Ri to Y are blocked by R1 , . [sent-562, score-0.784]

89 , Ri−1 and X for otherwise there would be a backdoor path from Ri through W ′ to Y not blocked by A, R1 , . [sent-568, score-0.545]

90 From this it follows that every frontdoor path from A to W ′ must be blocked given R and X either by a collider or by a node in R or X. [sent-572, score-0.541]

91 12 Proof of Proposition 9 Note that by Proposition 3 above if A1 has a weak positive monotonic effect on Y then E[Y |A1 = a1 , A2 = a2 ,C = c] must be non-decreasing in a1 and if A1 has a weak negative monotonic effect on Y then E[Y |A1 = a1 , A2 = a2 ,C = c] must be non-increasing in a1 . [sent-604, score-1.024]

92 The proof for weak negative monotonic effects is similar. [sent-615, score-0.453]

93 Let C denote all non-descendents of A which are either parents of Y or parents of a node on a directed path between A and Y . [sent-616, score-0.49]

94 ∏ to show that (Y Q|C, A)GA where GA denotes the graph obtained by deleting from the original directed acyclic graph all arrows pointing into A. [sent-620, score-0.454]

95 Any backdoor path from Y to Q in GA will be blocked by C. [sent-622, score-0.545]

96 Since C will block all backdoor paths from A to Y we have by the backdoor path adjustment theorem ∑c E[Y |C = c, A = a1 ]P(C = c|Q = q) − ∑c E[Y |C = c, A = a0 ]P(C = c|Q = q) = ∑c {E[Y |C = c, A = a1 ] − E[Y |C = c, A = a0 ]}P(C = c|Q = q). [sent-624, score-0.744]

97 But since all paths between A and Y are of positive sign and since C blocks all backdoor paths from A to Y we have by Proposition 4 that E[Y |C = c, A = a] is non-decreasing in a and so E[YA=a1 |Q = q0 ] − E[YA=a0 |Q = q0 ] = ∑c {E[Y |C = c, A = a1 ] − E[Y |C = c, A = a0 ]}P(C = c|Q = q0 ) ≥ 0. [sent-626, score-0.671]

98 Clearly then C has a positive monotonic effect on A and on Y and A has a positive monotonic effect on Y and so A and Y are positively monotonically associated. [sent-636, score-0.937]

99 Clearly then C has a positive monotonic effect on A and on Y and A has a positive monotonic effect on Y and so A and Y are positively monotonically associated. [sent-645, score-0.937]

100 Four types of effect modiﬁcation, a classiﬁcation based on directed acyclic graphs. [sent-722, score-0.449]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jq', 0.488), ('monotonic', 0.339), ('vk', 0.305), ('backdoor', 0.235), ('directed', 0.213), ('blocked', 0.212), ('paths', 0.176), ('robins', 0.148), ('acyclic', 0.145), ('causal', 0.133), ('pay', 0.13), ('proposition', 0.125), ('collider', 0.124), ('qualitative', 0.118), ('ya', 0.115), ('wellman', 0.113), ('monotonicity', 0.1), ('path', 0.098), ('effect', 0.091), ('vn', 0.091), ('eele', 0.087), ('vanderw', 0.087), ('vanderweele', 0.087), ('weak', 0.082), ('descendent', 0.081), ('propositions', 0.079), ('qk', 0.079), ('descendents', 0.078), ('onotonic', 0.078), ('rm', 0.075), ('parents', 0.071), ('frontdoor', 0.07), ('henrion', 0.07), ('druzdzel', 0.066), ('edge', 0.065), ('conditioning', 0.064), ('pavk', 0.061), ('roperties', 0.06), ('cardiovascular', 0.06), ('ffects', 0.06), ('pearl', 0.056), ('sign', 0.056), ('tian', 0.055), ('rk', 0.054), ('graph', 0.048), ('ri', 0.048), ('unmeasured', 0.046), ('qc', 0.046), ('kang', 0.046), ('confounding', 0.045), ('counterexamples', 0.044), ('hypercholesterolemia', 0.043), ('stratum', 0.043), ('tyler', 0.043), ('positively', 0.042), ('said', 0.039), ('counterfactuals', 0.037), ('vp', 0.037), ('node', 0.037), ('monotonically', 0.035), ('smoking', 0.033), ('effects', 0.032), ('judea', 0.032), ('ancestors', 0.03), ('counterfactual', 0.03), ('structural', 0.029), ('signed', 0.029), ('nodes', 0.029), ('parent', 0.029), ('absent', 0.028), ('vi', 0.028), ('blocks', 0.028), ('edges', 0.028), ('null', 0.028), ('ci', 0.027), ('unconditionally', 0.026), ('df', 0.025), ('james', 0.024), ('suppose', 0.024), ('expectations', 0.023), ('conditional', 0.023), ('qd', 0.022), ('ber', 0.022), ('integration', 0.022), ('concerning', 0.021), ('disease', 0.021), ('modi', 0.02), ('spirtes', 0.02), ('going', 0.02), ('lemma', 0.02), ('qn', 0.02), ('regularity', 0.019), ('verma', 0.018), ('auai', 0.018), ('ga', 0.018), ('graphs', 0.018), ('relationships', 0.018), ('arrowhead', 0.017), ('confounds', 0.017), ('gaag', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs

Author: Tyler J. VanderWeele, James M. Robins

2 0.095758662 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes

Author: Jan Ramon, Siegfried Nijssen

Abstract: Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efﬁciently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be used to enumerate any class of graphs with polynomial delay, as long as the class of graphs can be described using a monotonic predicate operating on a dense augmentation schema. In practice this means that this is the ﬁrst enumeration algorithm that can be applied theoretically efﬁciently in any frequent subgraph mining algorithm, and that this algorithm generalizes to situations beyond the standard frequent subgraph mining setting. Keywords: graph mining, enumeration, monotonic graph classes

3 0.083480634 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)

Author: Changsung Kang, Jin Tian

Abstract: A linear causal model with correlated errors, represented by a DAG with bi-directed edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property speciﬁes, by the d-separation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property speciﬁes a much smaller set of conditional independence relations which will imply all other conditional independence relations which hold under the global Markov property. For DAGs with bi-directed edges associated with arbitrary probability distributions, a local Markov property is given in Richardson (2003) which may invoke an exponential number of conditional independencies. In this paper, we show that for a class of linear structural equation models with correlated errors, there is a local Markov property which will invoke only a linear number of conditional independence relations. For general linear models, we provide a local Markov property that often invokes far fewer conditional independencies than that in Richardson (2003). The results have applications in testing linear structural equation models with correlated errors. Keywords: Markov properties, linear causal models, linear structural equation models, graphical models

4 0.079596944 6 jmlr-2009-An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity

Author: Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér

Abstract: We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisﬁes weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability distribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n2 (e + n)) for n nodes and e edges. Finally, we illustrate how the graphical criterion can be used within bioinformatics to identify biologically meaningful gene dependencies. Keywords: graphical models, vertex separation, graphoids, weak transitivity, bioinformatics

5 0.078717001 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors

Author: Mathias Drton, Michael Eichler, Thomas S. Richardson

Abstract: In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by a recursive (or acyclic) system of linear structural equations. These linear models have a long tradition and appear in seemingly unrelated regressions, structural equation modelling, and approaches to causal inference. They are also related to Gaussian graphical models via a classical representation known as a path diagram. Despite the models’ long history, a number of problems remain open. In this paper, we address the problem of computing maximum likelihood estimates in the subclass of ‘bow-free’ recursive linear models. The term ‘bow-free’ refers to the condition that the errors for variables i and j be uncorrelated if variable i occurs in the structural equation for variable j. We introduce a new algorithm, termed Residual Iterative Conditional Fitting (RICF), that can be implemented using only least squares computations. In contrast to existing algorithms, RICF has clear convergence properties and yields exact maximum likelihood estimates after the ﬁrst iteration whenever the MLE is available in closed form. Keywords: linear regression, maximum likelihood estimation, path diagram, structural equation model, recursive semi-Markov model, residual iterative conditional ﬁtting

6 0.071247295 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

7 0.054763872 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification

8 0.047585886 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths

9 0.04007408 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks

10 0.039595783 41 jmlr-2009-Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation (Special Topic on Causality)

11 0.032989413 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning

12 0.030826541 90 jmlr-2009-Structure Spaces

13 0.028666487 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

14 0.02596006 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning

15 0.023216035 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

16 0.022022838 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

17 0.020514987 33 jmlr-2009-Exploring Strategies for Training Deep Neural Networks

18 0.020312574 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

19 0.018805742 18 jmlr-2009-Consistency and Localizability

20 0.018650236 23 jmlr-2009-Discriminative Learning Under Covariate Shift

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.106), (1, -0.053), (2, 0.091), (3, 0.146), (4, -0.025), (5, -0.063), (6, -0.251), (7, -0.136), (8, 0.122), (9, -0.048), (10, -0.223), (11, 0.05), (12, 0.082), (13, 0.07), (14, -0.004), (15, -0.153), (16, 0.007), (17, -0.016), (18, -0.046), (19, -0.078), (20, -0.004), (21, 0.013), (22, 0.002), (23, -0.114), (24, -0.015), (25, 0.037), (26, -0.132), (27, 0.108), (28, 0.054), (29, 0.009), (30, 0.144), (31, 0.039), (32, -0.154), (33, -0.061), (34, 0.112), (35, -0.042), (36, -0.082), (37, 0.093), (38, -0.035), (39, -0.113), (40, 0.088), (41, -0.068), (42, -0.01), (43, 0.047), (44, -0.071), (45, 0.092), (46, -0.014), (47, 0.076), (48, 0.21), (49, 0.135)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98303008 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs

Author: Tyler J. VanderWeele, James M. Robins

2 0.53954339 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes

Author: Jan Ramon, Siegfried Nijssen

3 0.48291942 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths

Author: Dana Angluin, James Aspnes, Jiang Chen, David Eisenstat, Lev Reyzin

Abstract: We deﬁne a model of learning probabilistic acyclic circuits using value injection queries, in which ﬁxed values are assigned to an arbitrary subset of the wires and the value on the single output wire is observed. We adapt the approach of using test paths from the Circuit Builder algorithm (Angluin et al., 2009) to show that there is a polynomial time algorithm that uses value injection queries to learn acyclic Boolean probabilistic circuits of constant fan-in and log depth. We establish upper and lower bounds on the attenuation factor for general and transitively reduced Boolean probabilistic circuits of test paths versus general experiments. We give computational evidence that a polynomial time learning algorithm using general value injection experiments may not do much better than one using test paths. For probabilistic circuits with alphabets of size three or greater, we show that the test path lemmas (Angluin et al., 2009, 2008b) fail utterly. To overcome this obstacle, we introduce function injection queries, in which the values on a wire may be mapped to other values rather than just to themselves or constants, and prove a generalized test path lemma for this case. Keywords: nonadaptive learning algorithms, probabilistic circuits, causal Bayesian networks, value injection queries, test paths

4 0.46398494 6 jmlr-2009-An Algorithm for Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies Weak Transitivity

Author: Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér

5 0.42061037 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)

Author: Changsung Kang, Jin Tian

6 0.31969288 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification

7 0.30445206 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors

8 0.30398676 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

9 0.25272971 41 jmlr-2009-Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation (Special Topic on Causality)

10 0.23440805 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning

11 0.21262398 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression

12 0.20033951 90 jmlr-2009-Structure Spaces

13 0.13249786 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

14 0.12933099 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning

15 0.11745475 16 jmlr-2009-Classification with Gaussians and Convex Loss

16 0.11122961 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

17 0.11056601 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning

18 0.09846215 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks

19 0.097268164 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures

20 0.096129812 46 jmlr-2009-Learning Halfspaces with Malicious Noise

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.127), (18, 0.487), (26, 0.016), (38, 0.018), (47, 0.011), (52, 0.027), (55, 0.037), (58, 0.03), (66, 0.076), (90, 0.037), (96, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80214703 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs

Author: Tyler J. VanderWeele, James M. Robins

2 0.31375891 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors

Author: Mathias Drton, Michael Eichler, Thomas S. Richardson

3 0.31276506 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification

Author: Raanan Yehezkel, Boaz Lerner

Abstract: We propose the recursive autonomy identiﬁcation (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, run-time, structural correctness and classiﬁcation accuracy over the PC, Three Phase Dependency Analysis, Optimal Reinsertion, greedy search, Greedy Equivalence Search, Sparse Candidate, and Max-Min Hill-Climbing algorithms. Keywords: Bayesian networks, constraint-based structure learning

4 0.29682592 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers

Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil

Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufﬁcient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization

5 0.29177266 13 jmlr-2009-Bounded Kernel-Based Online Learning

Author: Francesco Orabona, Joseph Keshet, Barbara Caputo

Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the ﬁrst online bounded algorithm that can learn complex classiﬁcation tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set

6 0.2833561 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)

7 0.25259811 19 jmlr-2009-Controlling the False Discovery Rate of the Association Causality Structure Learned with the PC Algorithm (Special Topic on Mining and Learning with Graphs and Relations)

8 0.24159145 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models

9 0.22219646 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks

10 0.22218063 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

11 0.2211751 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning

12 0.2169608 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications

13 0.21681486 78 jmlr-2009-Refinement of Reproducing Kernels

14 0.2155142 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning

15 0.21502855 9 jmlr-2009-Analysis of Perceptron-Based Active Learning

16 0.21328399 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning

17 0.20994189 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods

18 0.20869091 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures

19 0.20839296 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

20 0.20720047 82 jmlr-2009-Robustness and Regularization of Support Vector Machines