jmlr jmlr2009 jmlr2009-74 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tyler J. VanderWeele, James M. Robins
Abstract: Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease. Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modification, monotonicity
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Departments of Epidemiology and Biostatistics Harvard School of Public Health Boston, MA 02115, USA Editor: Peter Spirtes Abstract Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. [sent-6, score-0.965]
2 Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. [sent-8, score-0.837]
3 Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modification, monotonicity 1. [sent-11, score-0.572]
4 Introduction Several papers have considered various monotonicity relationships on Bayesian networks or directed acyclic graphs. [sent-12, score-0.476]
5 VanderWeele and Robins (2009) introduced the concept of a monotonic effect which is closely related to Wellman’s qualitative influence and considered the relationship between monotonicity properties and causal effects, covariance, bias and confounding. [sent-17, score-0.781]
6 In this paper we develop a number of probabilistic properties concerning monotonic effects and weak monotonic effects. [sent-18, score-0.813]
7 In Section 4, we define the concepts of a monotonic effect and a weak monotonic effect in the directed acyclic graph causal framework, the latter essentially being equivalent to Wellman’s (1990) qualitative influence. [sent-27, score-1.599]
8 In Section 5, we give a number of results relating weak monotonic effects to the monotonicity in the conditioning argument of certain conditional expectations; we also return to the motivating example and show how the theory developed can be applied to this example. [sent-28, score-0.64]
9 Finally, in Section 6, we give a number of results that relate weak monotonic effects to the existence of qualitative effect modifiers. [sent-29, score-0.662]
10 Notation and Directed Acyclic Graphs Following Pearl (1995), a causal directed acyclic graph is a set of nodes (X1 , . [sent-32, score-0.568]
11 , Xn ) and directed edges amongst nodes such that the graph has no cycles and such that for each node Xi on the graph the corresponding variable is given by its non-parametric structural equation Xi = fi (pai , εi ) where pai are the parents of Xi on the graph and the εi are mutually independent. [sent-35, score-0.551]
12 Further discussion of the causal interpretation of directed acyclic graphs can be found elsewhere (Pearl, 1995, 2000; Spirtes et al. [sent-42, score-0.509]
13 A path is a sequence of nodes connected by edges regardless of arrowhead direction; a directed path is a path which follows the edges in the direction indicated by the graph’s arrows. [sent-44, score-0.609]
14 A node C is said to be a common cause of A and Y if there exists a directed path from C to Y not through A and a directed path from C to A not through Y . [sent-45, score-0.698]
15 A collider is a particular node on a path such that both the preceding and subsequent nodes on the path have directed edges going into that node, that is, both the edge to and the edge from that node have arrowheads into the node. [sent-50, score-0.814]
16 A path between A and B is said to be blocked given some set of variables Z if either there is a variable in Z on the path that is not a collider or if there is a collider on the path such that neither the collider itself nor any of its descendants are in Z. [sent-51, score-0.917]
17 The directed acyclic graph causal framework has proven to be particularly useful in determining whether conditioning on a given set of variables, or none at all, is sufficient to control for confounding. [sent-57, score-0.603]
18 A back-door path from some node A to another node Y is a path which begins with a directed edge into A. [sent-59, score-0.548]
19 Suppose that Figure 1 represents a causal directed acyclic graph. [sent-66, score-0.491]
20 On the Definition of a Monotonic Effect The definition of a monotonic effect is given in terms of a directed acyclic graph’s nonparametric structural equations. [sent-80, score-0.817]
21 Similarly A is said to have a negative monotonic effect on Y if for all paY and εY , f (paY , A1 , εY ) ≤ f (paY , A2 , εY ) whenever A1 ≥ A2 . [sent-83, score-0.469]
22 As we have defined it above, a causal direct acyclic graph corresponds to a set of non-parametric structural equations and as such the definition of a monotonic effect given above is relative to a particular set of non-parametric structural equations. [sent-84, score-0.814]
23 The presence of a monotonic effect is closely related to the monotonicity of counterfactual variables as is made clear by the following proposition. [sent-85, score-0.56]
24 In the context of characterizations of causal directed acyclic graphs that make reference to counterfactuals but not to non-parametric structural equations (Robins, 2003), a positive monotonic effect could instead be defined to be present if for all paY and a1 ≥ a0 , P(Ya1 ,paY ≥ Ya0 ,paY ) = 1. [sent-93, score-1.005]
25 We thus introduce the concept of a weak monotonic effect which is a special case of Wellman’s positive qualitative influence (Wellman, 1990). [sent-98, score-0.63]
26 The definition of a weak monotonic effect does not make reference to counterfactuals and thus can be used in characterizations of causal directed acyclic graphs that do not employ the concept of counterfactuals (Spirtes et al. [sent-99, score-1.095]
27 If A has a positive monotonic effect on Y then A has a weak positive monotonic effect on Y . [sent-106, score-0.942]
28 We note that for parent A and child Y , the definition of a weak monotonic effect coincides with Wellman’s (1990) definition of positive qualitative influence when the ”context” for qualitative influence is chosen to be the parents of Y other than A. [sent-107, score-0.848]
29 A monotonic effect is a relation between two nodes on a directed acyclic graph and as such it is associated with an edge. [sent-108, score-0.865]
30 The definition of the sign of an edge can be given either in terms of monotonic effects or weak monotonic effects. [sent-109, score-0.913]
31 We can define the sign of an edge as the sign of the monotonic effect or weak monotonic effect to which the edge corresponds; this in turn gives rise to a natural definition for the sign of a path. [sent-110, score-1.24]
32 An edge on a causal directed acyclic graph from X to Y is said to be of positive sign if X has a positive monotonic effect on Y . [sent-112, score-1.129]
33 An edge from X to Y is said to be of negative sign if X has a negative monotonic effect on Y . [sent-113, score-0.59]
34 If X has neither a positive monotonic effect nor a negative monotonic effect on Y , then the edge from X to Y is said to be without a sign. [sent-114, score-0.964]
35 The sign of a path on a causal directed acyclic graph is the product of the signs of the edges that constitute that path. [sent-116, score-0.721]
36 We will call a causal directed acyclic graph with signs on those edges which allow them a signed causal directed acyclic graph. [sent-118, score-1.087]
37 The theorems in this paper are given in terms of signed paths so as to be applicable to both monotonic effects and weak monotonic effects. [sent-119, score-0.997]
38 For the conditional expectation of some variable Y to be monotonic in a conditioning argument A, it requires that the conditioning set includes variables that block all backdoor paths from A to Y . [sent-140, score-0.901]
39 , Rm ) denote an ordered list of some set of nodes on directed paths from A to Y such that for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-147, score-1.041]
40 ,Vn−1 be an ordered list of all the nodes which are not in R but which are on directed paths from A to Y such that at least one of the directed paths from each node to Y is not blocked by R. [sent-154, score-1.056]
41 , Rm ) denote an ordered list of some set of nodes on directed paths from A to Y such that for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-167, score-1.041]
42 Lemma 3 which consists of the set of ancestors of A or Y which are not descendents of A with some other set X which consists of some set of non-descendents of A that blocks all backdoor paths from A to Y . [sent-185, score-0.547]
43 Propositions 5-8 relax the condition that the conditioning set includes variables that block all backdoor paths A to Y and impose certain other conditions; the proofs of each of these propositions make use of Proposition 4. [sent-186, score-0.554]
44 Propositions 7 and 8 consider the monotonicity of conditional expectations while conditioning on variables other than the variable in which monotonicity holds but not conditioning on variables that are sufficient to block all backdoor paths between A and Y . [sent-202, score-0.785]
45 If all directed paths from A to Y are of positive sign and all directed paths from C to A not through Q are of the same sign as all directed paths from C to Y not through {Q, A} then E[Y |A, Q] is non-decreasing in A. [sent-208, score-1.279]
46 If all directed paths from A to Y are of positive sign and all directed paths from C to A not through Q are of the same sign as all directed paths from C to Y not through {Q, A} then E[A|Y, Q] is non-decreasing in Y . [sent-214, score-1.279]
47 If all directed paths from A to Y (or from A to Y ) are of positive sign and all directed paths from C to A not through {Q,Y } are of the same sign as all directed paths from C to Y not through {Q, A} then E[Y |A, Q] is non-decreasing in Y. [sent-219, score-1.279]
48 Consider once again the causal directed acyclic graph given in Figure 1. [sent-227, score-0.539]
49 Suppose that we may assume that A has a weak monotonic effect on Y . [sent-228, score-0.512]
50 Let ni jq denote the number of individuals in stratum Q = q with A = i and R = j and let let di jq denote the number of individuals in stratum Q = q with A = i and R = j and Y = 1. [sent-239, score-1.062]
51 Let pi jq denote the true probability P(Y = 1|A = i, R = j, Q = q). [sent-240, score-0.488]
52 From the null hypothesis that U is absent from Figure 1, it follows by Proposition 4 that d jq d jq p1 jq − p0 jq ≤ 0 for all j and q. [sent-241, score-2.008]
53 Thus we have di jq ∼ Bin(ni jq , pi jq ) with E[ nii jq ] = pi jq and Var( nii jq ) = d d pi jq (1−pi jq ) . [sent-242, score-3.904]
54 ni jq By the central limit central limit theorem d sky’s theorem we have 1 jq 0 jq p (1−p ) p1 jq (1−p1 jq ) + 0 jq n 0 jq n1 jq 0 jq . [sent-243, score-4.392]
55 ∼ N(0, 1) and by Slut- d ( n1 jq − n0 jq )−(p1 jq −p0 jq ) 1 jq ( n1 jq − n0 jq )−(p1 jq −p0 jq ) . [sent-244, score-4.392]
56 Effect Modification and Monotonic Effects If when conditioning on a particular variable, the sign of the effect of another variable on the outcome varies between strata of the conditioning variable, then the conditioning variable is said to be a qualitative effect modifier. [sent-258, score-0.587]
57 Monotonic effects and weak monotonic effects are closely related to the concept of qualitative effect modification. [sent-263, score-0.694]
58 Essentially, the presence of a monotonic effect precludes the possibility of qualitative effect modification. [sent-264, score-0.639]
59 Suppose that some parent A1 of Y is such that the A1 − Y edge is of positive sign then there can be no other parent, A2 , of Y which is a qualitative effect modifier for causal effect of A1 on Y , either unconditionally or within some stratum C = c of the parents of Y other than A1 and A2 . [sent-267, score-0.723]
60 Suppose that all directed paths from A to Y are of positive sign (or are all of negative sign) then there exists no qualitative effect modifier Q on the directed acyclic graph for the causal effect of A on Y . [sent-274, score-1.284]
61 Consider the signed directed acyclic graph given in Figure 4 in which the A −Y edge is of positive sign. [sent-277, score-0.5]
62 However, by Proposition 9 or 10, since A has a (weak) monotonic effect on Y , none of Q1 , Q2 , Q3 , Q4 or Q5 can serve as qualitative effect modifiers for the causal effect of A on Y . [sent-279, score-0.863]
63 Conversely, if it is found that one of Q1 , Q2 , Q3 , Q4 or Q5 is a qualitative effect modifier for the causal effect of A on Y then the A −Y edge cannot be of positive (or negative) sign. [sent-280, score-0.498]
64 Concluding Remarks In this paper we have related weak monotonic effects to the monotonicity of certain conditional expectations in the conditioning argument and to qualitative effect modification. [sent-282, score-0.872]
65 When the variables on a causal directed acyclic graph exhibit weak monotonic effects the results can be used to construct tests for the presence of unmeasured confounding variables. [sent-283, score-1.083]
66 5 Proof of Lemma 2 We will say a path from A to B is a frontdoor path from A to B if the path begins with a directed edge with the arrowhead pointing out of A. [sent-298, score-0.659]
67 Thus all frontdoor paths from Rm to Vk will be blocked given A,V1 , . [sent-320, score-0.458]
68 All backdoor paths from Rm to Vk with an edge going into Vk will be blocked given A,V1 , . [sent-327, score-0.708]
69 ,Vk−1 , Q, Rk since there is a directed path from Vk to Y and Q includes all ancestors of Y not on directed paths from A to Y . [sent-336, score-0.73]
70 All backdoor paths from Rm to Vk with an edge going out from Vk will be blocked given A, Q, R1 , . [sent-337, score-0.708]
71 , Rm−1 by hypothesis; otherwise there would be a backdoor path from Rm through Vk to Y not blocked by A, Q, R1 , . [sent-340, score-0.545]
72 But all backdoor paths from Rm to Vk with an edge going out from Vk which are blocked by A, Q, R1 , . [sent-344, score-0.708]
73 ,Vk−1 were a collider then the path would in fact be blocked by the parents of the collider since all the parents of V1 , . [sent-370, score-0.7]
74 ,Vk−1 , say Vp , were a descendent of the collider then none of the directed paths from the collider to Vp could contain nodes in R1 , . [sent-383, score-0.747]
75 ,Vp−1 since it is an ancestor of Vp with a directed path to Vp not blocked by R. [sent-396, score-0.523]
76 ,Vp−1 then the path would in fact be blocked by the parents of the collider since all the parents of V1 , . [sent-400, score-0.576]
77 From this it follows that all backdoor paths from Rm to Vk with an edge going out from Vk are blocked by A,V1 , . [sent-410, score-0.708]
78 All backdoor paths from Vk to Q\Qk will be blocked given A,V1 , . [sent-459, score-0.623]
79 Since Vk is not a descendent of Q\Qk all frontdoor paths from Vk to Q\Qk will involve at least one collider which is a descendent of Vk . [sent-463, score-0.532]
80 ,Vn−1 be an ordered list of all the nodes which are not in R but which are on directed paths from A to Y such that at least one of the directed paths from each node to Y is not blocked by R. [sent-499, score-1.056]
81 , n − 1 since Vi has either a weak positive monotonic effect or no effect on Vn . [sent-515, score-0.603]
82 , vn−2 , q, r since Vi has either a weak positive monotonic effect or no effect on Vn−1 . [sent-529, score-0.603]
83 |A,V1 , Q, R] is a non-decreasing function of v1 and v0 = a and since A has either a weak positive monotonic effect or no effect on V1 , S(v1 |a, q, r) = S(v1 |pav1 ) will be non-decreasing in a and thus by Lemma 1, S(y|a, q, r) = E[1(Vn > y)|A, Q, R] = E[E[. [sent-540, score-0.603]
84 Note that if for each i the backdoor paths from Ri to Y are blocked by R1 , . [sent-549, score-0.623]
85 , Ri−1 , A and X then these backdoor paths will also be blocked by R1 , . [sent-552, score-0.623]
86 Since Q blocks all backdoor paths from A to Y we have S(y|a, x, r) = E[E[1(Y > y)|a, Q, x, r]|a, x, r] = E[E[1(Y > y)|a, Q, r]|a, x, r] = E[E[1(Y > y)|a,W, r]|a, x, r] where W is the subset of Q which are either parents of Y or parents of a node on a directed path from A to Y . [sent-557, score-0.929]
87 All backdoor paths from A to W ′ are blocked given R and X by X since X blocks all backdoor paths from A to Y . [sent-559, score-1.062]
88 Suppose the collider were some node Ri ; by hypothesis all backdoor paths from Ri to Y are blocked by R1 , . [sent-562, score-0.784]
89 , Ri−1 and X for otherwise there would be a backdoor path from Ri through W ′ to Y not blocked by A, R1 , . [sent-568, score-0.545]
90 From this it follows that every frontdoor path from A to W ′ must be blocked given R and X either by a collider or by a node in R or X. [sent-572, score-0.541]
91 12 Proof of Proposition 9 Note that by Proposition 3 above if A1 has a weak positive monotonic effect on Y then E[Y |A1 = a1 , A2 = a2 ,C = c] must be non-decreasing in a1 and if A1 has a weak negative monotonic effect on Y then E[Y |A1 = a1 , A2 = a2 ,C = c] must be non-increasing in a1 . [sent-604, score-1.024]
92 The proof for weak negative monotonic effects is similar. [sent-615, score-0.453]
93 Let C denote all non-descendents of A which are either parents of Y or parents of a node on a directed path between A and Y . [sent-616, score-0.49]
94 ∏ to show that (Y Q|C, A)GA where GA denotes the graph obtained by deleting from the original directed acyclic graph all arrows pointing into A. [sent-620, score-0.454]
95 Any backdoor path from Y to Q in GA will be blocked by C. [sent-622, score-0.545]
96 Since C will block all backdoor paths from A to Y we have by the backdoor path adjustment theorem ∑c E[Y |C = c, A = a1 ]P(C = c|Q = q) − ∑c E[Y |C = c, A = a0 ]P(C = c|Q = q) = ∑c {E[Y |C = c, A = a1 ] − E[Y |C = c, A = a0 ]}P(C = c|Q = q). [sent-624, score-0.744]
97 But since all paths between A and Y are of positive sign and since C blocks all backdoor paths from A to Y we have by Proposition 4 that E[Y |C = c, A = a] is non-decreasing in a and so E[YA=a1 |Q = q0 ] − E[YA=a0 |Q = q0 ] = ∑c {E[Y |C = c, A = a1 ] − E[Y |C = c, A = a0 ]}P(C = c|Q = q0 ) ≥ 0. [sent-626, score-0.671]
98 Clearly then C has a positive monotonic effect on A and on Y and A has a positive monotonic effect on Y and so A and Y are positively monotonically associated. [sent-636, score-0.937]
99 Clearly then C has a positive monotonic effect on A and on Y and A has a positive monotonic effect on Y and so A and Y are positively monotonically associated. [sent-645, score-0.937]
100 Four types of effect modification, a classification based on directed acyclic graphs. [sent-722, score-0.449]
wordName wordTfidf (topN-words)
[('jq', 0.488), ('monotonic', 0.339), ('vk', 0.305), ('backdoor', 0.235), ('directed', 0.213), ('blocked', 0.212), ('paths', 0.176), ('robins', 0.148), ('acyclic', 0.145), ('causal', 0.133), ('pay', 0.13), ('proposition', 0.125), ('collider', 0.124), ('qualitative', 0.118), ('ya', 0.115), ('wellman', 0.113), ('monotonicity', 0.1), ('path', 0.098), ('effect', 0.091), ('vn', 0.091), ('eele', 0.087), ('vanderw', 0.087), ('vanderweele', 0.087), ('weak', 0.082), ('descendent', 0.081), ('propositions', 0.079), ('qk', 0.079), ('descendents', 0.078), ('onotonic', 0.078), ('rm', 0.075), ('parents', 0.071), ('frontdoor', 0.07), ('henrion', 0.07), ('druzdzel', 0.066), ('edge', 0.065), ('conditioning', 0.064), ('pavk', 0.061), ('roperties', 0.06), ('cardiovascular', 0.06), ('ffects', 0.06), ('pearl', 0.056), ('sign', 0.056), ('tian', 0.055), ('rk', 0.054), ('graph', 0.048), ('ri', 0.048), ('unmeasured', 0.046), ('qc', 0.046), ('kang', 0.046), ('confounding', 0.045), ('counterexamples', 0.044), ('hypercholesterolemia', 0.043), ('stratum', 0.043), ('tyler', 0.043), ('positively', 0.042), ('said', 0.039), ('counterfactuals', 0.037), ('vp', 0.037), ('node', 0.037), ('monotonically', 0.035), ('smoking', 0.033), ('effects', 0.032), ('judea', 0.032), ('ancestors', 0.03), ('counterfactual', 0.03), ('structural', 0.029), ('signed', 0.029), ('nodes', 0.029), ('parent', 0.029), ('absent', 0.028), ('vi', 0.028), ('blocks', 0.028), ('edges', 0.028), ('null', 0.028), ('ci', 0.027), ('unconditionally', 0.026), ('df', 0.025), ('james', 0.024), ('suppose', 0.024), ('expectations', 0.023), ('conditional', 0.023), ('qd', 0.022), ('ber', 0.022), ('integration', 0.022), ('concerning', 0.021), ('disease', 0.021), ('modi', 0.02), ('spirtes', 0.02), ('going', 0.02), ('lemma', 0.02), ('qn', 0.02), ('regularity', 0.019), ('verma', 0.018), ('auai', 0.018), ('ga', 0.018), ('graphs', 0.018), ('relationships', 0.018), ('arrowhead', 0.017), ('confounds', 0.017), ('gaag', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs
Author: Tyler J. VanderWeele, James M. Robins
Abstract: Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease. Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modification, monotonicity
2 0.095758662 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes
Author: Jan Ramon, Siegfried Nijssen
Abstract: Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be used to enumerate any class of graphs with polynomial delay, as long as the class of graphs can be described using a monotonic predicate operating on a dense augmentation schema. In practice this means that this is the first enumeration algorithm that can be applied theoretically efficiently in any frequent subgraph mining algorithm, and that this algorithm generalizes to situations beyond the standard frequent subgraph mining setting. Keywords: graph mining, enumeration, monotonic graph classes
3 0.083480634 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)
Author: Changsung Kang, Jin Tian
Abstract: A linear causal model with correlated errors, represented by a DAG with bi-directed edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the d-separation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property specifies a much smaller set of conditional independence relations which will imply all other conditional independence relations which hold under the global Markov property. For DAGs with bi-directed edges associated with arbitrary probability distributions, a local Markov property is given in Richardson (2003) which may invoke an exponential number of conditional independencies. In this paper, we show that for a class of linear structural equation models with correlated errors, there is a local Markov property which will invoke only a linear number of conditional independence relations. For general linear models, we provide a local Markov property that often invokes far fewer conditional independencies than that in Richardson (2003). The results have applications in testing linear structural equation models with correlated errors. Keywords: Markov properties, linear causal models, linear structural equation models, graphical models
Author: Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér
Abstract: We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability distribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n2 (e + n)) for n nodes and e edges. Finally, we illustrate how the graphical criterion can be used within bioinformatics to identify biologically meaningful gene dependencies. Keywords: graphical models, vertex separation, graphoids, weak transitivity, bioinformatics
5 0.078717001 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors
Author: Mathias Drton, Michael Eichler, Thomas S. Richardson
Abstract: In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by a recursive (or acyclic) system of linear structural equations. These linear models have a long tradition and appear in seemingly unrelated regressions, structural equation modelling, and approaches to causal inference. They are also related to Gaussian graphical models via a classical representation known as a path diagram. Despite the models’ long history, a number of problems remain open. In this paper, we address the problem of computing maximum likelihood estimates in the subclass of ‘bow-free’ recursive linear models. The term ‘bow-free’ refers to the condition that the errors for variables i and j be uncorrelated if variable i occurs in the structural equation for variable j. We introduce a new algorithm, termed Residual Iterative Conditional Fitting (RICF), that can be implemented using only least squares computations. In contrast to existing algorithms, RICF has clear convergence properties and yields exact maximum likelihood estimates after the first iteration whenever the MLE is available in closed form. Keywords: linear regression, maximum likelihood estimation, path diagram, structural equation model, recursive semi-Markov model, residual iterative conditional fitting
6 0.071247295 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
7 0.054763872 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
8 0.047585886 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths
9 0.04007408 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks
10 0.039595783 41 jmlr-2009-Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation (Special Topic on Causality)
11 0.032989413 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning
12 0.030826541 90 jmlr-2009-Structure Spaces
14 0.02596006 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
15 0.023216035 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
16 0.022022838 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
17 0.020514987 33 jmlr-2009-Exploring Strategies for Training Deep Neural Networks
18 0.020312574 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications
19 0.018805742 18 jmlr-2009-Consistency and Localizability
20 0.018650236 23 jmlr-2009-Discriminative Learning Under Covariate Shift
topicId topicWeight
[(0, 0.106), (1, -0.053), (2, 0.091), (3, 0.146), (4, -0.025), (5, -0.063), (6, -0.251), (7, -0.136), (8, 0.122), (9, -0.048), (10, -0.223), (11, 0.05), (12, 0.082), (13, 0.07), (14, -0.004), (15, -0.153), (16, 0.007), (17, -0.016), (18, -0.046), (19, -0.078), (20, -0.004), (21, 0.013), (22, 0.002), (23, -0.114), (24, -0.015), (25, 0.037), (26, -0.132), (27, 0.108), (28, 0.054), (29, 0.009), (30, 0.144), (31, 0.039), (32, -0.154), (33, -0.061), (34, 0.112), (35, -0.042), (36, -0.082), (37, 0.093), (38, -0.035), (39, -0.113), (40, 0.088), (41, -0.068), (42, -0.01), (43, 0.047), (44, -0.071), (45, 0.092), (46, -0.014), (47, 0.076), (48, 0.21), (49, 0.135)]
simIndex simValue paperId paperTitle
same-paper 1 0.98303008 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs
Author: Tyler J. VanderWeele, James M. Robins
Abstract: Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease. Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modification, monotonicity
2 0.53954339 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes
Author: Jan Ramon, Siegfried Nijssen
Abstract: Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be used to enumerate any class of graphs with polynomial delay, as long as the class of graphs can be described using a monotonic predicate operating on a dense augmentation schema. In practice this means that this is the first enumeration algorithm that can be applied theoretically efficiently in any frequent subgraph mining algorithm, and that this algorithm generalizes to situations beyond the standard frequent subgraph mining setting. Keywords: graph mining, enumeration, monotonic graph classes
3 0.48291942 44 jmlr-2009-Learning Acyclic Probabilistic Circuits Using Test Paths
Author: Dana Angluin, James Aspnes, Jiang Chen, David Eisenstat, Lev Reyzin
Abstract: We define a model of learning probabilistic acyclic circuits using value injection queries, in which fixed values are assigned to an arbitrary subset of the wires and the value on the single output wire is observed. We adapt the approach of using test paths from the Circuit Builder algorithm (Angluin et al., 2009) to show that there is a polynomial time algorithm that uses value injection queries to learn acyclic Boolean probabilistic circuits of constant fan-in and log depth. We establish upper and lower bounds on the attenuation factor for general and transitively reduced Boolean probabilistic circuits of test paths versus general experiments. We give computational evidence that a polynomial time learning algorithm using general value injection experiments may not do much better than one using test paths. For probabilistic circuits with alphabets of size three or greater, we show that the test path lemmas (Angluin et al., 2009, 2008b) fail utterly. To overcome this obstacle, we introduce function injection queries, in which the values on a wire may be mapped to other values rather than just to themselves or constants, and prove a generalized test path lemma for this case. Keywords: nonadaptive learning algorithms, probabilistic circuits, causal Bayesian networks, value injection queries, test paths
Author: Jose M. Peña, Roland Nilsson, Johan Björkegren, Jesper Tegnér
Abstract: We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability distribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n2 (e + n)) for n nodes and e edges. Finally, we illustrate how the graphical criterion can be used within bioinformatics to identify biologically meaningful gene dependencies. Keywords: graphical models, vertex separation, graphoids, weak transitivity, bioinformatics
5 0.42061037 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)
Author: Changsung Kang, Jin Tian
Abstract: A linear causal model with correlated errors, represented by a DAG with bi-directed edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the d-separation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property specifies a much smaller set of conditional independence relations which will imply all other conditional independence relations which hold under the global Markov property. For DAGs with bi-directed edges associated with arbitrary probability distributions, a local Markov property is given in Richardson (2003) which may invoke an exponential number of conditional independencies. In this paper, we show that for a class of linear structural equation models with correlated errors, there is a local Markov property which will invoke only a linear number of conditional independence relations. For general linear models, we provide a local Markov property that often invokes far fewer conditional independencies than that in Richardson (2003). The results have applications in testing linear structural equation models with correlated errors. Keywords: Markov properties, linear causal models, linear structural equation models, graphical models
6 0.31969288 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
7 0.30445206 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors
8 0.30398676 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
10 0.23440805 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning
11 0.21262398 12 jmlr-2009-Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression
12 0.20033951 90 jmlr-2009-Structure Spaces
14 0.12933099 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
15 0.11745475 16 jmlr-2009-Classification with Gaussians and Convex Loss
16 0.11122961 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
17 0.11056601 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
18 0.09846215 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks
19 0.097268164 61 jmlr-2009-Nonextensive Information Theoretic Kernels on Measures
20 0.096129812 46 jmlr-2009-Learning Halfspaces with Malicious Noise
topicId topicWeight
[(11, 0.127), (18, 0.487), (26, 0.016), (38, 0.018), (47, 0.011), (52, 0.027), (55, 0.037), (58, 0.03), (66, 0.076), (90, 0.037), (96, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.80214703 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs
Author: Tyler J. VanderWeele, James M. Robins
Abstract: Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. Counterexamples are provided to show that the results do not hold under less restrictive conditions. Monotonic effects are furthermore used to relate signed edges on a causal directed acyclic graph to qualitative effect modification. The theory is applied to an example concerning the direct effect of smoking on cardiovascular disease controlling for hypercholesterolemia. Monotonicity assumptions are used to construct a test for whether there is a variable that confounds the relationship between the mediator, hypercholesterolemia, and the outcome, cardiovascular disease. Keywords: Bayesian networks, conditional expectation, covariance, directed acyclic graphs, effect modification, monotonicity
2 0.31375891 17 jmlr-2009-Computing Maximum Likelihood Estimates in Recursive Linear Models with Correlated Errors
Author: Mathias Drton, Michael Eichler, Thomas S. Richardson
Abstract: In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by a recursive (or acyclic) system of linear structural equations. These linear models have a long tradition and appear in seemingly unrelated regressions, structural equation modelling, and approaches to causal inference. They are also related to Gaussian graphical models via a classical representation known as a path diagram. Despite the models’ long history, a number of problems remain open. In this paper, we address the problem of computing maximum likelihood estimates in the subclass of ‘bow-free’ recursive linear models. The term ‘bow-free’ refers to the condition that the errors for variables i and j be uncorrelated if variable i occurs in the structural equation for variable j. We introduce a new algorithm, termed Residual Iterative Conditional Fitting (RICF), that can be implemented using only least squares computations. In contrast to existing algorithms, RICF has clear convergence properties and yields exact maximum likelihood estimates after the first iteration whenever the MLE is available in closed form. Keywords: linear regression, maximum likelihood estimation, path diagram, structural equation model, recursive semi-Markov model, residual iterative conditional fitting
3 0.31276506 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
Author: Raanan Yehezkel, Boaz Lerner
Abstract: We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, run-time, structural correctness and classification accuracy over the PC, Three Phase Dependency Analysis, Optimal Reinsertion, greedy search, Greedy Equivalence Search, Sparse Candidate, and Max-Min Hill-Climbing algorithms. Keywords: Bayesian networks, constraint-based structure learning
4 0.29682592 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil
Abstract: We consider a general class of regularization methods which learn a vector of parameters on the basis of linear measurements. It is well known that if the regularizer is a nondecreasing function of the L2 norm, then the learned vector is a linear combination of the input data. This result, known as the representer theorem, lies at the basis of kernel-based methods in machine learning. In this paper, we prove the necessity of the above condition, in the case of differentiable regularizers. We further extend our analysis to regularization methods which learn a matrix, a problem which is motivated by the application to multi-task learning. In this context, we study a more general representer theorem, which holds for a larger class of regularizers. We provide a necessary and sufficient condition characterizing this class of matrix regularizers and we highlight some concrete examples of practical importance. Our analysis uses basic principles from matrix theory, especially the useful notion of matrix nondecreasing functions. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, regularization
5 0.29177266 13 jmlr-2009-Bounded Kernel-Based Online Learning
Author: Francesco Orabona, Joseph Keshet, Barbara Caputo
Abstract: A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA2 . Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. Keywords: online learning, kernel methods, support vector machines, bounded support set
6 0.2833561 54 jmlr-2009-Markov Properties for Linear Causal Models with Correlated Errors (Special Topic on Causality)
8 0.24159145 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
9 0.22219646 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks
10 0.22218063 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
11 0.2211751 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning
12 0.2169608 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications
13 0.21681486 78 jmlr-2009-Refinement of Reproducing Kernels
14 0.2155142 80 jmlr-2009-Reproducing Kernel Banach Spaces for Machine Learning
15 0.21502855 9 jmlr-2009-Analysis of Perceptron-Based Active Learning
16 0.21328399 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
17 0.20994189 22 jmlr-2009-Deterministic Error Analysis of Support Vector Regression and Related Regularized Kernel Methods
18 0.20869091 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
19 0.20839296 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
20 0.20720047 82 jmlr-2009-Robustness and Regularization of Support Vector Machines