jmlr jmlr2008 jmlr2008-6 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
Reference: text
sentIndex sentText sentNum sentScore
1 In a constraint-based method, search for separators of vertex pairs is a key issue for orientation of edges and for recovering DAG structures and causal relationships among variables. [sent-26, score-0.513]
2 In this paper, we propose a recursive algorithm in which a problem of structural learning for a large DAG is split recursively into problems of structural learning for small vertex subsets. [sent-33, score-0.71]
3 Our algorithm can be depicted as a binary tree whose top node is the full set of all vertices or variables and whose other nodes are proper subsets of the vertex set at its parent node. [sent-34, score-0.56]
4 At each step, the decomposition is achieved by learning an undirected graph known as independence graph for a variable subset. [sent-37, score-0.812]
5 A directed edge from a vertex u to a vertex v is denoted by u, v . [sent-55, score-0.705]
6 We say that u is a parent of v and v is a child of u if there is a directed edge u, v , and denote the set of all parents of a vertex v by pa(v) and the set of all children of v by ch(v). [sent-57, score-0.424]
7 A path l between two distinct vertices u and v is a sequence of distinct vertices in which the first vertex is u, the last one is v and two consecutive vertices are connected by an edge, that is, l = (c0 = u, c1 , . [sent-59, score-0.83]
8 Two disjoint sets X and Y of vertices are d-separated by a set Z if Z d-separates every path from any vertex in X to any vertex in Y ; We call Z a d-separator of X and Y . [sent-69, score-0.781]
9 ¯ ¯ ¯ Let GV = (V, EV ) denote an undirected graph where EV is a set of undirected edges. [sent-71, score-0.669]
10 An undirected edge between two vertices u and v is denoted by (u, v). [sent-72, score-0.536]
11 An undirected graph is called com¯m plete if any pair of vertices is connected by an edge. [sent-73, score-0.56]
12 Define a moral graph GV for a DAG GV to ¯m ¯ be an undirected graph GV = (V, EV ) whose vertex set is V and whose edge set is constructed by ¯ marrying parents and dropping directions, that is, EV = {(u, v) : u, v or v, u ∈ EV } ∪ {(u, v) : (u, w, v) forms a v-structure} (Lauritzen, 1996). [sent-74, score-0.976]
13 An undirected edge added for marrying parents is called a moral edge. [sent-75, score-0.453]
14 For an undirected graph, we say that vertices u and v are separated by a set of vertices Z if each path between u and v passes through Z. [sent-76, score-0.658]
15 We say that two disjoint vertex sets X and Y are separated by Z if Z separates every pair of vertices u and v for any u ∈ X and v ∈ Y . [sent-77, score-0.478]
16 ¯ For a set K ⊆ V , we say that an undirected graph GK is an undirected independence graph ¯ K implies that Z d-separates X and Y in for a DAG GV if that a set Z separates X and Y in G GV . [sent-80, score-1.002]
17 An undirected independence graph is minimal if the proper subgraph obtained by deleting ¯m any edge is no longer an undirected independence graph. [sent-81, score-1.126]
18 The moral graph GV is the minimal undirected independence graph for GV with K = V (Lauritzen, 1996). [sent-82, score-0.778]
19 It can also be obtained by connecting each vertex u with all vertices in its Markov blanket Mb(u), which is the minimal set by which u are d-separated from the remaining set in V (that is, V \ [Mb(u) ∪ {u}]). [sent-83, score-0.552]
20 For a subset K ⊆ V , the Markov blanket for a vertex u ∈ K can be defined similarly, that is, it is the minimum set that is contained in K and d-separates u from the remaining set in K. [sent-84, score-0.424]
21 Define the local skeleton for a variable ¯ set K ⊆ V with respect to GV as an undirected graph LK (K, E) where K is the vertex set and E = {(u, v) : no subset S of K d-separates u and v in GV } is the edge set. [sent-86, score-1.025]
22 Note that though both minimal undirected independence graphs and local skeletons are undirected graphs and defined on the same vertex subset, they may be different. [sent-87, score-1.25]
23 Thus the edge set of the minimal undirected independence graph contains the edge set of the local skeleton. [sent-89, score-0.813]
24 The global skeleton is an undirected graph obtained by dropping the directions of the edges in a DAG, which coincides the local skeleton for K = V . [sent-90, score-0.981]
25 h d , (c) One decomposition based on ¯m GV (d) A local skeleton Figure 1: A directed graph, a moral graph, a decomposition and a local skeleton. [sent-93, score-0.656]
26 A path l = (c, a, d) is d-separated by vertex a, while the path l = (c, f , h, g) is d-separated by an empty set. [sent-101, score-0.389]
27 The moral graph GV is given in Figure 1 (b), where edges (b, c), (b, g), (c, g), ¯m (c, d) and ( f , g) are moral edges. [sent-104, score-0.411]
28 Note that the set {c, d} separates {a} and {b, e, f , g, h} in GV , ¯m thus ({a}, {b, e, f , g, h}, {c, d}) forms a decomposition of the undirected graph GV , the decomposed undirected independence subgraphs for {a, c, d} and {b, c, d, e, f , g, h} are shown in Figure 1 (c). [sent-105, score-1.15]
29 ¯ The graph in Figure 1 (d) is the local skeleton LK (K, E) for K = {a, c, d} because we have c and d are d-separated by {a} in GV . [sent-106, score-0.373]
30 Note that the minimal undirected independence graph for {a, c, d} in Figure 1(c) coincides with its local skeleton in Figure 1 (d), which does not hold in general. [sent-107, score-0.827]
31 For example, the local skeleton for K = {c, e, g} does not have the edge (c, g), while the corresponding minimal undirected independence graph is complete. [sent-108, score-0.924]
32 Let X Y denote the independence of X and Y , and X Y |Z the conditional independence of X and Y given Z. [sent-114, score-0.398]
33 We also discuss how to learn from data the undirected independence graphs which are used to achieve the recursive decomposition at each recursive step. [sent-120, score-1.03]
34 According to Theorem 1, we can see that all edges falling in A or crossing A and C in the local ¯ skeleton L(K, E) with K = A ∪ C ∪ B can be validly recovered from the marginal distribution of variables in A ∪ C. [sent-127, score-0.378]
35 These two theorems can guarantee that, for any partition (A, B,C) of a vertex set K ⊆ V that satisfies A B|C, two nonadjacent vertices u and v in K are d-separated by a subset S of K in GV if and only if they are d-separated by a subset S of either A ∪C or B ∪C in GV . [sent-136, score-0.475]
36 Then the local skeleton LK = (K, EK ) can ¯ A∪C = (A ∪ C, EA∪C ) and LB∪C = (B ∪ C, EB∪C ) as ¯ be constructed by combining local skeletons L follows: (1) the vertex set K = A ∪C ∪ B and (2) the edge set EK = (EA∪C ∪ EB∪C ) \ {(u, v) : u, v ∈ C and (u, v) ∈ EA∪C ∩ EB∪C }. [sent-140, score-0.771]
37 At the top-down step, a variable set is decomposed into two subsets whenever a conditional independence A B|C is found, and this decomposition is repeated until no new decomposition can be found. [sent-145, score-0.568]
38 The decomposition at each step is done by learning an undirected independence graph over the vertex subset at the tree node, which will be discussed in Subsection 3. [sent-146, score-1.014]
39 Main Algorithm (The recursive decomposition for structural learning of DAGs) 1. [sent-150, score-0.399]
40 For each d-separator Suv in the separator list S , orient the local skeleton u − w − v as a vstructure u → w ← v if u − w − v (Note no edge between u and v) appears in the global skeleton and w is not contained in the separator Suv . [sent-155, score-0.916]
41 Construct an undirected independence graph GK ; ¯ 2. [sent-162, score-0.575]
42 For any vertex pair (u, v) in the set K, if there exists a subset S uv of K \ {u, v} such that u v|Suv , then delete the edge (u, v) and save (u, v, Suv ) to the d-separator list S . [sent-164, score-0.378]
43 Combine LU = (U, EU ) and LV = (V, EV ) into an undirected graph LU∪V = (U ∪ V, EU∪V ) where EU∪V = (EU ∪ EV ) \ {(u, v) : u, v ∈ U ∩V and (u, v) ∈ EU ∩ EV }; ¯ 2. [sent-168, score-0.395]
44 Since a decomposition (A, B,C) of the undirected independence graph GK implies A B|C, it is obvious by Theorems 1 and 2 that our algorithm is correct. [sent-171, score-0.72]
45 , 1987); therefore, we may use some sub-optimal method to consruct a junction tree for an undirected graph (Jensen and Jensen, 1994; Becker and Geiger, 2001). [sent-177, score-0.476]
46 2 Tests of Conditional Independence Conditional independence test of two variables u and v given a set C of variables is required at Step 1 and the ‘Else’ part of Step 2 of Procedure DecompRecovery to construct an undirected independence graph and a local skeleton respectively. [sent-187, score-1.04]
47 With the recursive decomposition, independence tests are localized to smaller and smaller subsets of variables, and thus the recursive algorithm has higher power for statistical tests. [sent-200, score-0.748]
48 3 Constructing Undirected Independence Graphs In this subsection, we discuss how to construct undirected independence graphs at Step 1 of Procedure DecompRecovery. [sent-202, score-0.537]
49 At first we call DecompRecovery with the full set V as the input argument, ¯ and construct an undirected independence graph GV at Step 1. [sent-203, score-0.608]
50 ¯ To construct an undirected independence graph GV , we start with a complete undirected graph, and then we check an edge between each pair of vertices u and v. [sent-205, score-1.144]
51 For linear Gaussian models, the undirected graph can be constructed by removing an edge (u, v) if and only if the corresponding entry in the inverse covariance matrix is zero (Dempster, 1972; Whittaker, 1990). [sent-207, score-0.492]
52 After decomposing ¯ a graph GA∪B∪C into two subsets A ∪C and B ∪C, we need to construct a local undirected indepen¯ ¯ dence graph GK (say GA∪C ) at Step 1 of Procedure DecompRecovery. [sent-208, score-0.636]
53 The former is used to determine an edge in a DAG, and the latter is used to determine an edge in an undirected independence graph. [sent-214, score-0.648]
54 According to this ¯ theorem, there exists an edge (u, v) in the minimal undirected independence graph GA∪C for u in A and v in A ∪C if and only if there exists an edge (u, v) in the minimal undirected independence graph ¯ ¯ GA∪B∪C . [sent-215, score-1.344]
55 Thus given an undirected independence graph GA∪B∪C obtained in the preceding step, an ¯ A∪C has the same set of edges as GA∪B∪C each of which has at least ¯ undirected independence graph G ¯ one vertex in A, but all of possible edges within the separator C need to be checked for GA∪C . [sent-216, score-1.805]
56 Another way of learning undirected independence graphs is to apply current available Markov blanket learning algorithms. [sent-220, score-0.571]
57 By connecting each vertex with those in its Markov blanket, an independence graph is then obtained. [sent-221, score-0.621]
58 Another particular method for learning the undirected independence graph may use Lasso-type ¨ estimators (Tibshirani, 1996; Meinshausen and Buhlmann, 2006; Zhao and Yu, 2006; Wainwright et al. [sent-227, score-0.575]
59 Note that it is not necessary to learn neighborhoods exactly in our algorithm, and there may be extra edges in our undirected independence graph. [sent-232, score-0.61]
60 We 467 X IE AND G ENG compare the recursive algorithm with the decomposition algorithm proposed in Xie et al. [sent-241, score-0.379]
61 (2006), in which an entire undirected independence graph is first constructed and then it is decomposed into many small subgraphs at one step instead of recursive steps. [sent-242, score-0.933]
62 We show that, in our algorithm, search for separators is localized to smaller vertex subsets than those obtained by using the decomposition algorithm. [sent-243, score-0.543]
63 At the top of the binary tree, the first decomposition is done by splitting the full vertex set V in G 1 (that is, the moral graph) into two subsets {a, c, d} and {b, c, . [sent-248, score-0.522]
64 Next we learn the undirected independence graphs G2 and G3 for the two subsets separately. [sent-252, score-0.547]
65 To construct the subgraphs G2 and G3 , by Theorem 5, we only need to check the edge (c, d) in the separator {c, d}, and other edges in G2 and G3 can be obtained directly from G1 . [sent-253, score-0.456]
66 For each vertex pair (u, v) in K, we search a separator set Suv in all possible subsets of K \ {u, v} to construct the local skeleton. [sent-264, score-0.523]
67 For example, the vertices c and d are adjacent in the local skeleton K1 since no vertex set in K1 d-separates them, whereas b and g are non-adjacent in the local skeleton K2 since an empty set d-separates them in GV . [sent-266, score-0.95]
68 We can see that the undirected independence graphs and the local skeletons are different as shown in Figure 2 and Figure 4 respectively and that the former has more edges than the latter. [sent-273, score-0.771]
69 For example, there is a d-separator {a} in S which d-separates c and d, and there is a structure ¯ c − f − d in the global skeleton LV where f is not contained in the separator {a}. [sent-276, score-0.406]
70 In this equivalence class, the undirected edge (a, c) cannot be oriented uniquely because any of its orientation leads to a Markov equivalent DAG. [sent-282, score-0.443]
71 Below we compare the recursive algorithm with the decomposition algorithm proposed in Xie et al. [sent-283, score-0.379]
72 When there are a lot of v-structures in a DAG, many moral edges can be deleted in construction of a subgraph, and thus the recursive algorithm is more efficient than the decomposition algorithm. [sent-286, score-0.558]
73 (2006), a ‘d-separation tree’ is built from an undirected independence graph (that is, the moral graph in this example), and the full variable set is decomposed into three subsets of variables at one time, see Figure 8 (a). [sent-291, score-0.896]
74 By using the recursive algorithm proposed in this paper, we can decompose the graph into four subgraphs in Figure 8 (b), which have smaller subsets of variables. [sent-292, score-0.476]
75 This is because the undirected independence graph over {a, b, c} in Figure 8 (b) is re-constructed and the edge (b, c) is deleted for b c|a. [sent-293, score-0.672]
76 We mainly focus on the number of conditional independence tests for constructing the equivalence class since decomposition of graphs is a computationally simple task compared to the conditional independence tests. [sent-671, score-0.689]
77 In the recursive algorithm DecompRecovery, two steps (Step 1 for constructing an undirected ¯ ¯ independence graph GK and the ’Else’ part of Step 2 for constructing a local skeleton LK ) involve conditional independence tests, where K is the vertex set of the subgraph. [sent-672, score-1.56]
78 At Step 1, an undirected independence graph can be constructed by testing independence between any pair of variables conditionally on other variables, and thus the complexity is O(|K| 2 ), where |K| denotes the number of ¯ vertices in the set K. [sent-673, score-0.92]
79 3, an undirected independence graph GA∪C can be ¯ constructed from the previous graph GA∪B∪C by checking only all possible edges within the separator C. [sent-675, score-0.944]
80 Thus the complexity for constructing an undirected independence graph can be reduced. [sent-676, score-0.575]
81 At Step 2, we construct a local skeleton over a vertex subset K. [sent-677, score-0.566]
82 Below we consider the total expenses and suppose that the full vertex set V is recursively decomposed into H subsets {K1 , . [sent-680, score-0.481]
83 For each decomposition, we need to construct an undirected independence graph, and thus the total expenses for all decompositions is less than O(Hn2 ). [sent-684, score-0.53]
84 4) 44 Table 4: Results relative to the recursive algorithm for other networks: extra edges, missing edges, and SHD the complexity of our algorithm becomes the same as the IC algorithm, which reflects the fact that structural learning of DAGs is an NP-hard problem (Chickering et al. [sent-828, score-0.405]
85 (2005) requires that each separator has a complete undirected graph. [sent-845, score-0.396]
86 (2006) removed the condition, but their algorithm performs decomposition only ¯ based on the entire undirected independence graph GV of the full vertex set V and cannot perform decomposition of undirected independence subgraphs. [sent-847, score-1.571]
87 Theorems 1, 2 and 3 in this paper relax this requirement, and they do not require the union set K = A ∪ B ∪ C of a decomposition (A, B,C) to be equal to the full vertex set V . [sent-848, score-0.397]
88 Thus the recursive algorithm can delete more edges in undirected independence subgraphs and further decompose them, see Example 2. [sent-849, score-0.892]
89 Since An({u, v} ∪ S ) = An({u, v}), we have from Lemma 1 that there is a path l connecting u and v in [GAn({u,v}) ]m which is not separated by S in the moral graph, that is, the path l does not contain any vertex in S . [sent-876, score-0.51]
90 Thus we obtain that S and its subset do not contain the middle vertex w or its descendants, which implies that l is d-separated by any subset of S at the collider s → w ← t. [sent-905, score-0.399]
91 Because W (⊇ S ) d-separates a and d and thus d-separates l but S does not, there must exist at least one vertex on the path l which is contained in W \ S (⊆ B). [sent-917, score-0.411]
92 Since every path li between c and c is d-separated by S which equals S1 ∪ S2 , we have that for path li , there is at least one vertex contained in S \ Si . [sent-954, score-0.521]
93 Below we show that l is not d-separated by C, that is, the middle vertex of each collider or its descendant is in C but any of other vertices on l is not in C. [sent-961, score-0.616]
94 For any vertex u which is not the middle vertex of a collider on l 1 , since u is in an(c) ∪ an(c ) and l1 and l1 is not d-separated by S1 , we have that u ∈ S1 and thus u ∈ C. [sent-962, score-0.68]
95 Similarly, we can show that C does not contain any vertex u which is not the middle vertex of a collider on l 2 . [sent-963, score-0.68]
96 Thus we have shown that C does not contain any vertex which is not a middle vertex of colliders on l except that vertex c has not yet been considered. [sent-964, score-0.87]
97 Now we show that vertex c is a middle vertex of a collider on l . [sent-965, score-0.68]
98 Thus we have shown that C does not contain any vertex which is not a middle vertex of colliders on l . [sent-971, score-0.589]
99 For any vertex u which is a middle vertex of a collider on li , u or its descendant must be in Si , otherwise li and so li are d-separated by Si , which contradicts the supposition. [sent-972, score-0.845]
100 Thus we have shown that the middle vertex of each collider on l or its descendant is in C. [sent-975, score-0.451]
wordName wordTfidf (topN-words)
[('gv', 0.453), ('dag', 0.283), ('vertex', 0.281), ('undirected', 0.274), ('skeleton', 0.208), ('recursive', 0.205), ('independence', 0.18), ('vertices', 0.165), ('decomprecovery', 0.139), ('ecursive', 0.128), ('edges', 0.126), ('separator', 0.122), ('graph', 0.121), ('decomposition', 0.116), ('shd', 0.109), ('skeletons', 0.097), ('edge', 0.097), ('collider', 0.091), ('ethod', 0.089), ('mmhc', 0.089), ('lv', 0.085), ('eng', 0.083), ('ev', 0.083), ('moral', 0.082), ('tpda', 0.082), ('xie', 0.082), ('rec', 0.081), ('ie', 0.078), ('structural', 0.078), ('subgraphs', 0.078), ('contained', 0.076), ('decomposed', 0.075), ('pc', 0.074), ('lk', 0.073), ('ave', 0.072), ('sc', 0.07), ('dags', 0.067), ('blanket', 0.067), ('ga', 0.064), ('separators', 0.062), ('spirtes', 0.055), ('tsamardinos', 0.055), ('independencies', 0.055), ('suv', 0.054), ('path', 0.054), ('descendant', 0.052), ('graphs', 0.05), ('pai', 0.048), ('directed', 0.046), ('tests', 0.045), ('markov', 0.045), ('gk', 0.045), ('ic', 0.044), ('local', 0.044), ('causal', 0.044), ('subsets', 0.043), ('combinesubgraphs', 0.043), ('expenses', 0.043), ('earning', 0.043), ('tree', 0.042), ('equivalence', 0.042), ('geng', 0.041), ('kh', 0.041), ('localized', 0.041), ('pdag', 0.041), ('connecting', 0.039), ('bayesian', 0.039), ('junction', 0.039), ('orient', 0.039), ('recursively', 0.039), ('conditional', 0.038), ('lauritzen', 0.037), ('simulation', 0.037), ('alarm', 0.037), ('pearl', 0.035), ('missing', 0.034), ('construct', 0.033), ('graphical', 0.033), ('aliferis', 0.032), ('separates', 0.032), ('kullback', 0.032), ('ancestor', 0.031), ('sec', 0.031), ('acyclic', 0.03), ('oriented', 0.03), ('gan', 0.03), ('extra', 0.03), ('theorems', 0.029), ('contradicts', 0.029), ('eu', 0.029), ('algorithm', 0.029), ('li', 0.028), ('necessity', 0.028), ('lu', 0.028), ('mb', 0.028), ('verma', 0.028), ('middle', 0.027), ('alg', 0.027), ('castelo', 0.027), ('explorer', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
2 0.43097061 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
Author: Zongming Ma, Xianchao Xie, Zhi Geng
Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
4 0.20212118 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
5 0.17975952 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
Author: Eric Perrier, Seiya Imoto, Satoru Miyano
Abstract: Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the search-and-score phase. Subsequently, we theorize the structural constraint by introducing the concept of super-structure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a super-structure constrained optimal search (COS): its time complexity is upper bounded by O(γm n ), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree m and sparse structures ˜ allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound super-structure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the trade-off between speed and accuracy. For incomplete super-structures, a greedily post-processed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, super-structure, connected
6 0.17831527 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
7 0.16108961 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
8 0.079722166 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
9 0.079656057 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
10 0.077085726 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
12 0.056628909 9 jmlr-2008-Active Learning by Spherical Subdivision
13 0.047995653 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
14 0.0435686 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
15 0.038458932 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds
16 0.033928715 58 jmlr-2008-Max-margin Classification of Data with Absent Features
17 0.033818584 80 jmlr-2008-Ranking Individuals by Group Comparisons
18 0.03338467 29 jmlr-2008-Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods
19 0.031356297 56 jmlr-2008-Magic Moments for Structured Output Prediction
20 0.030874414 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
topicId topicWeight
[(0, 0.284), (1, 0.574), (2, 0.036), (3, 0.04), (4, 0.092), (5, 0.004), (6, 0.005), (7, 0.007), (8, 0.054), (9, 0.054), (10, 0.049), (11, -0.049), (12, -0.01), (13, 0.247), (14, -0.09), (15, 0.007), (16, -0.001), (17, 0.041), (18, 0.123), (19, 0.023), (20, 0.033), (21, 0.038), (22, -0.049), (23, -0.051), (24, 0.046), (25, 0.022), (26, -0.023), (27, 0.027), (28, 0.017), (29, 0.059), (30, -0.061), (31, 0.012), (32, -0.02), (33, 0.047), (34, -0.039), (35, -0.023), (36, -0.002), (37, 0.02), (38, -0.007), (39, 0.08), (40, -0.121), (41, -0.02), (42, -0.056), (43, 0.024), (44, 0.008), (45, -0.086), (46, 0.062), (47, 0.041), (48, -0.005), (49, -0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.97445267 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
2 0.92374349 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
Author: Zongming Ma, Xianchao Xie, Zhi Geng
Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
4 0.78338152 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
Author: Mathias Drton, Thomas S. Richardson
Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution
5 0.58344173 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
Author: Eric Perrier, Seiya Imoto, Satoru Miyano
Abstract: Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the search-and-score phase. Subsequently, we theorize the structural constraint by introducing the concept of super-structure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a super-structure constrained optimal search (COS): its time complexity is upper bounded by O(γm n ), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree m and sparse structures ˜ allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound super-structure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the trade-off between speed and accuracy. For incomplete super-structures, a greedily post-processed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, super-structure, connected
6 0.52774811 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
7 0.52056313 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
8 0.39208606 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
9 0.25616539 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
11 0.20530935 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
12 0.19303156 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes
13 0.18569779 9 jmlr-2008-Active Learning by Spherical Subdivision
14 0.17753485 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
15 0.16205958 56 jmlr-2008-Magic Moments for Structured Output Prediction
16 0.16091684 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
17 0.14482056 42 jmlr-2008-HPB: A Model for Handling BN Nodes with High Cardinality Parents
18 0.14419541 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields
19 0.13625528 80 jmlr-2008-Ranking Individuals by Group Comparisons
20 0.13140066 8 jmlr-2008-Accelerated Neural Evolution through Cooperatively Coevolved Synapses
topicId topicWeight
[(0, 0.036), (5, 0.012), (28, 0.045), (30, 0.042), (31, 0.013), (40, 0.024), (54, 0.025), (58, 0.032), (61, 0.403), (66, 0.05), (76, 0.062), (87, 0.024), (88, 0.056), (92, 0.037), (94, 0.044), (99, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.85219765 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs
Author: Xianchao Xie, Zhi Geng
Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning
2 0.73637706 26 jmlr-2008-Consistency of Trace Norm Minimization
Author: Francis R. Bach
Abstract: Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled. Keywords: convex optimization, singular value decomposition, trace norm, consistency
3 0.52670628 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition
Author: Zongming Ma, Xianchao Xie, Zhi Geng
Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning
4 0.44518173 20 jmlr-2008-Causal Reasoning with Ancestral Graphs (Special Topic on Causality)
Author: Jiji Zhang
Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention
Author: Yang-Bo He, Zhi Geng
Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning
6 0.37448332 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks
7 0.35068843 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning
8 0.34582022 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models
9 0.34421685 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning (Special Topic on Causality)
10 0.31507879 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models
11 0.31157243 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure
12 0.30486739 9 jmlr-2008-Active Learning by Spherical Subdivision
13 0.28974301 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy (Special Topic on Causality)
14 0.27483624 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix
15 0.27287832 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
16 0.26374045 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model
17 0.25785205 38 jmlr-2008-Generalization from Observed to Unobserved Features by Clustering
18 0.25430062 56 jmlr-2008-Magic Moments for Structured Output Prediction
19 0.25324231 57 jmlr-2008-Manifold Learning: The Price of Normalization
20 0.25284296 83 jmlr-2008-Robust Submodular Observation Selection