jmlr jmlr2008 jmlr2008-40 knowledge-graph by maker-knowledge-mining

40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models


Source: pdf

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). [sent-7, score-1.107]

2 Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. [sent-8, score-0.619]

3 Introduction In graphical modelling, bi-directed graphs encode marginal independences among random variables that are identified with the vertices of the graph (Pearl and Wermuth, 1994; Kauermann, 1996; Richardson, 2003). [sent-11, score-0.517]

4 D RTON AND R ICHARDSON G 1 2 3 Gmin 4 1 2 3 4 Figure 1: A bi-directed graph G with (unique) minimally oriented graph G min . [sent-19, score-0.696]

5 In this paper we employ the connection between bi-directed graphs and the more general ancestral graphs with undirected, directed, and bi-directed edges (Section 2). [sent-25, score-0.514]

6 We show how to construct a maximal ancestral graph G min , which we call a minimally oriented graph, that is Markov equivalent to a given bi-directed graph G and such that the number of arrowheads is minimal (Sections 3–4). [sent-27, score-1.074]

7 Two ancestral graphs are Markov equivalent if the independence models associated with the two graphs coincide; see for example Roverato (2005) for some recent results on Markov equivalence of different types of graphs. [sent-28, score-0.519]

8 For example, the graph G in Figure 1 is not Markov equivalent to an undirected graph because Gmin is not an undirected graph, and G is not Markov equivalent to a DAG because Gmin contains a bi-directed edge. [sent-31, score-0.548]

9 The graph G in Figure 1 has a unique minimally oriented graph but in general, minimally oriented graphs are not unique. [sent-32, score-1.092]

10 Varying the order one may obtain all minimally oriented graphs (Theorem 17). [sent-34, score-0.422]

11 For covariance graph models, minimally oriented graphs allow one to determine when the maximum likelihood estimate of a variance or covariance is available explicitly as its empirical counterpart (Section 5). [sent-35, score-0.751]

12 For example, since no arrowheads appear at the vertices 1 and 4 in the graph Gmin in Figure 1, the maximum likelihood estimates of σ11 and σ44 must be equal to the empirical variances of X1 and X4 , respectively. [sent-36, score-0.433]

13 However, when a minimally oriented graph reveals that a parameter estimate is equal to an empirical quantity (such as σ 11 and σ44 in the above example) then even if the likelihood function is multi-modal this parameter will take the same value at every mode. [sent-38, score-0.537]

14 Perhaps most importantly, minimally oriented graphs allow for computationally more efficient maximum likelihood fitting; see Remark 24 and the example in Section 5. [sent-39, score-0.472]

15 An induced subgraph of G over a vertex set A is the mixed graph GA = (A, EA ) where EA is the restriction of the edge map E on A × A. [sent-55, score-0.436]

16 These three types of graph are special cases of ancestral graphs (Richardson and Spirtes, 2002). [sent-72, score-0.544]

17 Definition 1 A simple mixed graph G is an ancestral graph if it holds that (i) G does not contain any directed cycles; (ii) if v − w ∈ G, then there does not exist u such that u → v ∈ G or u ↔ v ∈ G; (iii) if v ↔ w ∈ G, then v is not an ancestor of w. [sent-73, score-0.716]

18 A non-endpoint vertex vi on a path is a collider on the path if the edges preceding and succeeding v i on the path both have an arrowhead at vi , that is, vi−1 → vi ← vi+1 , vi−1 → vi ↔ vi+1 , vi−1 ↔ vi ← vi+1 or vi−1 ↔ vi ↔ vi+1 is part of the path. [sent-78, score-1.162]

19 Definition 2 A path π between vertices v and w in a simple mixed graph G is m-connecting given a possibly empty set C ⊆ V \ {v, w} if (i) every non-collider on π is not in C, and (ii) every collider on π is in An(C). [sent-82, score-0.419]

20 Let G = (V, E) be an ancestral graph whose vertices index a random vector (Xv | v ∈ V ). [sent-85, score-0.537]

21 Two ancestral graphs G1 and G2 are Markov equivalent if they have the same vertex set and the global Markov property states the same independences for G1 as for G2 . [sent-93, score-0.573]

22 If an ancestral graph G is not maximal, then there ¯ exists a unique Markov equivalent maximal ancestral graph G that contains all the edges present in ¯ G. [sent-96, score-0.933]

23 ¯ (ii) If G is an ancestral graph that is Markov equivalent to a maximal ancestral graph G and has ¯ the same skeleton as G, then G is also a maximal ancestral graph. [sent-101, score-1.202]

24 (iii) Bi-directed, undirected and directed acyclic graphs are maximal ancestral graphs. [sent-102, score-0.556]

25 Definition 4 A simple mixed graph G has the boundary containment property if for all distinct vertices v, w ∈ V the presence of an edge v − w implies that Bd(v) = Bd(w) and the presence of an edge v → w in G implies that Bd(v) ⊆ Bd(w). [sent-106, score-0.74]

26 ¯ Theorem 5 If G is an ancestral graph that has the same skeleton as a bi-directed graph G, then G ¯ ¯ and G are Markov equivalent iff G has the boundary containment property. [sent-109, score-0.919]

27 Simplicial Graphs In this section we show how simplicial vertex sets of a bi-directed graph can be used to construct a Markov equivalent maximal ancestral graph by removing arrowheads from certain bi-directed edges. [sent-171, score-1.063]

28 The simplicial graph G s is the simple mixed graph obtained by dropping all the arrowheads at simplicial vertices of G. [sent-183, score-1.046]

29 Parts (i) and (ii) of the next lemma show that simplicial graphs have the boundary containment property. [sent-185, score-0.604]

30 Lemma 9 Let v and w be adjacent vertices in a simplicial graph G s . [sent-186, score-0.563]

31 Theorem 10 The simplicial graph Gs of a bi-directed graph G is a maximal ancestral graph that is Markov equivalent to G. [sent-190, score-1.059]

32 Lemma 11 If G is an ancestral graph that has the boundary containment property, then dropping all arrowheads at simplicial vertices of G yields an ancestral graph. [sent-193, score-1.317]

33 898 G RAPHICAL G AUSSIAN C OVARIANCE M ODELS G1 Gs 1 v y x G2 v y x w Gmin 1 v x Gs 2 v w y x w z y x w y v Gmin 2 v y w z x w z Figure 3: Bi-directed graphs with simplicial and minimally oriented graphs. [sent-194, score-0.643]

34 ¯ Proof Let G be the graph obtained by dropping the arrowheads at simplicial vertices. [sent-195, score-0.493]

35 Since ¯ ¯ v→w∈G ¯ there are no arrowheads at simplicial vertices in G, no vertex on π including the endpoints v and w can be simplicial. [sent-197, score-0.519]

36 Proposition 12 A bi-directed graph G is Markov equivalent to an undirected graph iff the simplicial graph Gs induced by G is an undirected graph iff G is a disjoint union of complete (bi-directed) graphs. [sent-208, score-1.243]

37 By Theorem 5, U has the boundary containment property, which implies that every vertex is simplicial and thus that Gs is an undirected graph (and equal to U). [sent-212, score-0.819]

38 The simplicial graph Gs is an undirected graph iff the vertex set of the inducing bi-directed graph G can be partitioned into pairwise disjoint sets A1 , . [sent-213, score-1.001]

39 899 D RTON AND R ICHARDSON Under multivariate normality, a bi-directed graph that is Markov equivalent to an undirected graph represents a hypothesis that is linear in the covariance matrix as well as in its inverse. [sent-220, score-0.519]

40 For example, the graph u ↔ v ↔ w has the simplicial graph u → v ← w. [sent-224, score-0.587]

41 However, there exist bi-directed graphs that are Markov equivalent to a DAG and yet the simplicial graph contains bi-directed edges. [sent-225, score-0.535]

42 Hence, some arrowheads may be 1 dropped from bi-directed edges in a simplicial graph while preserving Markov equivalence. [sent-227, score-0.528]

43 In this section we construct maximal ancestral graphs from which no arrowheads may be dropped without destroying Markov equivalence. [sent-228, score-0.483]

44 A minimally oriented graph of G is a graph G min that satisfies the following three properties: (i) Gmin is a maximal ancestral graph; (ii) G and Gmin are Markov equivalent; (iii) Gmin has the minimum number of arrowheads of all maximal ancestral graphs that are Markov equivalent to G. [sent-232, score-1.468]

45 Here the number of arrowheads of an ancestral graph G with d directed and b bi-directed edges is defined as arr(G) = d + 2b. [sent-233, score-0.619]

46 By Lemma 3, a minimally oriented graph Gmin has the same skeleton as the underlying bidirected graph G. [sent-234, score-0.698]

47 Examples of minimally oriented graphs are shown in Figure 3. [sent-236, score-0.422]

48 Given the small number of vertices of these graphs the claim that these graphs are indeed minimally oriented graphs can be verified directly. [sent-237, score-0.769]

49 The example of graph G1 in Figure 3 also illustrates that minimally oriented graphs are not unique. [sent-238, score-0.605]

50 By symmetry, reversing the direction of the edge v → w in the depicted G min yields a second minimally 1 oriented graph for G1 . [sent-239, score-0.579]

51 Create a new graph G < as follows: (a) find the simplicial graph Gs of G; 900 G RAPHICAL G AUSSIAN C OVARIANCE M ODELS (b) set Gmin = Gs ; < (c) replace every bi-directed edge v ↔ w ∈ Gmin with Bd(v) ⊆ Bd(w) and v < w by the directed < edge v → w. [sent-245, score-0.788]

52 Clearly, by Theorem 5, in order for Gmin to be a minimally oriented graph it is < necessary that it satisfies the boundary containment property. [sent-247, score-0.714]

53 ) If v < w, then according to the definition of the simplicial graph and step (c) of Algorithm 14 the edge between v and w in Gmin cannot have an arrowhead at v and thus cannot be bi-directed. [sent-258, score-0.548]

54 Theorem 16 The graph Gmin constructed in Algorithm 14 is a minimally oriented graph for the < bi-directed graph G. [sent-267, score-0.853]

55 (i) Gmin is a maximal ancestral graph: < By Lemma 3 it suffices to show that Gmin is an ancestral graph. [sent-270, score-0.519]

56 (iii) Gmin has the minimal number of arrowheads: < ¯ Let G be a maximal ancestral graph that is Markov equivalent to the (bi-directed) graph G, which ¯ ¯ requires that G and G, and thus also Gmin have the same skeleton. [sent-277, score-0.655]

57 The global < ¯ ¯ Markov property of G states that x⊥ Since G is an ancestral graph and v − w ∈ G, however, there ⊥y. [sent-282, score-0.466]

58 which yields that the global Markov property of G ⊥w; The next result shows that our construction of minimally oriented graphs is complete in the sense that every minimally oriented graph can be obtained as the output of Algorithm 14 by appropriate choice of a total order on the vertex set. [sent-289, score-1.047]

59 Theorem 17 If Gmin is a minimally oriented graph for a bi-directed graph G, then there exists a total order ≤ on the vertex set such that Gmin = Gmin . [sent-290, score-0.768]

60 < Proof The graph Gmin is an ancestral graph and thus contains no directed cycles. [sent-291, score-0.678]

61 < First note that if v is a simplicial vertex of G, then there are no arrowheads at v in G min . [sent-305, score-0.434]

62 Otherwise, we could drop all arrowheads at simplicial vertices in G min to obtain an ancestral graph (Lemma 11) with fewer arrowheads. [sent-306, score-0.873]

63 The new graph would have the boundary containment property and thus be Markov equivalent to G (by Theorem 5). [sent-307, score-0.448]

64 The observation about simplicial vertices implies that an undirected edge in the simplicial graph Gs is also an undirected edge in Gmin . [sent-309, score-1.036]

65 Since Gmin has the boundary containment property, it follows from Proposition 7 that both v and w are simplicial vertices. [sent-311, score-0.448]

66 2 Markov Equivalence Results The following corollary is an immediate consequence of Proposition 12 because a minimally oriented graph Gmin is an undirected graph iff Gs is an undirected graph. [sent-322, score-0.868]

67 Corollary 18 Let Gmin be a minimally oriented graph for a bi-directed graph G. [sent-323, score-0.67]

68 If G is Markov equivalent to an undirected graph U, then Gmin = U is the unique minimally oriented graph of G. [sent-324, score-0.761]

69 A minimally oriented graph also reveals whether the original bi-directed graph is Markov equivalent to a DAG. [sent-325, score-0.683]

70 Theorem 19 Let Gmin be a minimally oriented graph for a bi-directed graph G. [sent-326, score-0.67]

71 Since the sets Ai are inclusion-maximal simplicial sets, no vertex in Ai , i = j, is adjacent to any vertex in A j . [sent-345, score-0.465]

72 The sink orientation of the graph G1 in Figure 3 has the directed edges of Gs but an undirected 1 edge v − w. [sent-359, score-0.446]

73 Proposition 20 Let Gmin be a minimally oriented graph for a connected bi-directed graph G. [sent-366, score-0.67]

74 If Gmin contains no bi-directed edges, then the set A of all simplicial vertices is non-empty, the induced subgraph (Gmin )A is a disjoint union of complete undirected graphs, the induced subgraph (Gmin )V \A is a complete DAG, and an edge v → w joins any two vertices v ∈ A and w ∈ A in G min . [sent-367, score-0.728]

75 If |V \ A| = 0, then the connected graph Gmin is a complete undirected graph and there is nothing to show. [sent-384, score-0.444]

76 Since subsequent theorems on the structure of the likelihood equations are obtained via Gaussian ancestral graph models, we briefly review the parametrization of these models. [sent-421, score-0.511]

77 Let G be an ancestral graph and unG ⊆ V the set of vertices v that are such that any edge with endpoint v has a tail at v. [sent-422, score-0.603]

78 Let N(G) be the Gaussian ancestral graph model associated with G, that is, the family of all centered normal distributions that are globally Markov with respect to G. [sent-428, score-0.426]

79 Since a bi-directed graph G and a minimally oriented graph G min are Markov equivalent and maximal, the parametrization map for Gmin , (Λ, B, Ω) → Σ(Λ, B, Ω), has image equal to P(G). [sent-432, score-0.726]

80 equations (1), then Σ(S) Proof By Theorem 10, the covariance graph model N(G) and the Gaussian ancestral graph model N(Gs ) based on the simplicial graph Gs are equal. [sent-442, score-1.079]

81 5) and Lemma 21 imply that every solution to the likelihood equations for Λ, B, Ω in the Gaussian ancestral graph model ˆ N(Gs ) satisfies that (Λ−1 )Ai ×Ai = SAi ×Ai for all i = 1, . [sent-457, score-0.494]

82 The conditional parameters we consider are the regression coefficients and conditional variance for the conditional distribution of variable v given its parents pa(v) = {w ∈ V | w → v ∈ Gmin } in a minimally oriented graph Gmin . [sent-463, score-0.487]

83 906 G RAPHICAL G AUSSIAN C OVARIANCE M ODELS Theorem 23 Let Gmin be a minimally oriented graph for a bi-directed graph G, S a symmetric ˆ positive definite matrix, and Σ(S) ∈ P(G) a solution to the likelihood equations (1). [sent-465, score-0.738]

84 On the other hand, if one runs the ancestral graph extension of iterative conditional fitting described in Drton and Richardson (2004b) on a minimally oriented graph, then unnecessary computations are avoided by implicitly exploiting Theorems 22 and 23. [sent-477, score-0.744]

85 If a bi-directed graph G has a minimally oriented graph G min without bi-directed edges then G is Markov equivalent to a DAG (Theorem 19) and the likelihood equations have a unique solution that is a rational function of the empirical covariance matrix S. [sent-480, score-0.874]

86 For these data, the covariance graph model induced by the graph G in Figure 5(i) has a deviance of 8. [sent-559, score-0.438]

87 Figure 5(ii) shows the unique minimally oriented graph G min . [sent-562, score-0.513]

88 The use of a minimally oriented graph Gmin leads to a considerable gain in computational effiˆ ciency in the iterative calculation of the maximum likelihood estimate Σ. [sent-566, score-0.551]

89 However, this is still four times as many iterations as required in iterative conditional fitting based on the minimally oriented graph G min ; recall that in addition each iteration is also simpler. [sent-582, score-0.527]

90 Conclusion We showed how to remove a maximal number of arrowheads from the edges of a bi-directed graph G such that one obtains a maximal ancestral graph Gmin that is Markov equivalent to G. [sent-586, score-0.812]

91 The graph Gmin , called a minimally oriented graph, reveals whether G is Markov equivalent to an undirected graph, and also whether G is Markov equivalent to a DAG. [sent-587, score-0.591]

92 For the (Gaussian) covariance graph model associated with G, a minimally oriented graph G min yields an alternative parametrization that provides insight into likelihood inference. [sent-588, score-0.811]

93 We thus believe that analogs to the Gaussian results established here will hold in discrete models, but a general parametrization of discrete ancestral graph models is required to fully access the potential of the results obtained in this paper. [sent-596, score-0.455]

94 G Lemma 25 If a simple mixed graph G satisfies the boundary containment property, v i−1 , vi and vi+1 are three consecutive vertices on a path π in G, and vi is a non-collider on π, then vi−1 and vi+1 are adjacent. [sent-608, score-0.865]

95 Proof If vi is a non-collider, then the edge between vi and vi−1 or the edge between vi and vi+1 must have a tail at vi . [sent-609, score-0.636]

96 If vi is a non-endpoint vertex on π and there is an arrowhead at v i on the edge between vi−1 and vi , then either (i) vi ∈ An(C) or (ii) the path (vi , vi+1 , . [sent-617, score-0.674]

97 Lemma 27 If G is an ancestral graph that satisfies the boundary containment property and π = (v, v1 , . [sent-633, score-0.678]

98 Corollary 28 If G is an ancestral graph that satisfies the boundary containment property and π = (v = v0 , v1 , . [sent-663, score-0.678]

99 2 Lemma 29 If G is an ancestral graph that satisfies the boundary containment property, and π = (v = v0 , v1 , . [sent-675, score-0.653]

100 Since the edge between vi−1 and vi has an arrowhead at vi and vi → c ∈ G, the edge between vi−1 and c must have an arrowhead at c because otherwise the fact that G is an ancestral graph would be contradicted. [sent-691, score-1.092]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gmin', 0.599), ('bd', 0.447), ('ancestral', 0.243), ('simplicial', 0.221), ('minimally', 0.205), ('graph', 0.183), ('containment', 0.17), ('gs', 0.139), ('vi', 0.126), ('graphs', 0.118), ('vertices', 0.111), ('oriented', 0.099), ('vertex', 0.098), ('vk', 0.093), ('arrowheads', 0.089), ('vq', 0.085), ('richardson', 0.079), ('arrowhead', 0.078), ('drton', 0.078), ('undirected', 0.078), ('markov', 0.077), ('directed', 0.069), ('ichardson', 0.067), ('rton', 0.067), ('edge', 0.066), ('independences', 0.061), ('raphical', 0.061), ('pa', 0.06), ('boundary', 0.057), ('path', 0.054), ('ovariance', 0.051), ('likelihood', 0.05), ('adjacent', 0.048), ('covariance', 0.048), ('odels', 0.046), ('iff', 0.042), ('iii', 0.041), ('lemma', 0.038), ('mixed', 0.038), ('spirtes', 0.038), ('aussian', 0.037), ('vw', 0.037), ('edges', 0.035), ('collider', 0.033), ('maximal', 0.033), ('dag', 0.031), ('colliders', 0.031), ('ung', 0.031), ('ii', 0.031), ('arr', 0.03), ('kauermann', 0.03), ('vv', 0.03), ('skeleton', 0.028), ('subgraph', 0.027), ('min', 0.026), ('aq', 0.026), ('wermuth', 0.026), ('graphical', 0.025), ('property', 0.025), ('quintic', 0.024), ('induced', 0.024), ('theorem', 0.02), ('marginal', 0.019), ('cox', 0.018), ('cographs', 0.018), ('spa', 0.018), ('equations', 0.018), ('contradiction', 0.018), ('parametrization', 0.017), ('subgraphs', 0.016), ('chaudhuri', 0.015), ('sink', 0.015), ('conversely', 0.015), ('independence', 0.015), ('acyclic', 0.015), ('global', 0.015), ('matrix', 0.014), ('dags', 0.014), ('bv', 0.014), ('paths', 0.014), ('iterative', 0.014), ('algebra', 0.013), ('disjoint', 0.013), ('ai', 0.013), ('equivalent', 0.013), ('genes', 0.013), ('roots', 0.013), ('pearl', 0.012), ('bdgw', 0.012), ('brandst', 0.012), ('galois', 0.012), ('greuel', 0.012), ('gw', 0.012), ('letac', 0.012), ('mathias', 0.012), ('svv', 0.012), ('unsolvable', 0.012), ('wishart', 0.012), ('models', 0.012), ('implies', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

2 0.18287985 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

3 0.16108961 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

4 0.14758833 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

5 0.13622151 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

Author: Jiji Zhang

Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention

6 0.11749471 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

7 0.064130776 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

8 0.059023131 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy    (Special Topic on Causality)

9 0.049985617 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

10 0.046321902 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

11 0.043848217 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

12 0.036438763 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

13 0.035772499 9 jmlr-2008-Active Learning by Spherical Subdivision

14 0.034027252 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

15 0.032707956 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

16 0.025609564 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

17 0.025276996 61 jmlr-2008-Mixed Membership Stochastic Blockmodels

18 0.022723757 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

19 0.022553682 51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

20 0.020182319 16 jmlr-2008-Approximations for Binary Gaussian Process Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.172), (1, 0.363), (2, -0.024), (3, -0.002), (4, 0.093), (5, -0.032), (6, -0.025), (7, -0.007), (8, 0.024), (9, 0.029), (10, 0.062), (11, -0.015), (12, 0.059), (13, 0.072), (14, -0.011), (15, 0.008), (16, -0.036), (17, -0.003), (18, 0.018), (19, 0.0), (20, -0.008), (21, -0.048), (22, 0.024), (23, -0.034), (24, 0.031), (25, -0.104), (26, -0.003), (27, 0.022), (28, 0.024), (29, -0.014), (30, 0.071), (31, 0.108), (32, -0.073), (33, 0.025), (34, -0.015), (35, 0.023), (36, -0.0), (37, -0.063), (38, -0.11), (39, 0.149), (40, 0.114), (41, -0.062), (42, -0.001), (43, 0.089), (44, -0.002), (45, -0.008), (46, 0.047), (47, 0.179), (48, 0.094), (49, -0.095)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95277107 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

2 0.78806871 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

3 0.67500889 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

4 0.62511444 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

5 0.60846102 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

Author: Jiji Zhang

Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention

6 0.43079257 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

7 0.26152679 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy    (Special Topic on Causality)

8 0.2408146 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

9 0.20836478 9 jmlr-2008-Active Learning by Spherical Subdivision

10 0.19657746 61 jmlr-2008-Mixed Membership Stochastic Blockmodels

11 0.19540554 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

12 0.19403817 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

13 0.19025321 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

14 0.1646481 17 jmlr-2008-Automatic PCA Dimension Selection for High Dimensional Data and Small Sample Sizes

15 0.16193785 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

16 0.1557664 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

17 0.13867483 96 jmlr-2008-Visualizing Data using t-SNE

18 0.12425073 16 jmlr-2008-Approximations for Binary Gaussian Process Classification

19 0.12303749 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

20 0.12131511 26 jmlr-2008-Consistency of Trace Norm Minimization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.519), (5, 0.01), (40, 0.013), (54, 0.022), (58, 0.018), (61, 0.07), (66, 0.051), (76, 0.031), (87, 0.043), (88, 0.034), (92, 0.028), (94, 0.023), (99, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94204897 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

2 0.87653828 11 jmlr-2008-Aggregation of SVM Classifiers Using Sobolev Spaces

Author: Sébastien Loustau

Abstract: This paper investigates statistical performances of Support Vector Machines (SVM) and considers the problem of adaptation to the margin parameter and to complexity. In particular we provide a classifier with no tuning parameter. It is a combination of SVM classifiers. Our contribution is two-fold: (1) we propose learning rates for SVM using Sobolev spaces and build a numerically realizable aggregate that converges with same rate; (2) we present practical experiments of this method of aggregation for SVM using both Sobolev spaces and Gaussian kernels. Keywords: classification, support vector machines, learning rates, approximation, aggregation of classifiers

3 0.8376137 15 jmlr-2008-An Information Criterion for Variable Selection in Support Vector Machines    (Special Topic on Model Selection)

Author: Gerda Claeskens, Christophe Croux, Johan Van Kerckhoven

Abstract: Support vector machines for classification have the advantage that the curse of dimensionality is circumvented. It has been shown that a reduction of the dimension of the input space leads to even better results. For this purpose, we propose two information criteria which can be computed directly from the definition of the support vector machine. We assess the predictive performance of the models selected by our new criteria and compare them to existing variable selection techniques in a simulation study. The simulation results show that the new criteria are competitive in terms of generalization error rate while being much easier to compute. We arrive at the same findings for comparison on some real-world benchmark data sets. Keywords: information criterion, supervised classification, support vector machine, variable selection

4 0.46582702 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

Author: Jiji Zhang

Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention

5 0.45177674 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

6 0.45044953 21 jmlr-2008-Classification with a Reject Option using a Hinge Loss

7 0.42576936 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

8 0.39961028 19 jmlr-2008-Bouligand Derivatives and Robustness of Support Vector Machines for Regression

9 0.39090911 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

10 0.37097892 3 jmlr-2008-A Moment Bound for Multi-hinge Classifiers

11 0.35254177 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

12 0.34236062 80 jmlr-2008-Ranking Individuals by Group Comparisons

13 0.33363983 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning

14 0.32341921 26 jmlr-2008-Consistency of Trace Norm Minimization

15 0.31853151 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

16 0.31631854 36 jmlr-2008-Finite-Time Bounds for Fitted Value Iteration

17 0.31611124 68 jmlr-2008-Nearly Uniform Validation Improves Compression-Based Error Bounds

18 0.31423178 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

19 0.30721447 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

20 0.3065649 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes