jmlr jmlr2008 jmlr2008-88 knowledge-graph by maker-knowledge-mining

88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition


Source: pdf

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning 1. [sent-12, score-0.498]

2 To our limited knowledge, Studen´ (1997) is the only work that addresses the issue of y learning chain graph structures in the literature. [sent-35, score-0.516]

3 Recently, Drton and Perlman (2008) studied the special case of Gaussian chain graph models using a multiple testing procedure, which requires prior knowledge of the dependence chain structure. [sent-36, score-0.883]

4 As a consequence of the versatility, chain graph models have received a growing attention as a modeling tool in statistical applications recently. [sent-40, score-0.516]

5 (1999) constructed a chain graph model for credit scoring in a case study in finance. [sent-42, score-0.516]

6 One important reason, we believe, is the lack of readily available algorithms for chain graph structure recovery. [sent-46, score-0.516]

7 In this paper, we focus on developing a computationally feasible method for structural learning of chain graphs along with this decomposition approach. [sent-51, score-0.511]

8 However, unlike the case of Bayesian networks, the structural learning of chain graph models is more complicated due to the extended Markov property of chain graphs and the presence of both directed and undirected edges. [sent-53, score-1.141]

9 In Section 2, we introduce necessary background for chain graph models and the concept of decomposition via separation trees. [sent-60, score-0.814]

10 The skeleton G of a graph G is the undirected graph obtained by replacing the arrows of G by lines. [sent-79, score-0.719]

11 If every edge in graph G is undirected, then for any vertex u, we define the neighborhood neG (u) of u to be the set of vertices v such that u − v in G . [sent-80, score-0.549]

12 A route in G is a sequence of vertices (v0 , · · · , vk ), k ≥ 0, such that (vi−1 , vi ) ∈ E or (vi , vi−1 ) ∈ E for i = 1, · · · , k, and the vertices v0 and vk are called terminals of the route. [sent-81, score-0.577]

13 / A graph with only undirected edges is called an undirected graph (UG). [sent-87, score-0.588]

14 A graph with only directed edges and without directed cycles is called a directed acyclic graph (DAG). [sent-88, score-0.552]

15 A graph that has no directed (pseudo) cycles is called a chain graph. [sent-89, score-0.566]

16 A complex in G is a path (v0 , · · · , vk ), k ≥ 2, such that v0 → v1 , vi −vi+1 (for i = 1, · · · , k −2) and vk−1 ← vk in G , and no additional edge exists among vertices {v0 , · · · , vk } in G . [sent-96, score-0.538]

17 y An arrow in a graph G is called a complex arrow if it belongs to a complex in G . [sent-100, score-0.581]

18 The moral graph G m of G is the graph obtained by first joining the parents of each complex by a line and then turning arrows of the obtained graph into lines. [sent-102, score-0.685]

19 Studen´ and Bouckaert (2001) introduced the notion of c-separation for chain graph models. [sent-103, score-0.516]

20 Definition 1 Let A, B, S be three disjoint subsets of the vertex set V of a chain graph G , such that A, B are nonempty. [sent-108, score-0.665]

21 For a chain graph G = (V, E), let each v ∈ V be associated with a random variable Xv with domain Xv and µ the underlying probability measure on ∏v∈V Xv . [sent-111, score-0.516]

22 The following result from Frydenberg (1990) characterizes equivalence classes graphically: Two chain graphs are Markov equivalent if and only if they have the same skeleton and complexes, that is, they have the same pattern. [sent-117, score-0.639]

23 s G H B D F C E I H B K A G J D F K A C E I (a) J (b) Figure 1: (a) a chain graph G ; (b) its moral graph G m . [sent-128, score-0.693]

24 The concept is similar to the junction tree of cliques and the independence tree introduced for DAG as ‘d-separation trees’(Xie et al. [sent-131, score-0.512]

25 The term ‘node’ is used for a separation tree to distinguish from the term ‘vertex’ for a graph in general. [sent-136, score-0.616]

26 Definition 2 A tree T with node set C is said to be a separation tree for a chain graph G = (V, E) if 1. [sent-141, score-1.192]

27 G Notice that a separator is defined in terms of a tree whose nodes consist of variable sets, while the c-separator is defined based on a chain graph. [sent-144, score-0.686]

28 In general, these two concepts are not related, though for a separation tree its separator must be some corresponding c-separator in the underlying chain graph. [sent-145, score-0.955]

29 Not surprisingly, the separation tree could be regarded as a scheme for decomposing the knowledge represented by the chain graph into local subsets. [sent-156, score-0.955]

30 The definition of separation trees for chain graphs is similar to that of junction trees of cliques, see Cowell et al. [sent-158, score-0.876]

31 Actually, it is not difficult to see that a junction tree of a chain graph G is also its separation tree. [sent-160, score-1.038]

32 Structural Learning of Chain Graphs In this section, we discuss how separation trees can be used to facilitate the decomposition of the structural learning of chain graphs. [sent-163, score-0.748]

33 It should be emphasized that even with perfect knowledge on the underlying distribution, any two chain graph structures in the same equivalence class are indistinguishable. [sent-165, score-0.55]

34 Thus, we can only expect to recover a pattern from the observed data, that is, an equivalence class to which the underlying chain graph belongs. [sent-166, score-0.516]

35 1 Theoretical Results It is known that for P faithful to a chain graph G = (V, E), an edge (u, v) is in the skeleton G if and only if Xu / Xv | XS for any S ⊆ V \ {u, v} (see Studen´ 1997, Lemma 3. [sent-171, score-0.81]

36 The following theorem shows that with a separation tree, one can localize the search into one node or a small number of tree nodes. [sent-174, score-0.506]

37 Theorem 3 Let T be a separation tree for a chain graph G . [sent-175, score-0.955]

38 s Given the skeleton of a chain graph, we extend it into the equivalence class by identifying and directing all the complex arrows, to which the following result is helpful. [sent-194, score-0.616]

39 Proposition 4 Let G be a chain graph and T a separation tree of G . [sent-195, score-0.955]

40 2 Skeleton Recovery with Separation Tree In this subsection, we propose an algorithm based on Theorem 3 for the identification of chain graph skeleton with separation tree information. [sent-199, score-1.203]

41 Let G be an unknown chain graph of interest and P be a probability distribution that is faithful to it. [sent-200, score-0.516]

42 (Continued) We apply Algorithm 1 to the chain graph G in Fig. [sent-221, score-0.516]

43 Input: A separation tree T of G ; perfect conditional independence knowledge about P. [sent-230, score-0.609]

44 / 1 Set S = 0; 2 foreach Tree node Ch do Start from a complete undirected graph Gh with vertex set Ch ; 3 4 foreach Vertex pair {u, v} ⊂ Ch do 5 if ∃Suv ⊂ Ch s. [sent-232, score-0.568]

45 u v|Suv then 6 Delete the edge (u, v) in Gh ; 7 Add Suv to S ; 8 end 9 end 10 end 11 Combine the graphs Gh = (Ch , Eh ), h = 1, · · · , H into an undirected graph G = (V, ∪H Eh ); h=1 12 foreach Vertex pair {u, v} contained in more than one tree node and (u, v) ∈ G do 13 if ∃Ch s. [sent-234, score-0.752]

46 As in the previous subsection, we assume separation tree information and perfect conditional independence knowledge. [sent-244, score-0.609]

47 2; (b) partially recovered global skeleton of G in part 2 of Algorithm 1; (c) completely recovered global skeleton of G in part 3 of Algorithm 1. [sent-251, score-0.55]

48 4, which is exactly the pattern for our original chain graph G in Fig. [sent-259, score-0.516]

49 (2006), which guarantees that their method for constructing a separation tree from data is valid for chain graph models. [sent-323, score-0.955]

50 Then we propose an algorithm for constructing a separation tree from background knowledge encoded in a labeled block ordering (Roverato and La Rocca, 2006) of the underlying chain graph. [sent-324, score-0.963]

51 1 F ROM U NDIRECTED I NDEPENDENCE G RAPH TO S EPARATION T REE For a chain graph G = (V, E) and a faithful distribution P, an undirected graph U with a vertex set V is an undirected independence graph (UIG) of G if for any u, v ∈ V , (u, v) is not an edge of U ⇒ u v | V \ {u, v} in P. [sent-329, score-1.371]

52 Then a junction tree constructed from any undirected independence graph of G is a separation tree for G . [sent-333, score-1.051]

53 We say a chain graph G = (V, E) is B -consistent if 1. [sent-352, score-0.516]

54 Suppose that the underlying chain graph is the one in Fig. [sent-366, score-0.516]

55 By taking the junction tree of this DAG and replacing the blocks by the vertices they contain, we obtain the tree structure in Fig. [sent-371, score-0.569]

56 This is a separation tree of the chain graph in Fig. [sent-373, score-0.955]

57 s B D A C E V1g V2g V2 F V3g (a) V1 V3 (b) (c) Figure 7: (a) a chain graph with a labeled block ordering; (b) the DAG constructed for the blocks; (c) the tree structure obtained from the junction tree of the DAG in (b). [sent-375, score-1.041]

58 Below we propose Algorithm 3 for constructing a separation tree from labeled block ordering knowledge. [sent-376, score-0.593]

59 By taking the junction tree of this particular DAG, we obtain a separation tree of the underlying chain graph model. [sent-380, score-1.208]

60 3 S EPARATION T REE R EFINEMENT In practice, the nodes of a separation tree constructed from a labeled block ordering or other methods may still be large. [sent-395, score-0.593]

61 Since the complexities of our skeleton and complex recovery algorithms are largely dominated by the cardinality of the largest node on the separation tree, it is desirable to further refine our separation tree by reducing the sizes of the nodes. [sent-396, score-1.117]

62 Input: A crude separation tree Tc for G ; perfect conditional independence knowledge. [sent-399, score-0.609]

63 1 Construct an undirected independence subgraph over each node of T c ; ¯ 2 Combine the subgraphs into a global undirected independence graph G whose edge set is the union of all edge sets of subgraphs; ¯ 3 Construct a junction tree T of G . [sent-401, score-1.015]

64 If the current separation tree contains a node whose cardinality is still relatively large, the above algorithm can be repeatedly used until no further refinement is available. [sent-402, score-0.537]

65 However, we remark that the cardinality of the largest node on the separation tree is eventually determined by the sparseness of the underlying chain graph together with other factors including the specific algorithms for constructing undirected independence subgraphs and junction tree. [sent-403, score-1.287]

66 Therefore, the complexity for investigating each possible edge in the skeleton is O(2 p ) and hence the complexity for constructing the global skeleton is O(p 2 2 p ). [sent-412, score-0.511]

67 This can be done by performing a graph traversal algorithm on an undirected graph which is obtained by deleting all existing arrows on the current graph with the adjacency relations checked on the original graph. [sent-437, score-0.74]

68 By incorporating the information obtained in the skeleton recovery stage, we greatly reduce the number of conditional independence tests to be checked and hence obtain an algorithm of only polynomial time complexity for the complex recovery stage. [sent-442, score-0.574]

69 Note that d is defined as the maximum degree of the vertices in the partially recovered skeleton G obtained after step 2 of Algorithm 1, where G is exactly the skeleton for the underlying DAG in the current situation (i. [sent-454, score-0.668]

70 Therefore, if we believe that the true graph is sparse, the case where our decomposition approach is most applicable, we can apply our general chain graph structural learning algorithm without worrying much about significant extra cost even when the underlying graph is indeed a DAG. [sent-458, score-0.99]

71 We first demonstrate various aspects of our algorithms by running them on randomly generated chain graph models. [sent-461, score-0.516]

72 We generate a random chain graph on V as follows: 1. [sent-474, score-0.516]

73 2863 M A , X IE AND G ENG This procedure then yields an adjacency matrix A for a chain graph with (A i j = A ji = 1) representing an undirected edge between Vi and V j and (Ai j = 1, A ji = 0) representing a directed edge from Vi to V j . [sent-481, score-0.813]

74 Given a randomly generated chain graph G with ordered chain components C1 , · · · ,Ck , we generate a Gaussian distribution on it via the incomplete block-recursive regression as described in T T Wermuth (1992). [sent-483, score-0.855]

75 For the chain component Ci , suppose the corresponding vertices are Vi1 , · · · ,Vir , and in general, let B∗ [Vl ,Vm ] be the element of B∗ that corresponds to the vertex pair (Vl ,Vm ). [sent-511, score-0.634]

76 An UIG is obtained after Step 2 and its junction tree is computed and used as the separation tree in the algorithms. [sent-553, score-0.692]

77 5 log10(sample size) Figure 8: Error measures of the algorithms for randomly generated Gaussian chain graph models: average over 25 repetitions with 10 variables. [sent-623, score-0.516]

78 5 log10(sample size) Figure 9: Error measures of the algorithms for randomly generated Gaussian chain graph models: average over 25 repetitions with 40 variables. [sent-699, score-0.516]

79 5 log10(sample size) Figure 10: Error measures of the algorithms for randomly generated Gaussian chain graph models: average over 25 repetitions with 80 variables. [sent-778, score-0.516]

80 5 log10(sample size) Figure 11: Running times of the algorithms on randomly generated Gaussian chain graph models: average over 25 repetitions. [sent-848, score-0.516]

81 performed on the data to learn the UIG and the junction tree of the UIG is supplied as the separation tree for the algorithm. [sent-1091, score-0.692]

82 Discussion In this paper, we presented a computationally feasible method for structural learning of chain graph models. [sent-1382, score-0.576]

83 The method can be used to facilitate the investigation of both response-explanatory and symmetric association relations among a set of variables simultaneously within the framework of chain graph models, a merit not shared by either Bayesian networks or Markov networks. [sent-1383, score-0.546]

84 Finally, our approach might be extendible to the structural learning of chain graph of alternative Markov properties, for example, AMP chain graphs (Andersson et al. [sent-1399, score-0.998]

85 Definition 7 Let T be a separation tree for a CG G with the node set C = {C1 , · · · ,CH }. [sent-1412, score-0.506]

86 Lemma 8 Let ρ be a route from u to v in a chain graph G , and W the set of all vertices on ρ (W may or may not contain the two end vertices). [sent-1415, score-0.75]

87 Proof Since ρ is intervented by S and W ⊂ S, there must be a non head-to-head section σ of ρ that is hit by S and actually every non head-to-head section of ρ is hit by S. [sent-1418, score-0.537]

88 Lemma 9 Let T be a separation tree for a chain graph over vertex set V and K a separator of T which separates T into two subtrees T1 and T2 with variable sets V1 and V2 . [sent-1421, score-1.281]

89 G Lemma 10 Let u and v be two non adjacent vertices in a chain graph G and ρ a route from u to v in T . [sent-1451, score-0.891]

90 Lemma 11 Let T be a separation tree for a chain graph G over V and C a node of T . [sent-1457, score-1.022]

91 For situation 3, since chain graphs do not admit directed pseudocycles, we know that x ∈ An(u) / and y = v. [sent-1476, score-0.502]

92 Lemma 12 Suppose that u and v are two adjacent vertices in G , then for any separation tree T for G , there exists a node C in T which contains both u and v. [sent-1538, score-0.697]

93 2), we know that for y any chain graph G , there is an edge between two vertices u and v if and only if u/ v|S for any subset S of V . [sent-1563, score-0.739]

94 By Proposition 4, Lemma 12 and the correctness of Algorithm 1, we know that every ordered vertex triple u, v, w in G with u → w a complex arrow and v the parent of (one of) the corresponding complex(es) is considered in line 4 of Algorithm 2. [sent-1573, score-0.487]

95 assuming positivity, both DAG models and chain graph models are closed subset of graphoids under 5 axioms, see Pearl (1988) and Studen´ and Bouckaert (2001); y 2. [sent-1582, score-0.516]

96 With the block vertices substituted, by the definition of separation trees, the output T of Algorithm 3 is a separation tree of G . [sent-1586, score-0.923]

97 Predicting protein folds with structural repeats using a chain graph model. [sent-1732, score-0.576]

98 A discrete variable chain graph for applicants for credit. [sent-1778, score-0.516]

99 On chain graph models for description of conditional indepeny dence structures. [sent-1795, score-0.563]

100 On substantive research hypotheses, conditional independence graphs and graphical chain models. [sent-1823, score-0.61]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('chain', 0.339), ('separation', 0.269), ('skeleton', 0.217), ('lcd', 0.204), ('separator', 0.177), ('graph', 0.177), ('tree', 0.17), ('xie', 0.165), ('dag', 0.161), ('shd', 0.157), ('ecomposition', 0.155), ('hain', 0.155), ('intervented', 0.155), ('tructural', 0.155), ('vertex', 0.149), ('bdg', 0.148), ('vertices', 0.146), ('arrow', 0.142), ('suv', 0.14), ('studen', 0.126), ('raphs', 0.118), ('eng', 0.107), ('tpr', 0.103), ('ie', 0.1), ('fpr', 0.096), ('non', 0.096), ('hit', 0.095), ('undirected', 0.093), ('independence', 0.089), ('route', 0.088), ('graphs', 0.083), ('junction', 0.083), ('vi', 0.081), ('edge', 0.077), ('orient', 0.077), ('block', 0.069), ('node', 0.067), ('recovery', 0.065), ('sep', 0.063), ('structural', 0.06), ('correctness', 0.06), ('complex', 0.06), ('uig', 0.058), ('vk', 0.058), ('recovered', 0.058), ('wermuth', 0.058), ('xv', 0.057), ('arrows', 0.055), ('cance', 0.055), ('graphical', 0.052), ('earning', 0.052), ('ordering', 0.052), ('skeletons', 0.052), ('trees', 0.051), ('directed', 0.05), ('neg', 0.049), ('scd', 0.049), ('studeny', 0.049), ('edges', 0.048), ('conditional', 0.047), ('ch', 0.046), ('adjacent', 0.045), ('alarm', 0.045), ('rec', 0.044), ('contained', 0.044), ('lauritzen', 0.043), ('delete', 0.041), ('bayesian', 0.041), ('legitimate', 0.041), ('foreach', 0.041), ('disorientation', 0.039), ('uigs', 0.039), ('line', 0.039), ('ci', 0.038), ('triple', 0.037), ('simulation', 0.036), ('xs', 0.035), ('perfect', 0.034), ('mmhc', 0.034), ('fp', 0.034), ('sec', 0.033), ('xa', 0.033), ('labeled', 0.033), ('lemma', 0.031), ('candidate', 0.031), ('xb', 0.031), ('algorithm', 0.031), ('relations', 0.03), ('subsection', 0.03), ('situation', 0.03), ('roverato', 0.029), ('eparation', 0.029), ('gvi', 0.029), ('rocca', 0.029), ('sfg', 0.029), ('zongming', 0.029), ('decomposition', 0.029), ('sample', 0.028), ('testing', 0.028), ('markov', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

2 0.43097061 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

3 0.28620893 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

4 0.15727217 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

Author: Eric Perrier, Seiya Imoto, Satoru Miyano

Abstract: Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the search-and-score phase. Subsequently, we theorize the structural constraint by introducing the concept of super-structure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a super-structure constrained optimal search (COS): its time complexity is upper bounded by O(γm n ), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree m and sparse structures ˜ allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound super-structure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the trade-off between speed and accuracy. For incomplete super-structures, a greedily post-processed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, super-structure, connected

5 0.14758833 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

6 0.12949243 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

7 0.1285888 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

8 0.099544168 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

9 0.094643518 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

10 0.074457079 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy    (Special Topic on Causality)

11 0.06087568 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

12 0.059676509 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

13 0.054632463 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models

14 0.052524395 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

15 0.050644491 9 jmlr-2008-Active Learning by Spherical Subdivision

16 0.047401972 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

17 0.043875698 56 jmlr-2008-Magic Moments for Structured Output Prediction

18 0.041514955 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

19 0.040568307 77 jmlr-2008-Probabilistic Characterization of Random Decision Trees

20 0.040101476 19 jmlr-2008-Bouligand Derivatives and Robustness of Support Vector Machines for Regression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.307), (1, 0.525), (2, 0.035), (3, 0.018), (4, 0.078), (5, 0.026), (6, -0.021), (7, -0.009), (8, 0.084), (9, 0.087), (10, 0.017), (11, -0.06), (12, -0.055), (13, 0.338), (14, -0.112), (15, 0.004), (16, -0.038), (17, 0.025), (18, 0.123), (19, -0.007), (20, 0.012), (21, 0.064), (22, -0.005), (23, -0.065), (24, 0.036), (25, 0.063), (26, 0.02), (27, 0.001), (28, 0.004), (29, 0.024), (30, -0.058), (31, 0.0), (32, 0.003), (33, 0.002), (34, -0.062), (35, -0.002), (36, -0.005), (37, 0.009), (38, 0.043), (39, 0.06), (40, -0.089), (41, -0.048), (42, 0.018), (43, -0.012), (44, -0.054), (45, -0.044), (46, 0.024), (47, 0.075), (48, -0.036), (49, -0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97186446 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

2 0.94332117 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

3 0.81128746 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

Author: Yang-Bo He, Zhi Geng

Abstract: The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasi-experiment. Furthermore, we give two optimal designs of experiments, a batch-intervention design and a sequential-intervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal v-structures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph. Keywords: active learning, causal networks, directed acyclic graphs, intervention, Markov equivalence class, optimal design, structural learning

4 0.75419199 40 jmlr-2008-Graphical Methods for Efficient Likelihood Inference in Gaussian Covariance Models

Author: Mathias Drton, Thomas S. Richardson

Abstract: In graphical modelling, a bi-directed graph encodes marginal independences among random variables that are identified with the vertices of the graph. We show how to transform a bi-directed graph into a maximal ancestral graph that (i) represents the same independence structure as the original bi-directed graph, and (ii) minimizes the number of arrowheads among all ancestral graphs satisfying (i). Here the number of arrowheads of an ancestral graph is the number of directed edges plus twice the number of bi-directed edges. In Gaussian models, this construction can be used for more efficient iterative maximization of the likelihood function and to determine when maximum likelihood estimates are equal to empirical counterparts. Keywords: ancestral graph, covariance graph, graphical model, marginal independence, maximum likelihood estimation, multivariate normal distribution

5 0.55207032 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

Author: Eric Perrier, Seiya Imoto, Satoru Miyano

Abstract: Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the search-and-score phase. Subsequently, we theorize the structural constraint by introducing the concept of super-structure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a super-structure constrained optimal search (COS): its time complexity is upper bounded by O(γm n ), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree m and sparse structures ˜ allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound super-structure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the trade-off between speed and accuracy. For incomplete super-structures, a greedily post-processed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, super-structure, connected

6 0.53747994 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

7 0.42679179 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

8 0.39267346 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

9 0.32388783 41 jmlr-2008-Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns

10 0.2660954 31 jmlr-2008-Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction

11 0.22476254 81 jmlr-2008-Regularization on Graphs with Function-adapted Diffusion Processes

12 0.21512191 62 jmlr-2008-Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data

13 0.19358443 9 jmlr-2008-Active Learning by Spherical Subdivision

14 0.18917905 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy    (Special Topic on Causality)

15 0.18568735 43 jmlr-2008-Hit Miss Networks with Applications to Instance Selection

16 0.17469575 56 jmlr-2008-Magic Moments for Structured Output Prediction

17 0.16123827 55 jmlr-2008-Linear-Time Computation of Similarity Measures for Sequential Data

18 0.15832759 39 jmlr-2008-Gradient Tree Boosting for Training Conditional Random Fields

19 0.15063067 61 jmlr-2008-Mixed Membership Stochastic Blockmodels

20 0.14352636 5 jmlr-2008-A New Algorithm for Estimating the Effective Dimension-Reduction Subspace


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.045), (5, 0.015), (28, 0.306), (40, 0.036), (54, 0.026), (58, 0.038), (61, 0.106), (66, 0.071), (72, 0.01), (76, 0.064), (87, 0.025), (88, 0.065), (92, 0.034), (94, 0.039), (99, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77765357 88 jmlr-2008-Structural Learning of Chain Graphs via Decomposition

Author: Zongming Ma, Xianchao Xie, Zhi Geng

Abstract: Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse. Keywords: chain graph, conditional independence, decomposition, graphical model, structural learning

2 0.54214841 6 jmlr-2008-A Recursive Method for Structural Learning of Directed Acyclic Graphs

Author: Xianchao Xie, Zhi Geng

Abstract: In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method. Keywords: Bayesian network, conditional independence, decomposition, directed acyclic graph, structural learning

3 0.4711439 26 jmlr-2008-Consistency of Trace Norm Minimization

Author: Francis R. Bach

Abstract: Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled. Keywords: convex optimization, singular value decomposition, trace norm, consistency

4 0.40331504 48 jmlr-2008-Learning Bounded Treewidth Bayesian Networks

Author: Gal Elidan, Stephen Gould

Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while at the same time allow for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial both in the size of the graph and the treewidth bound. At the heart of our method is a dynamic triangulation that we update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life data sets and show that it is superior to the greedy approach as soon as the bound on the treewidth is nontrivial. Importantly, we also show that by making use of global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. Keywords: Bayesian networks, structure learning, model selection, bounded treewidth

5 0.39965591 20 jmlr-2008-Causal Reasoning with Ancestral Graphs    (Special Topic on Causality)

Author: Jiji Zhang

Abstract: Causal reasoning is primarily concerned with what would happen to a system under external interventions. In particular, we are often interested in predicting the probability distribution of some random variables that would result if some other variables were forced to take certain values. One prominent approach to tackling this problem is based on causal Bayesian networks, using directed acyclic graphs as causal diagrams to relate post-intervention probabilities to pre-intervention probabilities that are estimable from observational data. However, such causal diagrams are seldom fully testable given observational data. In consequence, many causal discovery algorithms based on data-mining can only output an equivalence class of causal diagrams (rather than a single one). This paper is concerned with causal reasoning given an equivalence class of causal diagrams, represented by a (partial) ancestral graph. We present two main results. The first result extends Pearl (1995)’s celebrated do-calculus to the context of ancestral graphs. In the second result, we focus on a key component of Pearl’s calculus—the property of invariance under interventions, and give stronger graphical conditions for this property than those implied by the first result. The second result also improves the earlier, similar results due to Spirtes et al. (1993). Keywords: ancestral graphs, causal Bayesian network, do-calculus, intervention

6 0.37810263 27 jmlr-2008-Consistency of the Group Lasso and Multiple Kernel Learning

7 0.37422469 10 jmlr-2008-Active Learning of Causal Networks with Intervention Experiments and Optimal Designs    (Special Topic on Causality)

8 0.36797526 72 jmlr-2008-On the Size and Recovery of Submatrices of Ones in a Random Binary Matrix

9 0.36781028 35 jmlr-2008-Finding Optimal Bayesian Network Given a Super-Structure

10 0.36266175 34 jmlr-2008-Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

11 0.35811099 18 jmlr-2008-Bayesian Inference and Optimal Design for the Sparse Linear Model

12 0.35382053 9 jmlr-2008-Active Learning by Spherical Subdivision

13 0.351868 57 jmlr-2008-Manifold Learning: The Price of Normalization

14 0.35127556 83 jmlr-2008-Robust Submodular Observation Selection

15 0.35072783 93 jmlr-2008-Using Markov Blankets for Causal Structure Learning    (Special Topic on Causality)

16 0.34981355 15 jmlr-2008-An Information Criterion for Variable Selection in Support Vector Machines    (Special Topic on Model Selection)

17 0.34930345 58 jmlr-2008-Max-margin Classification of Data with Absent Features

18 0.34596011 89 jmlr-2008-Support Vector Machinery for Infinite Ensemble Learning

19 0.3459464 24 jmlr-2008-Complete Identification Methods for the Causal Hierarchy    (Special Topic on Causality)

20 0.3458277 84 jmlr-2008-Search for Additive Nonlinear Time Series Causal Models