jmlr jmlr2012 jmlr2012-48 knowledge-graph by maker-knowledge-mining

48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Source: pdf

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

Abstract: We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efﬁcient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of −2 samples n = Ω(Jmin log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufﬁcient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency. Keywords: Gaussian graphical model selection, high-dimensional learning, local-separation property, walk-summability, necessary conditions for model selection

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We identify a set of graphs for which an efﬁcient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. [sent-12, score-0.348]

2 Note that these graphs can simultaneously possess many short cycles as well as large node degrees (growing with the number of nodes), and thus, we can incorporate a wide class of graphs for high-dimensional estimation. [sent-33, score-0.677]

3 We propose a simple local algorithm for Gaussian graphical model selection, termed as conditional covariance threshold test (CMIT) based on a set of conditional covariance thresholding tests. [sent-43, score-0.641]

4 The conditional covariance is thus computed for each node pair (i, j) ∈ V 2 and the conditioning set which achieves the minimum is found, over all subsets of cardinality at most η; if the minimum value exceeds the threshold ξn,p , then the node pair is declared as an edge. [sent-63, score-0.373]

5 These include Erd˝ s-R´ nyi random graphs, power-law graphs and small-world graphs, o e as discussed previously. [sent-68, score-0.427]

6 We also develop novel techniques to obtain necessary conditions for consistent structure estimation of Erd˝ s-R´ nyi random graphs and other ensembles with non-uniform distribution of graphs. [sent-73, score-0.492]

7 As a by-product of our analysis, we also establish the correctness of loopy belief propagation for walk-summable Gaussian graphical models Markov on locally tree-like graphs (see Section 5 for details). [sent-80, score-0.551]

8 (2002), Kalisch and B¨ hlmann u (2007) and Xie and Geng (2008) propose conditional-independence tests for learning Bayesian networks on directed acyclic graphs (DAG). [sent-108, score-0.306]

9 (2010) proposed a faster greedy algorithm, based on conditional entropy, for graphs with large girth and bounded degree. [sent-110, score-0.453]

10 We allow for graphs where the maximum degree can grow with the number of nodes. [sent-116, score-0.297]

11 These works consider graphs drawn uniformly from the class of bounded degree graphs and establish that n = Ω(∆k log p) samples are required for consistent structure estimation, in an p-node graph with maximum degree ∆, where k is typically a small positive integer. [sent-122, score-0.759]

12 However, a direct application of these methods yield poor lower bounds if the ensemble of graphs has a highly non-uniform distribution. [sent-123, score-0.35]

13 This is the case with the ensemble of Erd˝ s-R´ nyi random graphs o e (Bollob´ s, 1985). [sent-124, score-0.529]

14 Necessary conditions for structure estimation of Erd˝ s-R´ nyi random graphs were a o e derived for Ising models by Anandkumar et al. [sent-125, score-0.492]

15 We characterize the set of typical graphs for the Erd˝ s-R´ nyi ensemble and o e derive a modiﬁed form of Fano’s inequality and obtain a non-asymptotic lower bound on sample complexity involving the average degree and the number of nodes. [sent-129, score-0.605]

16 Note that the assumption of a Gaussian graphical model Markov on a sparse graph is one such formulation. [sent-131, score-0.333]

17 A Gaussian graphical model Markov on G has a probability density function (pdf) that may be parameterized as 1 fX (x) ∝ exp − xT JG x + hT x , 2 (2) where JG is a positive-deﬁnite symmetric matrix whose sparsity pattern corresponds to that of the graph G. [sent-156, score-0.333]

18 2 Tractable Graph Families We consider the class of Gaussian graphical models Markov on a graph G p belonging to some ensemble G(p) of graphs with p nodes. [sent-182, score-0.712]

19 We emphasize that in our formulation the graph ensemble G(p) can either be deterministic or random—in the latter, we also specify a probability measure over the set of graphs in G(p). [sent-186, score-0.515]

20 In the setting where G(p) is a random-graph ensemble, let PX,G denote the joint probability distribution of the variables X and the graph G ∼ G(p), and let fX|G denote the conditional (Gaussian) density of the variables Markov on the given graph G. [sent-187, score-0.43]

21 We now characterize the ensemble of graphs amenable for consistent structure estimation under our formulation. [sent-198, score-0.35]

22 Deﬁnition 1 (γ-Local Separator) Given a graph G, a γ-local separator Sγ (i, j) between i and j, for (i, j) ∈ G, is a minimal vertex separator5 with respect to the subgraph Hγ,i . [sent-205, score-0.287]

23 We now characterize the ensemble of graphs based on the size of local separators. [sent-208, score-0.379]

24 Deﬁnition 2 ((η, γ)-Local Separation Property) An ensemble of graphs satisﬁes (η, γ)-local separation property if for a. [sent-209, score-0.439]

25 In Section 3, we propose an efﬁcient algorithm for graphical model selection when the underlying graph belongs to a graph ensemble G(p; η, γ) with sparse local separators (i. [sent-213, score-0.669]

26 does not apply to deterministic graph ensembles G(p) where no randomness is assumed, and in this setting, we assume that the property Q holds for every graph in the ensemble. [sent-222, score-0.386]

27 Any (deterministic or random) ensemble of degree-bounded graphs GDeg (p, ∆) satisﬁes (η, γ)-local separation property with η = ∆ and arbitrary γ ∈ N. [sent-231, score-0.439]

28 Our goal in this paper is to relax the usual bounded-degree assumption and to consider ensembles of graphs G(p) whose maximum degrees may grow with the number of nodes p. [sent-236, score-0.389]

29 Thus, a graph with girth g satisﬁes (η, γ)-local separation property with η = 1 and γ = g/2. [sent-244, score-0.359]

30 Let GGirth (p; g) denote the ensemble of graphs with girth at most g. [sent-245, score-0.455]

31 For instance, the ensemble of Erd˝ s-R´ nyi graphs GER (p, c/p), where an edge between o e any node pair appears with a probability c/p, independent of other node pairs, is locally tree-like. [sent-258, score-0.705]

32 Proposition 3 (Random Graphs are Locally Tree-Like) The ensemble of Erd˝ s-R´ nyi graphs o e GER (p, c/p) satisﬁes the (η, γ)-local separation property in (4) with η = 2, γ ≤ log p . [sent-264, score-0.618]

33 4 log c (5) Thus, there are at most two paths of length smaller than γ between any two nodes in Erd˝ s-R´ nyi o e graphs a. [sent-265, score-0.546]

34 Along similar lines, the ensemble of ∆-random regular graphs, denoted by GReg (p, ∆), which is the uniform ensemble of regular graphs with degree ∆ has no overlapping cycles of length at most Θ(log∆−1 p) a. [sent-272, score-0.582]

35 The former class of graphs can have short cycles but the maximum degree needs to be constant, while the latter class of graphs can have a large maximum degree but the number of overlapping short cycles needs to be small. [sent-280, score-0.756]

36 The class of hybrid graphs or augmented graphs (Chung and Lu, 2006, Ch. [sent-282, score-0.496]

37 12) consists of graphs which are the union of two graphs: a “local” graph having short cycles and a “global” graph having small average distances. [sent-283, score-0.659]

38 The simplest model GWatts (p, d, c/p), ﬁrst studied by Watts and Strogatz (1998), consists of the union of a d-dimensional grid and an Erd˝ s-R´ nyi o e random graph with parameter c. [sent-285, score-0.344]

39 Thus, we see that a wide range of graphs satisfy the property of having sparse local separators, and that it is possible for graphs with large degrees as well as many short cycles to have this property. [sent-291, score-0.666]

40 4 C OUNTER - EXAMPLE : D ENSE G RAPHS While the above examples illustrate that a large class of graphs satisfy the local separation criterion, there indeed exist graphs which do not satisfy it. [sent-294, score-0.587]

41 Such graphs tend to be “dense”, that is, the 2301 A NANDKUMAR , TAN , H UANG AND W ILLSKY number of edges scales super-linearly in the number of nodes. [sent-295, score-0.292]

42 For instance, the Erd˝ s-R´ nyi graphs o e GER (p, c/p) in the dense regime, where the average degree scales as c = Ω(p2 ). [sent-296, score-0.476]

43 (6) We require that the number of nodes p → ∞ to exploit the local separation properties of the class of graphs under consideration. [sent-306, score-0.418]

44 (A3) Local-Separation Property: We assume that the ensemble of graphs G(p; η, γ) satisﬁes the (η, γ)-local separation property with η, γ satisfying: η = O(1), Jmin D−1 α−γ = ω(1), min (8) where α is given by (7) and Dmin := mini J(i, i) is the minimum diagonal entry of the potential matrix J. [sent-315, score-0.508]

45 1 E XAMPLE 1: D EGREE -B OUNDED E NSEMBLES To gain a better understanding of conditions (A1)–(A5), consider the ensemble of graphs GDeg (p; ∆) with bounded degree ∆ ∈ N. [sent-334, score-0.435]

46 Intuitively, we require a larger path threshold γ, as the degree bound ∆ on the graph ensemble increases. [sent-339, score-0.377]

47 This implies that graphs with fairly large degrees and short cycles can be recovered successfully using our algorithm. [sent-342, score-0.362]

48 2 E XAMPLE 2: G IRTH -B OUNDED E NSEMBLES The condition in (11) can be specialized for the ensemble of girth-bounded graphs GGirth (p; g) in a straightforward manner as g (12) ∆α 2 = o(1), where g corresponds to the girth of the graphs in the ensemble. [sent-353, score-0.703]

49 The condition in (12) demonstrates a natural tradeoff between the girth and the maximum degree; graphs with large degrees can be learned efﬁciently if their girths are large. [sent-354, score-0.386]

50 Erd˝ s-R´ nyi graph G ∼ GER (p, c/p) satisﬁes (8) when o e c = O(poly(log p)) under the best-possible scaling of Jmin subject to the walk-summability constraint in (7) (i. [sent-360, score-0.344]

51 Thus, the Erd˝ s-R´ nyi graphs are a log amenable to successful recovery when the average degree c = O(poly(log p)). [sent-372, score-0.512]

52 Similarly, for the small-world ensemble GWatts (p, d, c/p), when d = O(1) and c = O(poly(log p)), the graphs are amenable for consistent estimation. [sent-373, score-0.35]

53 Theorem 4 (Structural consistency of CMIT) For structure learning of Gaussian graphical models Markov on a graph G p ∼ G(p; η, γ), CMIT(xn ; ξn,p , η) is consistent for a. [sent-377, score-0.362]

54 Comparison with Meinshausen and B¨ hlmann (2006): u The work by Meinshausen and B¨ hlmann (2006) considers ℓ1 -penalized linear regression for neighboru hood selection of Gaussian graphical models and establish a sample complexity of n = Ω((∆+ −2 Jmin ) log p). [sent-401, score-0.313]

55 We note that our guarantees allow for graphs which do not necessarily satisfy the conditions imposed by Meinshausen and B¨ hlmann (2006). [sent-402, score-0.342]

56 Note that the class of Gaussian graphical models with diagonally-dominant covariance matrices forms a strict sub-class of walk-summable models, and thus satisﬁes assumption (A2) for the theorem to hold. [sent-404, score-0.302]

57 Comparison with Ising models: Our above result for learning Gaussian graphical models is analogous to structure estimation of Ising models subject to an upper bound on the edge potentials (Anandkumar et al. [sent-419, score-0.295]

58 Using the threshold ξn,p deﬁned in (10), the conditional mutual information test CMIT is given by the threshold test min I(Xi ; X j |XS ) > ξ2 , n,p S⊂V \{i, j} |S|≤η and node pairs (i, j) exceeding the threshold are added to the estimate Gn . [sent-430, score-0.335]

59 Theorem 5 (Structural consistency of CMIT) For structure learning of the Gaussian graphical model on a graph G p ∼ G(p; η, γ), CMIT(xn ; ξn,p , η) is consistent for a. [sent-432, score-0.333]

60 For Gaussian random variables, conditional covariances and conditional mutual information are equivalent tests for conditional independence. [sent-438, score-0.396]

61 This is due to faster decay of conditional mutual information on the edges compared to the decay of conditional covariances. [sent-441, score-0.31]

62 Thus, conditional covariances are more efﬁcient for Gaussian graphical model selection compared to conditional mutual information. [sent-442, score-0.464]

63 Necessary Conditions for Model Selection In the previous sections, we proposed and analyzed efﬁcient algorithms for learning the structure of Gaussian graphical models Markov on graph ensembles satisfying local-separation property. [sent-444, score-0.391]

64 2)) turns out to be too weak for the class of Erd˝ s-R´ nyi graphs GER (p, c/p), where the o e average degree13 c is much smaller than the maximum degree. [sent-451, score-0.427]

65 A graph G is drawn from the ensemble of Erd˝ s-R´ nyi o e graphs G ∼ GER (p, c/p). [sent-457, score-0.694]

66 2 Necessary Conditions for Exact Recovery To derive the necessary condition for learning Gaussian graphical models Markov on sparse Erd˝ so R´ nyi graphs G ∼ GER (p, c/p), we assume that the strict walk-summability condition with parame eter α, according to (7). [sent-475, score-0.624]

67 We are then able to demonstrate the following: Theorem 6 (Weak Converse for Gaussian Models) For a walk-summable Gaussian graphical model satisfying (7) with parameter α, for almost every graph G ∼ GER (p, c/p) as p → ∞, in order 13. [sent-476, score-0.333]

68 The edit distance d : G p × G p → N ∪ {0} between two undirected graphs G = (V, E) and G = (V, E ′ ) is deﬁned as d(G, G′ ) := |E△E ′ |, where △ denotes the symmetric difference between the edge sets E and E ′ . [sent-490, score-0.359]

69 In our context, given a graph G, we deﬁne the d(G) to be the ratio of the number of edges of G to the total number of nodes p. [sent-511, score-0.288]

70 Every graph G ∈ Tε has an average number c of edges that is 2 ε-close in the Erd˝ s-R´ nyi ensemble. [sent-515, score-0.388]

71 Parts 1 and 3 of Lemma 8 respectively say that the set of typical graphs has high probability and has very small cardinality relative to the number of p graphs with p nodes |G p | = exp2 ( 2 ). [sent-529, score-0.575]

72 As a by-product of our previous analysis on graphical model selection, we now show the asymptotic correctness of LBP on walk-summable Gaussian models when the underlying graph is locally tree-like. [sent-537, score-0.362]

73 1 Background The belief propagation (BP) algorithm is a distributed algorithm where messages (or beliefs) are passed among the neighbors to draw inferences at the nodes of a graphical model. [sent-539, score-0.357]

74 However, if the graph is sparse (consists of few edges), the computation of node marginals may be sped up dramatically by exploiting the graph structure and using distributed algorithms to parallelize the computations. [sent-541, score-0.397]

75 Consider the event that the neighborhood of a node i has no cycles up to graph distance γ, given by Γ(i; γ, G) := {Bγ (i; G) does not contain any cycles}. [sent-552, score-0.35]

76 We assume a random graph ensemble G(p) such that for a given node i ∈ V , we have P[Γc (i; γ, G)] = o(1). [sent-553, score-0.334]

77 Recall that the class of random regular graphs G ∼ GReg (p, ∆) have a girth of O(log∆−1 p). [sent-565, score-0.353]

78 We measure the performance of methods using the notion u of the edit distance between the true and estimated graphs (for synthetic data). [sent-571, score-0.347]

79 1 Data Sets Synthetic data: In order to evaluate the performance of our algorithm in terms of error in reconstructing the graph structure, we generated samples from Gaussian graphical models for different graphs. [sent-597, score-0.362]

80 These include a single cycle graph with p = 80 nodes, an Erd˝ s-R´ nyi (ER) graph G ∼ GER (p, c/p) o e with average degree c = 1. [sent-598, score-0.558]

81 2 and Watts-Strogatz model GWatts (p, d, c/p) with degree of local graph d = 2 and average degree of the global graph c = 1. [sent-599, score-0.457]

82 For the foreign exchange data, we limited the number of edges to 100, while for synthetic data, we limited it to 100 for cycle and Erd˝ s-R´ nyi (ER) graphs and to 200 for the Watts-Strogatz o e model. [sent-620, score-0.616]

83 The choice of parameters and graphs result in valid models in our experiments, that is, the potential matrix is positive deﬁnite. [sent-624, score-0.308]

84 2688 Table 1: Normalized edit distance under CMIT, ℓ1 penalized MLE and ℓ1 penalized neighborhood selection on synthetic data from graphs listed above, where n denotes the number of samples. [sent-661, score-0.508]

85 We observe from the reconstructed graphs that geography plays a crucial role in the foreign exchange trends. [sent-673, score-0.363]

86 This algorithm succeeds on a wide range of graph ensembles such as the Erd˝ s-R´ nyi ensemble, small-world networks etc. [sent-738, score-0.373]

87 2 2 10 (b) Erd¨ s-R´ nyi o e (a) Cycle 3 10 10 Sample Size n Sample Size n (c) Watts-Strogatz Figure 7: CMIT, ℓ1 penalized MLE and ℓ1 penalized neighborhood selection methods. [sent-773, score-0.34]

88 3, we relate walk-summability in (7) to the notion of correlation decay, where the effect of faraway nodes on covariances can be controlled and the local-separation property of the graphs under consideration can be exploited. [sent-804, score-0.384]

89 The adjacency matrix AG of a graph G with maximum degree ∆G satisﬁes λmax (AG ) ≤ ∆G , since it is dominated by a ∆-regular graph which has maximum eigenvalue of ∆G . [sent-807, score-0.379]

90 When the graph G is a Erd˝ s-R´ nyi random graph, G ∼ GER (p, c/p), we can provide better o e bounds. [sent-811, score-0.344]

91 3 Implications of Walk-Summability Recall that ΣG denotes the covariance matrix for Gaussian graphical model on graph G and that JG = Σ−1 with JG = I − RG in (19). [sent-821, score-0.438]

92 Let Bγ (i) denote the set of nodes within γ hops from node i in graph G. [sent-824, score-0.311]

93 In other words, RHγ;i j is formed by considering the Gaussian graphical model over graph Hγ;i j . [sent-831, score-0.333]

94 1−α (22) Thus, for walk-summable Gaussian graphical models, we have α := RG < 1, implying that the error in (22) in approximating the covariance by local neighborhood decays exponentially with distance. [sent-834, score-0.339]

95 2 Thus, the conditional covariance in (25) consists of walks in the original graph G, not passing through nodes in S. [sent-859, score-0.523]

96 1 Conditional Covariance between Non-Neighbors: Normalized Case We now provide bounds on the conditional covariance for Gaussian graphical models Markov on a graph G ∼ G(p; η, γ) satisfying the local-separation property (η, γ), as per Deﬁnition 2. [sent-865, score-0.594]

97 Lemma 12 (Conditional Covariance Between Non-neighbors) For a walk-summable Gaussian graphical model, the conditional covariance between non-neighbors i and j, conditioned on Sγ , the γ-local separator between i and j, satisﬁes max Σ(i; j|Sγ ) = O( RG γ ). [sent-866, score-0.462]

98 Thus using (27), the conditional covariance on graph G can be bounded as ΣG (i, j|S) = O(max( Eγ , Fγ )). [sent-874, score-0.37]

99 Conditional Covariance between Non-neighbors: The conditional covariance between nonneighbors i and j, conditioned on Sγ , the γ-local separator between i and j, satisﬁes max Σ(i; j|Sγ ) = O j∈N (i) / αγ Dmin , (30) where Dmin := mini D(i, i) = mini J(i, i). [sent-894, score-0.37]

100 3 Conditional Covariance between Neighbors: General Case We provide a lower bound on conditional covariance among the neighbors for the graphs under consideration. [sent-901, score-0.518]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jmin', 0.338), ('cmit', 0.314), ('graphs', 0.248), ('rg', 0.198), ('nyi', 0.179), ('illsky', 0.177), ('imensional', 0.177), ('nandkumar', 0.177), ('graphical', 0.168), ('tan', 0.165), ('graph', 0.165), ('erd', 0.155), ('raphical', 0.151), ('hb', 0.136), ('jg', 0.131), ('igh', 0.117), ('uang', 0.117), ('dmin', 0.117), ('anandkumar', 0.113), ('xn', 0.111), ('ger', 0.105), ('election', 0.105), ('covariance', 0.105), ('girth', 0.105), ('ensemble', 0.102), ('conditional', 0.1), ('aussian', 0.1), ('odel', 0.096), ('separator', 0.089), ('cycles', 0.081), ('lbp', 0.08), ('nodes', 0.079), ('walks', 0.074), ('foreign', 0.072), ('gaussian', 0.072), ('pe', 0.072), ('mle', 0.072), ('edit', 0.069), ('node', 0.067), ('mutual', 0.066), ('penalized', 0.062), ('separation', 0.062), ('fano', 0.06), ('bic', 0.059), ('hlmann', 0.058), ('meinshausen', 0.056), ('ising', 0.056), ('jmax', 0.056), ('ravikumar', 0.055), ('degree', 0.049), ('edges', 0.044), ('distortion', 0.043), ('exchange', 0.043), ('markov', 0.043), ('edge', 0.042), ('xample', 0.041), ('ag', 0.041), ('bollob', 0.04), ('malioutov', 0.04), ('nbd', 0.04), ('paths', 0.04), ('separators', 0.04), ('propagation', 0.039), ('neighbors', 0.038), ('mini', 0.038), ('neighborhood', 0.037), ('chung', 0.037), ('lemma', 0.037), ('recovery', 0.036), ('conditions', 0.036), ('rh', 0.035), ('walk', 0.035), ('bresler', 0.034), ('threshold', 0.034), ('loopy', 0.034), ('singapore', 0.034), ('subgraph', 0.033), ('belief', 0.033), ('degrees', 0.033), ('xs', 0.032), ('gdeg', 0.032), ('gwatts', 0.032), ('lanka', 0.032), ('ounded', 0.032), ('sri', 0.032), ('summable', 0.032), ('thailand', 0.032), ('relaxation', 0.032), ('potential', 0.031), ('precision', 0.031), ('covariances', 0.03), ('pk', 0.03), ('synthetic', 0.03), ('models', 0.029), ('local', 0.029), ('ensembles', 0.029), ('india', 0.029), ('transparent', 0.029), ('bound', 0.027), ('property', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

2 0.15140289 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Author: Rahul Mazumder, Trevor Hastie

Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation

3 0.12299568 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides ﬁtting Gaussian graphical models, it also provides functions for ﬁtting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efﬁciency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efﬁcient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efﬁciency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a ﬂexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modiﬁed the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efﬁcient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=

4 0.08728338 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

Author: Aviv Tamar, Dotan Di Castro, Ron Meir

Abstract: In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent’s knowledge. Our method relies on a novel deﬁnition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem. Keywords: reinforcement learning, temporal difference, stochastic approximation, markov decision processes, hybrid model based model free algorithms

5 0.081507102 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

Author: Yongdai Kim, Sunghoon Kwon, Hosik Choi

Abstract: Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufﬁcient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufﬁcient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that ﬁnite sample performances of consistent model selection criteria can be quite different. Keywords: model selection consistency, general information criteria, high dimension, regression

6 0.077895932 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

7 0.06951049 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

8 0.062878735 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

9 0.062549978 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

10 0.061399013 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

11 0.05779282 20 jmlr-2012-Analysis of a Random Forests Model

12 0.056973655 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

13 0.051741678 82 jmlr-2012-On the Necessity of Irrelevant Variables

14 0.044182684 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

15 0.043400876 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

16 0.043317303 59 jmlr-2012-Linear Regression With Random Projections

17 0.042594194 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

18 0.04257137 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

19 0.042165995 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

20 0.040794544 80 jmlr-2012-On Ranking and Generalization Bounds

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.212), (1, 0.159), (2, 0.024), (3, -0.162), (4, 0.071), (5, 0.219), (6, -0.104), (7, -0.028), (8, 0.008), (9, -0.064), (10, 0.057), (11, 0.109), (12, -0.096), (13, -0.149), (14, -0.196), (15, -0.065), (16, 0.089), (17, -0.112), (18, 0.124), (19, 0.07), (20, 0.033), (21, -0.14), (22, -0.045), (23, 0.03), (24, -0.091), (25, 0.074), (26, -0.056), (27, -0.118), (28, 0.001), (29, 0.067), (30, -0.016), (31, -0.002), (32, -0.012), (33, 0.083), (34, -0.051), (35, 0.024), (36, 0.004), (37, 0.002), (38, -0.041), (39, -0.028), (40, -0.03), (41, 0.071), (42, 0.077), (43, 0.027), (44, -0.022), (45, 0.044), (46, -0.054), (47, 0.037), (48, 0.175), (49, -0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94149458 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

2 0.75627369 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Author: Rahul Mazumder, Trevor Hastie

3 0.68737739 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

4 0.51486319 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence deﬁnes a ﬁner partitioning of DAGs than observational Markov equivalence and hence improves the identiﬁability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

5 0.42226464 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

Author: Yongdai Kim, Sunghoon Kwon, Hosik Choi

6 0.41638041 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

7 0.40242025 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

8 0.38147563 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

9 0.38060409 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

10 0.37210017 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

11 0.34178054 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

12 0.30458686 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

13 0.29843661 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

14 0.28544468 20 jmlr-2012-Analysis of a Random Forests Model

15 0.27441961 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

16 0.26293516 59 jmlr-2012-Linear Regression With Random Projections

17 0.25597391 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

18 0.25317967 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

19 0.24263249 118 jmlr-2012-Variational Multinomial Logit Gaussian Process

20 0.23397712 80 jmlr-2012-On Ranking and Generalization Bounds

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.014), (21, 0.045), (26, 0.034), (29, 0.04), (35, 0.011), (49, 0.018), (56, 0.013), (57, 0.014), (64, 0.012), (75, 0.034), (77, 0.01), (79, 0.456), (92, 0.107), (96, 0.1)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.85601902 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions

Author: Daniel J. Lizotte, Michael Bowling, Susan A. Murphy

Abstract: We present a general and detailed development of an algorithm for ﬁnite-horizon ﬁtted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the ﬁeld of evidence-based clinical decision support. Keywords: reinforcement learning, dynamic programming, decision making, linear regression, preference elicitation

2 0.82646406 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

Author: Nir Ailon

Abstract: Given a set V of n elements we wish to linearly order them given pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(ε−6 n log5 n) preference labels for a regret of ε times the optimal loss. As a function of n, this is asymptotically better than standard (non-adaptive) learning bounds achievable for the same problem. Our main result takes us a step closer toward settling an open problem posed by learning-torank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? To further show the power and practicality of our solution, we analyze a typical test case in which a large margin linear relaxation is used for efﬁciently solving the simpler learning problems in our decomposition. Keywords: statistical learning theory, active learning, ranking, pairwise ranking, preferences

same-paper 3 0.78230125 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

4 0.40670931 84 jmlr-2012-Online Submodular Minimization

Author: Elad Hazan, Satyen Kale

Abstract: We consider an online decision problem over a discrete space in which the loss function is submodular. We give algorithms which are computationally efﬁcient and are Hannan-consistent in both the full information and partial feedback settings. Keywords: submodular optimization, online learning, regret minimization

5 0.40287519 105 jmlr-2012-Selective Sampling and Active Learning from Single and Multiple Teachers

Author: Ofer Dekel, Claudio Gentile, Karthik Sridharan

Abstract: We present a new online learning algorithm in the selective sampling framework, where labels must be actively queried before they are revealed. We prove bounds on the regret of our algorithm and on the number of labels it queries when faced with an adaptive adversarial strategy of generating the instances. Our bounds both generalize and strictly improve over previous bounds in similar settings. Additionally, our selective sampling algorithm can be converted into an efﬁcient statistical active learning algorithm. We extend our algorithm and analysis to the multiple-teacher setting, where the algorithm can choose which subset of teachers to query for each label. Finally, we demonstrate the effectiveness of our techniques on a real-world Internet search problem. Keywords: online learning, regret, label-efﬁcient, crowdsourcing

6 0.39773574 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

7 0.3825385 103 jmlr-2012-Sampling Methods for the Nyström Method

8 0.38150018 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

9 0.38018915 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

10 0.37922123 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

11 0.37919602 80 jmlr-2012-On Ranking and Generalization Bounds

12 0.37800533 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

13 0.3772034 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

14 0.37427306 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

15 0.37198195 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

16 0.37081873 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

17 0.36939985 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

18 0.36900085 111 jmlr-2012-Structured Sparsity and Generalization

19 0.36852372 68 jmlr-2012-Minimax Manifold Estimation

20 0.36791742 13 jmlr-2012-Active Learning via Perfect Selective Classification