jmlr jmlr2012 jmlr2012-40 knowledge-graph by maker-knowledge-mining

40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Source: pdf

Author: Rahul Mazumder, Trevor Hastie

Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Statistics Stanford University Stanford, CA 94305 Editor: Francis Bach Abstract We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. [sent-3, score-0.594]

2 Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. [sent-4, score-1.429]

3 We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. [sent-5, score-2.704]

4 This characterizes a very interesting property of a path of graphical lasso solutions. [sent-6, score-0.302]

5 For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. [sent-8, score-0.325]

6 Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation 1. [sent-10, score-1.605]

7 Introduction Consider a data matrix Xn×p comprising of n sample realizations from a p dimensional Gaussian distribution with zero mean and positive deﬁnite covariance matrix Σ p×p (unknown), that is, i. [sent-11, score-0.373]

8 ℓ1 regularized Sparse Inverse Covariance Selection also known as graphical lasso (Friedman et al. [sent-18, score-0.262]

9 , 2008; Yuan and Lin, 2007) estimates the covariance matrix Σ, under the assumption that the inverse covariance matrix, that is, Σ−1 is sparse. [sent-20, score-0.613]

10 This is achieved by minimizing the regularized negative log-likelihood function: minimize − log det(Θ) + tr(SΘ) + λ ∑ |Θi j |, Θ 0 (1) i, j where S is the sample covariance matrix. [sent-21, score-0.301]

11 We note that (1) can also be used in a more non-parametric fashion for any positive semi-deﬁnite input matrix S, not necessarily a sample covariance matrix of a MVN sample as described above. [sent-24, score-0.407]

12 This paper is about one such (surprising) property—namely establishing an equivalence between the vertex-partition induced by the connected components of the (λ) non-zero pattern of Θ and the thresholded sample covariance matrix S. [sent-35, score-1.413]

13 This strategy becomes extremely effective in solving large problems over a range of values of λ—sufﬁciently restricted to ensure sparsity and the separation into connected components. [sent-41, score-0.437]

14 A ﬁnite undirected graph G on p vertices is given by the ordered tuple G = (V , E ), where V is the set of nodes and E the collection of (undirected) edges. [sent-47, score-0.344]

15 We use the convention that a node is not connected to itself, so the diagonals of the adjacency matrix are all zeros. [sent-49, score-0.534]

16 We say two nodes u, v ∈ V are connected if there is a path between them. [sent-51, score-0.506]

17 A maximal connected subgraph2 is a connected component of the graph G . [sent-52, score-1.091]

18 Connectedness is an equivalence relation that decomposes a graph G into its connected components {(Vℓ , E ℓ )}1≤ℓ≤K —with G = ∪K (Vℓ , E ℓ ), ℓ=1 where K denotes the number of connected components. [sent-53, score-1.255]

19 Throughout this paper we will often refer to this partition as the vertex-partition induced by the components of the graph G . [sent-59, score-0.561]

20 Suppose a graph G deﬁned on the set of vertices V admits the following decomposition into connected components: G = ∪K (Vℓ , E ℓ ). [sent-61, score-0.755]

21 782 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO partitions induced by the connected components of G and G are equal if K = K and there is a permutation π on {1, . [sent-66, score-0.746]

22 Section 2 describes the covariance graph thresholding idea along with theoretical justiﬁcation and related work, followed by complexity analysis of the algorithmic framework in Section 3. [sent-74, score-0.627]

23 (2) The above deﬁnes a symmetric graph G (λ) = (V , E (λ) ), namely the estimated concentration graph (Cox and Wermuth, 1996; Lauritzen, 1996) deﬁned on the nodes V = {1, . [sent-78, score-0.644]

24 Suppose the graph G (λ) admits a decomposition into κ(λ) connected components: κ(λ) G (λ) = ∪ℓ=1 Gℓ(λ) , (λ) (λ) (3) (λ) where Gℓ = (Vℓ , E ℓ ) are the components of the graph G (λ) . [sent-82, score-1.133]

25 Given λ, we perform a thresholding on the entries of the sample covariance matrix S and obtain a graph edge skeleton E(λ) ∈ {0, 1} p×p deﬁned by: 1 if |Si j | > λ, i = j; (λ) Ei j = (4) 0 otherwise. [sent-88, score-0.719]

26 The symmetric matrix E(λ) deﬁnes a symmetric graph on the nodes V = {1, . [sent-89, score-0.373]

27 We refer to this as the thresholded sample covariance graph. [sent-93, score-0.673]

28 Similar to the decomposition in (3), the graph G(λ) also admits a decomposition into connected components: k(λ) (λ) G(λ) = ∪ℓ=1 Gℓ , (λ) (λ) (5) (λ) where Gℓ = (Vℓ , Eℓ ) are the components of the graph G(λ) . [sent-94, score-1.157]

29 Construction of G(λ) and its components require operating on S—an operation that can be performed completely independent of the optimization problem (1), which is arguably more expensive (See Section 3). [sent-96, score-0.262]

30 The surprising message we describe in this paper is that the vertex-partition of the connected components of (5) is exactly equal to that of (3). [sent-97, score-0.642]

31 We obtain a very interesting property of the path of solutions {Θ }λ≥0 —the behavior of the connected components of the estimated concentration graph can be completely understood by simple screening rules on S. [sent-99, score-1.151]

32 The cost of computing the connected components of the thresholded sample covariance graph (5) is orders of magnitude smaller than the cost of ﬁtting graphical models (1). [sent-101, score-1.67]

33 Furthermore, the computations pertaining to the covariance graph can be done off-line and are amenable to parallel computation (See Section 3). [sent-102, score-0.577]

34 Suppose that for λ0 , there are k(λ0 ) components and the graphical model computations are distributed. [sent-111, score-0.411]

35 Consider a distributed computing architecture, where every machine allows operating on a graphical lasso problem (1) of maximal size pmax . [sent-114, score-0.382]

36 Then with relatively small effort we can ﬁnd the smallest value of λ = λ pmax , such that there are no connected components of size larger than pmax . [sent-115, score-0.712]

37 The following theorem establishes the main technical contribution of this paper—the equivalence of the vertex-partitions induced by the connected components of the thresholded sample covariance graph and the estimated concentration graph. [sent-118, score-1.736]

38 Theorem 1 For any λ > 0, the components of the estimated concentration graph G (λ) , as deﬁned in (2) and (3) induce exactly the same vertex-partition as that of the thresholded sample covariance graph G(λ) , deﬁned in (4) and (5). [sent-119, score-1.443]

39 Since the decomposition of a symmetric graph into its connected components depends upon the ordering/ labeling of the components, the permutation π appears in Theorem 1. [sent-129, score-0.992]

40 Distributing these operations depend upon the number of processors available, their capacities, communication lag, the number of components and the maximal size of the blocks across all machines. [sent-133, score-0.315]

41 784 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO Theorem 1 leads to a special property of the path-of-solutions to (1), that is, the vertex-partition induced by the connected components of G (λ) are nested with increasing λ. [sent-136, score-0.752]

42 Theorem 2 Consider two values of the regularization parameter such that λ > λ′ > 0, with corre′ sponding concentration graphs G (λ) and G (λ ) as in (2) and connected components (3). [sent-138, score-0.685]

43 Then the vertex-partition induced by the components of G (λ) are nested within the partition induced by the ′ (λ) components of G (λ ) . [sent-139, score-0.676]

44 Remark 2 It is worth noting that Theorem 2 addresses the nesting of the edges across connected components and not within a component. [sent-143, score-0.651]

45 In general, the edge-set E (λ) of the estimated concentration graph need not be nested as a function of λ: ′ for λ > λ′ , in general, E (λ) ⊂ E (λ ) . [sent-144, score-0.381]

46 Furthermore, the ith diagonal entries of the estimated covariance and precision matrices are given by (Sii + λ) and Sii1 , respectively. [sent-150, score-0.411]

47 ,p {max j=i |Si j |}, the estimated covariance and precision matrices obtained from (1) are both diagonal. [sent-154, score-0.338]

48 At the time of our writing, an earlier version of their paper was available (Witten and Friedman, 2011); it proposed a scheme to detect isolated nodes for problem (1) via a simple screening of the entries of S, but no block screening. [sent-158, score-0.372]

49 A hard thresholding is performed on the ℓ1 -penalized regression u coefﬁcient estimates at every node to obtain the graph structure. [sent-166, score-0.362]

50 Computational Complexity The overall complexity of our proposal depends upon (a) the graph partition stage and (b) solving (sub)problems of the form (1). [sent-170, score-0.357]

51 The cost of computing the connected components of the thresholded covariance graph is fairly negligible when compared to solving a similar sized graphical lasso problem (1)—see also our simulation studies in Section 4. [sent-172, score-1.775]

52 , n the cost for creating the sample covariance matrix S is O(n · p2 ). [sent-176, score-0.337]

53 Obtaining the connected components of the thresholded covariance graph costs O(|E(λ) | + p) (Tarjan, 1972). [sent-178, score-1.485]

54 Since we are interested in a region where the thresholded covariance graph is sparse enough to be broken into smaller connected components—|E(λ) | ≪ p2 . [sent-179, score-1.298]

55 Note that all computations pertaining to the construction of the connected components and the task of computing S can be computed off-line. [sent-180, score-0.7]

56 Gazit (1991, for example) describes parallel algorithms for computing connected components of a graph—they have a time complexity O(log(p)) and require O((|E(λ) | + p)/ log(p)) processors with space O(p + |E(λ) |). [sent-182, score-0.642]

57 It is quite clear that the role played by covariance thresholding is indeed crucial in this context. [sent-194, score-0.375]

58 Suppose for a given λ, the thresholded sample covariance graph has k(λ) components—the k(λ) (λ) total cost of solving these smaller problems is then ∑i=1 O(|Vi |J ) ≪ O(pJ ), with J ∈ {3, 4}. [sent-196, score-0.929]

59 This is what makes large scale graphical lasso problems solvable! [sent-198, score-0.262]

60 For obtaining the connected components of a symmetric adjacency matrix we used the MATLAB function graphconncomp. [sent-219, score-0.716]

61 The sample covariance matrix is S = S + σ ·UU ′ . [sent-235, score-0.337]

62 In all the examples shown in Table 1, we set λI := (λmax + λmin )/2, where for all values of λ in the interval [λmin , λmax ] the thresh-holded version of the sample covariance matrix has exactly K connected components. [sent-238, score-0.746]

63 We also took a larger value of λ, that is, λII := λmax , which gave sparser estimates of the precision matrix but the number of connected components were the same. [sent-239, score-0.715]

64 The computations across different connected blocks could be distributed into as many machines. [sent-240, score-0.521]

65 Table 1 shows the rather remarkable improvements obtained by using our proposed covariance thresholding strategy as compared to operating on the whole matrix. [sent-243, score-0.403]

66 The time taken by the graph-partitioning step in splitting the thresholded covariance graph into its connected components is negligible as compared to the timings for the optimization problem. [sent-247, score-1.554]

67 The column ‘graph partition’ lists the time for computing the connected components of the thresholded sample covariance graph. [sent-303, score-1.291]

68 2 Micro-array Data Examples The graphical lasso is often used in learning connectivity networks in gene-microarray data (Friedman et al. [sent-308, score-0.262]

69 Since in most real examples the number of genes p is around tens of thousands, obtaining an inverse covariance matrix by solving (1) is computationally impractical. [sent-310, score-0.374]

70 The covariance thresholding method we propose easily applies to these problems—and as we 788 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO see gracefully delivers solutions over a large range of the parameter λ. [sent-311, score-0.375]

71 We study three different micro-array examples and observe that as one varies λ from large to small values, the thresholded covariance graph splits into a number of non-trivial connected components of varying sizes. [sent-312, score-1.509]

72 We continue till a small/moderate value of λ when the maximal size of a connected component gets larger than a predeﬁned machine-capacity or the ‘computational budget’ for a single graphical lasso problem. [sent-313, score-0.741]

73 Note that in relevant micro-array applications, since p ≫ n (n, the number of samples is at most a few hundred) heavy regularization is required to control the variance of the covariance estimates—so it does seem reasonable to restrict to solutions of (1) for large values of λ. [sent-314, score-0.267]

74 The exact thresholding methodolgy could have also been applied to the sample covariance matrix. [sent-328, score-0.409]

75 Figure 1 shows how the component sizes of the thresholded covariance graph change across λ. [sent-330, score-0.925]

76 Note that the connected components change only at the absolute values of the entries of S. [sent-332, score-0.664]

77 From the sorted absolute values of the off-diagonal entries of S, we obtained the smallest value of λ, say λ′ , for which the size of the maximal min connected component was 1500. [sent-333, score-0.5]

78 For a grid of values of λ till λ′ , we computed the connected min components of the thresholded sample-covariance matrix and obtained the size-distribution of the various connected components. [sent-334, score-1.46]

79 The number of connected components of a particular size is denoted by a color-scheme, described by the color-bar in the ﬁgures. [sent-336, score-0.618]

80 With increasing λ: the larger connected components gradually disappear as they decompose into smaller components; the sizes of the connected components decrease and the frequency of the smaller components increase. [sent-337, score-1.47]

81 Since these are all correlation matrices, for λ ≥ 1 all the nodes in the graph become isolated. [sent-338, score-0.285]

82 The range of λ values for which the maximal size of the components is smaller than 1500 differ across the three examples. [sent-339, score-0.287]

83 Note that by Theorem 1, the pattern of the components appearing in Figure 1 are exactly the same as the components appearing in the solution of (1) for that λ. [sent-341, score-0.468]

84 5 3 log10 (Size of Components) Figure 1: Figure showing the size distribution (in the log-scale) of connected components arising from the thresholded sample covariance graph for examples (A)-(C). [sent-379, score-1.519]

85 For every value of λ (vertical axis), the horizontal slice denotes the sizes of the different components appearing in the thresholded covariance graph. [sent-380, score-0.898]

86 The colors represent the number of components in the graph having that speciﬁc size. [sent-381, score-0.437]

87 For every ﬁgure, the range of λ values is chosen such that the maximal size of the connected components do not exceed 1500. [sent-382, score-0.663]

88 The property is fairly surprising—the vertex partition induced by the connected components of the non-zero pattern of the estimated concentration matrix (at λ) and the thresholded sample covariance matrix S (at λ) are exactly equal. [sent-386, score-1.592]

89 Our observation not only provides interesting insights into the properties of the graphical lasso solution-path but also opens the door to solving large-scale graphical lasso problems, which are otherwise intractable. [sent-388, score-0.552]

90 Under this ordering of the vertices, the edge-matrix of the thresholded covariance graph is of the form:  (λ)  E1 0 ··· 0   (λ) 0 ···   0 E2 (λ)  . [sent-404, score-0.867]

91 Note that by construction of the thresholded sample covariance graph, (λ) (λ) if i ∈ Vℓ and j ∈ Vℓ′ with ℓ = ℓ′ , then |Si j | ≤ λ. [sent-453, score-0.673]

92 The above argument shows that the connected components obtained from the estimated preci(λ) sion graph G (λ) leads to a partition of the vertices {Vℓ }1≤ℓ≤κ(λ) such that for every ℓ ∈ {1, . [sent-457, score-0.981]

93 This proves that the connected components of G(λ) leads to a partition of the vertices, which is ﬁner than the vertex-partition induced by the components of G (λ) . [sent-467, score-0.951]

94 The permutation π in the theorem appears since the labeling of the connected components is not unique. [sent-470, score-0.74]

95 2 Proof of Theorem 2 Proof This proof is a direct consequence of Theorem 1, which establishes that the vertex-partitions induced by the the connected components of the estimated precision graph and the thresholded sample covariance graph are equal. [sent-472, score-1.904]

96 Observe that, by construction, the connected components of the thresholded sample covariance ′ graph, that is, G(λ) are nested within the connected components of G(λ ) . [sent-473, score-1.957]

97 In particular, the vertexpartition induced by the components of the thresholded sample covariance graph at λ, is contained inside the vertex-partition induced by the components of the thresholded sample covariance graph at λ′ . [sent-474, score-2.392]

98 An optimal randomized parallel algorithm for ﬁnding connected components in a graph. [sent-530, score-0.618]

99 Adaptive ﬁrst-order methods for general sparse inverse covariance selection. [sent-551, score-0.332]

100 High-dimensional covariance estimation based on u u gaussian graphical models. [sent-656, score-0.418]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('connected', 0.409), ('thresholded', 0.372), ('smacs', 0.342), ('covariance', 0.267), ('glasso', 0.257), ('graph', 0.228), ('components', 0.209), ('screening', 0.16), ('graphical', 0.151), ('astie', 0.141), ('azumder', 0.141), ('hresholding', 0.121), ('lasso', 0.111), ('thresholding', 0.108), ('raphical', 0.103), ('asso', 0.093), ('xact', 0.093), ('induced', 0.086), ('witten', 0.071), ('block', 0.07), ('timings', 0.069), ('concentration', 0.067), ('banerjee', 0.067), ('friedman', 0.061), ('si', 0.06), ('vertices', 0.059), ('nodes', 0.057), ('stanford', 0.052), ('computations', 0.051), ('nested', 0.048), ('wrapper', 0.048), ('tissue', 0.047), ('enormous', 0.047), ('pmax', 0.047), ('entries', 0.046), ('maximal', 0.045), ('inverse', 0.043), ('permutation', 0.042), ('kkt', 0.042), ('wi', 0.042), ('path', 0.04), ('dated', 0.04), ('mvn', 0.04), ('serially', 0.04), ('uu', 0.04), ('isolated', 0.039), ('proposal', 0.039), ('partition', 0.038), ('estimated', 0.038), ('matrix', 0.036), ('adjacency', 0.036), ('admits', 0.035), ('screen', 0.034), ('rahul', 0.034), ('sii', 0.034), ('sample', 0.034), ('across', 0.033), ('precision', 0.033), ('oligonucleotide', 0.031), ('rothman', 0.031), ('pertaining', 0.031), ('issn', 0.03), ('lu', 0.029), ('appears', 0.029), ('fortran', 0.029), ('colon', 0.029), ('mazumder', 0.029), ('took', 0.028), ('operating', 0.028), ('blocks', 0.028), ('solving', 0.028), ('boyd', 0.027), ('diagonal', 0.027), ('scheinberg', 0.027), ('diagonals', 0.027), ('pj', 0.027), ('signature', 0.027), ('url', 0.026), ('doi', 0.026), ('node', 0.026), ('symmetric', 0.026), ('theorem', 0.026), ('arguably', 0.025), ('breast', 0.025), ('till', 0.025), ('sizes', 0.025), ('labeling', 0.025), ('appearing', 0.025), ('ii', 0.024), ('complexity', 0.024), ('cox', 0.024), ('ner', 0.024), ('splits', 0.024), ('surprising', 0.024), ('decomposition', 0.024), ('operate', 0.023), ('speedup', 0.023), ('matlab', 0.023), ('yuan', 0.023), ('sparse', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Author: Rahul Mazumder, Trevor Hastie

2 0.24144302 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides ﬁtting Gaussian graphical models, it also provides functions for ﬁtting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efﬁciency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efﬁcient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efﬁciency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a ﬂexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modiﬁed the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efﬁcient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=

3 0.15140289 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

Abstract: We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efﬁcient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of −2 samples n = Ω(Jmin log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufﬁcient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency. Keywords: Gaussian graphical model selection, high-dimensional learning, local-separation property, walk-summability, necessary conditions for model selection

4 0.07155101 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

Author: Jian Huang, Cun-Hui Zhang

Abstract: The ℓ1 -penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1 -penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1 -penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. Keywords: variable selection, penalized estimation, oracle inequality, generalized linear models, selection consistency, sparsity

5 0.069872178 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

Author: Christopher R. Genovese, Jiashun Jin, Larry Wasserman, Zhigang Yao

Abstract: The lasso is an important method for sparse, high-dimensional regression problems, with efﬁcient algorithms available, a long history of practical success, and a large body of theoretical results supporting and explaining its performance. But even with the best available algorithms, ﬁnding the lasso solutions remains a computationally challenging task in cases where the number of covariates vastly exceeds the number of data points. Marginal regression, where each dependent variable is regressed separately on each covariate, offers a promising alternative in this case because the estimates can be computed roughly two orders faster than the lasso solutions. The question that remains is how the statistical performance of the method compares to that of the lasso in these cases. In this paper, we study the relative statistical performance of the lasso and marginal regression for sparse, high-dimensional regression problems. We consider the problem of learning which coefﬁcients are non-zero. Our main results are as follows: (i) we compare the conditions under which the lasso and marginal regression guarantee exact recovery in the ﬁxed design, noise free case; (ii) we establish conditions under which marginal regression provides exact recovery with high probability in the ﬁxed design, noise free, random coefﬁcients case; and (iii) we derive rates of convergence for both procedures, where performance is measured by the number of coefﬁcients with incorrect sign, and characterize the regions in the parameter space recovery is and is not possible under this metric. In light of the computational advantages of marginal regression in very high dimensional problems, our theoretical and simulations results suggest that the procedure merits further study. Keywords: high-dimensional regression, lasso, phase diagram, regularization

6 0.067942642 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

7 0.064605124 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

8 0.04651754 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

9 0.044426151 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

10 0.043699265 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

11 0.042746626 118 jmlr-2012-Variational Multinomial Logit Gaussian Process

12 0.039767534 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

13 0.039017271 3 jmlr-2012-A Geometric Approach to Sample Compression

14 0.038121618 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization

15 0.03740279 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

16 0.036699347 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

17 0.036638953 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression

18 0.036196373 73 jmlr-2012-Multi-task Regression using Minimal Penalties

19 0.034534045 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

20 0.034153767 12 jmlr-2012-Active Clustering of Biological Sequences

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.178), (1, 0.148), (2, 0.041), (3, -0.146), (4, 0.055), (5, 0.354), (6, -0.077), (7, -0.084), (8, -0.008), (9, -0.068), (10, 0.091), (11, 0.041), (12, -0.2), (13, -0.296), (14, -0.224), (15, -0.175), (16, 0.046), (17, 0.008), (18, 0.112), (19, 0.099), (20, 0.052), (21, 0.01), (22, -0.037), (23, 0.13), (24, -0.008), (25, 0.062), (26, -0.023), (27, 0.032), (28, -0.008), (29, 0.012), (30, -0.003), (31, -0.011), (32, -0.028), (33, -0.085), (34, -0.04), (35, 0.072), (36, -0.019), (37, 0.023), (38, 0.058), (39, -0.048), (40, 0.003), (41, 0.025), (42, 0.007), (43, -0.065), (44, 0.004), (45, -0.02), (46, 0.035), (47, 0.089), (48, -0.029), (49, -0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97860855 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Author: Rahul Mazumder, Trevor Hastie

2 0.9416554 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R

Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

3 0.69292182 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky

4 0.44436124 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence deﬁnes a ﬁner partitioning of DAGs than observational Markov equivalence and hence improves the identiﬁability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

5 0.37679392 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

Author: Neil D. Lawrence

Abstract: We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random ﬁelds (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter ﬁtting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random ﬁeld via the graphical lasso.

6 0.28303322 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

7 0.27575204 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

8 0.26385912 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

9 0.24711436 5 jmlr-2012-A Local Spectral Method for Graphs: With Applications to Improving Graph Partitions and Exploring Data Graphs Locally

10 0.24038814 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

11 0.20003192 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

12 0.19651173 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity

13 0.19377068 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression

14 0.18764067 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

15 0.18332864 3 jmlr-2012-A Geometric Approach to Sample Compression

16 0.17661285 118 jmlr-2012-Variational Multinomial Logit Gaussian Process

17 0.16574065 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information

18 0.16251981 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss

19 0.15804502 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics

20 0.15284842 12 jmlr-2012-Active Clustering of Biological Sequences

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.036), (26, 0.029), (29, 0.018), (56, 0.014), (57, 0.033), (75, 0.041), (79, 0.015), (92, 0.057), (96, 0.641)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9918294 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Author: Rahul Mazumder, Trevor Hastie

2 0.99037862 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition

Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth

Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose conﬁguration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning

3 0.97773319 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization

Author: Koby Crammer, Mark Dredze, Fernando Pereira

Abstract: Conﬁdence-weighted online learning is a generalization of margin-based learning of linear classiﬁers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classiﬁer weights that is updated online as examples are observed. The distribution captures a notion of conﬁdence on classiﬁer weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Conﬁdence-weighted learning was motivated by the statistical properties of natural-language classiﬁcation tasks, where most of the informative features are relatively rare. We investigate several versions of conﬁdence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classiﬁcation for the example. Empirical evaluation on a range of textcategorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classiﬁer combination for a type of distributed training commonly used in cloud computing. Keywords: online learning, conﬁdence prediction, text categorization

4 0.96673036 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization

Author: Karthik Mohan, Maryam Fazel

Abstract: The problem of minimizing the rank of a matrix subject to afﬁne constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to ﬁnd the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efﬁcient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator deﬁning the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efﬁcient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property

5 0.9279682 91 jmlr-2012-Plug-in Approach to Active Learning

Author: Stanislav Minsker

Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classiﬁcation, conﬁdence bands

6 0.78346479 106 jmlr-2012-Sign Language Recognition using Sub-Units

7 0.76727581 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

8 0.76278013 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models

9 0.75260204 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs

10 0.73909044 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression

11 0.73806781 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks

12 0.73751211 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis

13 0.73571628 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development

14 0.73501301 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization

15 0.73495305 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training

16 0.7314555 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

17 0.72927225 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

18 0.72854471 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

19 0.72728676 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

20 0.72409952 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices