jmlr jmlr2012 jmlr2012-40 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Statistics Stanford University Stanford, CA 94305 Editor: Francis Bach Abstract We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. [sent-3, score-0.594]
2 Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. [sent-4, score-1.429]
3 We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. [sent-5, score-2.704]
4 This characterizes a very interesting property of a path of graphical lasso solutions. [sent-6, score-0.302]
5 For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. [sent-8, score-0.325]
6 Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation 1. [sent-10, score-1.605]
7 Introduction Consider a data matrix Xn×p comprising of n sample realizations from a p dimensional Gaussian distribution with zero mean and positive definite covariance matrix Σ p×p (unknown), that is, i. [sent-11, score-0.373]
8 ℓ1 regularized Sparse Inverse Covariance Selection also known as graphical lasso (Friedman et al. [sent-18, score-0.262]
9 , 2008; Yuan and Lin, 2007) estimates the covariance matrix Σ, under the assumption that the inverse covariance matrix, that is, Σ−1 is sparse. [sent-20, score-0.613]
10 This is achieved by minimizing the regularized negative log-likelihood function: minimize − log det(Θ) + tr(SΘ) + λ ∑ |Θi j |, Θ 0 (1) i, j where S is the sample covariance matrix. [sent-21, score-0.301]
11 We note that (1) can also be used in a more non-parametric fashion for any positive semi-definite input matrix S, not necessarily a sample covariance matrix of a MVN sample as described above. [sent-24, score-0.407]
12 This paper is about one such (surprising) property—namely establishing an equivalence between the vertex-partition induced by the connected components of the (λ) non-zero pattern of Θ and the thresholded sample covariance matrix S. [sent-35, score-1.413]
13 This strategy becomes extremely effective in solving large problems over a range of values of λ—sufficiently restricted to ensure sparsity and the separation into connected components. [sent-41, score-0.437]
14 A finite undirected graph G on p vertices is given by the ordered tuple G = (V , E ), where V is the set of nodes and E the collection of (undirected) edges. [sent-47, score-0.344]
15 We use the convention that a node is not connected to itself, so the diagonals of the adjacency matrix are all zeros. [sent-49, score-0.534]
16 We say two nodes u, v ∈ V are connected if there is a path between them. [sent-51, score-0.506]
17 A maximal connected subgraph2 is a connected component of the graph G . [sent-52, score-1.091]
18 Connectedness is an equivalence relation that decomposes a graph G into its connected components {(Vℓ , E ℓ )}1≤ℓ≤K —with G = ∪K (Vℓ , E ℓ ), ℓ=1 where K denotes the number of connected components. [sent-53, score-1.255]
19 Throughout this paper we will often refer to this partition as the vertex-partition induced by the components of the graph G . [sent-59, score-0.561]
20 Suppose a graph G defined on the set of vertices V admits the following decomposition into connected components: G = ∪K (Vℓ , E ℓ ). [sent-61, score-0.755]
21 782 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO partitions induced by the connected components of G and G are equal if K = K and there is a permutation π on {1, . [sent-66, score-0.746]
22 Section 2 describes the covariance graph thresholding idea along with theoretical justification and related work, followed by complexity analysis of the algorithmic framework in Section 3. [sent-74, score-0.627]
23 (2) The above defines a symmetric graph G (λ) = (V , E (λ) ), namely the estimated concentration graph (Cox and Wermuth, 1996; Lauritzen, 1996) defined on the nodes V = {1, . [sent-78, score-0.644]
24 Suppose the graph G (λ) admits a decomposition into κ(λ) connected components: κ(λ) G (λ) = ∪ℓ=1 Gℓ(λ) , (λ) (λ) (3) (λ) where Gℓ = (Vℓ , E ℓ ) are the components of the graph G (λ) . [sent-82, score-1.133]
25 Given λ, we perform a thresholding on the entries of the sample covariance matrix S and obtain a graph edge skeleton E(λ) ∈ {0, 1} p×p defined by: 1 if |Si j | > λ, i = j; (λ) Ei j = (4) 0 otherwise. [sent-88, score-0.719]
26 The symmetric matrix E(λ) defines a symmetric graph on the nodes V = {1, . [sent-89, score-0.373]
27 We refer to this as the thresholded sample covariance graph. [sent-93, score-0.673]
28 Similar to the decomposition in (3), the graph G(λ) also admits a decomposition into connected components: k(λ) (λ) G(λ) = ∪ℓ=1 Gℓ , (λ) (λ) (5) (λ) where Gℓ = (Vℓ , Eℓ ) are the components of the graph G(λ) . [sent-94, score-1.157]
29 Construction of G(λ) and its components require operating on S—an operation that can be performed completely independent of the optimization problem (1), which is arguably more expensive (See Section 3). [sent-96, score-0.262]
30 The surprising message we describe in this paper is that the vertex-partition of the connected components of (5) is exactly equal to that of (3). [sent-97, score-0.642]
31 We obtain a very interesting property of the path of solutions {Θ }λ≥0 —the behavior of the connected components of the estimated concentration graph can be completely understood by simple screening rules on S. [sent-99, score-1.151]
32 The cost of computing the connected components of the thresholded sample covariance graph (5) is orders of magnitude smaller than the cost of fitting graphical models (1). [sent-101, score-1.67]
33 Furthermore, the computations pertaining to the covariance graph can be done off-line and are amenable to parallel computation (See Section 3). [sent-102, score-0.577]
34 Suppose that for λ0 , there are k(λ0 ) components and the graphical model computations are distributed. [sent-111, score-0.411]
35 Consider a distributed computing architecture, where every machine allows operating on a graphical lasso problem (1) of maximal size pmax . [sent-114, score-0.382]
36 Then with relatively small effort we can find the smallest value of λ = λ pmax , such that there are no connected components of size larger than pmax . [sent-115, score-0.712]
37 The following theorem establishes the main technical contribution of this paper—the equivalence of the vertex-partitions induced by the connected components of the thresholded sample covariance graph and the estimated concentration graph. [sent-118, score-1.736]
38 Theorem 1 For any λ > 0, the components of the estimated concentration graph G (λ) , as defined in (2) and (3) induce exactly the same vertex-partition as that of the thresholded sample covariance graph G(λ) , defined in (4) and (5). [sent-119, score-1.443]
39 Since the decomposition of a symmetric graph into its connected components depends upon the ordering/ labeling of the components, the permutation π appears in Theorem 1. [sent-129, score-0.992]
40 Distributing these operations depend upon the number of processors available, their capacities, communication lag, the number of components and the maximal size of the blocks across all machines. [sent-133, score-0.315]
41 784 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO Theorem 1 leads to a special property of the path-of-solutions to (1), that is, the vertex-partition induced by the connected components of G (λ) are nested with increasing λ. [sent-136, score-0.752]
42 Theorem 2 Consider two values of the regularization parameter such that λ > λ′ > 0, with corre′ sponding concentration graphs G (λ) and G (λ ) as in (2) and connected components (3). [sent-138, score-0.685]
43 Then the vertex-partition induced by the components of G (λ) are nested within the partition induced by the ′ (λ) components of G (λ ) . [sent-139, score-0.676]
44 Remark 2 It is worth noting that Theorem 2 addresses the nesting of the edges across connected components and not within a component. [sent-143, score-0.651]
45 In general, the edge-set E (λ) of the estimated concentration graph need not be nested as a function of λ: ′ for λ > λ′ , in general, E (λ) ⊂ E (λ ) . [sent-144, score-0.381]
46 Furthermore, the ith diagonal entries of the estimated covariance and precision matrices are given by (Sii + λ) and Sii1 , respectively. [sent-150, score-0.411]
47 ,p {max j=i |Si j |}, the estimated covariance and precision matrices obtained from (1) are both diagonal. [sent-154, score-0.338]
48 At the time of our writing, an earlier version of their paper was available (Witten and Friedman, 2011); it proposed a scheme to detect isolated nodes for problem (1) via a simple screening of the entries of S, but no block screening. [sent-158, score-0.372]
49 A hard thresholding is performed on the ℓ1 -penalized regression u coefficient estimates at every node to obtain the graph structure. [sent-166, score-0.362]
50 Computational Complexity The overall complexity of our proposal depends upon (a) the graph partition stage and (b) solving (sub)problems of the form (1). [sent-170, score-0.357]
51 The cost of computing the connected components of the thresholded covariance graph is fairly negligible when compared to solving a similar sized graphical lasso problem (1)—see also our simulation studies in Section 4. [sent-172, score-1.775]
52 , n the cost for creating the sample covariance matrix S is O(n · p2 ). [sent-176, score-0.337]
53 Obtaining the connected components of the thresholded covariance graph costs O(|E(λ) | + p) (Tarjan, 1972). [sent-178, score-1.485]
54 Since we are interested in a region where the thresholded covariance graph is sparse enough to be broken into smaller connected components—|E(λ) | ≪ p2 . [sent-179, score-1.298]
55 Note that all computations pertaining to the construction of the connected components and the task of computing S can be computed off-line. [sent-180, score-0.7]
56 Gazit (1991, for example) describes parallel algorithms for computing connected components of a graph—they have a time complexity O(log(p)) and require O((|E(λ) | + p)/ log(p)) processors with space O(p + |E(λ) |). [sent-182, score-0.642]
57 It is quite clear that the role played by covariance thresholding is indeed crucial in this context. [sent-194, score-0.375]
58 Suppose for a given λ, the thresholded sample covariance graph has k(λ) components—the k(λ) (λ) total cost of solving these smaller problems is then ∑i=1 O(|Vi |J ) ≪ O(pJ ), with J ∈ {3, 4}. [sent-196, score-0.929]
59 This is what makes large scale graphical lasso problems solvable! [sent-198, score-0.262]
60 For obtaining the connected components of a symmetric adjacency matrix we used the MATLAB function graphconncomp. [sent-219, score-0.716]
61 The sample covariance matrix is S = S + σ ·UU ′ . [sent-235, score-0.337]
62 In all the examples shown in Table 1, we set λI := (λmax + λmin )/2, where for all values of λ in the interval [λmin , λmax ] the thresh-holded version of the sample covariance matrix has exactly K connected components. [sent-238, score-0.746]
63 We also took a larger value of λ, that is, λII := λmax , which gave sparser estimates of the precision matrix but the number of connected components were the same. [sent-239, score-0.715]
64 The computations across different connected blocks could be distributed into as many machines. [sent-240, score-0.521]
65 Table 1 shows the rather remarkable improvements obtained by using our proposed covariance thresholding strategy as compared to operating on the whole matrix. [sent-243, score-0.403]
66 The time taken by the graph-partitioning step in splitting the thresholded covariance graph into its connected components is negligible as compared to the timings for the optimization problem. [sent-247, score-1.554]
67 The column ‘graph partition’ lists the time for computing the connected components of the thresholded sample covariance graph. [sent-303, score-1.291]
68 2 Micro-array Data Examples The graphical lasso is often used in learning connectivity networks in gene-microarray data (Friedman et al. [sent-308, score-0.262]
69 Since in most real examples the number of genes p is around tens of thousands, obtaining an inverse covariance matrix by solving (1) is computationally impractical. [sent-310, score-0.374]
70 The covariance thresholding method we propose easily applies to these problems—and as we 788 E XACT T HRESHOLDING FOR G RAPHICAL L ASSO see gracefully delivers solutions over a large range of the parameter λ. [sent-311, score-0.375]
71 We study three different micro-array examples and observe that as one varies λ from large to small values, the thresholded covariance graph splits into a number of non-trivial connected components of varying sizes. [sent-312, score-1.509]
72 We continue till a small/moderate value of λ when the maximal size of a connected component gets larger than a predefined machine-capacity or the ‘computational budget’ for a single graphical lasso problem. [sent-313, score-0.741]
73 Note that in relevant micro-array applications, since p ≫ n (n, the number of samples is at most a few hundred) heavy regularization is required to control the variance of the covariance estimates—so it does seem reasonable to restrict to solutions of (1) for large values of λ. [sent-314, score-0.267]
74 The exact thresholding methodolgy could have also been applied to the sample covariance matrix. [sent-328, score-0.409]
75 Figure 1 shows how the component sizes of the thresholded covariance graph change across λ. [sent-330, score-0.925]
76 Note that the connected components change only at the absolute values of the entries of S. [sent-332, score-0.664]
77 From the sorted absolute values of the off-diagonal entries of S, we obtained the smallest value of λ, say λ′ , for which the size of the maximal min connected component was 1500. [sent-333, score-0.5]
78 For a grid of values of λ till λ′ , we computed the connected min components of the thresholded sample-covariance matrix and obtained the size-distribution of the various connected components. [sent-334, score-1.46]
79 The number of connected components of a particular size is denoted by a color-scheme, described by the color-bar in the figures. [sent-336, score-0.618]
80 With increasing λ: the larger connected components gradually disappear as they decompose into smaller components; the sizes of the connected components decrease and the frequency of the smaller components increase. [sent-337, score-1.47]
81 Since these are all correlation matrices, for λ ≥ 1 all the nodes in the graph become isolated. [sent-338, score-0.285]
82 The range of λ values for which the maximal size of the components is smaller than 1500 differ across the three examples. [sent-339, score-0.287]
83 Note that by Theorem 1, the pattern of the components appearing in Figure 1 are exactly the same as the components appearing in the solution of (1) for that λ. [sent-341, score-0.468]
84 5 3 log10 (Size of Components) Figure 1: Figure showing the size distribution (in the log-scale) of connected components arising from the thresholded sample covariance graph for examples (A)-(C). [sent-379, score-1.519]
85 For every value of λ (vertical axis), the horizontal slice denotes the sizes of the different components appearing in the thresholded covariance graph. [sent-380, score-0.898]
86 The colors represent the number of components in the graph having that specific size. [sent-381, score-0.437]
87 For every figure, the range of λ values is chosen such that the maximal size of the connected components do not exceed 1500. [sent-382, score-0.663]
88 The property is fairly surprising—the vertex partition induced by the connected components of the non-zero pattern of the estimated concentration matrix (at λ) and the thresholded sample covariance matrix S (at λ) are exactly equal. [sent-386, score-1.592]
89 Our observation not only provides interesting insights into the properties of the graphical lasso solution-path but also opens the door to solving large-scale graphical lasso problems, which are otherwise intractable. [sent-388, score-0.552]
90 Under this ordering of the vertices, the edge-matrix of the thresholded covariance graph is of the form: (λ) E1 0 ··· 0 (λ) 0 ··· 0 E2 (λ) . [sent-404, score-0.867]
91 Note that by construction of the thresholded sample covariance graph, (λ) (λ) if i ∈ Vℓ and j ∈ Vℓ′ with ℓ = ℓ′ , then |Si j | ≤ λ. [sent-453, score-0.673]
92 The above argument shows that the connected components obtained from the estimated preci(λ) sion graph G (λ) leads to a partition of the vertices {Vℓ }1≤ℓ≤κ(λ) such that for every ℓ ∈ {1, . [sent-457, score-0.981]
93 This proves that the connected components of G(λ) leads to a partition of the vertices, which is finer than the vertex-partition induced by the components of G (λ) . [sent-467, score-0.951]
94 The permutation π in the theorem appears since the labeling of the connected components is not unique. [sent-470, score-0.74]
95 2 Proof of Theorem 2 Proof This proof is a direct consequence of Theorem 1, which establishes that the vertex-partitions induced by the the connected components of the estimated precision graph and the thresholded sample covariance graph are equal. [sent-472, score-1.904]
96 Observe that, by construction, the connected components of the thresholded sample covariance ′ graph, that is, G(λ) are nested within the connected components of G(λ ) . [sent-473, score-1.957]
97 In particular, the vertexpartition induced by the components of the thresholded sample covariance graph at λ, is contained inside the vertex-partition induced by the components of the thresholded sample covariance graph at λ′ . [sent-474, score-2.392]
98 An optimal randomized parallel algorithm for finding connected components in a graph. [sent-530, score-0.618]
99 Adaptive first-order methods for general sparse inverse covariance selection. [sent-551, score-0.332]
100 High-dimensional covariance estimation based on u u gaussian graphical models. [sent-656, score-0.418]
wordName wordTfidf (topN-words)
[('connected', 0.409), ('thresholded', 0.372), ('smacs', 0.342), ('covariance', 0.267), ('glasso', 0.257), ('graph', 0.228), ('components', 0.209), ('screening', 0.16), ('graphical', 0.151), ('astie', 0.141), ('azumder', 0.141), ('hresholding', 0.121), ('lasso', 0.111), ('thresholding', 0.108), ('raphical', 0.103), ('asso', 0.093), ('xact', 0.093), ('induced', 0.086), ('witten', 0.071), ('block', 0.07), ('timings', 0.069), ('concentration', 0.067), ('banerjee', 0.067), ('friedman', 0.061), ('si', 0.06), ('vertices', 0.059), ('nodes', 0.057), ('stanford', 0.052), ('computations', 0.051), ('nested', 0.048), ('wrapper', 0.048), ('tissue', 0.047), ('enormous', 0.047), ('pmax', 0.047), ('entries', 0.046), ('maximal', 0.045), ('inverse', 0.043), ('permutation', 0.042), ('kkt', 0.042), ('wi', 0.042), ('path', 0.04), ('dated', 0.04), ('mvn', 0.04), ('serially', 0.04), ('uu', 0.04), ('isolated', 0.039), ('proposal', 0.039), ('partition', 0.038), ('estimated', 0.038), ('matrix', 0.036), ('adjacency', 0.036), ('admits', 0.035), ('screen', 0.034), ('rahul', 0.034), ('sii', 0.034), ('sample', 0.034), ('across', 0.033), ('precision', 0.033), ('oligonucleotide', 0.031), ('rothman', 0.031), ('pertaining', 0.031), ('issn', 0.03), ('lu', 0.029), ('appears', 0.029), ('fortran', 0.029), ('colon', 0.029), ('mazumder', 0.029), ('took', 0.028), ('operating', 0.028), ('blocks', 0.028), ('solving', 0.028), ('boyd', 0.027), ('diagonal', 0.027), ('scheinberg', 0.027), ('diagonals', 0.027), ('pj', 0.027), ('signature', 0.027), ('url', 0.026), ('doi', 0.026), ('node', 0.026), ('symmetric', 0.026), ('theorem', 0.026), ('arguably', 0.025), ('breast', 0.025), ('till', 0.025), ('sizes', 0.025), ('labeling', 0.025), ('appearing', 0.025), ('ii', 0.024), ('complexity', 0.024), ('cox', 0.024), ('ner', 0.024), ('splits', 0.024), ('surprising', 0.024), ('decomposition', 0.024), ('operate', 0.023), ('speedup', 0.023), ('matlab', 0.023), ('yuan', 0.023), ('sparse', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
2 0.24144302 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efficient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efficiency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a flexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modified the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efficient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=
3 0.15140289 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky
Abstract: We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of −2 samples n = Ω(Jmin log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency. Keywords: Gaussian graphical model selection, high-dimensional learning, local-separation property, walk-summability, necessary conditions for model selection
Author: Jian Huang, Cun-Hui Zhang
Abstract: The ℓ1 -penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1 -penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1 -penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. Keywords: variable selection, penalized estimation, oracle inequality, generalized linear models, selection consistency, sparsity
5 0.069872178 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
Author: Christopher R. Genovese, Jiashun Jin, Larry Wasserman, Zhigang Yao
Abstract: The lasso is an important method for sparse, high-dimensional regression problems, with efficient algorithms available, a long history of practical success, and a large body of theoretical results supporting and explaining its performance. But even with the best available algorithms, finding the lasso solutions remains a computationally challenging task in cases where the number of covariates vastly exceeds the number of data points. Marginal regression, where each dependent variable is regressed separately on each covariate, offers a promising alternative in this case because the estimates can be computed roughly two orders faster than the lasso solutions. The question that remains is how the statistical performance of the method compares to that of the lasso in these cases. In this paper, we study the relative statistical performance of the lasso and marginal regression for sparse, high-dimensional regression problems. We consider the problem of learning which coefficients are non-zero. Our main results are as follows: (i) we compare the conditions under which the lasso and marginal regression guarantee exact recovery in the fixed design, noise free case; (ii) we establish conditions under which marginal regression provides exact recovery with high probability in the fixed design, noise free, random coefficients case; and (iii) we derive rates of convergence for both procedures, where performance is measured by the number of coefficients with incorrect sign, and characterize the regions in the parameter space recovery is and is not possible under this metric. In light of the computational advantages of marginal regression in very high dimensional problems, our theoretical and simulations results suggest that the procedure merits further study. Keywords: high-dimensional regression, lasso, phase diagram, regularization
6 0.067942642 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
7 0.064605124 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs
8 0.04651754 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
9 0.044426151 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables
10 0.043699265 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
11 0.042746626 118 jmlr-2012-Variational Multinomial Logit Gaussian Process
12 0.039767534 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
13 0.039017271 3 jmlr-2012-A Geometric Approach to Sample Compression
14 0.038121618 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
16 0.036699347 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO
17 0.036638953 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
18 0.036196373 73 jmlr-2012-Multi-task Regression using Minimal Penalties
19 0.034534045 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
20 0.034153767 12 jmlr-2012-Active Clustering of Biological Sequences
topicId topicWeight
[(0, -0.178), (1, 0.148), (2, 0.041), (3, -0.146), (4, 0.055), (5, 0.354), (6, -0.077), (7, -0.084), (8, -0.008), (9, -0.068), (10, 0.091), (11, 0.041), (12, -0.2), (13, -0.296), (14, -0.224), (15, -0.175), (16, 0.046), (17, 0.008), (18, 0.112), (19, 0.099), (20, 0.052), (21, 0.01), (22, -0.037), (23, 0.13), (24, -0.008), (25, 0.062), (26, -0.023), (27, 0.032), (28, -0.008), (29, 0.012), (30, -0.003), (31, -0.011), (32, -0.028), (33, -0.085), (34, -0.04), (35, 0.072), (36, -0.019), (37, 0.023), (38, 0.058), (39, -0.048), (40, 0.003), (41, 0.025), (42, 0.007), (43, -0.065), (44, 0.004), (45, -0.02), (46, 0.035), (47, 0.089), (48, -0.029), (49, -0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.97860855 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
2 0.9416554 113 jmlr-2012-The huge Package for High-dimensional Undirected Graph Estimation in R
Author: Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
Abstract: We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency. Keywords: high-dimensional undirected graph estimation, glasso, huge, semiparametric graph estimation, data-dependent model selection, lossless screening, lossy screening 1. Overview Undirected graphs is a natural approach to describe the conditional independence among many variables. Each node of the graph represents a single variable and no edge between two variables implies that they are conditional independent given all other variables. In the past decade, significant progress has been made on designing efficient algorithms to learn undirected graphs from high-dimensional observational data sets. Most of these methods are based on either the penalized maximum-likelihood estimation (Friedman et al., 2007) or penalized regression methods (Meinshausen and B¨ hlmann, 2006). Existing packages include glasso, Covpath and CLIME. In particuu ∗. Also in the Department of Biostatistics. †. Also in the Department of Machine Learning. c 2012 Zhao, Liu, Roeder, Lafferty and Wasserman. Z HAO , L IU , ROEDER , L AFFERTY AND WASSERMAN lar, the glasso package has been widely adopted by statisticians and computer scientists due to its friendly user-inference and efficiency. In this paper1 we describe a newly developed R package named huge (High-dimensional Undirected Graph Estimation) coded in C. The package includes a wide range of functional modules and addresses some drawbacks of the graphical lasso algorithm. To gain more scalability, the package supports two modes of screening, lossless (Witten et al., 2011) and lossy screening. When using lossy screening, the user can select the desired screening level to scale up for high-dimensional problems, but this introduces some estimation bias. 2. Software Design and Implementation The package huge aims to provide a general framework for high-dimensional undirected graph estimation. The package includes Six functional modules (M1-M6) facilitate a flexible pipeline for analysis (Figure 1). M1. Data Generator: The function huge.generator() can simulate multivariate Gaussian data with different undirected graphs, including hub, cluster, band, scale-free, and Erd¨ s-R´ nyi o e random graphs. The sparsity level of the obtained graph and signal-to-noise ratio can also be set up by users. M2. Semiparametric Transformation: The function huge.npn() implements the nonparanormal method (Liu et al., 2009, 2012) for estimating a semiparametric Gaussian copula model.The nonparanormal family extends the Gaussian distribution by marginally transforming the variables. Computationally, the nonparanormal transformation only requires one pass through the data matrix. M3. Graph Screening: The scr argument in the main function huge() controls the use of largescale correlation screening before graph estimation. The function supports the lossless screening (Witten et al., 2011) and the lossy screening. Such screening procedures can greatly reduce the computational cost and achieve equal or even better estimation by reducing the variance at the expense of increased bias. Figure 1: The graph estimation pipeline. M4. Graph Estimation: Similar to the glasso package, the method argument in the huge() function supports two estimation methods: (i) the neighborhood pursuit algorithm (Meinshausen and B¨ hlmann, 2006) and (ii) the graphical lasso algorithm (Friedman et al., 2007). We apply u the coordinate descent with active set and covariance update, as well as other tricks suggested in Friedman et al. (2010). We modified the warm start trick to address the potential divergence problem of the graphical lasso algorithm (Mazumder and Hastie, 2011). The code is also memory-optimized using the sparse matrix data structure when estimating and storing full regularization paths for large 1. This paper is only a summary of the package huge. For more details please refer to the online vignette. 1060 H IGH - DIMENSIONAL U NDIRECTED G RAPH E STIMATION data sets. we also provide a complementary graph estimation method based on thresholding the sample correlation matrix, which is computationally efficient and widely applied in biomedical research. M5. Model Selection: The function huge.select() provides two regularization parameter selection methods: the stability approach for regularization selection (StARS) (Liu et al., 2010); and rotation information criterion (RIC). We also provide a likelihood-based extended Bayesian information criterion. M6. Graph Visualization: The plotting functions huge.plot() and plot() provide visualizations of the simulated data sets, estimated graphs and paths. The implementation is based on the igraph package. 3. User Interface by Example We illustrate the user interface by analyzing a stock market data which we contribute to the huge package. We acquired closing prices from all stocks in the S&P; 500 for all the days that the market was open between Jan 1, 2003 and Jan 1, 2008. This gave us 1258 samples for the 452 stocks that remained in the S&P; 500 during the entire time period. > > > > > library(huge) data(stockdata) # Load the data x = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) # Preprocessing x.npn = huge.npn(x, npn.func=
3 0.69292182 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
Author: Animashree Anandkumar, Vincent Y.F. Tan, Furong Huang, Alan S. Willsky
Abstract: We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of −2 samples n = Ω(Jmin log p), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency. Keywords: Gaussian graphical model selection, high-dimensional learning, local-separation property, walk-summability, necessary conditions for model selection
Author: Alain Hauser, Peter Bühlmann
Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search
5 0.37679392 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
Author: Neil D. Lawrence
Abstract: We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian Markov random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting model, which we call maximum entropy unfolding (MEU) is a nonlinear generalization of principal component analysis. We relate the model to Laplacian eigenmaps and isomap. We show that parameter fitting in the locally linear embedding (LLE) is approximate maximum likelihood MEU. We introduce a variant of LLE that performs maximum likelihood exactly: Acyclic LLE (ALLE). We show that MEU and ALLE are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the Gaussian random field via the graphical lasso.
6 0.28303322 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
7 0.27575204 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression
10 0.24038814 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
11 0.20003192 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
12 0.19651173 17 jmlr-2012-An Active Learning Algorithm for Ranking from Pairwise Preferences with an Almost Optimal Query Complexity
13 0.19377068 47 jmlr-2012-GPLP: A Local and Parallel Computation Toolbox for Gaussian Process Regression
14 0.18764067 38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization
15 0.18332864 3 jmlr-2012-A Geometric Approach to Sample Compression
16 0.17661285 118 jmlr-2012-Variational Multinomial Logit Gaussian Process
17 0.16574065 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information
18 0.16251981 10 jmlr-2012-A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss
19 0.15804502 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
20 0.15284842 12 jmlr-2012-Active Clustering of Biological Sequences
topicId topicWeight
[(21, 0.036), (26, 0.029), (29, 0.018), (56, 0.014), (57, 0.033), (75, 0.041), (79, 0.015), (92, 0.057), (96, 0.641)]
simIndex simValue paperId paperTitle
same-paper 1 0.9918294 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
2 0.99037862 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
3 0.97773319 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
Author: Koby Crammer, Mark Dredze, Fernando Pereira
Abstract: Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of textcategorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing. Keywords: online learning, confidence prediction, text categorization
4 0.96673036 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
5 0.9279682 91 jmlr-2012-Plug-in Approach to Active Learning
Author: Stanislav Minsker
Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classification, confidence bands
6 0.78346479 106 jmlr-2012-Sign Language Recognition using Sub-Units
7 0.76727581 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
8 0.76278013 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
9 0.75260204 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
10 0.73909044 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
11 0.73806781 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
12 0.73751211 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
13 0.73571628 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
14 0.73501301 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
15 0.73495305 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
16 0.7314555 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
17 0.72927225 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
18 0.72854471 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
19 0.72728676 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
20 0.72409952 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices