jmlr jmlr2010 jmlr2010-46 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
Reference: text
sentIndex sentText sentNum sentScore
1 EDU School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA Editor: John Lafferty Abstract This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. [sent-3, score-0.737]
2 Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. [sent-4, score-0.73]
3 Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity 1. [sent-7, score-1.233]
4 Introduction One of the classical problems in multivariate statistics is to estimate the covariance matrix or its inverse. [sent-8, score-0.561]
5 , Xp )′ be a p-dimensional random vector with an unknown covariance matrix Σ0 . [sent-12, score-0.455]
6 The usual sample covariance matrix is most often adopted for this purpose: S= 1 n (i) ¯ ¯ ∑ (X − X)(X (i) − X)′ , n i=1 ¯ where X = ∑ X (i) /n. [sent-17, score-0.455]
7 On the other hand, with the recent advances in science and technology, we are more and more often faced with the problem of high dimensional covariance matrix estimation where the dimensionality p is large when compared with the sample size n. [sent-21, score-0.549]
8 Motivated by the practical demands and the failure of classical methods, a number of sparse models and approaches have been introduced in recent years to deal with high dimensional covariance matrix estimation. [sent-24, score-0.573]
9 Bickel and Levina (2008a) pioneered the theoretical study of high dimensional sparse covariance matrices. [sent-27, score-0.483]
10 They consider the case where the magnitude of the entries of Σ0 decays at a polynomial rate of their distance from the diagonal; and show that banding the sample covariance matrix or S leads to well-behaved estimates. [sent-28, score-0.568]
11 More recently, Cai, Zhang and Zhou (2010) established minimax convergence rates for estimating this type of covariance matrices. [sent-29, score-0.469]
12 A more general class of covariance matrix model is investigated in Bickel and Levina (2008b) where the rows or columns of Σ0 is assumed to come from an ℓα ball with 0 < α < 1. [sent-30, score-0.455]
13 In addition to the aforementioned methods, sparse models have also been proposed for the modified Cholesky factor of the covariance matrix in a series of papers by Pourahmadi and co-authors (Pourahmadi, 1999; Pourahmadi, 2000; Wu and Pourahmadi, 2003; Huang et al. [sent-32, score-0.52]
14 In this paper, we focus on another type of sparsity—sparsity in terms of the entries of the inverse covariance matrix. [sent-34, score-0.616]
15 This type of sparsity naturally connects with the problem of covariance selection (Dempster, 1972) and Gaussian graphical models (see, e. [sent-35, score-0.605]
16 (2008) suggest that, although better than the sample covariance matrix, these methods may not perform well when p is larger than the sample size n. [sent-48, score-0.365]
17 It remains unclear to what extent the sparsity of inverse covariance matrix entails well-behaved covariance matrix estimates. [sent-49, score-1.183]
18 Through the study of a new estimating procedure, we show here that the estimability of a high dimensional inverse covariance matrix is related to how well it can be approximated by a graphical model with a relatively low degree. [sent-50, score-0.88]
19 A preliminary estimate is first constructed using a well known relationship between inverse covariance matrix and multivariate linear regression. [sent-53, score-0.738]
20 We show that the preliminary estimate, although often dismissed as an estimate of the inverse covariance matrix, can be easily modified to produce a satisfactory estimate for the inverse covariance matrix. [sent-54, score-1.211]
21 The probabilistic bounds we prove suggest that the estimation error of the proposed method adapts to the sparseness of the true inverse covariance matrix. [sent-56, score-0.583]
22 The implications of these oracle in2262 C OVARIANCE M ATRIX E STIMATION equalities are demonstrated on a couple of popular covariance matrix models. [sent-57, score-0.621]
23 When Ω0 corresponds to a Gaussian graphical model of degree d, we show that the proposed method can achieve convergence rate of the order O p [d(n−1 log p)−1/2 ] in terms of several matrix operator norms. [sent-58, score-0.317]
24 Neighborhood selection aims at identifying the correct graphical model whereas our goal is to estimate the covariance matrix. [sent-64, score-0.595]
25 The distinction is clear when the inverse covariance matrix is only “approximately” sparse and does not have many zero entries. [sent-65, score-0.733]
26 Even when the inverse covariance matrix is indeed sparse, the two tasks of estimation and selection can be different. [sent-66, score-0.713]
27 2 Initial Estimate From the aforementioned relationship, a zero entry on the ith column of the inverse covariance matrix implies a zero entry in the regression coefficient θ(i) and vice versa. [sent-97, score-0.744]
28 This property is exploited by Meinshausen and B¨ hlmann (2006) to identify the zero pattern of the inverse covariance matrix. [sent-98, score-0.658]
29 Such assumptions may be unrealistic and can be relaxed if the u purpose is to estimate the covariance matrix. [sent-108, score-0.409]
30 With such a distinction in mind, the question now is whether or not similar strategies of applying sparse multivariate linear regression to recover the inverse covariance matrix remains useful. [sent-109, score-0.795]
31 x ℓq In the case of q = 1 and q = ∞, the matrix norm can be given more explicitly as p A ℓ1 = A ℓ∞ = max ∑ 1≤ j≤p i=1 ai j ; p ∑ 1≤i≤p max ai j . [sent-141, score-0.452]
32 To this end, we propose to adjust Ω ˜ closest to Ω in the sense of the matrix ℓ1 norm, that is, it solves the following problem: min Ω is symmetric ˜ Ω−Ω Recall that ˜ Ω−Ω p ℓ1 ∑ 1≤ j≤p = max ℓ1 . [sent-145, score-0.402]
33 To sum up, our estimate of the inverse covariance matrix is obtained in the following steps: A LGORITHM FOR ˆ C OMPUTING Ω Input: Sample covariance matrix – S, tuning parameter – δ. [sent-148, score-1.184]
34 ˆ Output: An estimate of the inverse covariance matrix – Ω. [sent-149, score-0.676]
35 It is worth pointing out that that proposed method depends on the data only through the sample covariance matrix. [sent-154, score-0.365]
36 1≤i, j≤p We refer to O (ν, η, τ) as an “oracle” set because its definition requires the knowledge of the true covariance matrix Σ0 . [sent-167, score-0.455]
37 The requirement (5) is in place to ensure that the true inverse covariance matrix is indeed “approximately” sparse. [sent-176, score-0.632]
38 min The bound on the matrix ℓ2 has great practical implications when we are interested in estimating ˆ the covariance matrix or need to a positive definite estimate of Ω. [sent-186, score-0.765]
39 min 2268 C OVARIANCE M ATRIX E STIMATION When considering a particular class of inverse covariance matrices, we can use the oracle inequalities established here with a proper choice of the oracle set O . [sent-192, score-0.918]
40 When X follows a multivariate normal distribution, the sparsity of the entries of the inverse covariance matrix relates to the notion of conditional independence: the (i, j) entry of Ω0 being zero implies that Xi is independent of X j conditional on the remaining variables and vice versa. [sent-197, score-0.905]
41 Motivated by this connection, we consider the following class of inverse covariance matrices: M1 (τ0 , ν0 , d) = A ≻ 0 : A ℓ1 < τ0 , ν−1 < λmin (A) < λmax (A) < ν0 , deg(A) < d , 0 where τ0 , ν0 > 1, and deg(A) = maxi ∑ j I(Ai j = 0). [sent-202, score-0.542]
42 2269 Y UAN Theorem 5 indicates that the estimability of a sparse inverse covariance matrix is dictated by its degree as opposed to the total number of nonzero entries. [sent-212, score-0.781]
43 u As mentioned before, the goal of the neighborhood selection from Meinshausen and B¨ hlmann u (2006) is to select the correct graphical model whereas our focus here is on estimating the covariance matrix. [sent-217, score-0.746]
44 However, the neighborhood selection method can be followed by the maximum likelihood estimation based on the selected graphical model to yield a covariance matrix estimate. [sent-218, score-0.742]
45 It turns out that selecting the graphical model can be more difficult than estimating the covariance matrix as reflected by the more restrictive assumptions made in Meinshausen and B¨ hlmann (2006). [sent-220, score-0.727]
46 In particular, to be able to identify the nonzero entries of the inverse covariance u matrix, it is necessary that they are sufficiently large in magnitude whereas such requirement is generally not needed for the purpose of estimation. [sent-221, score-0.661]
47 3 Approximately Sparse Models In many applications, the inverse covariance matrix is only approximately sparse. [sent-224, score-0.632]
48 A popular way to model this class of covariance matrix is to assume that its rows or columns belong to an ℓα ball (0 < α < 1): M2 (τ0 , ν0 , α, M) = A ≻ 0 : A−1 p ℓ1 < τ0 , ν−1 ≤ λmin (A) ≤ λmax (A) ≤ ν0 , ∑ Ai j 0 α ≤M , j=1 where τ0 , ν0 > 1 and 0 < α < 1. [sent-225, score-0.455]
49 Assuming that Σ0 ∈ M2 , Bickel and Levina (2008b) study thresholding estimator of the covariance matrix. [sent-233, score-0.404]
50 It is 2270 C OVARIANCE M ATRIX E STIMATION however interesting to note that Bickel and Levina (2008b) show that thresholding the sample covariance matrix S at an appropriate level can achieve the same rate given by right hand side of (8). [sent-235, score-0.494]
51 Then there exists a constant C > 0 depending P ¯ Ω − Ω0 P ¯ Σ − Σ0 ℓ1 log p ≥ CM n and inf sup ¯ Σ Σ0 ∈M2 (τ0 ,ν0 ,α,M) ℓ1 log p ≥ CM n 1−α 2 > 0, (9) > 0, (10) 1−α 2 ¯ ¯ where the infimum is taken over all estimate, Ω or Σ, based on observations X (1) , . [sent-239, score-0.308]
52 Specifically, we generated n = 50 observations from a multivariate normal distribution with mean 0 and variance covariance matrix given by Σ0j = ρ|i− j| i for some ρ = 0. [sent-245, score-0.517]
53 Its inverse covariance matrix is banded with the magnitude of ρ determining the strength of the dependence among the coordinates. [sent-247, score-0.671]
54 For each simulated data set, we ran the proposed method to construct estimate of the inverse covariance matrix. [sent-256, score-0.586]
55 For comparison purposes, we included a couple of popular alternative covariance matrix estimates in the study. [sent-258, score-0.495]
56 As pointed out earlier, the goal u of the neighborhood selection is to identify the underlying graphical model rather than estimating the covariance matrix. [sent-262, score-0.63]
57 4 ρ Figure 1: Estimation error of the proposed method (black solid lines), the ℓ1 penalized likelihood estimate (red dashed lines) and the maximum likelihood estimate based on the graphical model selected through neighborhood selection (green dotted lines). [sent-327, score-0.397]
58 Recall that the inverse covariance matrix is banded with nonzero entries increasing in magnitude with ρ. [sent-332, score-0.79]
59 Such benefit diminishes for small values of ρ as identifying nonzero entries in the inverse covariance matrix becomes more difficult. [sent-335, score-0.793]
60 Both the ℓ1 penalized likelihood estimate and the neighborhood selection based method are computed using the graphical Lasso algorithm of Friedman, Hastie and Tibshirani (2008) which iteratively solves a sequence of p Lasso problems using a modified Lars algorithm (Efron et al. [sent-342, score-0.32]
61 Discussions High dimensional (inverse) covariance matrix estimation is becoming more and more common in various scientific and technological areas. [sent-349, score-0.549]
62 Most of the existing methods are designed to benefit from sparsity of the covariance matrix, and based on banding or thresholding the sample covariance matrix. [sent-350, score-0.904]
63 Sparse models for the inverse covariance matrix, despite its practical appeal and close connection to graphical modeling, are more difficult to be taken advantage of due to heavy computational cost as well as the lack of a coherent theory on how such sparsity can be effectively exploited. [sent-351, score-0.783]
64 We also note that the method can be easily extended to handle prior information regarding the sparsity patterns of the inverse covariance matrices. [sent-358, score-0.638]
65 Lemma 8 Under the event that S − Σ0 max < C0 λmax (Σ0 )((A + 1)n−1 log p)1/2 , S−i,i − S−i,−i γ 2273 ℓ∞ ≤ δ, Y UAN provided that δ ≥ ην +C0 τνλmax (Σ0 )((A + 1)n−1 log p)1/2 . [sent-374, score-0.328]
66 Proof By the definition of O (ν, η, τ), for any j = i, Σ0 Ω·i = Ωii Σ0 − Σ0 γ ≤ Σ0 Ω − I j· ji j,−i max ≤ η, which implies that Σ0 − Σ0 γ −i,i −i,−i ℓ∞ = max Σ0 − Σ0 γ ≤ Ω−1 η ≤ λ−1 (Ω)η ≤ ην. [sent-375, score-0.32]
67 ji j,−i ii min j=i An application of the triangular inequality now yields ≤ S − Σ0 max + = ℓ∞ S−i,i − Σ0 −i,i ≤ S−i,i − S−i,−i γ S − Σ0 max ≤ τν S − Σ0 ℓ∞ S−i,−i − Σ0 −i,−i γ + S − Σ0 Ω·i max γ ℓ1 + ην ℓ∞ + Σ0 − Σ0 γ −i,i −i,−i ℓ∞ ℓ1 /Ωii + ην max + ην. [sent-376, score-1.024]
68 Lemma 9 Under the event that S − Σ0 max < C0 λmax (Σ0 )((A + 1)n−1 log p)1/2 , we have ˆ Σ0 −i,−i θ − γ ≤ 2δ, ℓ∞ provided that δ ≥ ην +C0 τνλmax (Σ0 )((A + 1)n−1 log p)1/2 . [sent-388, score-0.328]
69 To bound the first term on the right hand side, note that ˆ S−i,−i − Σ0 −i,−i θ ℓ∞ ≤ S−i,−i − Σ0 −i,−i ˆ θ max ℓ1 ≤ S − Σ0 max ˆ θ Also recall that ˆ S−i,i − S−i,−i θ ℓ∞ ≤ δ, and Σ0 θ = Σ0 . [sent-395, score-0.32]
70 S − Σ0 ˆ θ To sum up, ˆ Σ0 −i,−i θ − γ ℓ∞ ≤ δ + νη + S − Σ0 max + ≤ δ + νη + S − Σ0 max (1 + ≤ δ + νη + S − Σ0 max Ω ≤ δ + νη + S − Σ0 max Ω ≤ δ + νη + τν S − Σ0 2275 max , γ max ℓ1 ) ℓ1 /Ωii −1 ℓ1 λmin (Ω) ℓ1 ℓ1 . [sent-397, score-0.96]
71 , p ˆ θ−γ ℓ1 ≤ 8λ2 (Ω0 )dJ δ, max ˜ if S − Σ0 max < C0 λmax (Σ0 )((A + 1)n−1 log p)1/2 . [sent-402, score-0.404]
72 We begin with the diagonal elements |Ω ii Lemma 10 Assume that S − Σ0 max ℓ1 . [sent-404, score-0.329]
73 < C0 λmax (Σ0 )((A + 1)n−1 log p)1/2 and δλmax (Ω0 ) ντ + 8λ2 (Ω0 )λ−1 (Ω0 )dJ + ντλmax (Ω0 )λ−2 (Ω0 ) Ω − Ω0 max min min ℓ1 ≤ c0 for some numerical constant 0 < c0 < 1. [sent-405, score-0.492]
74 Then ˜ Ω0 − Ωii ≤ ii 1 δλ2 (Ω0 ) ντ + 8λ2 (Ω0 )dJ λ−1 (Ω0 ) + ντλ−2 (Ω0 )λ2 (Ω0 ) Ω − Ω0 max max max min min 1 − c0 ℓ1 provided that δ ≥ ην +C0 τνλmax (Σ0 )((A + 1)n−1 log p)1/2 . [sent-406, score-0.981]
75 Therefore −i,i −i,−i Ω0 = Σ0 − 2Σ0 θ + θ′ Σ0 θ ii ii i,−i −i,−i −1 = Σ0 − Σ0 θ ii i,−i Because ˆ ˆ ˆ ˜ Ωii = Sii − 2Si,−i θ + θ′ S−i,−i θ we have ˜ Ω−1 − Ω0 ii ii −1 −1 −1 . [sent-408, score-0.845]
76 ≤ ˆ ˆ Si,−i − Σ0 θ + Σ0 θ − θ i,−i i,−i ˆ ˆ S − Σ0 max θ ℓ + Σ0 ℓ θ − θ i,−i ≤ S − Σ0 = ˆ Si,−i θ − Σ0 θ i,−i S − Σ0 ≤ 1 max max ˆ θ ˆ θ ∞ ℓ1 + λmax (Σ0 ) −1 ℓ1 + λmin (Ω0 ) ℓ1 ˆ θ−γ ˆ θ−γ ℓ1 + γ−θ ℓ1 ℓ1 + γ−θ ℓ1 . [sent-415, score-0.48]
77 Together with the fact that Ω0 ≤ ii Ω0 ii − 1 ≤ δλmax (Ω0 ) ντ + 8λ2 (Ω0 )λ−1 (Ω0 )dJ + λmax (Ω0 )λ−1 (Ω0 ) γ − θ max min min ˜ ii Ω ℓ1 . [sent-417, score-0.915]
78 Moreover, observe that γ−θ ℓ1 ≤ ≤ ≤ Ω0 ii −1 Ω−i,i − Ω0 −i,i ℓ1 + Ω−1 Ω0 ii ii λ−1 (Ω0 ) Ω − Ω0 ℓ1 + λ−1 (Ω0 ) min min −1 ντλmin (Ω0 ) Ω − Ω0 ℓ1 . [sent-418, score-0.755]
79 −1 |Ωii − Ω0 | Ω−i,i ii Ω − Ω0 ℓ1 ℓ1 (ντ − 1) Therefore, Ω0 ii − 1 ≤ δλmax (Ω0 ) ντ + 8λ2 (Ω0 )λ−1 (Ω0 )dJ + ντλmax (Ω0 )λ−2 (Ω0 ) Ω − Ω0 max min min ˜ Ωii ℓ1 , (16) which implies that Ω0 ii ≥ 1 − c0 . [sent-419, score-0.915]
80 1 − c0 ii 1 − c0 Together with (16), this implies ˜ Ω0 − Ωii ii 0 ˜ Ω ≤ Ωii ii − 1 ˜ Ωii 1 ≤ δλ2 (Ω0 ) ντ + 8λ2 (Ω0 )λ−1 (Ω0 )dJ max min 1 − c0 max 1 ντλ−2 (Ω0 )λ2 (Ω0 ) Ω − Ω0 + max min 1 − c0 ℓ1 . [sent-421, score-1.235]
81 max min 1 − c0 + Therefore, ˜ Ω−i,· − Ω0 −i,· ℓ1 ˜ ˜ Ω0 − Ωii + Ω−i,i − Ω0 ii −i,i = ℓ1 ντ 1 ν2 τ2 λ2 (Ω0 ) + 8 1 + λ2 (Ω0 )dJ λ−1 (Ω0 ) max max min 1 − c0 1 − c0 ντ 2 + 1+ λ (Ω0 ) ντλ−2 (Ω0 ) Ω − Ω0 ℓ1 . [sent-425, score-0.897]
82 min 1 − c0 max ≤ δ From Lemma 11, it is clear that under the assumptions of Lemma 10, ˜ Ω − Ω0 ℓ1 ≤C inf Ω∈O (ν,η,τ) ( Ω − Ω0 ℓ1 + deg(Ω)δ) , (17) where C = max{C1 ,C2 ,C3 } is a positive constant depending only on ν, τ, λmin (Ω0 ) and λmax (Ω0 ). [sent-426, score-0.391]
83 To complete the proof, we appeal to the following lemma showing that S − Σ0 max ≤ C0 λmax (Σ0 ) t + log p , n for a numerical constant C0 > 0, with probability at least 1 − e−t . [sent-429, score-0.318]
84 Taking t = A log p yields P S − Σ0 max < C0 λmax (Σ0 )((A + 1)n−1 log p)1/2 ≥ 1 − p−A . [sent-430, score-0.328]
85 2 To sum up, P { S − Σ0 max ≥ x} ≤ p2 max P |Si j − Σ0j ≥ x i 1≤i, j≤p 2 ≤ p [P (∆1 ≥ x/2) + P (∆2 ≥ x/2)] ≤ 4p2 exp −c3 nx2 . [sent-460, score-0.32]
86 More specifically, it suffices to find a collection of inverse covariance matrices M ′ = {Ω1 , . [sent-466, score-0.58]
87 More specifically, Ωk = 1, Ωk −1,1 = (Ω1,−1 ) = an bk , 11 Ωk −1,−1 = I p−1 , that is, 1 an b′ k an bk I Ωk = , where an = a0 (n−1 log p)1/2 with a constant 0 < a0 < 1 to be determined later. [sent-484, score-0.362]
88 To this end, we need to find an “oracle” inverse covariance matrix Ω. [sent-496, score-0.632]
89 First observe that Ω − Ω0 p ℓ1 ≤ ∑ 1≤i≤p max Ω0j 1 Ω0j ≤ ζ i i j=1 ≤ ζ p 1−α ∑ max α Ω0j i 1≤i≤p j=1 1 Ω0j ≤ ζ i ≤ Mζ1−α . [sent-499, score-0.32]
90 Similar to Theorem 5, there eixs a collection of inverse covariance matrices M ′ = {Ω1 , . [sent-516, score-0.58]
91 By the block matrix inversion formula Σk = 1 1−a2 b′ bk n k − 1−aa2nb′ b bk n k k − 1−aa2nb′ b b′ k n k k I+ a2 n b b′ 1−a2 b′ bk k k n k . [sent-525, score-0.507]
92 an bk I The only difference from before is the calculation of Kullback-Leibler divergence, which in this case is n K (P (Σk ), P (ΣK+1 )) = (trace(Ωk ) + log det(Σk ) − p) 2 where 1 − 1−aa2nb′ b b′ 1−a2 b′ bk n k n k k k . [sent-538, score-0.362]
93 Operator norm consistent estimation of large dimensional sparse covariance matrices. [sent-626, score-0.566]
94 High dimensional covariance matrix estimation using a factor model. [sent-632, score-0.549]
95 Sparsistency and rates of convergence in large covariance matrices estimation. [sent-655, score-0.403]
96 Sparse estimation of large covariance matrices via a nested lasso penalty. [sent-672, score-0.52]
97 Maximum likelihood estimation of generalized linear models for multivariate normal covariance matrix. [sent-689, score-0.501]
98 High-dimensional covariance estimation by minimizing ℓ1 -penalized log-determinant divergence. [sent-703, score-0.406]
99 A path following algorithm for sparse pseudo-likelihood inverse covariance estimation. [sent-709, score-0.607]
100 Nonparametric estimation of large covariance matrices of longitudinal data. [sent-734, score-0.483]
wordName wordTfidf (topN-words)
[('covariance', 0.365), ('dj', 0.315), ('deg', 0.265), ('en', 0.208), ('ovariance', 0.199), ('stimation', 0.179), ('uan', 0.179), ('inverse', 0.177), ('ii', 0.169), ('levina', 0.165), ('max', 0.16), ('bk', 0.139), ('rothman', 0.133), ('atrix', 0.131), ('oracle', 0.126), ('bickel', 0.126), ('min', 0.124), ('meinshausen', 0.121), ('hlmann', 0.116), ('eet', 0.105), ('graphical', 0.104), ('dantzig', 0.103), ('sparsity', 0.096), ('yuan', 0.094), ('matrix', 0.09), ('selector', 0.085), ('log', 0.084), ('pourahmadi', 0.083), ('ravikumar', 0.077), ('lasso', 0.076), ('inf', 0.076), ('entries', 0.074), ('neighborhood', 0.069), ('sparse', 0.065), ('el', 0.063), ('xi', 0.062), ('multivariate', 0.062), ('biometrika', 0.06), ('sii', 0.06), ('var', 0.055), ('triangular', 0.055), ('dimensional', 0.053), ('tuning', 0.053), ('estimating', 0.052), ('minimax', 0.052), ('aspremont', 0.051), ('si', 0.05), ('regressing', 0.05), ('fan', 0.047), ('banerjee', 0.046), ('rocha', 0.045), ('nonzero', 0.045), ('estimate', 0.044), ('ghaoui', 0.044), ('constants', 0.043), ('identifying', 0.042), ('norm', 0.042), ('estimation', 0.041), ('appeal', 0.041), ('raskutti', 0.041), ('entry', 0.041), ('couple', 0.04), ('selection', 0.04), ('thresholding', 0.039), ('theorem', 0.039), ('operator', 0.039), ('banded', 0.039), ('banding', 0.039), ('deng', 0.039), ('dismissed', 0.039), ('eetxi', 0.039), ('estimability', 0.039), ('korostelev', 0.039), ('ledoit', 0.039), ('longitudinal', 0.039), ('overwhelming', 0.039), ('whittaker', 0.039), ('georgia', 0.039), ('lam', 0.039), ('matrices', 0.038), ('annals', 0.037), ('inequality', 0.036), ('distinction', 0.036), ('likelihood', 0.033), ('ming', 0.033), ('birg', 0.033), ('edwards', 0.033), ('sup', 0.033), ('lemma', 0.033), ('det', 0.033), ('wainwright', 0.033), ('depending', 0.031), ('hundred', 0.03), ('penalized', 0.03), ('ith', 0.03), ('lin', 0.029), ('norms', 0.029), ('shall', 0.029), ('bounded', 0.028), ('symmetric', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
2 0.18502542 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs
Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu
Abstract: Methods based on ℓ1 -relaxation, such as basis pursuit and the Lasso, are very popular for sparse regression in high dimensions. The conditions for success of these methods are now well-understood: (1) exact recovery in the noiseless setting is possible if and only if the design matrix X satisfies the restricted nullspace property, and (2) the squared ℓ2 -error of a Lasso estimate decays at the minimax optimal rate k log p , where k is the sparsity of the p-dimensional regression problem with additive n Gaussian noise, whenever the design satisfies a restricted eigenvalue condition. The key issue is thus to determine when the design matrix X satisfies these desirable properties. Thus far, there have been numerous results showing that the restricted isometry property, which implies both the restricted nullspace and eigenvalue conditions, is satisfied when all entries of X are independent and identically distributed (i.i.d.), or the rows are unitary. This paper proves directly that the restricted nullspace and eigenvalue conditions hold with high probability for quite general classes of Gaussian matrices for which the predictors may be highly dependent, and hence restricted isometry conditions can be violated with high probability. In this way, our results extend the attractive theoretical guarantees on ℓ1 -relaxations to a much broader class of problems than the case of completely independent or unitary designs. Keywords: Lasso, basis pursuit, random matrix theory, Gaussian comparison inequality, concentration of measure
3 0.14897354 96 jmlr-2010-Rate Minimaxity of the Lasso and Dantzig Selector for thelqLoss inlrBalls
Author: Fei Ye, Cun-Hui Zhang
Abstract: We consider the estimation of regression coefficients in a high-dimensional linear model. For regression coefficients in ℓr balls, we provide lower bounds for the minimax ℓq risk and minimax quantiles of the ℓq loss for all design matrices. Under an ℓ0 sparsity condition on a target coefficient vector, we sharpen and unify existing oracle inequalities for the Lasso and Dantzig selector. We derive oracle inequalities for target coefficient vectors with many small elements and smaller threshold levels than the universal threshold. These oracle inequalities provide sufficient conditions on the design matrix for the rate minimaxity of the Lasso and Dantzig selector for the ℓq risk and loss in ℓr balls, 0 ≤ r ≤ 1 ≤ q ≤ ∞. By allowing q = ∞, our risk bounds imply the variable selection consistency of threshold Lasso and Dantzig selectors. Keywords: variable selection, estimation, oracle inequality, minimax, linear regression, penalized least squares, linear programming
4 0.14272739 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
Author: Jitkomut Songsiri, Lieven Vandenberghe
Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization
Author: Dapo Omidiran, Martin J. Wainwright
Abstract: We consider the problem of high-dimensional variable selection: given n noisy observations of a k-sparse vector β∗ ∈ R p , estimate the subset of non-zero entries of β∗ . A significant body of work has studied behavior of ℓ1 -relaxations when applied to random measurement matrices that are dense (e.g., Gaussian, Bernoulli). In this paper, we analyze sparsified measurement ensembles, and consider the trade-off between measurement sparsity, as measured by the fraction γ of nonzero entries, and the statistical efficiency, as measured by the minimal number of observations n required for correct variable selection with probability converging to one. Our main result is to prove that it is possible to let the fraction on non-zero entries γ → 0 at some rate, yielding measurement matrices with a vanishing fraction of non-zeros per row, while retaining the same statistical efficiency as dense ensembles. A variety of simulation results confirm the sharpness of our theoretical predictions. Keywords: variable selection, sparse random projections, high-dimensional statistics, Lasso, consistency, ℓ1 -regularization
6 0.11248519 72 jmlr-2010-Matrix Completion from Noisy Entries
7 0.10137545 21 jmlr-2010-Classification Methods with Reject Option Based on Convex Risk Minimization
8 0.1004642 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox
9 0.077285945 62 jmlr-2010-Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
10 0.074263722 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
11 0.072468415 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
12 0.068215355 84 jmlr-2010-On Spectral Learning
13 0.06815058 12 jmlr-2010-Analysis of Multi-stage Convex Relaxation for Sparse Regularization
14 0.060731214 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
15 0.056426413 108 jmlr-2010-Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation
16 0.05353846 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
17 0.052227549 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
18 0.050901439 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring
19 0.049833652 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression
20 0.049576554 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
topicId topicWeight
[(0, -0.261), (1, -0.137), (2, 0.161), (3, -0.075), (4, -0.247), (5, -0.163), (6, 0.155), (7, -0.173), (8, -0.111), (9, -0.111), (10, 0.052), (11, -0.068), (12, -0.077), (13, 0.046), (14, -0.008), (15, -0.08), (16, -0.132), (17, 0.005), (18, -0.002), (19, 0.011), (20, 0.069), (21, -0.098), (22, -0.031), (23, -0.066), (24, -0.049), (25, 0.024), (26, 0.058), (27, 0.052), (28, 0.131), (29, -0.025), (30, -0.125), (31, -0.062), (32, -0.112), (33, 0.007), (34, -0.113), (35, 0.013), (36, 0.003), (37, -0.072), (38, -0.01), (39, 0.039), (40, 0.012), (41, -0.051), (42, 0.009), (43, -0.062), (44, 0.046), (45, 0.003), (46, -0.001), (47, -0.072), (48, 0.034), (49, -0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.96696752 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
2 0.848822 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs
Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu
Abstract: Methods based on ℓ1 -relaxation, such as basis pursuit and the Lasso, are very popular for sparse regression in high dimensions. The conditions for success of these methods are now well-understood: (1) exact recovery in the noiseless setting is possible if and only if the design matrix X satisfies the restricted nullspace property, and (2) the squared ℓ2 -error of a Lasso estimate decays at the minimax optimal rate k log p , where k is the sparsity of the p-dimensional regression problem with additive n Gaussian noise, whenever the design satisfies a restricted eigenvalue condition. The key issue is thus to determine when the design matrix X satisfies these desirable properties. Thus far, there have been numerous results showing that the restricted isometry property, which implies both the restricted nullspace and eigenvalue conditions, is satisfied when all entries of X are independent and identically distributed (i.i.d.), or the rows are unitary. This paper proves directly that the restricted nullspace and eigenvalue conditions hold with high probability for quite general classes of Gaussian matrices for which the predictors may be highly dependent, and hence restricted isometry conditions can be violated with high probability. In this way, our results extend the attractive theoretical guarantees on ℓ1 -relaxations to a much broader class of problems than the case of completely independent or unitary designs. Keywords: Lasso, basis pursuit, random matrix theory, Gaussian comparison inequality, concentration of measure
3 0.79926914 96 jmlr-2010-Rate Minimaxity of the Lasso and Dantzig Selector for thelqLoss inlrBalls
Author: Fei Ye, Cun-Hui Zhang
Abstract: We consider the estimation of regression coefficients in a high-dimensional linear model. For regression coefficients in ℓr balls, we provide lower bounds for the minimax ℓq risk and minimax quantiles of the ℓq loss for all design matrices. Under an ℓ0 sparsity condition on a target coefficient vector, we sharpen and unify existing oracle inequalities for the Lasso and Dantzig selector. We derive oracle inequalities for target coefficient vectors with many small elements and smaller threshold levels than the universal threshold. These oracle inequalities provide sufficient conditions on the design matrix for the rate minimaxity of the Lasso and Dantzig selector for the ℓq risk and loss in ℓr balls, 0 ≤ r ≤ 1 ≤ q ≤ ∞. By allowing q = ∞, our risk bounds imply the variable selection consistency of threshold Lasso and Dantzig selectors. Keywords: variable selection, estimation, oracle inequality, minimax, linear regression, penalized least squares, linear programming
Author: Dapo Omidiran, Martin J. Wainwright
Abstract: We consider the problem of high-dimensional variable selection: given n noisy observations of a k-sparse vector β∗ ∈ R p , estimate the subset of non-zero entries of β∗ . A significant body of work has studied behavior of ℓ1 -relaxations when applied to random measurement matrices that are dense (e.g., Gaussian, Bernoulli). In this paper, we analyze sparsified measurement ensembles, and consider the trade-off between measurement sparsity, as measured by the fraction γ of nonzero entries, and the statistical efficiency, as measured by the minimal number of observations n required for correct variable selection with probability converging to one. Our main result is to prove that it is possible to let the fraction on non-zero entries γ → 0 at some rate, yielding measurement matrices with a vanishing fraction of non-zeros per row, while retaining the same statistical efficiency as dense ensembles. A variety of simulation results confirm the sharpness of our theoretical predictions. Keywords: variable selection, sparse random projections, high-dimensional statistics, Lasso, consistency, ℓ1 -regularization
5 0.53459859 72 jmlr-2010-Matrix Completion from Noisy Entries
Author: Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the ‘Netflix problem’) to structure-from-motion and positioning. We study a low complexity algorithm introduced by Keshavan, Montanari, and Oh (2010), based on a combination of spectral techniques and manifold optimization, that we call here O PT S PACE. We prove performance guarantees that are order-optimal in a number of circumstances. Keywords: matrix completion, low-rank matrices, spectral methods, manifold optimization
6 0.50501662 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
7 0.45892304 62 jmlr-2010-Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
8 0.40192592 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox
9 0.39453819 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring
10 0.3835268 84 jmlr-2010-On Spectral Learning
11 0.37346247 104 jmlr-2010-Sparse Spectrum Gaussian Process Regression
12 0.34853205 108 jmlr-2010-Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation
13 0.33824882 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
14 0.33445445 21 jmlr-2010-Classification Methods with Reject Option Based on Convex Risk Minimization
15 0.33414257 12 jmlr-2010-Analysis of Multi-stage Convex Relaxation for Sparse Regularization
16 0.33363158 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
17 0.32331949 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
18 0.32235995 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis
19 0.31297341 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
20 0.27821752 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars
topicId topicWeight
[(1, 0.051), (3, 0.068), (4, 0.041), (8, 0.044), (14, 0.011), (21, 0.019), (25, 0.129), (31, 0.049), (32, 0.052), (36, 0.042), (37, 0.109), (52, 0.024), (75, 0.176), (81, 0.025), (85, 0.063), (96, 0.016)]
simIndex simValue paperId paperTitle
1 0.89328271 93 jmlr-2010-PyBrain
Author: Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas Rückstieß, Jürgen Schmidhuber
Abstract: PyBrain is a versatile machine learning library for Python. Its goal is to provide flexible, easyto-use yet still powerful algorithms for machine learning tasks, including a variety of predefined environments and benchmarks to test and compare algorithms. Implemented algorithms include Long Short-Term Memory (LSTM), policy gradient methods, (multidimensional) recurrent neural networks and deep belief networks. Keywords: Python, neural networks, reinforcement learning, optimization
same-paper 2 0.87741536 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
3 0.79970604 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
Author: Shiliang Sun, John Shawe-Taylor
Abstract: In this paper, we propose a general framework for sparse semi-supervised learning, which concerns using a small portion of unlabeled data and a few labeled data to represent target functions and thus has the merit of accelerating function evaluations when predicting the output of a new example. This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss involving a regularization with unlabeled data, and is applicable to a family of semi-supervised learning methods such as multi-view co-regularized least squares and single-view Laplacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative algorithm to optimize the problem. We give the margin bound on the generalization error of the sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function class. Experiments on artificial and real-world data show their effectiveness. We further give a sequential training approach to show their possibility and potential for uses in large-scale problems and provide encouraging experimental results indicating the efficacy of the margin bound and empirical Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised learning. Keywords: semi-supervised learning, Fenchel-Legendre conjugate, representer theorem, multiview regularization, support vector machine, statistical learning theory
4 0.77996546 113 jmlr-2010-Tree Decomposition for Large-Scale SVM Problems
Author: Fu Chang, Chien-Yang Guo, Xiao-Rong Lin, Chi-Jen Lu
Abstract: To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a given data space and train SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for large-scale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training for the remaining data points. Second, it is efficient in determining the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, the tree decomposition method can derive a generalization error bound for the classifier. For data sets whose size can be handled by current non-linear, or kernel-based, SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy. Keywords: binary tree, generalization error ¨bound, margin-based theory, pattern classification, ı tree decomposition, support vector machine, VC theory
5 0.77956724 104 jmlr-2010-Sparse Spectrum Gaussian Process Regression
Author: Miguel Lázaro-Gredilla, Joaquin Quiñonero-Candela, Carl Edward Rasmussen, Aníbal R. Figueiras-Vidal
Abstract: We present a new sparse Gaussian Process (GP) model for regression. The key novel idea is to sparsify the spectral representation of the GP. This leads to a simple, practical algorithm for regression tasks. We compare the achievable trade-offs between predictive accuracy and computational requirements, and show that these are typically superior to existing state-of-the-art sparse approximations. We discuss both the weight space and function space representations, and note that the new construction implies priors over functions which are always stationary, and can approximate any covariance function in this class. Keywords: Gaussian process, probabilistic regression, sparse approximation, power spectrum, computational efficiency
6 0.77826524 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
7 0.77779216 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
8 0.7745114 69 jmlr-2010-Lp-Nested Symmetric Distributions
9 0.77402574 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
10 0.77354115 96 jmlr-2010-Rate Minimaxity of the Lasso and Dantzig Selector for thelqLoss inlrBalls
11 0.77288079 57 jmlr-2010-Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models
12 0.77153844 97 jmlr-2010-Regret Bounds and Minimax Policies under Partial Monitoring
13 0.76957768 25 jmlr-2010-Composite Binary Losses
14 0.76763254 109 jmlr-2010-Stochastic Composite Likelihood
15 0.7663123 82 jmlr-2010-On Learning with Integral Operators
16 0.76475525 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
17 0.76474351 102 jmlr-2010-Semi-Supervised Novelty Detection
18 0.76350808 84 jmlr-2010-On Spectral Learning
19 0.76309448 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data
20 0.76286817 106 jmlr-2010-Stability Bounds for Stationary φ-mixing and β-mixing Processes