nips nips2013 nips2013-194 knowledge-graph by maker-knowledge-mining

194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition


Source: pdf

Author: Adel Javanmard, Andrea Montanari

Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. [sent-8, score-0.216]

2 In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. [sent-9, score-0.087]

3 We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. [sent-10, score-0.607]

4 We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. [sent-11, score-0.204]

5 , Yn )T and denoting by X the design matrix with rows T T X1 , . [sent-19, score-0.154]

6 Such developments pose the following theoretical question: For which vectors θ0 , designs X, and 1 noise levels σ, the support S can be identified, with high probability, through computationally efficient procedures? [sent-33, score-0.211]

7 The same question can be asked for random designs X and, in this case, ‘high probability’ will refer both to the noise realization W , and to the design realization X. [sent-34, score-0.204]

8 The Lasso estimator is defined by θn (Y, X; λ) ≡ arg min p θ∈R 1 Y − Xθ 2n 2 2 +λ θ 1 . [sent-40, score-0.147]

9 (A closely related method is the so-called Dantzig selector [6]: it would be interesting to explore whether our results can be generalized to that approach. [sent-46, score-0.304]

10 This assumption is formalized by the so-called ‘irrepresentability condition’, that can be stated in terms of the empirical covariance matrix Σ = (XT X/n). [sent-48, score-0.123]

11 Letting ΣA,B be the submatrix (Σi,j )i∈A,j∈B , irrepresentability requires ΣS c ,S Σ−1 sign(θ0,S ) S,S ∞ ≤ 1−η, (4) for some η > 0 (here sign(u)i = +1, 0, −1 if, respectively, ui > 0, = 0, < 0). [sent-49, score-0.621]

12 In an early breakthrough, Zhao and Yu [23] proved that, if this condition holds with η uniformly bounded away from 0, it guarantees correct model selection also in the high-dimensional regime p n. [sent-50, score-0.238]

13 Also, for a specific classes of random Gaussian designs (including X with i. [sent-57, score-0.123]

14 Namely, there exists c , cu > 0 such that the Lasso fails with high probability if n < c s0 log p and succeeds with high probability if n ≥ cu s0 log p. [sent-61, score-0.183]

15 Two aspects of of the above theory cannot be √ improved substantially: (i) The non-zero entries must satisfy the condition θmin ≥ cσ/ n to be detected with high probability. [sent-63, score-0.142]

16 Indeed, if this is not the case, for each θ0 with support of size |S| = s0 , there is a one parameter family {θ0 (t) = θ0 + t v}t∈R with supp(θ0 (t)) ⊆ S, Xθ0 (t) = Xθ0 and, for specific values of t, the support of θ0 (t) is strictly contained in S. [sent-70, score-0.134]

17 On the other hand, there is no fundamental reason to assume the irrepresentability condition (4). [sent-71, score-0.655]

18 In this paper we prove that the Gauss-Lasso selector has nearly optimal model selection properties under a condition that is strictly weaker than irrepresentability. [sent-73, score-0.455]

19 2 G AUSS -L ASSO SELECTOR: Model selector for high dimensional problems Input: Measurement vector y, design model X, regularization parameter λ, support size s0 . [sent-74, score-0.407]

20 1: Let T = supp(θ n ) be the support of Lasso estimator θ n = θ n (y, X, λ) given by 1 Y − Xθ 2n θn (Y, X; λ) ≡ arg min p θ∈R 2 2 +λ θ 1 . [sent-76, score-0.214]

21 2: Construct the estimator θ GL as follows: GL θT = (XT XT )−1 XT y , T T GL θT c = 0 . [sent-77, score-0.109]

22 We call this condition the generalized irrepresentability condition (GIC). [sent-79, score-0.802]

23 It then constructs a new estimator by ordinary least squares regression of the data Y onto the model T . [sent-84, score-0.166]

24 , S = S) under conditions comparable to the ones assumed in [14, 23, 21], while replacing irrepresentability by the weaker generalized irrepresentability condition. [sent-87, score-1.307]

25 In order to build some intuition about the difference between irrepresentability and generalized irrepresentability, it is convenient to consider the Lasso cost function at ‘zero noise’: G(θ; ξ) ≡ 1 X(θ − θ0 ) 2n 2 2 +ξ θ 1 = 1 (θ − θ0 ), Σ(θ − θ0 ) + ξ θ 2 1 . [sent-89, score-0.64]

26 Generalized irrepresentability requires that the above inequality holds with some small slack η > 0 bounded away from zero, i. [sent-95, score-0.668]

27 Notice that this assumption reduces to standard irrepresentability cf. [sent-98, score-0.574]

28 In other words, earlier work [14, 23, 21] required generalized irrepresentability plus sign-consistency in zero noise, and established sign consistency in non-zero noise. [sent-101, score-0.821]

29 From a different point of view, GIC demands that irrepresentability holds for a superset of the true support S. [sent-103, score-0.747]

30 It was indeed argued in the literature that such a relaxation of irrepresentability allows to cover a significantly broader set of cases (see for instance [3, Section 7. [sent-104, score-0.574]

31 However, it was never clarified why such a superset irrepresentability condition should be significantly more general than simple irrepresentability. [sent-107, score-0.731]

32 Our contributions can therefore be summarized as follows: • By tying it to the KKT condition for the zero-noise problem, we justify the expectation that generalized irrepresentability should hold for a broad class of design matrices. [sent-109, score-0.802]

33 • We thus provide a specific formulation of superset irrepresentability, prescribing both the superset T and the sign vector vT , that is, by itself, significantly more general than simple irrepresentability. [sent-110, score-0.309]

34 3 • We show that, under GIC, exact support recovery can be guaranteed using the Gauss-Lasso, and formulate the appropriate ‘minimum coefficient’ conditions that guarantee this. [sent-111, score-0.126]

35 As a side remark, even when simple irrepresentability holds, our results strengthen somewhat the estimates of [21] (see below for details). [sent-112, score-0.574]

36 Section 2 treats the case of deterministic designs X, and develops our main results on the basis of the GIC. [sent-116, score-0.207]

37 As for the design matrix, first p − 1 covariates are orthogonal at the population level, i. [sent-141, score-0.201]

38 The Lasso correctly recovers the support S from n ≥ c s0 log p samples, provided θmin ≥ c (log p)/n. [sent-158, score-0.232]

39 Namely, if a ∈ 0, 1−η ∪ s0 1 1−η , √ s0 s0 , then it recovers S from n ≥ c s0 log p samples, provided θmin ≥ c (log p)/n. [sent-162, score-0.105]

40 2 Further related work The restricted isometry property [7, 6] (or the related restricted eigenvalue [2] or compatibility conditions [19]) have been used to establish guarantees on the estimation and model selection errors of the Lasso or similar approaches. [sent-165, score-0.212]

41 While the main objective of this work is to prove high-dimensional 2 consistency with a sparse estimated model, the author also proves partial model selection guarantees. [sent-169, score-0.162]

42 Namely, the method correctly recovers a subset of large coefficients SL ⊆ S, provided |θ0,i | ≥ cσ s0 (log p)/n, for i ∈ SL . [sent-170, score-0.134]

43 Lounici [13] proves correct model selection under the assumption maxi=j |Σij | = O(1/s0 ). [sent-173, score-0.127]

44 Cand´ s e and Plan [4] also assume mutual incoherence, albeit with a much weaker requirement, namely maxi=j |Σij | = O(1/(log p)). [sent-175, score-0.098]

45 Under this condition, they establish model selection guarantees for an ideal scaling of the non-zero coefficients θmin ≥ cσ (log p)/n. [sent-176, score-0.091]

46 The authors in [22] consider the variable selection problem, and under the same assumptions on the non-zero coefficients as in the present paper, guarantee support recovery under a cone condition. [sent-178, score-0.159]

47 The latter condition however is stronger than the generalized irrepresentability condition. [sent-179, score-0.721]

48 The work [20] studies the adaptive and the thresholded Lasso estimators and proves correct model selection assuming the non-zero coefficients are of order s0 (log p)/n. [sent-182, score-0.165]

49 Finally, model selection consistency can be obtained without irrepresentability through other methods. [sent-183, score-0.667]

50 For a matrix A and set of indices I, J, we let AJ denote the submatrix containing just the columns in J and AI,J denote the submatrix formed by the rows in I and columns in J. [sent-188, score-0.227]

51 Throughout, we denote the rows of the design matrix X by X1 , . [sent-204, score-0.154]

52 Further, for a vector v, sign(v) is the vector with entries sign(v)i = +1 if vi > 0, sign(v)i = −1 if vi < 0, and sign(v)i = 0 otherwise. [sent-211, score-0.086]

53 2 Deterministic designs An outline of this section is as follows: (1) We first consider the zero-noise problem W = 0, and prove several useful properties of the Lasso estimator in this case. [sent-212, score-0.266]

54 In particular, we show that there exists a threshold for the regularization parameter below which the support of the Lasso estimator remains the same and contains supp(θ0 ). [sent-213, score-0.176]

55 Moreover, the Lasso estimator support is not much larger than supp(θ0 ). [sent-214, score-0.176]

56 (2) We then turn to the noisy problem, and introduce the generalized irrepresentability condition (GIC) that is motivated by the properties of the Lasso in the zero-noise case. [sent-215, score-0.721]

57 We prove that under GIC (and other technical conditions), with high probability, the signed support of the Lasso estimator is the same as that in the zero-noise problem. [sent-216, score-0.374]

58 (3) We show that the Gauss-Lasso selector correctly recovers the signed support of θ0 . [sent-217, score-0.582]

59 1 Zero-noise problem Recall that Σ ≡ (XT X/n) denotes the empirical covariance of the rows of the design matrix. [sent-219, score-0.195]

60 Given Σ ∈ Rp×p , Σ 0, θ0 ∈ Rp and ξ ∈ R+ , we define the zero-noise Lasso estimator as 1 (θ − θ0 ), Σ(θ − θ0 ) + ξ θ 1 . [sent-220, score-0.109]

61 Following [2], we introduce a restricted eigenvalue constant for the empirical covariance matrix Σ: κ(s, c0 ) ≡ min u, Σu . [sent-222, score-0.188]

62 u 2 2 min p u∈R J⊆[p] |J|≤s uJ c 1 ≤c0 uJ (6) 1 Our first result states that supp(θZN (ξ)) is not much larger than the support of θ0 , for any ξ > 0. [sent-223, score-0.105]

63 Then sign(θZN ) = v if and only if ΣT c ,T Σ−1 vT T,T ∞ ≤ 1, vT = sign θ0,T − ξ Σ−1 vT . [sent-242, score-0.157]

64 T,T Motivated by this result, we introduce the generalized irrepresentability condition (GIC) for deterministic designs. [sent-244, score-0.779]

65 The pair (Σ, θ0 ), Σ ∈ Rp×p , θ0 ∈ Rp satisfy the generalized irrepresentability condition with parameter η > 0 if the following happens. [sent-246, score-0.721]

66 (10) In other words we require the dual feasibility condition (8), which always holds, to hold with a positive slack η. [sent-250, score-0.11]

67 We begin with a standard characterization of sign(θn ), the signed support of the Lasso estimator (3). [sent-253, score-0.319]

68 Then the signed support of the Lasso estimator is given by sign(θn ) = z if and only if 1 ≤ 1, (11) ΣT c ,T Σ−1 zT + (rT c − ΣT c ,T Σ−1 rT ) T,T T,T λ ∞ zT = sign θ0,T − Σ−1 (λzT − rT ) . [sent-259, score-0.476]

69 Consider the deterministic design model with empirical covariance matrix Σ ≡ (XT X)/n, and assume Σi,i ≤ 1 for i ∈ [p]. [sent-262, score-0.23]

70 (ii) The pair (Σ, θ0 ) satisfies the generalized irrepresentability condition with parameter η. [sent-266, score-0.721]

71 Consider the Lasso estimator θn = θn (y, X; λ) defined as per Eq. [sent-267, score-0.109]

72 5 requires the submatrix ΣT0 ,T0 to have minimum singular value bounded away form zero. [sent-276, score-0.133]

73 Requiring the minimum singular value of ΣT0 ,T0 to be bounded away from zero is not much more restrictive since T0 is comparable in size with S, as stated in Lemma 2. [sent-278, score-0.118]

74 We next show that the Gauss-Lasso selector correctly recovers the support of θ0 . [sent-280, score-0.439]

75 Consider the deterministic design model with empirical covariance matrix Σ ≡ (XT X)/n, and assume that Σi,i ≤ 1 for i ∈ [p]. [sent-283, score-0.23]

76 3 Random Gaussian designs In the previous section, we studied the case of deterministic design models which allowed for a straightforward analysis. [sent-287, score-0.262]

77 Within the random Gaussian design model, the rows Xi are distributed as Xi ∼ N(0, Σ) for some (unknown) covariance matrix Σ 0. [sent-289, score-0.22]

78 In order to study the performance of Gauss-Lasso selector in this case, we first define the population-level estimator. [sent-290, score-0.238]

79 Given Σ ∈ Rp×p , Σ 0, θ0 ∈ Rp and ξ ∈ R+ , the population-level estimator θ∞ (ξ) = θ∞ (ξ; θ0 , Σ) is defined as 1 θ∞ (ξ) ≡ arg min (θ − θ0 ), Σ(θ − θ0 ) + ξ θ 1 . [sent-291, score-0.147]

80 (16) p θ∈R 2 In fact, the population-level estimator is obtained by assuming that the response vector Y is noiseless and n = ∞, hence replacing the empirical covariance (XT X/n) with the exact covariance Σ in the lasso optimization problem (3). [sent-292, score-0.623]

81 Note that the population-level estimator θ∞ is deterministic, albeit X is a random design. [sent-293, score-0.133]

82 We show that under some conditions on the covariance Σ and vector θ0 , T ≡ supp(θn ) = supp(θ∞ ), i. [sent-294, score-0.102]

83 , the population-level estimator and the Lasso estimator share the same (signed) support. [sent-296, score-0.218]

84 This observation allows for a simple analysis of the Gauss-Lasso selector θGL . [sent-299, score-0.238]

85 An outline of the section is as follows: (1) We begin with noting that the population-level estimator θ∞ (ξ) has the similar properties to θZN (ξ) stated in Section 2. [sent-300, score-0.141]

86 (2) We show that under GIC for covariance matrix Σ (and other sufficient conditions), with high probability, the signed support of the Lasso estimator is the same as the signed support of the population-level estimator. [sent-304, score-0.641]

87 (3) Following the previous steps, we show that the Gauss-Lasso selector correctly recovers the signed support of θ0 . [sent-305, score-0.582]

88 2 The high-dimensional problem We now consider the Lasso estimator (3). [sent-312, score-0.109]

89 Consider the Gaussian random design model with covariance matrix Σ 0, and assume that Σi,i ≤ 1 for i ∈ [p]. [sent-317, score-0.172]

90 (ii) The pair (Σ, θ0 ) satisfies the generalized irrepresentability condition with parameter η. [sent-321, score-0.721]

91 Consider the Lasso estimator θn = θn (y, X; λ) defined as per Eq. [sent-322, score-0.109]

92 Below, we show that the Gauss-Lasso selector correctly recovers the signed support of θ0 . [sent-336, score-0.582]

93 Consider the random Gaussian design model with covariance matrix Σ 0, and assume that Σi,i ≤ 1 for i ∈ [p]. [sent-339, score-0.172]

94 [Detection level] Let θmin ≡ mini∈S |θ0,i | be the minimum magnitude of the nonzero entries of vector θ0 . [sent-345, score-0.092]

95 3, Gauss-Lasso selector correctly recovers supp(θ0 ), with s0 n probability greater than 1 − p e− 10 − 6 e− 2 − 10 p1−c1 , if n ≥ max(M1 , M2 )t0 log p, and θmin ≥ Cσ log p 1 + Σ−1,T0 T0 n ∞ , for some constant C. [sent-347, score-0.434]

96 Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory. [sent-414, score-0.148]

97 Model selection for high-dimensional regression under the generalized irrepresentability condition. [sent-420, score-0.741]

98 Sup-norm convergence rate and sign concentration property of lasso and dantzig estimators. [sent-430, score-0.565]

99 Rate minimaxity of the lasso and dantzig selector for the lq loss in lr balls. [sent-497, score-0.646]

100 Thresholded Lasso for high dimensional variable selection and statistical estimation. [sent-506, score-0.09]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('irrepresentability', 0.574), ('lasso', 0.332), ('supp', 0.263), ('gic', 0.25), ('selector', 0.238), ('zn', 0.21), ('gl', 0.158), ('sign', 0.157), ('signed', 0.143), ('cmin', 0.137), ('designs', 0.123), ('estimator', 0.109), ('rp', 0.094), ('covariates', 0.083), ('condition', 0.081), ('design', 0.081), ('dantzig', 0.076), ('superset', 0.076), ('recovers', 0.074), ('selection', 0.069), ('xt', 0.067), ('support', 0.067), ('vt', 0.066), ('covariance', 0.066), ('generalized', 0.066), ('cients', 0.065), ('coef', 0.064), ('correctly', 0.06), ('deterministic', 0.058), ('rows', 0.048), ('javanmard', 0.048), ('submatrix', 0.047), ('bolasso', 0.046), ('ncmin', 0.046), ('cand', 0.042), ('namely', 0.041), ('lemma', 0.04), ('entries', 0.04), ('notations', 0.039), ('covariate', 0.039), ('min', 0.038), ('thresholded', 0.038), ('orthogonal', 0.037), ('active', 0.036), ('conditions', 0.036), ('mini', 0.035), ('away', 0.035), ('proves', 0.035), ('gaussian', 0.035), ('letting', 0.034), ('prove', 0.034), ('stanford', 0.033), ('weaker', 0.033), ('geer', 0.033), ('ritov', 0.033), ('kkt', 0.033), ('eigenvalue', 0.033), ('stated', 0.032), ('regression', 0.032), ('pe', 0.032), ('genomics', 0.032), ('log', 0.031), ('holds', 0.03), ('columns', 0.03), ('hlmann', 0.029), ('slack', 0.029), ('minimizer', 0.029), ('minimum', 0.029), ('bickel', 0.029), ('zt', 0.028), ('incoherence', 0.028), ('sl', 0.028), ('remark', 0.027), ('wi', 0.027), ('meinshausen', 0.027), ('cu', 0.027), ('response', 0.026), ('develops', 0.026), ('restricted', 0.026), ('succeeds', 0.025), ('squares', 0.025), ('matrix', 0.025), ('consistency', 0.024), ('replacing', 0.024), ('albeit', 0.024), ('ca', 0.024), ('correct', 0.023), ('rt', 0.023), ('nonzero', 0.023), ('recovery', 0.023), ('covered', 0.023), ('vi', 0.023), ('singular', 0.022), ('uj', 0.022), ('xi', 0.022), ('establish', 0.022), ('identifying', 0.022), ('zhao', 0.021), ('high', 0.021), ('xn', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999958 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

Author: Adel Javanmard, Andrea Montanari

Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1

2 0.29148036 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1

3 0.25157732 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

Author: Adel Javanmard, Andrea Montanari

Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1

4 0.19208738 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory

Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor

Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1

5 0.18645141 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan

Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1

6 0.12735887 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

7 0.10988762 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

8 0.098142564 91 nips-2013-Dirty Statistical Models

9 0.093685821 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

10 0.08769083 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements

11 0.081579968 293 nips-2013-Sign Cauchy Projections and Chi-Square Kernel

12 0.07972306 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

13 0.075298473 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables

14 0.070515588 188 nips-2013-Memory Limited, Streaming PCA

15 0.070149429 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

16 0.070024341 185 nips-2013-Matrix Completion From any Given Set of Observations

17 0.066873632 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

18 0.06310226 247 nips-2013-Phase Retrieval using Alternating Minimization

19 0.062869512 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators

20 0.062341519 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.178), (1, 0.064), (2, 0.114), (3, 0.069), (4, -0.039), (5, 0.092), (6, -0.045), (7, 0.046), (8, -0.194), (9, 0.007), (10, 0.147), (11, -0.069), (12, -0.142), (13, -0.19), (14, -0.199), (15, -0.111), (16, 0.12), (17, -0.099), (18, 0.002), (19, -0.038), (20, 0.036), (21, 0.073), (22, -0.063), (23, 0.062), (24, 0.031), (25, 0.097), (26, -0.055), (27, -0.03), (28, -0.043), (29, -0.025), (30, 0.066), (31, -0.008), (32, 0.046), (33, -0.029), (34, -0.044), (35, 0.049), (36, 0.013), (37, -0.0), (38, -0.033), (39, -0.084), (40, -0.021), (41, 0.014), (42, 0.062), (43, 0.02), (44, -0.014), (45, -0.02), (46, -0.045), (47, 0.094), (48, 0.005), (49, 0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95334744 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

Author: Adel Javanmard, Andrea Montanari

Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1

2 0.89396006 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory

Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor

Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1

3 0.87770212 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1

4 0.8722266 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan

Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1

5 0.80182064 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

Author: Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers

Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1

6 0.76937056 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

7 0.71445 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements

8 0.63110983 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables

9 0.62272668 265 nips-2013-Reconciling "priors" & "priors" without prejudice?

10 0.57029825 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

11 0.52618992 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

12 0.51875317 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

13 0.51661193 91 nips-2013-Dirty Statistical Models

14 0.47929195 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures

15 0.45186955 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

16 0.43382999 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression

17 0.41158095 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

18 0.40284723 225 nips-2013-One-shot learning and big data with n=2

19 0.40011013 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?

20 0.39764261 249 nips-2013-Polar Operators for Structured Sparse Estimation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.034), (19, 0.045), (33, 0.163), (34, 0.104), (41, 0.012), (49, 0.032), (56, 0.132), (70, 0.024), (85, 0.024), (89, 0.118), (93, 0.028), (94, 0.176), (95, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87469637 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

Author: Adel Javanmard, Andrea Montanari

Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1

2 0.83530205 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

Author: Ryota Tomioka, Taiji Suzuki

Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1

3 0.82385647 325 nips-2013-The Pareto Regret Frontier

Author: Wouter M. Koolen

Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically. 1

4 0.81807971 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

Author: Gunnar Kedenburg, Raphael Fonteneau, Remi Munos

Abstract: This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions. 1

5 0.81242514 91 nips-2013-Dirty Statistical Models

Author: Eunho Yang, Pradeep Ravikumar

Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1

6 0.81008035 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

7 0.80536896 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

8 0.80324954 109 nips-2013-Estimating LASSO Risk and Noise Level

9 0.80303675 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

10 0.80114865 234 nips-2013-Online Variational Approximations to non-Exponential Family Change Point Models: With Application to Radar Tracking

11 0.80047172 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

12 0.79881632 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes

13 0.79583526 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models

14 0.79156446 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

15 0.79067576 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

16 0.78601408 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

17 0.7824294 233 nips-2013-Online Robust PCA via Stochastic Optimization

18 0.78062427 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation

19 0.78028488 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels

20 0.77938712 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions