nips nips2013 nips2013-109 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. [sent-5, score-0.283]
2 In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. [sent-7, score-0.426]
3 Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. [sent-9, score-0.487]
4 We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. [sent-10, score-0.197]
5 To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. [sent-12, score-0.438]
6 1 Introduction In Gaussian random design model for the linear regression, we seek to reconstruct an unknown coefficient vector θ0 ∈ Rp from a vector of noisy linear measurements y ∈ Rn : y = Xθ0 + w, (1. [sent-14, score-0.182]
7 1) where X ∈ Rn×p is a measurement (or feature) matrix with iid rows generated through a multivariate normal density. [sent-15, score-0.198]
8 The noise vector, w, has iid entries with mean 0 and variance σ 2 . [sent-16, score-0.312]
9 Research has established a number of important properties of LASSO estimator under suitable conditions on the design matrix X, and for sufficiently sparse vectors θ0 . [sent-25, score-0.432]
10 Under weaker 1 conditions such as restricted isometry or compatibility properties the correct recovery of support fails however, the 2 estimation error θ − θ0 2 is of the same order as the one achieved by an oracle estimator that knows the support [CRT06, CT07, BRT09, BdG11]. [sent-27, score-0.357]
11 Finally, [DMM09, RFG09, BM12b] provided asymptotic formulas for MSE or other operating characteristics of θ, for Gaussian design matrices X. [sent-28, score-0.383]
12 Finally, estimating the noise level is necessary for applying these formulae, and this is in itself a challenging question. [sent-37, score-0.14]
13 The results of [DMM09, BM12b] provide exact asymptotic formulae for the risk, and its dependence on the regularization parameter λ. [sent-38, score-0.241]
14 The formulae of [DMM09, BM12b] depend on the empirical distribution1 of the entries of θ0 , which is of course unknown, as well as on the noise level2 . [sent-40, score-0.213]
15 A step towards the resolution of this problem was taken in [DMM11], which determined the least favorable noise level and distribution of entries, and hence suggested a prescription for λ, and a predicted risk in this case. [sent-41, score-0.345]
16 While this settles the question (in an asymptotic sense) from a minimax point of view, it would be preferable to have a prescription that is adaptive to the distribution of the entries of θ0 and to the noise level. [sent-42, score-0.332]
17 Our starting point is the asymptotic results of [DMM09, DMM11, BM12a, BM12b]. [sent-43, score-0.134]
18 The LASSO estimator θ is obtained by applying a denoiser function to θu . [sent-45, score-0.352]
19 We then use Stein’s Unbiased Risk Estimate (SURE) [Ste81] to derive an expression for the 2 risk (mean squared error) of this operation. [sent-46, score-0.205]
20 Finally, by modifying this formula we obtain an estimator for the noise level. [sent-48, score-0.378]
21 We prove that these estimators are asymptotically consistent for sequences of design matrices X with converging aspect ratio and iid Gaussian entries. [sent-49, score-0.642]
22 In particular, for the case of general Gaussian design matrices, consistency holds conditionally on a conjectured formula stated in [JM13] on the basis of the “replica method” from statistical physics. [sent-51, score-0.258]
23 For the sake of concreteness, let us briefly describe our method in the case of standard Gaussian design that is when the design matrix X has iid Gaussian entries. [sent-52, score-0.481]
24 We construct the unbiased pseudodata vector by θu = θ + X T (y − X θ)/[n − θ 0 ] . [sent-53, score-0.176]
25 3) Our estimator of the mean squared error is derived from applying SURE to unbiased pseudo-data. [sent-55, score-0.39]
26 In particular, our estimator is R(y, X, λ, τ ) where R(y, X, λ, τ ) ≡ τ 2 2 θ 0 /p − 1 + X T (y − X θ) 2 2 p(n − θ 0 )2 (1. [sent-56, score-0.25]
27 4) Here θ(λ; X, y) is the LASSO estimator and τ = y − X θ 2 /[n − θ 0 ]. [sent-57, score-0.25]
28 Our estimator of the noise level is σ 2 /n = τ 2 − R(y, X, λ, τ )/δ where δ = n/p. [sent-58, score-0.39]
29 Although our rigorous results are asymptotic in the problem dimensions, we show through numerical simulations that they are accurate already on problems with a few thousands of 1 2 The probability distribution that puts a point mass 1/p at each of the p entries of the vector. [sent-59, score-0.23]
30 √ Note that our definition of noise level σ corresponds to σ n in most of the compressed sensing literature. [sent-60, score-0.272]
31 0 Figure 1: Red color represents the estimated values by our estimators and green color represents the true values to be estimated. [sent-83, score-0.235]
32 2, X ∈ Rn×p with iid N1 (0, 1) entries where n = 4000. [sent-87, score-0.174]
33 We compare our approach with earlier work on the estimation of the noise level. [sent-93, score-0.136]
34 The authors of [NSvdG10] target this problem by using a 1 -penalized maximum log-likelihood estimator (PMLE) and a related method called “Scaled Lasso” [SZ12] (also studied by [BC13]) considers an iterative algorithm to jointly estimate the noise level and θ0 . [sent-94, score-0.39]
35 Under some conditions, the aforementioned studies provide consistency results for their noise level estimators. [sent-96, score-0.178]
36 We compare our estimator with these methods through extensive numerical simulations. [sent-97, score-0.25]
37 The necessary background on SURE and asymptotic distributional characterization of the LASSO is presented in Section 3. [sent-100, score-0.192]
38 We also analyze the behavior of our variance estimator as λ varies, along with four other methods. [sent-103, score-0.298]
39 We simulated across 50 replications within each, we generated a new Gaussian design matrix X. [sent-111, score-0.182]
40 For each λ, a new signal θ0 and noise independent from X were generated. [sent-114, score-0.143]
41 0 Figure 2: Red color represents the estimated values by our estimators and green color represents the true values to be estimated. [sent-137, score-0.235]
42 2, rows of X ∈ Rn×p are iid from Np (0, Σ) where n = 5000 and Σ has entries 1 on the main diagonal, 0. [sent-141, score-0.212]
43 For each replication, we used a design matrix X where Xi,j ∼ N1 (0, 1). [sent-154, score-0.182]
44 For each replication, we used a design matrix X where each row is independently generated through Np (0, Σ) where Σ has 1 on the main diagonal and 0. [sent-160, score-0.182]
45 As can be seen from the figures, the asymptotic theory applies quite well to the finite dimensional data. [sent-162, score-0.169]
46 For an appropriate sequence of non-linear denoisers {ηt }t≥0 , the AMP algorithm constructs a sequence of estimates {θt }t≥0 , pseudo-data {y t }t≥0 , and residuals { t }t≥0 where θt , yt ∈ Rp and t ∈ Rn . [sent-166, score-0.249]
47 For the random variable Θ0 ∼ pθ0 , a positive constant σ 2 and a given sequence of non-linear denoisers {ηt }t≥0 , define the sequence {τt2 }t≥0 iteratively by 1 2 τt+1 = Ft (τt2 ) , Ft (τ 2 ) ≡ σ 2 + E{ [ηt (Θ0 + τ Z) − Θ0 ]2 } , (3. [sent-175, score-0.213]
48 It is shown in [BM12a] that the pseudo-data 4 y t has the same asymptotic distribution as Θ0 + τt Z. [sent-180, score-0.134]
49 This result can be roughly interpreted as the pseudo-data generated by AMP is the summation of the true signal and a normally distributed noise which has zero mean. [sent-181, score-0.173]
50 The importance of this result will appear later when we use Stein’s method in order to obtain an estimator for the MSE and the variance of the noise. [sent-186, score-0.298]
51 We will use state evolution in order to describe the behavior of a specific type of converging sequence defined as the following: Definition 1. [sent-187, score-0.314]
52 2 → 1, We provide rigorous results for the special class of converging sequences when entries of X are iid N1 (0, 1) (i. [sent-194, score-0.398]
53 4 is correct) when rows of X are iid multivariate normal Np (0, Σ) (i. [sent-198, score-0.198]
54 In order to discuss the LASSO connection for the AMP algorithm, we need to use a specific class of denoisers and apply an appropriate calibration to the state evolution. [sent-201, score-0.15]
55 We will use the AMP algorithm with the soft-thresholding denoiser ηt ( · ) = η( · ; ξt ) along with a suitable sequence of thresholds {ξt }t≥0 in order to obtain a connection to the LASSO. [sent-205, score-0.17]
56 Let {θ0 (n), X(n), σ 2 (n)}n∈N be a converging sequence of instances of the standard Gaussian design model. [sent-214, score-0.45]
57 Denote the LASSO estimator of θ0 (n) by θ(n, λ) and the unbiased pseudodata generated by LASSO by θu (n, λ) ≡ θ + X T (y − X θ)/[n − θ 0 ]. [sent-215, score-0.426]
58 The above theorem combined with the stationarity condition of the LASSO implies that the empirical distribution of {θi , θ0,i }p converges weakly to the joint distribution of η(Θ0 + τ∗ Z; ξ∗ ), Θ0 i=1 5 where ξ∗ = α(λ)τ∗ (α(λ)). [sent-217, score-0.163]
59 It is also important to emphasize a relation between the asymptotic 2 MSE, τ∗ and the model variance. [sent-218, score-0.134]
60 1 and the state evolution recursion, almost surely, lim θ − θ0 2 /p = E [η(Θ0 + τ∗ Z; ξ∗ ) − Θ0 ] 2 2 p→∞ 2 2 = δ(τ∗ − σ0 ) , (3. [sent-220, score-0.212]
61 3) which will be helpful to get an estimator for the noise level. [sent-221, score-0.34]
62 3 Stein’s Unbiased Risk Estimator In [Ste81], Stein proposed a method to estimate the risk of an almost arbitrary estimator of the mean of a multivariate normal vector. [sent-223, score-0.447]
63 Suppose that µ(x) ∈ Rn is an estimator of µ for which µ(x) = x + g(x) and that g : Rn → Rn is ˆ ˆ weakly differentiable and that ∀i, j ∈ [n], Eν [|xi gi (x)| + |xj gj (x)|] < ∞ where ν is the measure corresponding to the multivariate Gaussian distribution Nn (µ, V ). [sent-228, score-0.372]
64 ˆ ˆ 2 In the literature of statistics, the above estimator is called “Stein’s Unbiased Risk Estimator” or SURE. [sent-233, score-0.25]
65 If we consider the risk of soft thresholding estimator η(xi ; ξ) for µi when xi ∼ N1 (µi , σ 2 ) for i ∈ [m], the above formula suggests the functional S(x, η( · ; ξ)) 2σ 2 = σ2 − m m m 1 m 1{|xi |≤ξ} + i=1 m 2 [min{|xi |, ξ}] , i=1 as an estimator of the corresponding MSE. [sent-236, score-0.692]
66 Also for y ∈ Rn and X ∈ Rn×p denote by R(y, X, λ, τ ), the estimator of the mean squared error of LASSO where m R(y, X, λ, τ ) ≡ τ2 (2 θ p 0 − p) + X T (y − X θ) p(n − θ 0 )2 2 2 . [sent-242, score-0.301]
67 We are now ready to state the following theorem on the asymptotic MSE of the AMP: Theorem 4. [sent-245, score-0.218]
68 Let {θ0 (n), X(n), σ 2 (n)}n∈N be a converging sequence of instances of the standard Gaussian design model. [sent-247, score-0.45]
69 Denote the sequence of estimators of θ0 (n) by {θt (n)}t≥0 , the pseudodata by {y t (n)}t≥0 , and residuals by { t (n)}t≥0 produced by AMP algorithm using the sequence of Lipschitz continuous functions {ηt }t≥0 as in Eq. [sent-248, score-0.347]
70 More precisely, with probability one, lim n→∞ θt+1 − θ0 2 /p(n) = lim Rηt (y t , τt ) . [sent-252, score-0.232]
71 1) In other words, Rηt (y t , τt ) is a consistent estimator of the asymptotic mean squared error of the AMP algorithm at iteration t + 1. [sent-254, score-0.469]
72 6 The above theorem allows us to accurately predict how far the AMP estimate is from the true signal at iteration t + 1 and this can be utilized as a stopping rule for the AMP algorithm. [sent-255, score-0.161]
73 Let {θ0 (n), X(n), σ 2 (n)}n∈N be a converging sequence of instances of the standard Gaussian design model. [sent-264, score-0.45]
74 Then with probability one, lim n→∞ θ − θ0 2 /p(n) = lim R(y, X, λ, τ ) , 2 n→∞ where τ = y − X θ 2 /[n − θ 0 ]. [sent-266, score-0.232]
75 In other words, R(y, X, λ, τ ) is a consistent estimator of the asymptotic mean squared error of the LASSO. [sent-267, score-0.469]
76 2 enables us to assess the quality of the LASSO estimation without knowing the true signal itself or the noise (or their distribution). [sent-269, score-0.219]
77 In the standard Gaussian design model, the variance of the noise can be accurately estimated by σ 2 /n ≡ τ 2 − R(y, X, λ, τ )/δ where δ = n/p and other variables are defined as in Theorem 4. [sent-275, score-0.39]
78 2) almost surely, providing us a consistent estimator for the variance of the noise in the LASSO. [sent-278, score-0.422]
79 Additionally, using the exponential convergence of AMP algorithm for the standard gaussian design model, proved by [BM12b], one can use O(log(1/ )) iterations of AMP algorithm and Theorem 4. [sent-285, score-0.281]
80 1, we devised our estimators based on the standard Gaussian design model. [sent-289, score-0.27]
81 Let {Ω(n)}n∈N be a sequence of inverse covariance matrices. [sent-292, score-0.134]
82 Define the general Gaussian design model by the converging sequence of instances {θ0 (n), X(n), σ 2 (n)}n∈N where for each n, rows of design matrix X(n) are iid multivariate Gaussian, i. [sent-293, score-0.83]
83 Let {θ0 (n), X(n), σ 2 (n)}n∈N be a converging sequence of instances under the general Gaussian design model with a sequence of proper inverse covariance matrices {Ω(n)}n∈N . [sent-298, score-0.62]
84 Denote the LASSO estimator of θ0 (n) by θ(n, λ) and the LASSO pseudo-data by θu (n, λ) ≡ θ + ΩX T (y − X θ)/[n − θ 0 ]. [sent-300, score-0.25]
85 A heuristic justification of this conjecture using the replica method from statistical physics is offered in [JM13]. [sent-303, score-0.228]
86 Using the above conjecture, we define the following generalized estimator of the linearly transformed risk under the general Gaussian design model. [sent-304, score-0.586]
87 The construction of the estimator is essentially the same as before i. [sent-305, score-0.25]
88 A notable case, when V = I, corresponds to the mean squared error of LASSO for the general Gaussian design and the estimator R(y, X, λ, τ ) is just a special case of the estimator ΓΩ (y, X, τ, λ, V ). [sent-314, score-0.733]
89 Let {θ0 (n), X(n), σ 2 (n)}n∈N be a converging sequence of instances of the general Gaussian design model with the inverse covariance matrices {Ω(n)}n∈N . [sent-320, score-0.552]
90 4 holds, then, with probability one, lim n→∞ θ − θ0 2 /p(n) = lim ΓΩ (y, X, τ , λ, I) 2 n→∞ where τ = y − X θ 2 /[n − θ 0 ]. [sent-323, score-0.232]
91 In other words, ΓΩ (y, X, τ , λ, I) is a consistent estimator of the asymptotic MSE of the LASSO. [sent-324, score-0.418]
92 In fact, for the general case, replica method suggests the relation lim n→∞ 1 2 Ω− 2 (θ − θ) 2 /p(n) = δ(τ 2 − σ0 ). [sent-326, score-0.218]
93 3, we state the following result on the general Gaussian design model. [sent-328, score-0.221]
94 In the general Gaussian design model, the variance of the noise can be accurately estimated by 1 σ 2 (n, Ω)/n ≡ τ 2 − ΓΩ (y, X, τ , λ, Ω− 2 )/δ , ˆ where δ = n/p and other variables are defined as in Theorem 4. [sent-333, score-0.39]
95 Also, we have 2 lim σ 2 /n = σ0 , ˆ n→∞ almost surely, providing us a consistent estimator for the noise level in LASSO. [sent-335, score-0.54]
96 6 follows similar arguments as in the standard Gaussian design model. [sent-341, score-0.182]
97 Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Trans. [sent-358, score-0.16]
98 [BM12b] , The LASSO risk for gaussian matrices, IEEE Trans. [sent-361, score-0.253]
99 Montanari, Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory, preprint available in arxiv:1301. [sent-406, score-0.281]
100 Goyal, Asymptotic analysis of map estimation via the replica method and applications to compressed sensing, 2009. [sent-423, score-0.218]
wordName wordTfidf (topN-words)
[('lasso', 0.453), ('amp', 0.331), ('mse', 0.293), ('estimator', 0.25), ('design', 0.182), ('stein', 0.155), ('risk', 0.154), ('converging', 0.15), ('asymptotic', 0.134), ('conjecture', 0.126), ('iid', 0.117), ('lim', 0.116), ('pmle', 0.116), ('rcv', 0.116), ('denoiser', 0.102), ('replica', 0.102), ('gaussian', 0.099), ('rn', 0.097), ('bayati', 0.094), ('noise', 0.09), ('unbiased', 0.089), ('estimators', 0.088), ('sure', 0.088), ('montanari', 0.088), ('erdogdu', 0.087), ('pseudodata', 0.087), ('weakly', 0.079), ('denoisers', 0.077), ('rp', 0.07), ('compressed', 0.07), ('sequence', 0.068), ('formulae', 0.066), ('sensing', 0.062), ('bpdn', 0.058), ('ndes', 0.058), ('distributional', 0.058), ('evolution', 0.057), ('entries', 0.057), ('hlmann', 0.056), ('np', 0.056), ('signal', 0.053), ('corollary', 0.052), ('pursuit', 0.051), ('confidence', 0.051), ('prescription', 0.051), ('squared', 0.051), ('instances', 0.05), ('scaled', 0.05), ('level', 0.05), ('variance', 0.048), ('wss', 0.047), ('estimation', 0.046), ('message', 0.045), ('theorem', 0.045), ('passing', 0.045), ('replication', 0.044), ('multivariate', 0.043), ('bai', 0.042), ('geer', 0.042), ('tted', 0.042), ('regularization', 0.041), ('surely', 0.041), ('bands', 0.04), ('selector', 0.04), ('color', 0.04), ('state', 0.039), ('rigorous', 0.039), ('dantzig', 0.039), ('dg', 0.039), ('converges', 0.039), ('consistency', 0.038), ('rows', 0.038), ('formula', 0.038), ('estimated', 0.037), ('ft', 0.037), ('residuals', 0.036), ('matrices', 0.036), ('inverse', 0.036), ('asymptotics', 0.035), ('reader', 0.035), ('sequences', 0.035), ('dimensional', 0.035), ('remark', 0.035), ('ei', 0.035), ('consistent', 0.034), ('prescribed', 0.034), ('calibration', 0.034), ('donoho', 0.033), ('tao', 0.033), ('accurately', 0.033), ('annals', 0.032), ('wavelet', 0.032), ('stanford', 0.032), ('squares', 0.032), ('versus', 0.031), ('oracle', 0.031), ('formulas', 0.031), ('true', 0.03), ('covariance', 0.03), ('recovery', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
2 0.29148036 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
3 0.2694163 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
4 0.26582372 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
Author: Ryosuke Matsushita, Toshiyuki Tanaka
Abstract: We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for matrix reconstruction. We have also successfully applied the proposed algorithm to a clustering problem, by reformulating it as a low-rank matrix reconstruction problem with an additional structural property. Numerical experiments show that the proposed algorithm outperforms Lloyd’s K-means algorithm. 1
5 0.21542297 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
6 0.19582835 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
7 0.15834895 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
8 0.15702257 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
9 0.13555217 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
10 0.13364838 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
11 0.12968524 196 nips-2013-Modeling Overlapping Communities with Node Popularities
12 0.1131428 171 nips-2013-Learning with Noisy Labels
13 0.10691227 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
14 0.10616366 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
15 0.10471921 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
16 0.10363677 91 nips-2013-Dirty Statistical Models
17 0.096578203 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
18 0.087278731 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
19 0.085664652 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
20 0.08529526 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
topicId topicWeight
[(0, 0.22), (1, 0.082), (2, 0.11), (3, 0.078), (4, -0.036), (5, 0.128), (6, -0.021), (7, 0.039), (8, -0.213), (9, 0.016), (10, 0.214), (11, -0.054), (12, -0.136), (13, -0.251), (14, -0.241), (15, -0.152), (16, 0.143), (17, -0.12), (18, -0.018), (19, -0.058), (20, 0.055), (21, 0.03), (22, -0.081), (23, 0.004), (24, 0.062), (25, 0.153), (26, -0.064), (27, 0.036), (28, 0.096), (29, -0.128), (30, 0.037), (31, 0.04), (32, 0.096), (33, 0.005), (34, 0.039), (35, -0.125), (36, 0.017), (37, -0.015), (38, -0.012), (39, 0.029), (40, 0.069), (41, 0.014), (42, 0.039), (43, -0.028), (44, -0.049), (45, -0.014), (46, -0.032), (47, 0.093), (48, -0.018), (49, 0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.95492744 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
2 0.83221614 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
3 0.72090548 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
4 0.71001416 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1
5 0.6952703 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
6 0.65938658 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
7 0.64250326 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
8 0.59509748 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
9 0.59428799 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
10 0.56277925 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
11 0.54901618 180 nips-2013-Low-rank matrix reconstruction and clustering via approximate message passing
12 0.52393293 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
13 0.518471 59 nips-2013-Blind Calibration in Compressed Sensing using Message Passing Algorithms
14 0.51058412 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
15 0.45758635 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
16 0.45603091 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression
17 0.44479877 225 nips-2013-One-shot learning and big data with n=2
18 0.44207311 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
19 0.42825371 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
20 0.41629502 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
topicId topicWeight
[(16, 0.035), (19, 0.02), (33, 0.12), (34, 0.086), (41, 0.017), (49, 0.019), (56, 0.12), (70, 0.017), (85, 0.021), (89, 0.441), (93, 0.025)]
simIndex simValue paperId paperTitle
1 0.8697378 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
Author: George H. Chen, Stanislav Nikolov, Devavrat Shah
Abstract: For classifying time series, a nearest-neighbor approach is widely used in practice with performance often competitive with or better than more elaborate methods such as neural networks, decision trees, and support vector machines. We develop theoretical justification for the effectiveness of nearest-neighbor-like classification of time series. Our guiding hypothesis is that in many applications, such as forecasting which topics will become trends on Twitter, there aren’t actually that many prototypical time series to begin with, relative to the number of time series we have access to, e.g., topics become trends on Twitter only in a few distinct manners whereas we can collect massive amounts of Twitter data. To operationalize this hypothesis, we propose a latent source model for time series, which naturally leads to a “weighted majority voting” classification rule that can be approximated by a nearest-neighbor classifier. We establish nonasymptotic performance guarantees of both weighted majority voting and nearest-neighbor classification under our model accounting for how much of the time series we observe and the model complexity. Experimental results on synthetic data show weighted majority voting achieving the same misclassification rate as nearest-neighbor classification while observing less of the time series. We then use weighted majority to forecast which news topics on Twitter become trends, where we are able to detect such “trending topics” in advance of Twitter 79% of the time, with a mean early advantage of 1 hour and 26 minutes, a true positive rate of 95%, and a false positive rate of 4%. 1
same-paper 2 0.86860806 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
3 0.86453921 305 nips-2013-Spectral methods for neural characterization using generalized quadratic models
Author: Il M. Park, Evan W. Archer, Nicholas Priebe, Jonathan W. Pillow
Abstract: We describe a set of fast, tractable methods for characterizing neural responses to high-dimensional sensory stimuli using a model we refer to as the generalized quadratic model (GQM). The GQM consists of a low-rank quadratic function followed by a point nonlinearity and exponential-family noise. The quadratic function characterizes the neuron’s stimulus selectivity in terms of a set linear receptive fields followed by a quadratic combination rule, and the invertible nonlinearity maps this output to the desired response range. Special cases of the GQM include the 2nd-order Volterra model [1, 2] and the elliptical Linear-Nonlinear-Poisson model [3]. Here we show that for “canonical form” GQMs, spectral decomposition of the first two response-weighted moments yields approximate maximumlikelihood estimators via a quantity called the expected log-likelihood. The resulting theory generalizes moment-based estimators such as the spike-triggered covariance, and, in the Gaussian noise case, provides closed-form estimators under a large class of non-Gaussian stimulus distributions. We show that these estimators are fast and provide highly accurate estimates with far lower computational cost than full maximum likelihood. Moreover, the GQM provides a natural framework for combining multi-dimensional stimulus sensitivity and spike-history dependencies within a single model. We show applications to both analog and spiking data using intracellular recordings of V1 membrane potential and extracellular recordings of retinal spike trains. 1
4 0.84449565 91 nips-2013-Dirty Statistical Models
Author: Eunho Yang, Pradeep Ravikumar
Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1
5 0.75405151 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
Author: Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
Abstract: As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof-the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy. 1
7 0.67016751 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
8 0.62328213 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
9 0.60491419 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
10 0.60117257 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
11 0.5858236 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
12 0.5756256 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
13 0.57194382 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
14 0.57042909 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
15 0.56778103 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
16 0.5553565 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
17 0.5530501 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
18 0.55091041 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
19 0.55063617 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
20 0.54258436 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation