nips nips2013 nips2013-109 nips2013-109-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
[BC13] A. Belloni and V. Chernozhukov, Least Squares after Model Selection in High-Dimensional Sparse Models, Bernoulli (2013). [BdG11] P. B¨ hlmann and S. Van de Geer, Statistics for high-dimensional data, Springer-Verlag Berlin u Heidelberg, 2011. [BEM13] M. Bayati, M. A. Erdogdu, and A. Montanari, Estimating LASSO Risk and Noise Level, long version (in preparation), 2013. [BM12a] M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Trans. on Inform. Theory 57 (2012), 764–785. [BM12b] , The LASSO risk for gaussian matrices, IEEE Trans. on Inform. Theory 58 (2012). [BRT09] P. Bickel, Y. Ritov, and A. Tsybakov, Simultaneous Analysis of Lasso and Dantzig Selector, The Annals of Statistics 37 (2009), 1705–1732. [BS05] Z. Bai and J. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, Springer, 2005. [BT09] A. Beck and M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM J. Imaging Sciences 2 (2009), 183–202. [BY93] Z. D. Bai and Y. Q. Yin, Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance Matrix, The Annals of Probability 21 (1993), 1275–1294. [CD95] S.S. Chen and D.L. Donoho, Examples of basis pursuit, Proceedings of Wavelet Applications in Signal and Image Processing III (San Diego, CA), 1995. [CRT06] E. C` ndes, J. K. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate a measurements, Communications on Pure and Applied Mathematics 59 (2006), 1207–1223. [CT07] E. C` ndes and T. Tao, The Dantzig selector: statistical estimation when p is much larger than n, a Annals of Statistics 35 (2007), 2313–2351. [DMM09] D. L. Donoho, A. Maleki, and A. Montanari, Message Passing Algorithms for Compressed Sensing, Proceedings of the National Academy of Sciences 106 (2009), 18914–18919. [DMM11] , The noise-sensitivity phase transition in compressed sensing, Information Theory, IEEE Transactions on 57 (2011), no. 10, 6920–6941. [FGH12] J. Fan, S. Guo, and N. Hao, Variance estimation using refitted cross-validation in ultrahigh dimensional regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74 (2012), 1467–9868. [JM13] A. Javanmard and A. Montanari, Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory, preprint available in arxiv:1301.4240, 2013. [Joh12] I. Johnstone, Gaussian estimation: Sequence and wavelet models, Book draft, 2012. [MB06] N. Meinshausen and P. B¨ hlmann, High-dimensional graphs and variable selection with the lasso, u The Annals of Statistics 34 (2006), no. 3, 1436–1462. [NSvdG10] P. B¨ hlmann N. St¨ dler and S. van de Geer, u a discussion), Test 19 (2010), 209–285. 1 -penalization for Mixture Regression Models (with [RFG09] S. Rangan, A. K. Fletcher, and V. K. Goyal, Asymptotic analysis of map estimation via the replica method and applications to compressed sensing, 2009. [SJKG07] M. Lustig S. Boyd S. J. Kim, K. Koh and D. Gorinevsky, An Interior-Point Method for Large-Scale l1-Regularized Least Squares, IEEE Journal on Selected Topics in Signal Processing 4 (2007), 606–617. [Ste81] C. Stein, Estimation of the mean of a multivariate normal distribution, The Annals of Statistics 9 (1981), 1135–1151. [SZ12] T. Sun and C. H. Zhang, Scaled sparse linear regression, Biometrika (2012), 1–20. [Tib96] R. Tibshirani, Regression shrinkage and selection with the lasso, J. Royal. Statist. Soc B 58 (1996), 267–288. [Wai09] M. J. Wainwright, Sharp thresholds for high-dimensional and noisy sparsity recovery using 1 constrained quadratic programming, Information Theory, IEEE Transactions on 55 (2009), no. 5, 2183–2202. [ZY06] P. Zhao and B. Yu, On model selection consistency of Lasso, The Journal of Machine Learning Research 7 (2006), 2541–2563. 9