jmlr jmlr2012 jmlr2012-2 jmlr2012-2-reference knowledge-graph by maker-knowledge-mining

2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

Source: pdf

Author: Christopher R. Genovese, Jiashun Jin, Larry Wasserman, Zhigang Yao

Abstract: The lasso is an important method for sparse, high-dimensional regression problems, with efﬁcient algorithms available, a long history of practical success, and a large body of theoretical results supporting and explaining its performance. But even with the best available algorithms, ﬁnding the lasso solutions remains a computationally challenging task in cases where the number of covariates vastly exceeds the number of data points. Marginal regression, where each dependent variable is regressed separately on each covariate, offers a promising alternative in this case because the estimates can be computed roughly two orders faster than the lasso solutions. The question that remains is how the statistical performance of the method compares to that of the lasso in these cases. In this paper, we study the relative statistical performance of the lasso and marginal regression for sparse, high-dimensional regression problems. We consider the problem of learning which coefﬁcients are non-zero. Our main results are as follows: (i) we compare the conditions under which the lasso and marginal regression guarantee exact recovery in the ﬁxed design, noise free case; (ii) we establish conditions under which marginal regression provides exact recovery with high probability in the ﬁxed design, noise free, random coefﬁcients case; and (iii) we derive rates of convergence for both procedures, where performance is measured by the number of coefﬁcients with incorrect sign, and characterize the regions in the parameter space recovery is and is not possible under this metric. In light of the computational advantages of marginal regression in very high dimensional problems, our theoretical and simulations results suggest that the procedure merits further study. Keywords: high-dimensional regression, lasso, phase diagram, regularization

reference text

P. B¨ hlmann, M. Kalisch, and M. H. Maathuis. Variable selection in high-dimensional linear modu els: partially faithful distributions and the PC-simple algorith. Biometrika, 97:261–278, 2009. T. Cai, L. Wang, and G. Xu. Shifting inequality and recovery of sparse signals. IEEE Transactions on Signal Processing, 59(3):1300–1308, 2010. E. J. Cand` s and Y. Plan. Near-ideal model selection by ℓ1 minimization. The Annals of Statistics, e 37:2145–2177, 2009. E. J. Cand´ s and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. e The Annals of Statistics, 35:2313–2351, 2007. S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientiﬁc Computing, 20(1):33–61, 1998. D. Donoho. For most large underdetermined systems of equations, the minimal ℓ1 -norm nearsolution approximates the sparsest near-solution. Communications on Pure and Applied Mathematics, 59(7):907–934, 2006. D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings of the National Academy of Sciences of the United States of America, 100(5):2197–2202, 2003. D. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47(7):2845–2862, 2001. D. Donoho and J. Jin. Higher criticism for detecting sparse heterogeneous mixtures. The Annals of Statistics, 32(3):962–994, 2004. B. Efron, R. Tibshirani, J. Storey, and V. Tusher. Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96:1151–1160, 2001. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–499, 2004. J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911, 2008. J.J. Fuchs. Recovery of exact sparse representations in the presence of noise. IEEE Transactions on Information Theory, 51(10):3601–3608, 2005. 2142 A C OMPARISON OF THE L ASSO AND M ARGINAL R EGRESSION P. Ji and J. Jin. UPS delivers optimal phase diagram in high dimensional variable selection. The Annals of Statistics, 40(1):73–103, 2012. J. Jin. Proportion of nonzero normal means: oracle equivalence and uniformly consistent estimators. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3):461–493, 2007. K. Knight and W. J. Fu. Asymptotics for lasso-type estimators. The Annals of Statistics, 28:1356– 1378, 2000. N. Meinshausen and P. B¨ hlmann. High-dimensional graphs and variable selection with the lasso. u The Annals of Statistics, 34(3):1436–1462, 2006. N. Meinshausen and J. Rice. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics, 34(1):373–393, 2006. P. Ravikumar. Personal Communication, 2007. J. M. Robins, R. Scheines, P. Spirtes, and L. Wasserman. Uniform consistency in causal inference. Biometrika, 90(3):491–515, 2003. G. R. Shorack and J. A. Wellner. Empirical Processes with Applications to Statistics. John Wiley & Sons, NY, 1986. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search (Lecture Notes in Statistics). Springer-Verlag, NY, 1993. T. Sun and C.-H. Zhang. Scaled sparse linear regression. http://arxiv.org/abs/1104.4595. 2011. Manuscript available at R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1):267–288, 1996. J. Tropp. Greed is good: algorithic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004. R. Vershynin. Introduction to the Non-asymptotic Analysis of Random Matrices. Lecture notes, Department of Mathematics, University of Michigan, 2010. Available electronically via wwwpersonal.umich.edu/ romanv/teaching/2006-07/280/course.html. M. Wainwright. Sharp Threshold for High-dimensional and Noisy Recovery of Sparsity. Technical report, Department of Statistics, University of Berkeley, 2006. L. Wasserman. All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York, 2006. L. Wasserman and K. Roeder. High-dimensional variable selection. The Annals of Statistics, 37(5): 2178–2201, 2009. P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563, 2006. H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429, 2006. 2143