nips nips2008 nips2008-179 nips2008-179-reference knowledge-graph by maker-knowledge-mining

179 nips-2008-Phase transitions for high-dimensional joint support recovery


Source: pdf

Author: Sahand Negahban, Martin J. Wainwright

Abstract: Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of 1 / ∞ -regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dimensional scaling of 1 / ∞ -regularized quadratic programming, considering both consistency rates in ∞ -norm, and also how the minimal sample size n required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports. We begin by establishing bounds on the ∞ error as well sufficient conditions for exact variable selection for fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. These results show that the high-dimensional scaling of 1 / ∞ -regularization is qualitatively similar to that of ordinary 1 -regularization. Our second set of results applies to design matrices drawn from standard Gaussian ensembles, for which we provide a sharp set of necessary and sufficient conditions: the 1 / ∞ -regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞ (n, p, s, α) = n/{(4 − 3α)s log(p − (2 − α) s)}. More precisely, for any δ > 0, the probability of successfully recovering both supports converges to 1 for scalings such that θ1,∞ ≥ 1 + δ, and converges to 0 for scalings for which θ1,∞ ≤ 1 − δ. An implication of this threshold is that use of 1,∞ -regularization yields improved statistical efficiency if the overlap parameter is large enough (α > 2/3), but performs worse than a naive Lasso-based approach for moderate to small overlap (α < 2/3). We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. 1


reference text

[1] V. V. Buldygin and Y. V. Kozachenko. Metric characterization of random variables and random processes. American Mathematical Society, Providence, RI, 2000.

[2] E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 2006.

[3] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Computing, 20(1):33–61, 1998.

[4] D. L. Donoho and J. M. Tanner. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. Technical report, Stanford University, 2006. Submitted to Journal of the AMS.

[5] S. Negahban and M. J. Wainwright. Joint support recovery under high-dimensional scaling: Benefits and perils of 1,∞ -regularization. Technical report, Department of Statistics, UC Berkeley, January 2009.

[6] G. Obozinski, B. Taskar, and M. Jordan. Joint covariate selection for grouped classification. Technical report, Statistics Department, UC Berkeley, 2007.

[7] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.

[8] J. A. Tropp, A. C. Gilbert, and M. J. Strauss. Algorithms for simultaneous sparse approximation. Signal Processing, 86:572–602, April 2006. Special issue on ”Sparse approximations in signal and image processing”.

[9] B. Turlach, W.N. Venables, and S.J. Wright. Simultaneous variable selection. Technometrics, 27:349–363, 2005.

[10] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity using using constrained quadratic programs. Technical Report 709, Department of Statistics, UC Berkeley, 2006. 1-

[11] Kim Y., Kim J., and Y. Kim. Blockwise sparse regression. Statistica Sinica, 16(2), 2006.

[12] P. Zhao, G. Rocha, and B. Yu. Grouped and hierarchical model selection through composite absolute penalties. Technical report, Statistics Department, UC Berkeley, 2007. 8