nips nips2009 nips2009-144 nips2009-144-reference knowledge-graph by maker-knowledge-mining

144 nips-2009-Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness


Source: pdf

Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright

Abstract: We study minimax rates for estimating high-dimensional nonparametric regression models with sparse additive structure and smoothness constraints. More precisely, our goal is to estimate a function f ∗ : Rp → R that has an additive decomposition of the form ∗ ∗ f ∗ (X1 , . . . , Xp ) = j∈S hj (Xj ), where each component function hj lies in some class H of “smooth” functions, and S ⊂ {1, . . . , p} is an unknown subset with cardinality s = |S|. Given n i.i.d. observations of f ∗ (X) corrupted with additive white Gaussian noise where the covariate vectors (X1 , X2 , X3 , ..., Xp ) are drawn with i.i.d. components from some distribution P, we determine lower bounds on the minimax rate for estimating the regression function with respect to squared-L2 (P) error. Our main result is a lower bound on the minimax rate that scales as max s log(p/s) , s ǫ2 (H) . The first term reflects the sample size required for n n performing subset selection, and is independent of the function class H. The second term s ǫ2 (H) is an s-dimensional estimation term corresponding to the sample size required for n estimating a sum of s univariate functions, each chosen from the function class H. It depends linearly on the sparsity index s but is independent of the global dimension p. As a special case, if H corresponds to functions that are m-times differentiable (an mth -order Sobolev space), then the s-dimensional estimation term takes the form sǫ2 (H) ≍ s n−2m/(2m+1) . Either of n the two terms may be dominant in different regimes, depending on the relation between the sparsity and smoothness of the additive decomposition.


reference text

[1] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of the Lasso and Dantzig selector. Annals of Statistics, 2009. To appear.

[2] L. Birg´ . Approximation dans les espaces metriques et theorie de l’estimation. Z. Wahrsch. verw. Gebiete, e 65:181–327, 1983. α

[3] M. S. Birman and M. Z. Solomjak. Piecewise-polynomial approximations of functions of the classes Wp . Math. USSR-Sbornik, 2(3):295–317, 1967.

[4] T. S. Han and S. Verdu. Generalizing the Fano inequality. IEEE Transactions on Information Theory, 40:1247–1251, 1994.

[5] R. Z. Has’minskii. A lower bound on the risks of nonparametric estimates of densities in the uniform metric. Theory Prob. Appl., 23:794–798, 1978.

[6] T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall Ltd, Boca Raton, 1999.

[7] V. Koltchinskii and M. Yuan. Sparse recovery in large ensembles of kernel machines. In Proceedings of COLT, 2008.

[8] T. K¨ hn. A lower estimate for entropy numbers. Journal of Approximation Theory, 110:120–124, 2001. u

[9] L. Meier, S. van de Geer, and P. Buhlmann. High-dimensional additive modeling. Annals of Statistics, To appear.

[10] N. Meinshausen and B.Yu. Lasso-type recovery of sparse representations for high-dimensional data. Annals of Statistics, 37(1):246–270, 2009.

[11] G. Raskutti, M. J. Wainwright, and B. Yu. Minimax rates of estimation for high-dimensional linear regression over ℓq -balls. Technical Report arXiv:0910.2042, UC Berkeley, Department of Statistics, 2009.

[12] P. Ravikumar, H. Liu, J. Lafferty, and L. Wasserman. Sparse additive models. Journal of the Royal Statistical Society, To appear.

[13] C. J. Stone. Optimal global rates of convergence for nonparametric regression. Annals of Statistics, 10:1040–1053, 1982.

[14] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1996.

[15] S. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, 2000.

[16] M. J. Wainwright. Information-theoretic bounds for sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Info. Theory, December 2009. Presented at International Symposium on Information Theory, June 2007.

[17] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, 27(5):1564–1599, 1999.

[18] B. Yu. Assouad, Fano and Le Cam. Research Papers in Probability and Statistics: Festschrift in Honor of Lucien Le Cam, pages 423–435, 1996.

[19] C. H. Zhang and J. Huang. The sparsity and bias of the lasso selection in high-dimensional linear regression. Annals of Statistics, 36:1567–1594, 2006. 8