nips nips2005 nips2005-168 nips2005-168-reference knowledge-graph by maker-knowledge-mining

168 nips-2005-Rodeo: Sparse Nonparametric Regression in High Dimensions


Source: pdf

Author: Larry Wasserman, John D. Lafferty

Abstract: We present a method for nonparametric regression that performs bandwidth selection and variable selection simultaneously. The approach is based on the technique of incrementally decreasing the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, our approach avoids the curse of dimensionality, achieving the optimal minimax rate of convergence, up to logarithmic factors, as if the relevant variables were known in advance. The method—called rodeo (regularization of derivative expectation operator)—conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces hard with soft thresholding effectively solves a sequence of lasso problems. 1


reference text

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth Publishing Co Inc, 1984. P. B¨ hlmann and B. Yu. Boosting, model selection, lasso and nonnegative garrote. Technical report, u Berkeley, 2005. 1 0.0 2 5 10 Rodeo Step 15 0.5 0.4 0.3 9 7 4 1 2 8 0.2 Bandwidth 3 0.1 0.6 0.4 0.2 Average Bandwidth 0.8 11 6 16 8 3 4 15 18 19 5 7 10 13 17 20 9 14 0.0 1.0 12 0 20 40 60 80 100 Greedy Rodeo Step Figure 3: Left: Average bandwidth output by the rodeo for a function with r = 2 relevant variables in d = 20 dimensions (n = 500, with 50 trials). Covariates are generated as xi ∼ Uniform(0, 1), the true function is m(x) = 2(x1 + 1)3 + 2 sin(10x2 ), and σ = 1, fit at the test point x = ( 1 , . . . , 1 ). The variance is greater for large step sizes since the rodeo runs that long for fewer data 2 2 sets. Right: Greedy rodeo on the diabetes data, used to illustrate LARS (Efron et al. 2004). A set of k = 100 of the total n = 442 points were sampled (d = 10), and the bandwidth for the variable with largest average |Zj |/λj was reduced in each step. The variables were selected in the order 3 (body mass index), 9 (serum), 7 (serum), 4 (blood pressure), 1 (age), 2 (sex), 8 (serum), 5 (serum), 10 (serum), 6 (serum). The parametric LARS algorithm adds variables in the order 3, 9, 4, 7, 2, 10, 5, 8, 6, 1. One notable difference is in the position of the age variable. D. Donoho. For most large underdetermined systems of equations, the minimal ℓ1 -norm near-solution approximates the sparest near-solution. Technical report, Stanford, 2004. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32:407–499, 2004. J. H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1–67, 1991. W. Fu and K. Knight. Asymptotics for lasso type estimators. The Annals of Statistics, 28:1356–1378, 2000. L. Gy¨ rfi, M. Kohler, A. Krzy˙ ak, and H. Walk. A Distribution-Free Theory of Nonparametric o z Regression. Springer-Verlag, 2002. T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001. M. Hristache, A. Juditsky, J. Polzehl, and V. Spokoiny. Structure adaptive approach for dimension reduction. Ann. Statist., 29:1537–1566, 2001. O. V. Lepski, E. Mammen, and V. G. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. The Annals of Statistics, 25:929–947, 1997. L. Li, R. D. Cook, and C. Nachsteim. Model-free variable selection. J. R. Statist. Soc. B., 67:285–299, 2005. J. Rice. Bandwidth choice for nonparametric regression. The Annals of Statistics, 12:1215–1230, 1984. D. Ruppert. Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. Journal of the American Statistical Association, 92:1049–1062, 1997. D. Ruppert and M. P. Wand. Multivariate locally weighted least squares regression. The Annals of Statistics, 22:1346–1370, 1994. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58:267–288, 1996. H. Zhang, G. Wahba, Y. Lin, M. Voelker, R. K. Ferris, and B. Klein. Variable selection and model building via likelihood basis pursuit. J. of the Amer. Stat. Assoc., 99(467):659–672, 2005.