jmlr jmlr2006 jmlr2006-83 jmlr2006-83-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Bühlmann, Bin Yu
Abstract: We propose Sparse Boosting (the SparseL2 Boost algorithm), a variant on boosting with the squared error loss. SparseL2 Boost yields sparser solutions than the previously proposed L2 Boosting by minimizing some penalized L2 -loss functions, the FPE model selection criteria, through smallstep gradient descent. Although boosting may give already relatively sparse solutions, for example corresponding to the soft-thresholding estimator in orthogonal linear models, there is sometimes a desire for more sparseness to increase prediction accuracy and ability for better variable selection: such goals can be achieved with SparseL2 Boost. We prove an equivalence of SparseL2 Boost to Breiman’s nonnegative garrote estimator for orthogonal linear models and demonstrate the generic nature of SparseL2 Boost for nonparametric interaction modeling. For an automatic selection of the tuning parameter in SparseL2 Boost we propose to employ the gMDL model selection criterion which can also be used for early stopping of L2 Boosting. Consequently, we can select between SparseL2 Boost and L2 Boosting by comparing their gMDL scores. Keywords: lasso, minimum description length (MDL), model selection, nonnegative garrote, regression
H. Akaike. Statistical predictor identification. Ann. Inst. Statist. Math., 22:203, 1970. L. Breiman. Better subset regression using the nonnegative garrote. Technometrics, 37:373–384, 1995. L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996. L. Breiman. Arcing classifiers (with discussion). Ann. Statist., 26:801–849, 1998. L. Breiman. Prediction games & arcing algorithms. Neural Computation, 11:1493–1517, 1999. P. B¨ hlmann. Boosting for high-dimensional linear models. To appear in Ann. Statist., 34, 2006. u P. B¨ hlmann and B. Yu. Boosting with the l2 loss: regression and classification. J. Amer. Statist. u Assoc., 98:324–339, 2003. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression (with discussion). Ann. Statist., 32:407–451, 2004. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth Intern. Conf., pages 148–156. Morgan Kauffman, 1996. J. H. Friedman. Greedy function approximation: a gradient boosting machine. Ann.Statist., 29: 1189–1232, 2001. J. H. Friedman. Multivariate adaptive regression splines (with discussion). Ann.Statist., 19:1–141, 1991. J. H. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist., 28:337–407, 2000. P. J. Green and B. W. Silverman. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, 1994. M. Hansen and B. Yu. Model selection and minimum description length principle. J. Amer. Statist. Assoc., 96:746–774, 2001. M. Hansen and B. Yu. Minimum Description Length Model Selection Criteria for Generalized Linear Models. IMS Lecture Notes – Monograph Series, Vol. 40, 2002. M. Hansen and B. Yu. Bridging aic and bic: an mdl model selection criterion. In IEEE Information Theory Workshop on Detection, Imaging and Estimation; Santa Fe, 1999. 1023 ¨ B UHLMANN AND Y U G. Lugosi and N. Vayatis. On the bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist., 32:30–55 (disc. pp. 85–134), 2004. S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Proc., 41:3397–3415, 1993. N. Meinshausen. Lasso with relaxation. Technical report, 2005. G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for adaboost. Machine Learning, 42:287–320., a u 2001. G. R¨ tsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis a spaces. Machine Learning, 48:193–221, 2002. T. Speed and B. Yu. Model selection and prediction: normal regression. Ann. Inst. Statist. Math., 45:35–54, 1993. R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc., Ser. B, 58: 267–288, 1996. J. W. Tukey. Exploratory data analysis. Addison-Wesley, 1977. M. West, C. Blanchette, H. Dressman, E. Huang, S. Ishida, R. Spang, H. Zuzan, J. Olson, J. Marks, and J. Nevins. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Nat. Acad. Sci. (USA), 98:11462–11467, 2001. 1024