jmlr jmlr2009 jmlr2009-21 jmlr2009-21-reference knowledge-graph by maker-knowledge-mining

21 jmlr-2009-Data-driven Calibration of Penalties for Least-Squares Regression

Source: pdf

Author: Sylvain Arlot, Pascal Massart

Abstract: Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely data-driven calibration algorithm for these parameters in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birg´ and Massart (2007) in the context of penalized least squares for Gaussian hoe moscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a datadriven penalty (the slope heuristics) and proving that it works for penalized least-squares regression with a random design, even for heteroscedastic non-Gaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram bin-width selection. This is at least a ﬁrst step towards further results, since the approach and the method that we use are indeed general. Keywords: data-driven calibration, non-parametric regression, model selection by penalization, heteroscedastic data, regressogram

reference text

Hirotugu Akaike. Statistical predictor identiﬁcation. Ann. Inst. Statist. Math., 22:203–217, 1970. 276 DATA - DRIVEN C ALIBRATION OF P ENALTIES Hirotugu Akaike. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971), pages 267–281. Akad´ miai Kiad´ , Budapest, 1973. e o David M. Allen. The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16:125–127, 1974. Sylvain Arlot. Resampling and Model Selection. PhD thesis, University Paris-Sud 11, December 2007. oai:tel.archives-ouvertes.fr:tel-00198803 v1. Sylvain Arlot. Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression, December 2008a. arXiv:0812.3141. Sylvain Arlot. V -fold cross-validation improved: arXiv:0802.0566v2. V -fold penalization, February 2008b. Sylvain Arlot. Model selection by resampling penalization, March 2008c. ouvertes.fr:hal-00262478 v1. oai:hal.archives- Yannick Baraud. Model selection for regression on a ﬁxed design. Probab. Theory Related Fields, 117(4):467–493, 2000. Yannick Baraud. Model selection for regression on a random design. ESAIM Probab. Statist., 6: 127–146 (electronic), 2002. Yannick Baraud, Christophe Giraud, and Sylvie Huet. Gaussian model selection with unknown variance, 2007. To appear in The Annals of Statistics. arXiv:math.ST/0701250. Andrew Barron, Lucien Birg´ , and Pascal Massart. Risk bounds for model selection via penalizae tion. Probab. Theory Related Fields, 113(3):301–413, 1999. Peter L. Bartlett, St´ phane Boucheron, and G´ bor Lugosi. Model selection and error estimation. e a Machine Learning, 48:85–113, 2002. Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local Rademacher complexities. Ann. Statist., 33(4):1497–1537, 2005. Jean-Patrick Baudry. Clustering through model selection criteria, June 2007. Poster session at One Day Statistical Workshop in Lisieux. http://www.math.u-psud.fr/∼baudry. Lucien Birg´ and Pascal Massart. Gaussian model selection. J. Eur. Math. Soc. (JEMS), 3(3): e 203–268, 2001. Lucien Birg´ and Pascal Massart. Minimal penalties for Gaussian model selection. Probab. Theory e Related Fields, 138(1-2):33–73, 2007. St´ phane Boucheron and Pascal Massart. A poor man’s wilks phenomenon, March 2008. Personal e communication. Prabir Burman. Estimation of equifrequency histograms. Statist. Probab. Lett., 56(3):227–238, 2002. 277 A RLOT AND M ASSART Imre Csisz´ r. Large-scale typicality of Markov sample paths and consistency of MDL order estimaa tors. IEEE Trans. Inform. Theory, 48(6):1616–1628, 2002. Imre Csisz´ r and Paul C. Shields. The consistency of the BIC Markov order estimator. Ann. Statist., a 28(6):1601–1619, 2000. Luc Devroye and G´ bor Lugosi. Combinatorial Methods in Density Estimation. Springer Series in a Statistics. Springer-Verlag, New York, 2001. Bradley Efron. Estimating the error rate of a prediction rule: improvement on cross-validation. J. Amer. Statist. Assoc., 78(382):316–331, 1983. Seymour Geisser. The predictive sample reuse method with applications. J. Amer. Statist. Assoc., 70:320–328, 1975. Edward I. George and Dean P. Foster. Biometrika, 87(4):731–747, 2000. Calibration and empirical Bayes variable selection. Clifford M. Hurvich and Chih-Ling Tsai. Regression and time series model selection in small samples. Biometrika, 76(2):297–307, 1989. Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory, 47(5):1902–1914, 2001. Vladimir Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist., 34(6):2593–2656, 2006. Marc Lavielle. Using penalized contrasts for the change-point problem. Signal Proces., 85(8): 1501–1510, 2005. ´ Emilie Lebarbier. Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Proces., 85:717–736, 2005. Guillaume Lecu´ . M´ thodes d’Agr´ gation : Optimalit´ et Vitesses Rapides. PhD thesis, LPMA, e e e e University Paris VII, May 2007. Vincent Lepez. Some Estimation Problems Related to Oil Reserves. PhD thesis, University Paris XI, 2002. Ker-Chau Li. Asymptotic optimality for C p , CL , cross-validation and generalized cross-validation: discrete index set. Ann. Statist., 15(3):958–975, 1987. Fernando Lozano. Model selection using Rademacher penalization. In Proceedings of the 2nd ICSC Symp. on Neural Computation (NC2000). Berlin, Germany. ICSC Academic Press, 2000. Colin L. Mallows. Some comments on C p . Technometrics, 15:661–675, 1973. Pascal Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Cathy Maugis and Bertrand Michel. A non asymptotic penalized criterion for gaussian mixture model selection. Technical Report 6549, INRIA, 2008. 278 DATA - DRIVEN C ALIBRATION OF P ENALTIES Boris T. Polyak and Alexandre B. Tsybakov. Asymptotic optimality of the C p -test in the projection estimation of a regression. Teor. Veroyatnost. i Primenen., 35(2):305–317, 1990. Xiaotong Shen and Jianming Ye. Adaptive model selection. J. Amer. Statist. Assoc., 97(457):210– 221, 2002. Ritei Shibata. An optimal selection of regression variables. Biometrika, 68(1):45–54, 1981. Charles J. Stone. An asymptotically optimal histogram selection rule. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berkeley, Calif., 1983), Wadsworth Statist./Probab. Ser., pages 513–520, Belmont, CA, 1985. Wadsworth. M. Stone. Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B, 36:111–147, 1974. Nariaki Sugiura. Further analysis of the data by Akaike’s information criterion and the ﬁnite corrections. Comm. Statist. A—Theory Methods, 7(1):13–26, 1978. Alexandre B. Tsybakov. Optimal aggregation of classiﬁers in statistical learning. Ann. Statist., 32 (1):135–166, 2004. Nicolas Verzelen. Gaussian Graphical Models and Model Selection. PhD thesis, University Paris XI, December 2008. Fanny Villers. Tests et S´ lection de Mod` les pour l’Analyse de Donn´ es Prot´ omiques et Transcripe e e e tomiques. PhD thesis, University Paris XI, December 2007. 279