nips nips2009 nips2009-64 nips2009-64-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sylvain Arlot, Francis R. Bach
Abstract: This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows’ CL penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation. 1
[1] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[2] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2001. o
[3] O. Chapelle and V. Vapnik. Model selection for support vector machines. In Advances in Neural Information Processing Systems (NIPS), 1999.
[4] C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
[5] F. Bach. Consistency of the group Lasso and multiple kernel learning. Journal of Machine Learning Research, 9:1179–1225, 2008.
[6] L. Birg´ and P. Massart. Minimal penalties for Gaussian model selection. Probab. Theory e Related Fields, 138(1-2):33–73, 2007.
[7] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res., 10:245–279, 2009.
[8] P. Craven and G. Wahba. Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31(4):377– 403, 1978/79.
[9] G. Wahba. Spline Models for Observational Data. SIAM, 1990.
[10] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of The Royal Statistical Society Series B, 68(1):49–67, 2006.
[11] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res., 5:27–72, 2003/04.
[12] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of The Royal Statistical Society Series B, 58(1):267–288, 1996.
[13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. SpringerVerlag, 2001.
[14] D. M. Allen. The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16:125–127, 1974.
[15] M. Stone. Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B, 36:111–147, 1974.
[16] T. Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Comput., 17(9):2077–2098, 2005.
[17] C. L. Mallows. Some comments on Cp . Technometrics, 15:661–675, 1973.
[18] B. Efron. How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc., 81(394):461–470, 1986.
[19] Y. Cao and Y. Golubev. On oracle inequalities related to smoothing splines. Math. Methods Statist., 15(4):398–414 (2007), 2006.
[20] S. Arlot and F. Bach. Data-driven calibration of linear estimators with minimal penalties, September 2009. Long version. arXiv:0909.1884v1. ´
[21] E. Lebarbier. Detecting multiple change-points in the mean of a gaussian process by model selection. Signal Proces., 85:717–736, 2005.
[22] C. Maugis and B. Michel. Slope heuristics for variable selection and clustering via gaussian mixtures. Technical Report 6550, INRIA, 2008.
[23] K.-C. Li. Asymptotic optimality for Cp , CL , cross-validation and generalized cross-validation: discrete index set. Ann. Statist., 15(3):958–975, 1987. 9