nips nips2007 nips2007-99 nips2007-99-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marie Szafranski, Yves Grandvalet, Pierre Morizet-mahoudeaux
Abstract: Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords – Optimization: constrained and convex optimization. Supervised learning: regression, kernel methods, sparsity and feature selection. 1
[1] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1):267–288, 1996.
[2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.
[3] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B, 68(1):49–67, 2006. 7 combined h1 =10−1 h3 =1 1 h4 =10 2 h5 =10 ν=10 ν=25 ν=50 Figure 3: Hierarchical penalization applied to kernel smoothing on the motorcycle data. Combined: the points represent data and the solid line the function of estimated responses. Isolated bandwidths: the points represent partial residuals and the solid line represents the contribution of the bandwidth to the model.
[4] Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003.
[5] M. R. Osborne, B. Presnell, and B. A. Turlach. On the lasso and its dual. Journal of Computational and Graphical Statistics, 9(2):319–337, June 2000.
[6] C.L. Blake D.J. Newman, S. Hettich and C.J. Merz. UCI repository of machine learning databases, 1998. URL http://www.ics.uci.edu/˜mlearn/MLRepository.html.
[7] Delve: Data for evaluating learning in valid experiments. URL http://www.cs.toronto. edu/˜delve/.
[8] G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, and W. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20:2626–2635, 2004.
[9] W. H¨ rdle. Applied Nonparametric Regression, volume 19. Economic Society Monographs, a 1990. 8