jmlr jmlr2011 jmlr2011-40 jmlr2011-40-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stéphane Gaïffas, Guillaume Lecué
Abstract: Given a finite set F of functions and a learning sample, the aim of an aggregation procedure is to have a risk as close as possible to risk of the best function in F. Up to now, optimal aggregation procedures are convex combinations of every elements of F. In this paper, we prove that optimal aggregation procedures combining only two functions in F exist. Such algorithms are of particular interest when F contains many irrelevant functions that should not appear in the aggregation procedure. Since selectors are suboptimal aggregation procedures, this proves that two is the minimal number of elements of F required for the construction of an optimal aggregation procedure in every situations. Then, we perform a numerical study for the problem of selection of the regularization parameters of the Lasso and the Elastic-net estimators. We compare on simulated examples our aggregation algorithms to aggregation with exponential weights, to Mallow’s Cp and to cross-validation selection procedures. Keywords: aggregation, exact oracle inequality, empirical risk minimization, empirical process theory, sparsity, Lasso, Lars
Radosław Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab., 13:no. 34, 1000–1034, 2008. ISSN 1083-6489. Jean-Yves Audibert. Fast learning rates in statistical inference through aggregation. Ann. Statist., 37:1591, 2009. URL doi:10.1214/08-AOS623. Olivier Catoni. Statistical Learning Theory and Stochastic Optimization. Ecole d’´ t´ de Probabilit´ s ee e de Saint-Flour 2001, Lecture Notes in Mathematics. Springer, N.Y., 2001. Arnak S. Dalalyan and Alexandre B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In COLT, pages 97–111, 2007. Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. Ann. Statist., 32(2):407–499, 2004. ISSN 0090-5364. With discussion, and a rejoinder by the authors. Evarist Gin´ and Joel Zinn. Some limit theorems for empirical processes. Ann. Probab., 12(4): e 929–998, 1984. ISSN 0091-1798. Anatoli Juditsky, Alexander V. Nazin, Alexandre B. Tsybakov, and Nicolas Vayatis. Recursive aggregation of estimators by the mirror descent method with averaging. Problemy Peredachi Informatsii, 41(4):78–96, 2005. ISSN 0555-2923. Anatoli Juditsky, Philippe Rigollet, and Alexandre B. Tsybakov. Learning by mirror averaging. Ann. Statist., 36(5):2183–2206, 2008. ISSN 0090-5364. doi: 10.1214/07-AOS546. URL http: //dx.doi.org/10.1214/07-AOS546. 1832 H YPER -S PARSE O PTIMAL AGGREGATION Guillaume Lecu´ and Shahar Mendelson. e Aggregation via empirical risk minimization. Probab. Theory Related Fields, 145(3-4):591–613, 2009. ISSN 0178-8051. doi: 10.1007/ s00440-008-0180-8. URL https://dx.doi.org/10.1007/s00440-008-0180-8. Guillaume Lecu´ and Shahar Mendelson. On the optimality of the aggregate with exponential e weights for small temperatures. Submitted, 2010. Gilbert Leung and Andrew R. Barron. Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory, 52(8):3396–3410, 2006. ISSN 0018-9448. Nicolai Meinshausen and Peter B¨ hlmann. Stability selection. Journal of the Royal Statistical u Society: Series B (Statistical Methodology), 72(4):417–473, 2010. Michel Talagrand. New concentration inequalities in product spaces. Invent. Math., 126(3):505– 563, 1996. ISSN 0020-9910. Robert Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58 (1):267–288, 1996. ISSN 0035-9246. Alexandre. B. Tsybakov. Optimal rates of aggregation. Computational Learning Theory and Kernel Machines. B.Sch¨ lkopf and M.Warmuth, eds. Lecture Notes in Artificial Intelligence, 2777:303– o 313, 2003. Springer, Heidelberg. Yuhong Yang. Mixing strategies for density estimation. Ann. Statist., 28(1):75–87, 2000. ISSN 0090-5364. doi: 10.1214/aos/1016120365. URL http://dx.doi.org/10.1214/aos/ 1016120365. Yuhong Yang. Aggregating regression procedures to improve performance. Bernoulli, 10(1):25–47, 2004. ISSN 1350-7265. Hui Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429, 2006. Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol., 67(2):301–320, 2005. ISSN 1369-7412. doi: 10.1111/j.1467-9868.2005. 00503.x. URL http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x. 1833