nips nips2011 nips2011-109 nips2011-109-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dong Dai, Tong Zhang
Abstract: This paper considers the problem of combining multiple models to achieve a prediction accuracy not much worse than that of the best single model for least squares regression. It is known that if the models are mis-specified, model averaging is superior to model selection. Specifically, let n be the sample size, then the worst case regret of the former decays at the rate of O(1/n) while the worst √ case regret of the latter decays at the rate of O(1/ n). In the literature, the most important and widely studied model averaging method that achieves the optimal O(1/n) average regret is the exponential weighted model averaging (EWMA) algorithm. However this method suffers from several limitations. The purpose of this paper is to present a new greedy model averaging procedure that improves EWMA. We prove strong theoretical guarantees for the new procedure and illustrate our theoretical results with empirical examples. 1
[1] Jean-Yves Audibert. Progressive mixture rules are deviation suboptimal. In NIPS’07, 2008.
[2] Olivier Catoni. Statistical learning theory and stochastic optimization. Springer-Verlag, 2004.
[3] Arnak Dalalyan and Joseph Salmon. Optimal aggregation of affine estimators. In COLT’01, 2011.
[4] L.K. Jones. A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Statist., 20(1):608–613, 1992.
[5] Anatoli Juditsky, Philippe Rigollet, and Alexandre Tsybakov. Learning by mirror averaging. The Annals of Statistics, 36:2183–2206, 2008.
[6] Gilbert Leung and A.R. Barron. Information theory and mixing least-squares regressions. Information Theory, IEEE Transactions on, 52(8):3396 –3410, aug. 2006.
[7] Philippe Rigollet. Kullback-leibler aggregation and misspecified generalized linear models. arXiv:0911.2919, November 2010.
[8] Pilippe Rigollet and Alexandre Tsybakov. Exponential Screening and optimal rates of sparse estimation. The Annals of Statistics, 39:731–771, 2011.
[9] Yuhong Yang. Adaptive regression by mixing. Journal of the American Statistical Association, 96:574–588, 2001. 9