nips nips2010 nips2010-233 nips2010-233-reference knowledge-graph by maker-knowledge-mining

233 nips-2010-Scrambled Objects for Least-Squares Regression


Source: pdf

Author: Odalric Maillard, Rémi Munos

Abstract: We consider least-squares regression using a randomly generated subspace GP ⊂ F of finite dimension P , where F is a function space of infinite dimension, e.g. L2 ([0, 1]d ). GP is defined as the span of P random features that are linear combinations of the basis functions of F weighted by random Gaussian i.i.d. coefficients. In particular, we consider multi-resolution random combinations at all scales of a given mother function, such as a hat function or a wavelet. In this latter case, the resulting Gaussian objects are called scrambled wavelets and we show that they enable to approximate functions in Sobolev spaces H s ([0, 1]d ). As a result, given N data, the least-squares estimate g built from P scrambled wavelets has excess risk ||f ∗ − g||2 = O(||f ∗ ||2 s ([0,1]d ) (log N )/P + P (log N )/N ) for P H target functions f ∗ ∈ H s ([0, 1]d ) of smoothness order s > d/2. An interesting aspect of the resulting bounds is that they do not depend on the distribution P from which the data are generated, which is important in a statistical regression setting considered here. Randomization enables to adapt to any possible distribution. We conclude by describing an efficient numerical implementation using lazy ex˜ pansions with numerical complexity O(2d N 3/2 log N + N 2 ), where d is the dimension of the input space. 1


reference text

[1] Andrew Barron, Albert Cohen, Wolfgang Dahmen, and Ronald Devore. Approximation and learning by greedy algorithms. 36:1:64–94, 2008.

[2] Gerard Bourdaud. Ondelettes et espaces de besov. Rev. Mat. Iberoamericana, 11:3:477–512, 1995.

[3] Hans-Joachim Bungartz and Michael Griebel. Sparse grids. In Arieh Iserles, editor, Acta Numerica, volume 13. University of Cambridge, 2004.

[4] St´ phane Canu, Xavier Mary, and Alain Rakotomamonjy. Functional learning through kernel. e arXiv, 2009, October.

[5] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. In STOC ’87: Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 1–6, New York, NY, USA, 1987. ACM.

[6] R. DeVore. Nonlinear Approximation. Acta Numerica, 1997.

[7] L. Gy¨ rfi, M. Kohler, A. Krzy˙ ak, and H. Walk. A distribution-free theory of nonparametric o z regression. Springer-Verlag, 2002.

[8] St´ phane Jaffard. D´ compositions en ondelettes. In Development of mathematics 1950–2000, e e pages 609–634. Birkh¨ user, Basel, 2000. a

[9] Svante Janson. Gaussian Hilbert spaces. Cambridge Univerity Press, Cambridge, UK, 1997.

[10] Odalric-Ambrym Maillard and R´ mi Munos. Compressed Least-Squares Regression. In NIPS e 2009, Vancouver Canada, 2009.

[11] Odalric-Ambrym Maillard and R´ mi Munos. Linear regression with random projections. Teche nical report, Hal INRIA: http://hal.archives-ouvertes.fr/inria-00483014/, 2010.

[12] Stephane Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1999.

[13] Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In John C. Platt, Daphne Koller, Yoram Singer, Sam T. Roweis, John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, NIPS. MIT Press, 2007.

[14] Ali Rahimi and Benjamin Recht. Uniform approximation of functions with random bases. 2008.

[15] S. Saitoh. Theory of reproducing Kernels and its applications. Longman Scientific & Technical, Harlow, UK, 1988.

[16] Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.

[17] A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4, pages 1035–1038, 1963.

[18] C. Zenger. Sparse grids. In W. Hackbusch, editor, Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, volume 31 of Notes on Num. Fluid Mech., Kiel, 1990. Vieweg-Verlag. 9