jmlr jmlr2012 jmlr2012-59 jmlr2012-59-reference knowledge-graph by maker-knowledge-mining

59 jmlr-2012-Linear Regression With Random Projections

Source: pdf

Author: Odalric-Ambrym Maillard, Rémi Munos

Abstract: We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of ﬁnite dimension P) of a given large (possibly inﬁnite) dimensional function space F , for example, L2 ([0, 1]d ; R). GP is deﬁned as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefﬁcients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F , and derive excess risk bounds for a speciﬁc regression algorithm (least squares regression in GP ). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments. Keywords: regression, random matrices, dimension reduction

reference text

Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal Of Computer And System Sciences, 66(4):671–687, June 2003. Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast johnson-lindenstrauss transform. In Proceedings Of The 38th Annual ACM Symposium On Theory Of Computing, STOC ’06, pages 557–563, New York, NY, USA, 2006. ACM. ISBN 1-59593-134-1. Pierre Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers. Mathematical Methods Of Statistics, 17(4):279–304, dec 2008. Pierre Alquier and Karim Lounici. PAC-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights. Electronic Journal Of Statistics, 5:127–145, 2011. Jean-Marie Aubry and St´ phane Jaffard. Random wavelet series. Communications In Mathematical e Physics, 227:483–514, 2002. Jean-Yves Audibert and Olivier Catoni. Robust linear regression through PAC-Bayesian truncation. arXiv, 2010. Andrew Barron, Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Approximation and learning by greedy algorithms. Annals Of Statistics, 36:1:64–94, 2008. Lucien Birg´ and Pascal Massart. Minimum contrast estimators on sieves: Exponential bounds and e rates of convergence. Bernoulli, 4(3):329–375, sep 1998. 2769 M AILLARD AND M UNOS Gerard Bourdaud. Ondelettes et espaces de besov. Rev. Mat. Iberoamericana, 11:3:477–512, 1995. Hans-Joachim Bungartz and Micha¨ l Griebel. Sparse grids. In Arieh Iserles, editor, Acta Numerica, e volume 13. University of Cambridge, 2004. Stphane Canu, Xavier Mary, and Alain Rakotomamonjy. Functional learning through kernel. In Advances In Learning Theory: Methods, Models And Applications NATO Science Series III: Computer And Systems Sciences, pages 89–110. IOS Press, 2002. Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. Foundations Of Computational Mathematics, 7:331–368, jul 2007. ISSN 1615-3375. Olivier Catoni. Statistical Learning Theory And Stochastic Optimization. Springer-Verlag, 2004. Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. In Proceedings Of The 19th Annual ACM Symposium On Theory Of Computing, STOC ’87, pages 1–6, New York, NY, USA, 1987. ACM. Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In Proceedings Of The 40th Annual ACM Symposium On Theory Of Computing, STOC ’08, pages 537–546, New York, NY, USA, 2008. ACM. S´ bastien Deguy and Albert Benassi. A ﬂexible noise model for designing maps. In Proceedings Of e The Vision Modeling And Visualization Conference 2001, VMV ’01, pages 299–308. Aka GmbH, 2001. ISBN 3-89838-028-9. Ronald DeVore. Nonlinear Approximation. Acta Numerica, 1997. Richard M. Dudley. Real Analysis And Probability. Wadsworth, Belmont, Calif, 1989. Arnaud Durand. Random wavelet series based on a tree-indexed Markov chain. Communications In Mathematical Physics, 283:451–477, 2008. Micha¨ l Frazier and Bj¨ rn Jawerth. Decomposition of Besov spaces. Indiana University Mathee o matics Journal, (34), 1985. L´ szl´ Gy¨ rﬁ, Micha¨ l Kohler, Adam Krzy˙ ak, and Harro Walk. A Distribution-free Theory Of a o o e z Nonparametric Regression. Springer-Verlag, New York, 2002. Svante Janson. Gaussian Hilbert Spaces. Cambridge Univerity Press, Cambridge, UK, 1997. Vladimir Koltchinskii. Local rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593–2656, 2006. Edo Liberty, Nir Ailon, and Amit Singer. Dense fast random projections and lean Walsh transforms. In Proceedings Of The 11th International Workshop, APPROX 2008, And 12th International Workshop, RANDOM 2008 On Approximation, Randomization And Combinatorial Optimization: Algorithms And Techniques, APPROX ’08 / RANDOM ’08, pages 512–522, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 978-3-540-85362-6. 2770 L INEAR R EGRESSION W ITH R ANDOM P ROJECTIONS Mikhail A. Lifshits. Gaussian Random Functions. Kluwer Academic Publishers, Dordrecht, Boston, 1995. Odalric-Ambrym Maillard and R´ mi Munos. Compressed least-squares regression. In Yoshua e Bengio, Dale Schuurmans, John D. Lafferty, Chris K. I. Williams, and Aron Culotta, editors, Proceedings Of The 23rd Conference On Advances In Neural Information Processing Systems, NIPS ’09, pages 1213–1221, Vancouver, British Columbia, Canada, dec 2009. Odalric-Ambrym Maillard and R´ mi Munos. Scrambled objects for least-squares regression. In e John D. Lafferty, Chris K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and Aron Culotta, editors, Proceedings Of The 24th Conference On Advances In Neural Information Processing Systems, NIPS ’10, pages 1549–1557, Vancouver, British Columbia, Canada, dec 2010. Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In John C. Platt, Daphne Koller, Yoram Singer, and Sam T. Roweis, editors, Proceedings Of The 21st Conference On Advances In Neural Information Processing Systems, NIPS ’07, Vancouver, British Columbia, Canada, dec 2007. MIT Press. Ali Rahimi and Benjamin Recht. Uniform approximation of functions with random bases. Proceedings Of The 46th Annual Allerton Conference, 2008. Saburou Saitoh. Theory Of Reproducing Kernels And Its Applications. Longman Scientiﬁc & Technical, Harlow, UK, 1988. Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceedings Of The 47th Annual IEEE Symposium On Foundations Of Computer Science., FOCS ’06, pages 143–152, 2006. Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances In Neural Information Processing Systems 8, pages 1038–1044. MIT Press, 1996. Richard S. Sutton and Steven D. Whitehead. Online learning with random representations. In In Proceedings Of The 10th International Conference On Machine Learning, ICML ’93, pages 314–321. Morgan Kaufmann, 1993. Robert Tibshirani. Regression shrinkage and selection via the Lasso. Journal Of The Royal Statistical Society, Series B, 58:267–288, 1994. Andrei N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4, pages 1035–1038, 1963. Alexandre B. Tsybakov. Optimal rates of aggregation. In Proceedings Of The 16th Annual Conference On Learning Theory, pages 303–313, 2003. Adriaan Zaanen. Linear Analysis. North Holland Publishing, 1960. Christoph Zenger. Sparse grids. In W. Hackbusch, editor, Parallel Algorithms For Partial Differential Equations, Proceedings Of The Sixth GAMM-Seminar, volume 31 of Notes on Num. Fluid Mechanics, Kiel, 1990. Vieweg-Verlag. 2771 M AILLARD AND M UNOS Bin Zhao and Changshui Zhang. Compressed spectral clustering. In Proceedings Of The 2009 IEEE International Conference On Data Mining Workshops, ICDMW ’09, pages 344–349, Washington, DC, USA, 2009. IEEE Computer Society. 2772