nips nips2012 nips2012-34 nips2012-34-reference knowledge-graph by maker-knowledge-mining

34 nips-2012-Active Learning of Multi-Index Function Models


Source: pdf

Author: Tyagi Hemant, Volkan Cevher

Abstract: We consider the problem of actively learning multi-index functions of the form k f (x) = g(Ax) = i=1 gi (aT x) from point evaluations of f . We assume that i the function f is defined on an 2 -ball in Rd , g is twice continuously differentiable almost everywhere, and A ∈ Rk×d is a rank k matrix, where k d. We propose a randomized, active sampling scheme for estimating such functions with uniform approximation guarantees. Our theoretical developments leverage recent techniques from low rank matrix recovery, which enables us to derive an estimator of the function f along with sample complexity bounds. We also characterize the noise robustness of the scheme, and provide empirical evidence that the highdimensional scaling of our sample complexity bounds are quite accurate. 1


reference text

[1] P. B¨ hlmann and S. Van De Geer. Statistics for High-Dimensional Data: Methods, Theory and u Applications. Springer-Verlag New York Inc, 2011.

[2] L. Carin, R.G. Baraniuk, V. Cevher, D. Dunson, M.I. Jordan, G. Sapiro, and M.B. Wakin. Learning low-dimensional signal models. Signal Processing Magazine, IEEE, 28(2):39–51, 2011.

[3] M. Hristache, A. Juditsky, J. Polzehl, and V. Spokoiny. Structure adaptive approach for dimension reduction. The Annals of Statistics, 29(6):1537–1566, 2001.

[4] K.C. Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, pages 316–327, 1991.

[5] P. Hall and K.C. Li. On almost linearity of low dimensional projections from high dimensional data. The Annals of Statistics, pages 867–889, 1993.

[6] Y. Xia, H. Tong, WK Li, and L.X. Zhu. An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):363–410, 2002.

[7] Y. Xia. A multiple-index model and dimension reduction. Journal of the American Statistical Association, 103(484):1631–1640, 2008. 8

[8] Y. Lin and H.H. Zhang. Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5):2272–2297, 2006.

[9] L. Meier, S. Van De Geer, and P. B¨ hlmann. High-dimensional additive modeling. The Annals u of Statistics, 37(6B):3779–3821, 2009.

[10] G. Raskutti, M. J. Wainwright, and B. Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Technical Report, UC Berkeley, Department of Statistics, August 2010.

[11] P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5):1009–1030, 2009.

[12] V. Koltchinskii and M. Yuan. Sparsity in multiple kernel learning. The Annals of Statistics, 38(6):3660–3695, 2010.

[13] A. Cohen, I. Daubechies, R. A. DeVore, G. Kerkyacharian, and D. Picard. Capturing ridge functions in high dimensions from point queries. Constr. Approx., pages 1–19, 2011.

[14] M. Fornasier, K. Schnass, and J. Vyb´ral. Learning functions of few arbitrary linear parameters ı in high dimensions. Preprint, 2010.

[15] H. Tyagi and V. Cevher. Learning ridge functions with randomized sampling in high dimensions. In ICASSP, 2011.

[16] J.F. Traub, G.W Wasilkowski, and H. Wozniakowski. Information-based complexity. Academic Press, New York, 1988.

[17] R. DeVore and G.G. Lorentz. Constructive approximation. vol. 303, Grundlehren, Springer Verlag, N.Y., 1993.

[18] E.Novak and H.Woniakowski. Approximation of infinitely differentiable multivariate functions is intractable. J. Complex., 25:398–404, August 2009.

[19] W. Hardle. Applied nonparametric regression, volume 26. Cambridge Univ Press, 1990.

[20] J.H. Friedman and W. Stuetzel. Projection pursuit regression. J. Amer. Statist. Assoc., 76:817– 823, 1981.

[21] D.L. Donoho and I.M. Johnstone. Projection based regression and a duality with kernel methods. Ann. Statist., 17:58–106, 1989.

[22] P.J. Huber. Projection pursuit. Ann. Statist., 13:435–475, 1985.

[23] A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, 1999.

[24] E.J Cand` s. Harmonic analysis of neural networks. Appl. Comput. Harmon. Anal., 6(2):197– e 218, 1999.

[25] N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. To appear in the IEEE Trans. on Information Theory, 2012.

[26] Hemant Tyagi and Volkan Cevher. Learning non-parametric basis independent models from point queries via low-rank methods. Technical Report, Infoscience EPFL, 2012.

[27] E.J. Cand` s and B. Recht. Exact matrix completion via convex optimization. Foundations of e Computational Mathematics, 9(6):717–772, 2009.

[28] E.J. Cand` s and T. Tao. The power of convex relaxation: near-optimal matrix completion. e IEEE Trans. Inf. Theor., 56:2053–2080, May 2010.

[29] E.J. Cand` s and Y. Plan. Tight oracle bounds for low-rank matrix recovery from a minimal e number of random measurements. CoRR, abs/1001.0339, 2010.

[30] B. Recht, M. Fazel, and P.A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM REVIEW, 52:471–501, 2010.

[31] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338. 9