nips nips2010 nips2010-199 nips2010-199-reference knowledge-graph by maker-knowledge-mining

199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression


Source: pdf

Author: Gilles Blanchard, Nicole Krämer

Abstract: We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the effective dimensionality of the data mapped into the kernel space. Lower bounds on attainable rates depending on these two quantities were established in earlier literature, and we obtain upper bounds for the considered method that match these lower bounds (up to a log factor) if the true regression function belongs to the reproducing kernel Hilbert space. If this assumption is not fulfilled, we obtain similar convergence rates provided additional unlabeled data are available. The order of the learning rates match state-of-the-art results that were recently obtained for least squares support vector machines and for linear regularization operators. 1


reference text

[1] F. Bauer, S. Pereverzev, and L. Rosasco. On Regularization Algorithms in Learning Theory. Journal of Complexity, 23:52–72, 2007.

[2] N. Bissantz, T. Hohage, A. Munk, and F. Ruymgaart. Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications. SIAM Journal on Numerical Analysis, 45(6):2610–2636, 2007.

[3] G. Blanchard and N. Kr¨ mer. Kernel Partial Least Squares is Universally Consistent. Proa ceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop & Conference Proceedings, 9:57–64, 2010.

[4] G. Blanchard and P. Massart. Discussion of V. Koltchinskii’s ”Local Rademacher complexities and oracle inequalities in risk minimization”. Annals of Statistics, 34(6):2664–2671, 2006.

[5] A. Caponnetto. Optimal Rates for Regularization Operators in Learning Theory. Technical Report CBCL Paper 264/ CSAIL-TR 2006-062, Massachusetts Institute of Technology, 2006.

[6] A. Caponnetto and E. De Vito. Optimal Rates for Regularized Least-squares Algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.

[7] A. Caponnetto and Y. Yao. Cross-validation based Adaptation for Regularization Operators in Learning Theory. Analysis and Applications, 8(2):161–183, 2010.

[8] H. Chun and S. Keles. Sparse Partial Least Squares for Simultaneous Dimension Reduction and Variable Selection. Journal of the Royal Statistical Society B, 72(1):3–25, 2010.

[9] E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, and F. Odone. Learning from Examples as an Inverse Problem. Journal of Machine Learning Research, 6(1):883, 2006.

[10] L. Gy¨ rfi, M. Kohler, A. Krzyzak, and H. Walk. A Distribution-Free Theory of Nonparametric o Regression. Springer, 2002.

[11] M. Hanke. Conjugate Gradient Type Methods for Linear Ill-posed Problems. Pitman Research Notes in Mathematics Series, 327, 1995.

[12] L. Lo Gerfo, L. Rosasco, E. Odone, F.and De Vito, and A. Verri. Spectral Algorithms for Supervised Learning. Neural Computation, 20:1873–1897, 2008.

[13] S. Mendelson and J. Neeman. Regularization in Kernel Learning. The Annals of Statistics, 38(1):526–565, 2010.

[14] P. Naik and C.L. Tsai. Partial Least Squares Estimator for Single-index Models. Journal of the Royal Statistical Society B, 62(4):763–771, 2000.

[15] A. S. Nemirovskii. The Regularizing Properties of the Adjoint Gradient Method in Ill-posed Problems. USSR Computational Mathematics and Mathematical Physics, 26(2):7–16, 1986.

[16] C. S. Ong. Kernels: Regularization and Optimization. Doctoral dissertation, Australian National University, 2005.

[17] C. S. Ong, X. Mary, S. Canu, and A. J. Smola. Learning with Non-positive Kernels. In Proceedings of the 21st International Conference on Machine Learning, pages 639 – 646, 2004.

[18] R. Rosipal and L.J. Trejo. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 2:97–123, 2001.

[19] R. Rosipal, L.J. Trejo, and B. Matthews. Kernel PLS-SVC for Linear and Nonlinear Classification. In Proceedings of the Twentieth International Conference on Machine Learning, pages 640–647, Washington, DC, 2003.

[20] I. Steinwart, D. Hush, and C. Scovel. Optimal Rates for Regularized Least Squares Regression. In Proceedings of the 22nd Annual Conference on Learning Theory, pages 79–93, 2009.

[21] S. Wold, H. Ruhe, H. Wold, and W.J. Dunn III. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses. SIAM Journal of Scientific and Statistical Computations, 5:735–743, 1984.

[22] T. Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17(9):2077–2098, 2005. 9