nips nips2012 nips2012-231 nips2012-231-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hachem Kadri, Alain Rakotomamonjy, Philippe Preux, Francis R. Bach
Abstract: Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an r -norm constraint on the combination coefficients (r ≥ 1). The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinate-descent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces. 1
[1] J. Aflalo, A. Ben-Tal, C. Bhattacharyya, J. Saketha Nath, and S. Raman. Variable sparsity kernel learning. JMLR, 12:565–592, 2011.
[2] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
[3] F. Bach. Consistency of the group Lasso and multiple kernel learning. JMLR, 9:1179–1225, 2008.
[4] C. Brouard, F. d’Alch´ -Buc, and M. Szafranski. Semi-supervised penalized output kernel regression for e link prediction. In Proc. ICML, 2011.
[5] A. Caponnetto, C. A. Micchelli, M. Pontil, and Y. Ying. Universal multi-task kernels. JMLR, 68:1615– 1646, 2008.
[6] C. Carmeli, E. De Vito, and A. Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and mercer theorem. Analysis and Applications, 4:377–408, 2006.
[7] C. Carmeli, E. De Vito, and A. Toigo. Vector valued reproducing kernel Hilbert spaces and universality. Analysis and Applications, 8:19–61, 2010.
[8] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In Proc. UAI, 2009.
[9] C. Cortes, M. Mohri, and A. Rostamizadeh. Generalization bounds for learning kernels. In ICML, 2010.
[10] F. Dinuzzo, C. S. Ong, P. Gehler, and G. Pillonetto. Learning output kernels with block coordinate descent. In Proc. ICML, 2011.
[11] T. Evgeniou, C. A. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. JMLR, 6:615–637, 2005.
[12] H. Kadri, E. Duflos, P. Preux, S. Canu, and M. Davy. Nonlinear functional regression: a functional RKHS approach. In Proc. AISTATS, pages 111–125, 2010.
[13] H. Kadri, A. Rabaoui, P. Preux, E. Duflos, and A. Rakotomamonjy. Functional regularized least squares classification with operator-valued kernels. In Proc. ICML, 2011.
[14] H. Kadri, A. Rakotomamonjy, F. Bach, and P. Preux. Multiple operator-valued kernel learning. Technical Report 00677012, INRIA, 2012.
[15] M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. 2011. p -norm multiple kernel learning. JMLR, 12:953–997,
[16] S. Kurcyusz. On the existence and nonexistence of lagrange multipliers in Banach spaces. Journal of Optimization Theory and Applications, 20:81–110, 1976.
[17] A. Kurdila and M. Zabarankin. Convex Functional Analysis. Birkhauser Verlag, 2005.
[18] G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. Jordan. Learning the kernel matrix with semi-definite programming. JMLR, 5:27–72, 2004.
[19] H. Lian. Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces. The Canadian Journal of Statistics, 35:597–606, 2007.
[20] C. Micchelli and M. Pontil. Learning the kernel function via regularization. JMLR, 6:1099–1125, 2005.
[21] C. A. Micchelli and M. Pontil. On learning vector-valued functions. Neural Comput., 17:177–204, 2005.
[22] K. J. Miller and G. Schalk. Prediction of finger flexion: 4th brain-computer interface data competition. BCI Competition IV, 2008.
[23] T. Pistohl, T. Ball, A. Schulze-Bonhage, A. Aertsen, and C. Mehring. Prediction of arm movement trajectories from ECoG-recordings in humans. Journal of Neuroscience Methods, 167(1):105–114, 2008.
[24] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. SimpleMKL. JMLR, 9:2491–2521, 2008.
[25] J. O. Ramsay and B. W. Silverman. Functional Data Analysis, 2nd ed. Springer Verlag, New York, 2005.
[26] John A. Rice and B. W. Silverman. Estimating the mean and covariance structure nonparametrically when the data are curves. Journal of the Royal Statistical Society. Series B, 53(1):233–243, 1991.
[27] G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, and J. R. Wolpaw. BCI2000: a generalpurpose brain-computer interface system. Biomedical Engineering, IEEE Trans. on, 51:1034–1043, 2004.
[28] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optio mization, and Beyond. MIT Press, Cambridge, MA, USA, 2002.
[29] S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. JMLR, a a o 7:1531–1565, 2006.
[30] P. Tseng. Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl., 109:475–494, 2001. 9