jmlr jmlr2006 jmlr2006-16 jmlr2006-16-reference knowledge-graph by maker-knowledge-mining

16 jmlr-2006-Bounds for Linear Multi-Task Learning

Source: pdf

Author: Andreas Maurer

Abstract: We give dimension-free and data-dependent bounds for linear multi-task learning where a common linear operator is chosen to preprocess data for a vector of task speciﬁc linear-thresholding classiﬁers. The complexity penalty of multi-task learning is bounded by a simple expression involving the margins of the task-speciﬁc classiﬁers, the Hilbert-Schmidt norm of the selected preprocessor and the Hilbert-Schmidt norm of the covariance operator for the total mixture of all task distributions, or, alternatively, the Frobenius norm of the total Gramian matrix for the data-dependent version. The results can be compared to state-of-the-art results on linear single-task learning. Keywords: learning to learn, transfer learning, multi-task learning

reference text

[1] R. K. Ando, T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6: 1817-1853, 2005. 138 L INEAR M ULTI -TASK L EARNING

[2] M. Anthony and P. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, UK, 1999.

[3] P. L. Bartlett and S. Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3: 463-482, 2002.

[4] P. Bartlett, O. Bousquet and S. Mendelson. Local Rademacher complexities. Available online: http://www.stat.berkeley.edu/˜bartlett/papers/bbm-lrc-02b.pdf.

[5] P. Baxendale. Gaussian measures on function spaces. Amer. J. Math., 98:891-952, 1976.

[6] J. Baxter. Theoretical Models of Learning to Learn, in Learning to Learn, S.Thrun, L.Pratt Eds. Springer 1998.

[7] J. Baxter. A Model of Inductive Bias Learning. Journal of Artiﬁcial Intelligence Research 12: 149-198, 2000.

[8] S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In COLT 03, 2003.

[9] R. Caruana. Multitask Learning, in Learning to Learn, S.Thrun, L.Pratt Eds. Springer 1998.

[10] Nello Cristianini and John Shawe-Taylor. Support Vector Machines. Cambridge University Press, 2000.

[11] T. Evgeniou and M. Pontil. Regularized multi-task learning. Proc. Conference on Knowledge Discovery and Data Mining, 2004.

[12] V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classiﬁers. The Annals of Statistics, Vol. 30, No 1, 1-50.

[13] Colin McDiarmid. Concentration, in Probabilistic Methods of Algorithmic Discrete Mathematics, p. 195-248. Springer, Berlin, 1998.

[14] C. A. Miccheli and M. Pontil. Kernels for multi-task learning. Available online, 2005.

[15] S.Mika, B.Sch¨ lkopf, A.Smola, K.-R.M¨ ller, M.Scholz and G.R¨ tsch. Kernel PCA and Deo u a noising in Feature Spaces. Advances in Neural Information Processing Systems 11, 1998.

[16] J. Shawe-Taylor, N. Cristianini. Estimating the moments of a random vector. Proceedings of GRETSI 2003 Conference, I: 47–52, 2003.

[17] Michael Reed and Barry Simon. Functional Analysis, part I of Methods of Mathematical Physics, Academic Press, 1980.

[18] S. Thrun. Lifelong Learning Algorithms, in Learning to Learn, S.Thrun, L.Pratt Eds. Springer 1998 139