jmlr jmlr2005 jmlr2005-45 jmlr2005-45-reference knowledge-graph by maker-knowledge-mining

45 jmlr-2005-Learning Multiple Tasks with Kernel Methods

Source: pdf

Author: Theodoros Evgeniou, Charles A. Micchelli, Massimiliano Pontil

Abstract: We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard single-task kernel methods, such as support vector machines and regularization networks, are extended to the case of multi-task learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multi-task kernel functions we deﬁne is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Speciﬁc kernels that can be used for multi-task learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multi-task learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can signiﬁcantly outperform standard single-task learning particularly when there are many related tasks but few data per task. Keywords: multi-task learning, kernels, vector-valued functions, regularization, learning algorithms

reference text

G. M. Allenby and P. E. Rossi. Marketing models of consumer heterogeneity. Journal of Econometrics, 89, p. 57–78, 1999. R. K. Ando and T. Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Technical Report RC23462, IBM T.J. Watson Research Center, 2004. N. Arora G.M Allenby, and J. Ginter. A hierarchical Bayes model of primary and secondary demand. Marketing Science, 17,1, p. 29–44, 1998. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, pp. 337–404, 1950. B. Bakker and T. Heskes. Task clustering and gating for Bayesian multi–task learning. Journal of Machine Learning Research, 4: 83–99, 2003. J. Baxter. A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, pp. 7–39, 1997. J. Baxter. A model for inductive bias learning. Journal of Artiﬁcial Intelligence Research, 12, p. 149–198, 2000. S. Ben-David, J. Gehrke, and R. Schuller. A theoretical framework for learning from a pool of disparate data sources. Proceedings of Knowledge Discovery and Datamining (KDD), 2002. S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. Proceedings of Computational Learning Theory (COLT), 2003. L. Breiman and J.H Friedman. Predicting multivariate responses in multiple linear regression. Royal Statistical Society Series B, 1998. P. J. Brown and J. V. Zidek. Adaptive multivariate ridge regression. The Annals of Statistics, Vol. 8, No. 1, p. 64–74, 1980. R. Caruana. Multi–task learning. Machine Learning, 28, p. 41–75, 1997. F. R. K. Chung. Spectral Graph Theory CBMS Series, AMS, Providence, 1997. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000. T. Evgeniou, C. Boussios, and G. Zacharia. Generalized robust conjoint estimation. Marketing Science, 2005 (forthcoming). T. Evgeniou and M. Pontil. Regularized multi-task learning. Proceedings of the 10th Conference on ‘Knowledge Discovery and Data Mining, Seattle, WA, August 2004. 635 E VGENIOU , M ICCHELLI AND P ONTIL F. Girosi. Demographic Forecasting. PhD Thesis, Harvard University, 2003. W. Greene. Econometric Analysis. Prentice Hall, ﬁfth edition, 2002. B. Heisele, T. Serre, M. Pontil, T. Vetter, and T. Poggio. Categorization by learning and combining object parts. In Advances in Neural Information Processing Systems 14, Vancouver, Canada, Vol. 2, 1239–1245, 2002. T. Heskes. Empirical Bayes for learning to learn. Proceedings of ICML–2000, ed. Langley, P., pp. 367–374, 2000. T. Jebara. Multi-Task Feature and Kernel Selection for SVMs. International Conference on Machine Learning, ICML, July 2004. M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 1993. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semi-deﬁnite programming. Journal of Machine Learning Research, 5, pp. 27–72, 2004. G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A framework for genomic data fusion and its application to membrane protein prediction. Technical Report CSD– 03–1273, Division of Computer Science, University of California, Berkeley, 2003. O. L. Mangasarian. Nonlinear Programming. Classics in Applied Mathematics. SIAM, 1994. C. A. Micchelli and M. Pontil. Learning the kernel via regularization. Research Note RN/04/11, Dept of Computer Science, UCL, September, 2004. C. A. Micchelli and M. Pontil. On learning vector–valued functions. Neural Computation, 17, pp. 177–204, 2005. C. A. Micchelli and M. Pontil. Kernels for multi-task learning. Proc. of the 18–th Conf. on Neural Information Processing Systems, 2005. R. Rifkin, S. Mukherjee, P. Tamayo, S. Ramaswamy, C. Yeang, M. Angelo, M. Reich, T. Poggio, T. Poggio, E. Lander, T. Golub, and J. Mesirov. An analytical method for multi-class molecular cancer classiﬁcation SIAM Review, Vol. 45, No. 4, p. 706-723, 2003. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, USA, 2002. o D. L. Silver and R.E Mercer. The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connection Science, 8, p. 277–294, 1996. V. Srivastava and T. Dwivedi. Estimation of seemingly unrelated regression equations: A brief survey Journal of Econometrics, 10, p. 15–32, 1971. 636 L EARNING M ULTIPLE TASKS WITH K ERNEL M ETHODS D. M. J. Tax and R. P. W. Duin. Support vector domain description. Pattern Recognition Letters, 20 (11-13), pp. 1191–1199, 1999. S. Thrun and L. Pratt. Learning to Learn. Kluwer Academic Publishers, November 1997. S. Thrun and J. O’Sullivan. Clustering learning tasks and the selective cross–task transfer of knowledge. In Learning to Learn, S. Thrun and L. Y. Pratt Eds., Kluwer Academic Publishers, 1998. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. G. Wahba. Splines Models for Observational Data. Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990. A. Zellner. An efﬁcient method for estimating seemingly unrelated regression equations and tests for aggregation bias. Journal of the American Statistical Association, 57, p. 348–368, 1962. 637