nips nips2007 nips2007-135 nips2007-135-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Edwin V. Bonilla, Kian M. Chai, Christopher Williams
Abstract: In this paper we investigate multi-task learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on input-dependent features and a “free-form” covariance matrix over tasks. This allows for good flexibility when modelling inter-task dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noise-free observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of inter-task transfer occurs. We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1
[1] Jonathan Baxter. A Model of Inductive Bias Learning. JAIR, 12:149–198, March 2000.
[2] Rich Caruana. Multitask Learning. Machine Learning, 28(1):41–75, July 1997.
[3] Edwin V. Bonilla, Felix V. Agakov, and Christopher K. I. Williams. Kernel Multi-task Learning using Task-specific Features. In Proceedings of the 11th AISTATS, March 2007.
[4] Kai Yu, Wei Chu, Shipeng Yu, Volker Tresp, and Zhao Xu. Stochastic Relational Models for Discriminative Link Prediction. In NIPS 19, Cambridge, MA, 2007. MIT Press.
[5] Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In Proceedings of the 10th AISTATS, pages 333–340, January 2005.
[6] Hao Zhang. Maximum-likelihood estimation for multivariate spatial linear coregionalization models. Environmetrics, 18(2):125–139, 2007.
[7] Hans Wackernagel. Multivariate Geostatistics: An Introduction with Applications. Springer-Verlag, Berlin, 2nd edition, 1998.
[8] A. O’Hagan. A Markov property for covariance structures. Statistics Research Report 98-13, Nottingham University, 1998.
[9] C. K. I. Williams, K. M. A. Chai, and E. V. Bonilla. A note on noise-free Gaussian process prediction with separable covariance functions and grid designs. Technical report, University of Edinburgh, 2007.
[10] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, Massachusetts, 2006.
[11] Joaquin Qui˜ onero-Candela, Carl Edward Rasmussen, and Christopher K. I. Williams. Approximation n Methods for Gaussian Process Regression. In Large Scale Kernel Machines. MIT Press, 2007. To appear.
[12] Michael E. Tipping and Christopher M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61(3):611–622, 1999.
[13] S. Thrun. Is Learning the n-th Thing Any Easier Than Learning the First? In NIPS 8, 1996.
[14] Thomas P. Minka and Rosalind W. Picard. Learning How to Learn is Learning with Point Sets. 1999.
[15] Neil D. Lawrence and John C. Platt. Learning to learn with the Informative Vector Machine. In Proceedings of the 21st International Conference on Machine Learning, July 2004.
[16] Kai Yu, Volker Tresp, and Anton Schwaighofer. Learning Gaussian Processes from Multiple Tasks. In Proceedings of the 22nd International Conference on Machine Learning, 2005.
[17] Anton Schwaighofer, Volker Tresp, and Kai Yu. Learning Gaussian Process Kernels via Hierarchical Bayes. In NIPS 17, Cambridge, MA, 2005. MIT Press.
[18] Shipeng Yu, Kai Yu, Volker Tresp, and Hans-Peter Kriegel. Collaborative Ordinal Regression. In Proceedings of the 23rd International Conference on Machine Learning, June 2006.
[19] Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. Learning Multiple Tasks with Kernel Methods. Journal of Machine Learning Research, 6:615–537, April 2005.
[20] Bart Bakker and Tom Heskes. Task Clustering and Gating for Bayesian Multitask Learning. Journal of Machine Learning Research, 4:83–99, May 2003.