jmlr jmlr2012 jmlr2012-1 jmlr2012-1-reference knowledge-graph by maker-knowledge-mining

1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach


Source: pdf

Author: Grigorios Skolidis, Guido Sanguinetti

Abstract: We propose a novel model for meta-generalisation, that is, performing prediction on novel tasks based on information from multiple different but related tasks. The model is based on two coupled Gaussian processes with structured covariance function; one model performs predictions by learning a constrained covariance function encapsulating the relations between the various training tasks, while the second model determines the similarity of new tasks to previously seen tasks. We demonstrate empirically on several real and synthetic data sets both the strengths of the approach and its limitations due to the distributional assumptions underpinning it. Keywords: transfer learning, meta-generalising, multi-task learning, Gaussian processes, mixture of experts


reference text

J. H. Albert and S. Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669–679, 1993. R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6:1817–1853, 2005. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. A. Arnold, R. Nallapati, and W.W. Cohen. A comparative study of methods for transductive transfer learning. In Proceedings of the 7th IEEE International Conference on Data Mining Workshops, pages 77–82, Omaha, Nebraska, USA, 2007. B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83–99, 2003. J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12: 149–198, 2000. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems 19, pages 137–145, Vancouver, Canada, 2007. 718 A C ASE S TUDY ON M ETA -G ENERALISING : A G AUSSIAN P ROCESSES A PPROACH S. Ben-David, T. Luu, T. Lu, and D. P´ l. Impossibility theorems for domain adaptation. In Proa ceedings of the 13th International Workshop on Artificial Intelligence and Statistics, volume 13, pages 129–136, Sardinia, Italy, 2010. S. Bickel, M. Br¨ ckner, and T. Scheffer. Discriminative learning under covariate shift. The Journal u of Machine Learning Research, 10:2137–2155, 2009. E. Bonilla, K. M. Chai, and C.K.I. Williams. Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems 20, pages 153–160, Vancouver, Canada, 2008. K.H. Brodersen, C.S. Ong, K.E. Stephan, and J.M. Buhmann. The binormal assumption on precision-recall curves. In Proceedings of the 2010 International Conference on Pattern Recognition, pages 4263–4266, Istanbul, Turkey, 2010. R. Caruana. Multi-task learning. Machine Learning, 28(1):41–75, 1997. O. Chapelle, B. Sch¨ lkopf, and A. Zien. Semi-supervised learning. MIT Press, Cambridge, MA, o 2006. K. Crammer, M. Kearns, and J. Wortman. Learning from multiple sources. The Journal of Machine Learning Research, 9:1757–1774, 2008. N. A.C. Cressie. Statistics for Spatial Data. John Wiley & Sons. New York. US, 1993. L. Csat´ , E. Fokou´ , M. Opper, B. Schottky, and O. Winther. Efficient approaches to gaussian o e process classification. In Advances in Neural Information Processing Systems 12, pages 251– 257, Denver, Colorado, 2000. H. Daum´ . Frustratingly easy domain adaptation. In Annual Meeting of the Association for Come putational Linguistics, volume 45, pages 256–263, 2007. H. Daum´ III. Bayesian multitask learning with latent hierarchies. In Proceedings of the 25th e Conference on Uncertainty in Artificial Intelligence, pages 135–142, Montreal, Canada, 2009. H. Daum´ III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial e Intelligence Research, 26(1):101–126, 2006. J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine Learning, pages 233–240, Pittsburgh, USA, 2006. I. Deak. Three digit accurate multiple normal probabilities. Numerische Mathematik, 35(4):369– 380, 1980. H. I. Gassmann, I. Deak, and T. Szantai. Computing multivariate normal probabilities: A new look. Journal of Computational and Graphical Statistics, 11(4):920–949, 2002. A. Genz. Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1(2):141–149, 1992. 719 S KOLIDIS AND S ANGUINETTI M. Girolami and S. Rogers. Variational bayesian multinomial probit regression with gaussian process priors. Neural Computation, 18(8):1790–1817, 2006. M. Girolami and M. Zhong. Data integration for classification problems employing Gaussian process priors. In Advances in Neural Information Processing Systems 19, pages 465–472, Vancouver, Canada, 2007. A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23): 215–220”, 2000. A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC, 2000. J. A. Hanley and B. J. Mcneil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36, April 1982. G.E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1-3):185–234, 1989. J. Huang, A. J. Smola, A. Gretton, K M. Borgwardt, and B. Sch¨ lkopf. Correcting sample selection o bias by unlabeled data. In Advances in Neural Information Processing Systems 19, pages 601– 608, Vancouver, Canada, 2007. R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991. L.I. Kuncheva. A theoretical study on six classifier fusion strategies. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2):281–286, 2002. Q. Liu, X. Liao, H. Li, J. R. Stack, and L. Carin. Semisupervised multitask learning. IEEE Transactions on Pattern Analysis Machine Intelligence, 31(6):1074–1086, 2009. D.J.C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems 21, pages 1041–1048, Vancouver, Canada, 2009. T.P. Minka. Expectation propagation for approximate bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, volume 17, pages 362–369, San Francisco, CA, USA, 2001. M. Opper and O. Winther. Gaussian processes for classification: mean-field algorithms. Neural Computation, 12(11):2655–2684, 2000. S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010. S.J. Pan, I.W. Tsang, J.T. Kwok, and Q. Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199–210, 2009. 720 A C ASE S TUDY ON M ETA -G ENERALISING : A G AUSSIAN P ROCESSES A PPROACH R. Raina, A. Battle, H. Lee, B. Packer, and A.Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, pages 759–766, Corvallis, OR, USA, 2007. C. E. Rasmussen and C. K.I. Williams. Gaussian Processes for Machine Learning. MIT press, 2005. C.E. Rasmussen and Z. Ghahramani. Infinite mixtures of gaussian process experts. In Advances in Neural Information Processing Systems 14, pages 881–888, Vancouver, Canada, 2001. R. Rebonato and P. J¨ ckel. The most general methodology to create a valid correlation matrix for a risk management and option pricing purposes. Journal of Risk, 2(2), 2000. G. Skolidis and G. Sanguinetti. Bayesian multitask classification with gaussian process priors. IEEE Transactions on Neural Networks, 22(12):2011 –2021, Dec. 2011. G. Skolidis, RH Clayton, and G. Sanguinetti. Automatic classification of arrhythmic beats using gaussian processes. In IEEE Transactions on Computers in Cardiology, 2008, pages 921–924, Bologna, Italy, 2008. E. Snelson and Z. Ghahramani. Sparse gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems 18, pages 1257–1264, Vancouver, Canada, 2006. A. J. Storkey and M. Sugiyama. Mixture regression for covariate shift. In Advances in Neural Information Processing Systems 19, pages 1337–1344, Vancouver, Canada, 2007. M. Sugiyama, M. Krauledat, and K.R. M¨ ller. Covariate shift adaptation by importance weighted u cross validation. The Journal of Machine Learning Research, 8:985–1005, 2007. Volker Tresp. Mixtures of gaussian processes. In Advances in Neural Information Processing Systems 13, pages 654–660, Vancouver, Canada, 2000. MIT Press. S.R. Waterhouse. Classification and Regression Using Mixtures of Experts. PhD thesis, Department of Engineering, Cambridge University, 1997. Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. The Journal of Machine Learning Research, 8:35–63, 2007. K. Yu, V. Tresp, and A. Schwaighofer. Learning gaussian processes from multiple tasks. In Proceedings of the 22nd International Conference on Machine Learning, pages 1012–1019, Bonn, Germany, 2005. 721