jmlr jmlr2011 jmlr2011-17 jmlr2011-17-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mauricio A. Álvarez, Neil D. Lawrence
Abstract: Recently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-definite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efficient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression data. Keywords: Gaussian processes, convolution processes, efficient approximations, multitask learning, structured outputs, multivariate processes
´ Mauricio A. Alvarez and Neil D. Lawrence. Sparse convolved Gaussian processes for multi-output regression. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 57–64. MIT Press, Cambridge, MA, 2009. ´ Mauricio A. Alvarez, David Luengo, and Neil D. Lawrence. Latent Force Models. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 9–16. JMLR W&CP; 5, Clearwater Beach, Florida, 16-18 April 2009. ´ Mauricio A. Alvarez, David Luengo, Michalis K. Titsias, and Neil D. Lawrence. Efficient multioutput Gaussian processes through variational inducing kernels. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 25–32. JMLR W&CP; 9, Chia Laguna, Sardinia, Italy, 13-15 May 2010. ´ Mauricio A. Alvarez, Jan Peters, Bernhard Sch¨ lkopf, and Neil D. Lawrence. Switched latent force o models for movement segmentation. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 55–63. MIT Press, Cambridge, MA, 2011a. ´ Mauricio A. Alvarez, Lorenzo Rosasco, and Neil D. Lawrence. Kernels for vector-valued functions: a review, 2011b. Universidad Tecnol´ gica de Pereira, Massachusetts Institute of Technology and o University of Sheffield. In preparation. Bart Bakker and Tom Heskes. Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, 2003. Ronald Paul Barry and Jay M. Ver Hoef. Blackbox kriging: spatial prediction without specifying variogram models. Journal of Agricultural, Biological and Environmental Statistics, 1(3):297– 322, 1996. Edwin V. Bonilla, Kian Ming Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 153–160. MIT Press, Cambridge, MA, 2008. Phillip Boyle and Marcus Frean. Dependent Gaussian processes. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Advances in Neural Information Processing Systems 17, pages 217– e 224. MIT Press, Cambridge, MA, 2005. Michael Brookes. The matrix reference manual. Available on-line., 2005. http://www.ee.ic. ac.uk/hp/staff/dmb/matrix/intro.html. Catherine A. Calder. Exploring Latent Structure in Spatial Temporal Processes Using Process Convolutions. PhD thesis, Institute of Statistics and Decision Sciences, Duke University, Durham, NC, USA, 2003. 1496 C OMPUTATIONALLY E FFICIENT C ONVOLVED M ULTIPLE O UTPUT G AUSSIAN P ROCESSES Catherine A. Calder. Dynamic factor process convolution models for multivariate space-time data with application to air quality assessment. Environmental and Ecological Statistics, 14(3):229– 247, 2007. Catherine A. Calder and Noel Cressie. Some topics in convolution-based spatial modeling. In Proceedings of the 56th Session of the International Statistics Institute, August 2007. Rich Caruana. Multitask learning. Machine Learning, 28:41–75, 1997. Kian Ming A. Chai, Christopher K. I. Williams, Stefan Klanke, and Sethu Vijayakumar. Multi-task Gaussian process learning of robot inverse dynamics. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 265–272. MIT Press, Cambridge, MA, 2009. Noel A. C. Cressie. Statistics for Spatial Data. John Wiley & Sons (Revised edition), USA, 1993. Lehel Csat´ and Manfred Opper. Sparse representation for Gaussian process models. In Todd K. o Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, pages 444–450. MIT Press, Cambridge, MA, 2001. Theodoros Evgeniou and Massimiliano Pontil. Regularized Multi-task Learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 109–117, 2004. Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005. Montserrat Fuentes. Interpolation of nonstationary air pollution processes: a spatial spectral approach. Statistical Modelling, 2:281–298, 2002a. Montserrat Fuentes. Spectral methods for nonstationary spatial processes. Biometrika, 89(1):197– 210, 2002b. Pei Gao, Antti Honkela, Magnus Rattray, and Neil D. Lawrence. Gaussian process modelling of latent chemical species: Applications to inferring transcription factor activities. Bioinformatics, 24:i70–i75, 2008. doi: 10.1093/bioinformatics/btn278. Marc G. Genton. Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2:299–312, 2001. Harvey Goldstein. Multilevel modelling of survey data. The Statistician, 40(2):235–244, 1991. Pierre Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, USA, 1997. Michel Goulard and Marc Voltz. Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix. Mathematical Geology, 24(3):269–286, 1992. Jeffrey D. Helterbrand and Noel Cressie. Universal cokriging under intrinsic coregionalization. Mathematical Geology, 26(2):205–226, 1994. 1497 ´ A LVAREZ AND L AWRENCE Tom Heskes. Empirical Bayes for learning to learn. In P. Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning 17, pages 367–374. Morgan Kaufmann, San Francisco, CA, June 29-July 2 2000. David M. Higdon. A process-convolution approach to modeling temperatures in the north atlantic ocean. Journal of Ecological and Environmental Statistics, 5:173–190, 1998. David M. Higdon. Space and space-time modelling using process convolutions. In C. Anderson, V. Barnett, P. Chatwin, and A. El-Shaarawi, editors, Quantitative Methods for Current Environmental Issues, pages 37–56. Springer-Verlag, 2002. David M. Higdon, Jenise Swall, and John Kern. Non-stationary spatial modeling. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 761–768. Oxford University Press, 1998. Antti Honkela, Charles Girardot, E. Hilary Gustafson, Ya-Hsin Liu, Eileen E. M. Furlong, Neil D. Lawrence, and Magnus Rattray. Model-based method for transcription factor target identification with limited data. Proc. Natl. Acad. Sci., 107(17):7793–7798, 2010. Andre G. Journel and Charles J. Huijbregts. Mining Geostatistics. Academic Press, London, 1978. ISBN 0-12391-050-1. Neil D. Lawrence. Learning for larger datasets with the Gaussian process latent variable model. In Marina Meila and Xiaotong Shen, editors, AISTATS 11, pages 243–250. Omnipress, San Juan, Puerto Rico, 21-24 March 2007. Neil D. Lawrence, Matthias Seeger, and Ralf Herbrich. Fast sparse Gaussian process methods: The informative vector machine. In Sue Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 625–632. MIT Press, Cambridge, MA, 2003. Neil D. Lawrence, Guido Sanguinetti, and Magnus Rattray. Modelling transcriptional regulation using Gaussian processes. In Bernhard Sch¨ lkopf, John C. Platt, and Thomas Hofmann, editors, o Advances in Neural Information Processing Systems 19, pages 785–792. MIT Press, Cambridge, MA, 2007. Feng Liang, Kai Mao, Ming Liao, Sayan Mukherjee, and Mike West. Non-parametric Bayesian kernel models. Department of Statistical Science, Duke University, Discussion Paper 07-10. (Submitted for publication), 2009. Desmond L. Nuttall, Harvey Goldstein, Robert Prosser, and Jon Rasbash. Differential school effectiveness. International Journal of Educational Research, 13(7):769–776, 1989. Michael A. Osborne and Stephen J. Roberts. Gaussian processes for prediction. Technical report, Department of Engineering Science, University of Oxford, 2007. Michael A. Osborne, Alex Rogers, Sarvapali D. Ramchurn, Stephen J. Roberts, and Nicholas R. Jennings. Towards real-time information processing of sensor network data using computationally efficient multi-output Gaussian processes. In Proceedings of the International Conference on Information Processing in Sensor Networks (IPSN 2008), 2008. 1498 C OMPUTATIONALLY E FFICIENT C ONVOLVED M ULTIPLE O UTPUT G AUSSIAN P ROCESSES Christopher J. Paciorek and Mark J. Schervish. Nonstationary covariance functions for Gaussian process regression. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, editors, Ado vances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. Natesh S. Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee, and Robert L. Wolpert. Characterizing the function space for Bayesian kernel models. Journal of Machine Learning Research, 8:1769– 1797, 2007. Joaquin Qui˜ onero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate n Gaussian process regression. Journal of Machine Learning Research, 6:1939–1959, 2005. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. ISBN 0-262-18253-X. Matthias Seeger, Christopher K. I. Williams, and Neil D. Lawrence. Fast forward selection to speed up sparse Gaussian process regression. In Christopher M. Bishop and Brendan J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, 3–6 Jan 2003. Alexander J. Smola and Peter L. Bartlett. Sparse greedy Gaussian process regression. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, pages 619–625. MIT Press, Cambridge, MA, 2001. Edward Snelson and Zoubin Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Yair Weiss, Bernhard Sch¨ lkopf, and John C. Platt, editors, Advances in Neural Information o Processing Systems 18, pages 1257–1264. MIT Press, Cambridge, MA, 2006. Edward Snelson and Zoubin Ghahramani. Local and global sparse Gaussian process approximations. In Marina Meila and Xiaotong Shen, editors, AISTATS 11, pages 524–531, San Juan, Puerto Rico, 21-24 March 2007. Omnipress. Yee Whye Teh, Matthias Seeger, and Michael I. Jordan. Semiparametric latent factor models. In Robert G. Cowell and Zoubin Ghahramani, editors, AISTATS 10, pages 333–340. Society for Artificial Intelligence and Statistics, Barbados, 6-8 January 2005. Michalis K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In David van Dyk and Max Welling, editors, Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pages 567–574. JMLR W&CP; 5, Clearwater Beach, Florida, 16-18 April 2009. Pavel Tomancak, Amy Beaton, Richard Weiszmann, Elaine Kwan, ShengQiang Shu, Suzanna E Lewis, Stephen Richards, Michael Ashburner, Volker Hartenstein, Susan E Celniker, and Gerald M Rubin. Systematic determination of patterns of gene expression during drosophila embryogenesis. Genome Biology, 3(12):research0088.1–0088.14, 2002. Jay M. Ver Hoef and Ronald Paul Barry. Constructing and fitting models for cokriging and multivariable spatial prediction. Journal of Statistical Plannig and Inference, 69:275–294, 1998. Hans Wackernagel. Multivariate Geostatistics. Springer-Verlag Heidelberg New York, 2003. 1499 ´ A LVAREZ AND L AWRENCE Christopher K. Wikle. A kernel-based spectral model for non-Gaussian spatio-temporal processes. Statistical Modelling, 2:299–314, 2002. Christopher K. Wikle. Hierarchical Bayesian models for predicting the spread of ecological processes. Ecology, 84(6):1382–1394, 2003. Christopher K. Wikle, L. Mark Berliner, and Noel Cressie. Hierarchical Bayesian space-time models. Environmental and Ecological Statistics, 5:117–154, 1998. Christopher K. I. Williams and Matthias Seeger. Using the Nystr¨ m method to speed up kernel o machines. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682–688. MIT Press, Cambridge, MA, 2001. Ya Xue, Xuejun Liao, and Lawrence Carin. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 8:35–63, 2007. Robert P. Zinzen, Charles Girardot, Julien Gagneur, Martina Braun, and Eileen E. M. Furlong. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature, 462:65–70, 2009. 1500