nips nips2010 nips2010-147 nips2010-147-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yi Zhang, Jeff G. Schneider
Abstract: In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks.
[1] R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, 2005.
[2] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In NIPS, 2006.
[3] A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. In NIPS, 2007.
[4] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, 2003.
[5] O. Banerjee, L. E. Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J. Mach. Learn. Res., 9:485–516, 2008.
[6] J. Baxter. Learning Internal Representations. In COLT, pages 311–320, 1995.
[7] E. Bonilla, K. M. Chai, and C. Williams. Multi-task gaussian process prediction. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, NIPS, pages 153–160. 2008.
[8] E. V. Bonilla, F. V. Agakov, and C. K. I. Williams. Kernel multi-task learning using task-specific features. In AISTATS, 2007.
[9] P. J. Brown and M. Vannucci. Multivariate Bayesian Variable Selection and Prediction. Journal of the Royal Statistical Soceity, Series B, 60(3):627–641, 1998.
[10] D. Cai, X. He, J. Han, and H. Zhang. Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 15(11):3608–3614, 2006.
[11] R. Caruana. Multitask Learning. Machine Learning, 28:41–75, 1997.
[12] J. Chen, L. Tang, J. Liu, and J. Ye. A Convex Formulation for Learning Shared Structures from Multiple Tasks. In ICML, 2009.
[13] A. P. Dawid. Some matrix-variate distribution theory: Notational considerations and a bayesian application. Biometrika, 68(1):265–274, 1981.
[14] A. P. Dempster. Covariance selection. Biometrics, 1972.
[15] J. Duchi, S. Gould, and D. Koller. Projected subgradient methods for learning sparse gaussians. In Proceedings of the Twenty-fourth Conference on Uncertainty in AI (UAI), 2008.
[16] P. Dutilleul. The MLE Algorithm for the Matrix Normal Distribution. J. Statist. Comput. Simul., 64:105– 123, 1999.
[17] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2007.
[18] A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman Hall, 1999.
[19] B. Hariharan, S. Vishwanathan, and M. Varma. Large scale max-margin multi-label classification with priors. In ICML, 2010.
[20] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[21] L. Jacob, F. Bach, and J. P. Vert. Clustered multi-task learning: A convex formulation. In NIPS, pages 745–752, 2008.
[22] J. Nocedal and S. Wright. Numerical Optimization. Springer, 2000.
[23] G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 2009.
[24] S. Thrun and J. O’Sullivan. Discovering Structure in Multiple Learning Tasks: The TC Algorithm. In ICML, pages 489–497, 1996.
[25] L. Vandenberghe, S. Boyd, and S.-P. Wu. Determinant maximization with linear matrix inequality constraints. SIAM Journal on Matrix Analysis and Applications, 19:499–533, 1996.
[26] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research, 8:35–63, 2007.
[27] K. Yu, W. Chu, S. Yu, V. Tresp, and Z. Xu. Stochastic relational models for discriminative link prediction. In NIPS, pages 1553–1560, 2007.
[28] K. Yu, J. Lafferty, S. Zhu, and Y. Gong. Large-scale collaborative prediction using a nonparametric random effects model. In ICML, pages 1185–1192, 2009.
[29] S. Yu, V. Tresp, and K. Yu. Robust multi-task learning with t-processes. In ICML, page 1103, 2007.
[30] J. Zhang, Z. Ghahramani, and Y. Yang. Learning multiple related tasks using latent independent component analysis. In NIPS, pages 1585–1592, 2006. 9