nips nips2008 nips2008-47 nips2008-47-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Laurent Jacob, Jean-philippe Vert, Francis R. Bach
Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non-convex methods dedicated to the same problem. 1
[1] G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
[2] F. Girosi, M. Jones, and T. Poggio. Regularization Theory and Neural Networks Architectures. Neural Comput., 7(2):219–269, 1995.
[3] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Stat. Soc. B., 58:267–288, 1996.
[4] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res., 4:83–99, 2003.
[5] T. Evgeniou, C. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. J. Mach. Learn. Res., 6:615–637, 2005.
[6] J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert. Low-rank matrix factorization with attributes. Technical Report cs/0611124, arXiv, 2006.
[7] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In B. Sch¨ lkopf, J. Platt, and o T. Hoffman, editors, Adv. NIPS 19, pages 41–48, Cambridge, MA, 2007. MIT Press.
[8] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res., 5:27–72, 2004.
[9] M. Deodhar and J. Ghosh. A framework for simultaneous co-clustering and learning from complex data. In KDD ’07, pages 250–259, New York, NY, USA, 2007. ACM.
[10] B. Peters, H.-H Bui, S. Frankild, M. Nielson, C. Lundegaard, E. Kostem, D. Basch, K. Lamberth, M. Harndahl, W. Fleri, S. S Wilson, J. Sidney, O. Lund, S. Buus, and A. Sette. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol, 2(6):e65, 2006.
[11] D. Heckerman, D. Kadie, and J. Listgarten. Leveraging information across HLA alleles/supertypes improves epitope prediction. J. Comput. Biol., 14(6):736–746, 2007.
[12] L. Jacob and J.-P. Vert. Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics, 24(3):358–366, Feb 2008. 8