nips nips2006 nips2006-138 nips2006-138-reference knowledge-graph by maker-knowledge-mining

138 nips-2006-Multi-Task Feature Learning


Source: pdf

Author: Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil

Abstract: We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacross-tasks representations and in the former step we learn task-specific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks. 1


reference text

[1] J.Abernethy, F. Bach, T. Evgeniou and J-P. Vert. Low-rank matrix factorization with attributes. Technical report N24/06/MM, Ecole des Mines de Paris, 2006.

[2] R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Machine Learning Research. 6: 1817–1853, 2005.

[3] B. Bakker and T. Heskes. Task clustering and gating for Bayesian multi–task learning. J. of Machine Learning Research, 4: 83–99, 2003.

[4] J. Baxter. A model for inductive bias learning. J. of Artificial Intelligence Research, 12: 149–198, 2000.

[5] S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. Proceedings of Computational Learning Theory (COLT), 2003.

[6] R. Caruana. Multi–task learning. Machine Learning, 28: 41–75, 1997.

[7] D. Donoho. For most large underdetermined systems of linear equations, the minimal l 1 -norm nearsolution approximates the sparsest near-solution. Preprint, Dept. of Statistics, Stanford University, 2004.

[8] T. Evgeniou, C.A. Micchelli and M. Pontil. Learning multiple tasks with kernel methods. J. Machine Learning Research, 6: 615–637, 2005.

[9] T. Evgeniou, M. Pontil and O. Toubia. A convex optimization approach to modeling consumer heterogeneity in conjoint estimation. INSEAD N 2006/62/TOM/DS.

[10] M. Fazel, H. Hindi and S. P. Boyd. A rank minimization heuristic with application to minimum order system approximation. Proceedings, American Control Conference, 6, 2001.

[11] T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag Series in Statistics, New York, 2001.

[12] T. Jebara. Multi-task feature and kernel selection for SVMs. Proc. of ICML 2004.

[13] P.J. Lenk, W.S. DeSarbo, P.E. Green, M.R. Young. Hierarchical Bayes conjoint analysis: recovery of partworth heterogeneity from reduced experimental designs. Marketing Science, 15(2): 173–191, 1996.

[14] C.A. Micchelli and A. Pinkus. Variational problems arising from balancing several error criteria. Rendiconti di Matematica, Serie VII, 14: 37-86, 1994.

[15] C. A. Micchelli and M. Pontil. On learning vector–valued functions. Neural Computation, 17:177–204, 2005.

[16] T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio. Theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. AI Memo No. 2005-036, MIT, Cambridge, MA, October, 2005.

[17] N. Srebro, J.D.M. Rennie, and T.S. Jaakkola. Maximum-margin matrix factorization. NIPS 2004.

[18] A. Torralba, K. P. Murphy and W. T. Freeman. Sharing features: efficient boosting procedures for multiclass object detection. Proc. of CVPR’04, pages 762–769, 2004.

[19] K. Yu, V. Tresp and A. Schwaighofer. Learning Gaussian processes from multiple tasks. Proc. of ICML 2005.

[20] J. Zhang, Z. Ghahramani and Y. Yang. Learning Multiple Related Tasks using Latent Independent Component Analysis. NIPS 2006.