nips nips2011 nips2011-51 nips2011-51-reference knowledge-graph by maker-knowledge-mining

51 nips-2011-Clustered Multi-Task Learning Via Alternating Structure Optimization


Source: pdf

Author: Jiayu Zhou, Jianhui Chen, Jieping Ye

Abstract: Multi-task learning (MTL) learns multiple related tasks simultaneously to improve generalization performance. Alternating structure optimization (ASO) is a popular MTL method that learns a shared low-dimensional predictive structure on hypothesis spaces from multiple related tasks. It has been applied successfully in many real world applications. As an alternative MTL approach, clustered multi-task learning (CMTL) assumes that multiple tasks follow a clustered structure, i.e., tasks are partitioned into a set of groups where tasks in the same group are similar to each other, and that such a clustered structure is unknown a priori. The objectives in ASO and CMTL differ in how multiple tasks are related. Interestingly, we show in this paper the equivalence relationship between ASO and CMTL, providing significant new insights into ASO and CMTL as well as their inherent relationship. The CMTL formulation is non-convex, and we adopt a convex relaxation to the CMTL formulation. We further establish the equivalence relationship between the proposed convex relaxation of CMTL and an existing convex relaxation of ASO, and show that the proposed convex CMTL formulation is significantly more efficient especially for high-dimensional data. In addition, we present three algorithms for solving the convex CMTL formulation. We report experimental results on benchmark datasets to demonstrate the efficiency of the proposed algorithms. 1


reference text

[1] T. Evgeniou, M. Pontil, and O. Toubia. A convex optimization approach to modeling consumer heterogeneity in conjoint estimation. Marketing Science, 26(6):805–818, 2007.

[2] R.K. Ando. Applying alternating structure optimization to word sense disambiguation. In Proceedings of the Tenth Conference on Computational Natural Language Learning, pages 77–84, 2006.

[3] A. Torralba, K.P. Murphy, and W.T. Freeman. Sharing features: efficient boosting procedures for multiclass object detection. In Computer Vision and Pattern Recognition, 2004, IEEE Conference on, volume 2, pages 762–769, 2004.

[4] J. Baxter. A model of inductive bias learning. J. Artif. Intell. Res., 12:149–198, 2000.

[5] R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6:1817–1853, 2005.

[6] S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. Lecture notes in computer science, pages 567–580, 2003.

[7] S. Bickel, J. Bogojeska, T. Lengauer, and T. Scheffer. Multi-task learning for hiv therapy screening. In Proceedings of the 25th International Conference on Machine Learning, pages 56–63. ACM, 2008.

[8] T. Evgeniou, C.A. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(1):615, 2006.

[9] A. Argyriou, C.A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. Advances in Neural Information Processing Systems, 20:25–32, 2008.

[10] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[11] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on EMNLP, pages 120–128, 2006.

[12] A. Quattoni, M. Collins, and T. Darrell. Learning visual representations using images with captions. In Computer Vision and Pattern Recognition, 2007. IEEE Conference on, pages 1–8. IEEE, 2007.

[13] J. Chen, L. Tang, J. Liu, and J. Ye. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 137–144. ACM, 2009.

[14] S. Thrun and J. O’Sullivan. Clustering learning tasks and the selective cross-task transfer of knowledge. Learning to learn, pages 181–209, 1998.

[15] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. The Journal of Machine Learning Research, 4:83–99, 2003.

[16] Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multi-task learning for classification with dirichlet process priors. The Journal of Machine Learning Research, 8:35–63, 2007.

[17] L. Jacob, F. Bach, and J.P. Vert. Clustered multi-task learning: A convex formulation. Arxiv preprint arXiv:0809.2085, 2008.

[18] F. Wang, X. Wang, and T. Li. Semi-supervised multi-task learning with task regularizations. In Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on, pages 562–568. IEEE, 2009.

[19] C. Ding and X. He. K-means clustering via principal component analysis. In Proceedings of the twentyfirst International Conference on Machine learning, page 29. ACM, 2004.

[20] H. Zha, X. He, C. Ding, M. Gu, and H. Simon. Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems, 2:1057–1064, 2002.

[21] K. Fan. On a theorem of Weyl concerning eigenvalues of linear transformations I. Proceedings of the National Academy of Sciences of the United States of America, 35(11):652, 1949.

[22] J. Nocedal and S.J. Wright. Numerical optimization. Springer verlag, 1999.

[23] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.

[24] Y. Nesterov. Gradient methods for minimizing composite objective function. ReCALL, 76(2007076), 2007.

[25] S.P. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.

[26] J. Gauvin and F. Dubeau. Differential properties of the marginal function in mathematical programming. Optimality and Stability in Mathematical Programming, pages 101–119, 1982.

[27] M. Wu, B. Sch¨ lkopf, and G. Bakır. A direct method for building sparse kernel learning algorithms. The o Journal of Machine Learning Research, 7:603–624, 2006.

[28] T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 109–117. ACM, 2004. 9