nips nips2010 nips2010-138 nips2010-138-reference knowledge-graph by maker-knowledge-mining

138 nips-2010-Large Margin Multi-Task Metric Learning

Source: pdf

Author: Shibin Parameswaran, Kilian Q. Weinberger

Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (lmnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural ﬁt for multi-task learning. We evaluate the resulting multi-task lmnn on real-world insurance data and speech classiﬁcation problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classiﬁers. 1

reference text

[1] B. Bakker and T. Heskes. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, 2003.

[2] S. Ben-David, J. Gehrke, and R. Schuller. A theoretical framework for learning from a pool of disparate data sources. In KDD, pages 443–449, 2002.

[3] S. Ben-David and R. Schuller. Exploiting task relatedness for mulitple task learning. In COLT, pages 567– 580, 2003.

[4] B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classiﬁers. In Proceedings of the ﬁfth annual workshop on Computational learning theory, pages 144–152. ACM New York, NY, USA, 1992.

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[6] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[7] G. Chechik, U. Shalit, V. Sharma, and S. Bengio. An online algorithm for large scale image similarity learning. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 306–314. 2009.

[8] R. Collobert and J. Weston. A uniﬁed architecture for NLP: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167. ACM New York, NY, USA, 2008.

[9] T. Cover and P. Hart. Nearest neighbor pattern classiﬁcation. In IEEE Transactions in Information Theory, IT-13, pages 21–27, 1967.

[10] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265–292, 2002.

[11] H. Daum´ . Frustratingly easy domain adaptation. In Annual Meeting-Association for Computational Line guistics, volume 45, page 256, 2007.

[12] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. Proceedings of the 24th international conference on Machine learning, 2007.

[13] V. Digalakis, D. Rtischev, and L. Neumeyer. Fast speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. on Speech and Audio Processing, pages 357–366, 1995.

[14] T. Evgeniou, C. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(1):615, 2006.

[15] T. Evgeniou and M. Pontil. Regularized multi–task learning. In KDD, pages 109–117, 2004.

[16] M. A. Fanty and R. Cole. Spoken letter recognition. In Advances in Neural Information Processing Systems 4, page 220. MIT Press, 1990.

[17] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 513– 520, Cambridge, MA, 2005. MIT Press.

[18] I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.

[19] A. Quattoni, C. X., C. M., and D. T. A projected subgradient method for scalable multi-task learning. Massachusetts Institute of Technology, Technical Report, 2008.

[20] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. The Journal of Machine Learning Research, 10:207–244, 2009.

[21] J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In ESANN, page 219, 1999. 9