nips nips2012 nips2012-181 nips2012-181-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Koby Crammer, Yishay Mansour
Abstract: In this work we consider a setting where we have a very large number of related tasks with few examples from each individual task. Rather than either learning each task individually (and having a large generalization error) or learning all the tasks together using a single hypothesis (and suffering a potentially large inherent error), we consider learning a small pool of shared hypotheses. Each task is then mapped to a single hypothesis in the pool (hard association). We derive VC dimension generalization bounds for our model, based on the number of tasks, shared hypothesis and the VC dimension of the hypotheses class. We conducted experiments with both synthetic problems and sentiment of reviews, which strongly support our approach. 1
[1] Yonatan Amit, Michael Fink, Nathan Srebro, and Shimon Ullman. Uncovering shared structures in multiclass classification. In ICML, pages 17–24, 2007.
[2] Rie Kubota Ando and Tong Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853, 2005.
[3] Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
[4] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
[5] Bart Bakker and Tom Heskes. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research, 4:83–99, 2003.
[6] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151–175, 2010.
[7] Gilles Blanchard, Gyemin Lee, and Clay Scott. Generalizing from several related classification tasks to a new unlabeled sample. In NIPS, 2011.
[8] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics (ACL), 2007.
[9] Edwin V. Bonilla, Felix V. Agakov, and Christopher K. I. Williams. Kernel multi-task learning using task-specific features. Journal of Machine Learning Research - Proceedings Track, 2:43– 50, 2007.
[10] Koby Crammer, Michael Kearns, and Jennifer Wortman. Learning from multiple sources. Journal of Machine Learning Research, 9:1757–1774, 2008.
[11] Koby Crammer, Michael J. Kearns, and Jennifer Wortman. Learning from data of variable quality. In NIPS, 2005.
[12] Theodoros Evgeniou, Charles A. Micchelli, and Massimiliano Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005.
[13] Theodoros Evgeniou and Massimiliano Pontil. Regularized multi–task learning. In KDD, pages 109–117, 2004.
[14] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998. To appear, Machine Learning.
[15] Hal Daum´ III. Frustratingly easy domain adaptation. In ACL, 2007. e
[16] Hal Daum´ III. Bayesian multitask learning with latent hierarchies. In UAI, pages 135–142, e 2009.
[17] Neil D. Lawrence and John C. Platt. Learning to learn with the informative vector machine. In ICML, 2004.
[18] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. In NIPS, pages 1041–1048, 2008.
[19] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation: Learning bounds and algorithms. In COLT, 2009.
[20] Guillaume Obozinski, Ben Taskar, and Michael I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2):231– 252, 2010.
[21] Kai Yu, Volker Tresp, and Anton Schwaighofer. Learning gaussian processes from multiple tasks. In ICML, pages 1012–1019, 2005.
[22] Shipeng Yu, Volker Tresp, and Kai Yu. Robust multi-task learning with t-processes. In ICML, pages 1103–1110, 2007. 9