nips nips2008 nips2008-65 nips2008-65-reference knowledge-graph by maker-knowledge-mining

65 nips-2008-Domain Adaptation with Multiple Sources


Source: pdf

Author: Yishay Mansour, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper presents a theoretical analysis of the problem of domain adaptation with multiple sources. For each source domain, the distribution over the input points as well as a hypothesis with error at most ǫ are given. The problem consists of combining these hypotheses to derive a hypothesis with small error with respect to the target domain. We present several theoretical results relating to this problem. In particular, we prove that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Our main result shows that, remarkably, for any fixed target function, there exists a distribution weighted combining rule that has a loss of at most ǫ with respect to any target mixture of the source distributions. We further generalize the setting from a single target function to multiple consistent target functions and show the existence of a combining rule with error at most 3ǫ. Finally, we report empirical results for a multiple source adaptation problem with a real-world dataset.


reference text

[1] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In Proceedings of NIPS 2006. MIT Press, 2007.

[2] Jacob Benesty, M. Mohan Sondhi, and Yiteng Huang, editors. Springer Handbook of Speech Processing. Springer, 2008.

[3] John Blitzer, Koby Crammer, A. Kulesza, Fernando Pereira, and Jennifer Wortman. Learning bounds for domain adaptation. In Proceedings of NIPS 2007. MIT Press, 2008.

[4] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL 2007, Prague, Czech Republic, 2007.

[5] Koby Crammer, Michael Kearns, and Jennifer Wortman. Learning from Data of Variable Quality. In Proceedings of NIPS 2005, 2006.

[6] Koby Crammer, Michael Kearns, and Jennifer Wortman. Learning from multiple sources. In Proceedings of NIPS 2006, 2007.

[7] Mark Dredze, John Blitzer, Pratha Pratim Talukdar, Kuzman Ganchev, Joao Graca, and Fernando Pereira. Frustratingly Hard Domain Adaptation for Parsing. In CoNLL 2007, Prague, Czech Republic, 2007.

[8] Jean-Luc Gauvain and Chin-Hui. Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing, 2(2):291–298, 1994.

[9] Frederick Jelinek. Statistical Methods for Speech Recognition. The MIT Press, 1998.

[10] Jing Jiang and ChengXiang Zhai. Instance Weighting for Domain Adaptation in NLP. In Proceedings of ACL 2007, pages 264–271, Prague, Czech Republic, 2007. Association for Computational Linguistics.

[11] C. J. Legetter and Phil C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, pages 171–185, 1995.

[12] Aleix M. Mart´nez. Recognizing imprecisely localized, partially occluded, and expression variant faces ı from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell., 24(6):748–763, 2002.

[13] S. Della Pietra, V. Della Pietra, R. L. Mercer, and S. Roukos. Adaptive language modeling using minimum discriminant estimation. In HLT ’91: Proceedings of the workshop on Speech and Natural Language, pages 103–106, Morristown, NJ, USA, 1992. Association for Computational Linguistics.

[14] Brian Roark and Michiel Bacchiani. Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of HLT-NAACL, 2003.

[15] Roni Rosenfeld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language, 10:187–228, 1996.

[16] Leslie G. Valiant. A theory of the learnable. ACM Press New York, NY, USA, 1984.

[17] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998. 8