emnlp emnlp2013 emnlp2013-29 emnlp2013-29-reference knowledge-graph by maker-knowledge-mining

29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning


Source: pdf

Author: Di Wang ; Chenyan Xiong ; William Yang Wang

Abstract: Chenyan Xiong School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA cx@ c s . cmu .edu William Yang Wang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA ww@ cmu .edu might not be generalizable to other domains (BenDavid et al., 2006; Ben-David et al., 2010). Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata at- tributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we propose an automatic domain partitioning approach that aims at providing better domain identities for MDL. We use a supervised clustering approach that learns the domain distance between data instances , and then cluster the data into better domains for MDL. Our experiment on real multi-domain datasets shows that using our automatically generated domain partition improves over popular MDL methods.


reference text

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems (NIPS), pages 137–144. Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79(1-2): 151–175. Mark Dredze and Koby Crammer. 2008. Online methods for multi-domain learning and adaptation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 689–697. Mark Dredze, Koby Crammer, and Fernando Pereira. 2008. Confidence-weighted linear classification. In Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML), pages 264–271 . Jenny Rose Finkel and Christopher D. Manning. 2009. Hierarchical bayesian domain adaptation. In Proceedings of the 2009 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 602–610. Hal Daum e´ III. 2007. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL). Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM), pages 219–230. Mahesh Joshi, Mark Dredze, William W. Cohen, and Carolyn Penstein Ros e´. 2012. Multi-domain learning: When do domains matter? In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (EMNLP-CoNLL), pages 1302– 1312. Mahesh Joshi, Mark Dredze, William W. Cohen, and Carolyn P. Ros e´. 2013. Whats in a domain? multidomain learning for multi-attribute data. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 685–690, Atlanta, Georgia, June. Association for Computational Linguistics. Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart J. Russell. 2002. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems (NIPS), pages 505–512.