nips nips2011 nips2011-53 nips2011-53-reference knowledge-graph by maker-knowledge-mining

53 nips-2011-Co-Training for Domain Adaptation


Source: pdf

Author: Minmin Chen, Kilian Q. Weinberger, John Blitzer

Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance. 1


reference text

[1] R.K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6:1817–1853, 2005.

[2] M.F. Balcan, A. Blum, and K. Yang. Co-training and expansion: Towards bridging theory and practice. NIPS, 17:89–96, 2004.

[3] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and Jenn Wortman. A theory of learning from different domains. Machine Learning, 2009.

[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics, Prague, Czech Republic, 2007.

[5] J. Blitzer, D. Foster, and S. Kakade. Domain adaptation with coupled subspaces. In Conference on Artificial Intelligence and Statistics, Fort Lauterdale, 2011.

[6] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128. Association for Computational Linguistics, 2006.

[7] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, page 100. ACM, 1998.

[8] R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.

[9] O. Chapelle, P. Shivaswamy, S. Vadrevu, K.Q. Weinberger, Y. Zhang, and B. Tseng. Multi-task learning for boosting with application to web search ranking. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pages 1189–1198, New York, NY, USA, 2010. ACM.

[10] M. Chen, K.Q. Weinberger, and Y. Chen. Automatic Feature Decomposition for Single View Co-training. In International Conference on Machine Learning, 2011.

[11] H. Daume III. Frustratingly easy domain adaptation. In Association for Computational Linguistics, 2007.

[12] T. Evgeniou, C.A. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(1):615, 2006.

[13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Verlag, New York, 2009.

[14] J. Huang, A.J. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In NIPS 19, pages 601–608. MIT Press, Cambridge, MA, 2007.

[15] H. Daume III, A. Kumar, and A. Saha. Co-regularization based semi-supervised domain adaptation. In NIPS 23, pages 478–486. MIT Press, 2010.

[16] J. Jiang and C.X. Zhai. Instance weighting for domain adaptation in nlp. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 264–271, Prague, Czech Republic, June 2007. Association for Computational Linguistics.

[17] Qian Liu, Aaron Mackey, David Roos, and Fernando Pereira. Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics, 2008.

[18] T. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In NIPS 21, pages 1041–1048. MIT Press, 2009.

[19] D. McClosky, E. Charniak, and M. Johnson. Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 337–344. Association for Computational Linguistics, 2006.

[20] K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86–93. ACM, 2000.

[21] S. Parameswaran and K.Q. Weinberger. Large margin multi-task metric learning. In NIPS 23, pages 1867–1875. 2010.

[22] J.C. Platt et al. Probabilities for sv machines. NIPS, pages 61–74, 1999.

[23] K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. Computer Vision–ECCV 2010, pages 213–226, 2010.

[24] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523, 1988.

[25] S. Satpal and S. Sarawagi. Domain adaptation of conditional probability models via feature subsetting. Knowledge Discovery in Databases: PKDD 2007, pages 224–235, 2007.

[26] B. Settles. Active learning literature survey. Machine Learning, 15(2):201–221, 1994.

[27] K.Q. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1113–1120. ACM, 2009.

[28] G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classication. In SIGIR, 2008. 9