acl acl2012 acl2012-120 acl2012-120-reference knowledge-graph by maker-knowledge-mining

120 acl-2012-Information-theoretic Multi-view Domain Adaptation

Source: pdf

Author: Pei Yang ; Wei Gao ; Qi Tan ; Kam-Fai Wong

Abstract: We use multiple views for cross-domain document classification. The main idea is to strengthen the views’ consistency for target data with source training data by identifying the correlations of domain-specific features from different domains. We present an Information-theoretic Multi-view Adaptation Model (IMAM) based on a multi-way clustering scheme, where word and link clusters can draw together seemingly unrelated domain-specific features from both sides and iteratively boost the consistency between document clusterings based on word and link views. Experiments show that IMAM significantly outperforms state-of-the-art baselines.

reference text

Steven Abney. 2002. Bootstrapping. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 360-367. John Blitzer, Mark Dredze and Fernado Pereira. 2007. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440-447. John Blitzer, Sham Kakade and Dean P. Foster. 2011. Domain Adaptation with Coupled Subspaces. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 173181. Avrim Blum and Tom Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 92-100. Minmin Chen, Killian Q. Weinberger and John Blitzer. 2011. Co-Training for Domain Adaptation. In Proceedings of NIPS, pages 1-9. Wenyuan Dai, Gui-Rong Xue, Qiang Yang and Yong Yu. 2007. Co-clustering Based Classification for Outof-domain Documents. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 210-219. Sanjoy Dasgupta, Michael L. Littman and David McAllester. 2001. PAC Generalization Bounds for Co-Training. In Proceeding of NIPS, pages 375-382. Hal Daum e´ III and Daniel Marcu. 2006. Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research, 26(2006): 101-126. Inderjit S. Dhillon, Subramanyam Mallela and Dharmendra S. Modha. 2003. Information-Theoretic Coclustering. In Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 210-219. Thorsten Joachims. 1999. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of Sixteenth International Conference on Machine Learning, pages 200-209. Jing Jiang and Chengxiang Zhai. 2007. Instance Weighting for Domain Adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 264-271. Andrew K. McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. 2000. Automating the Construction of Internet Portals with Machine Learning. Information Retrieval, 3(2): 127-163. Stephan R ¨uping and Tobias Scheffer. 2005. Learning with Multiple Views. In Proceedings of ICML Workshop on Learning with Multiple Views. 274 Kanoksri Sarinnapakorn and Miroslav Kubat. 2007. Combining Sub-classifiers in Text Categorization: A DST-Based Solution and a Case Study. IEEE Transactions Knowledge and Data Engineering, 19(12): 16381651. Karthik Sridharan and Sham M. Kakade. 2008. An Information Theoretic Framework for Multi-view Learning. In Proceedings of the 21st Annual Conference on Computational Learning Theory, pages 403-414. Dan Zhang, Jingrui He, Yan Liu, Luo Si and Richard D. Lawrence. 2011. Multi-view Transfer Learning with a Large Margin Approach. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1208-1216.