acl acl2012 acl2012-62 acl2012-62-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xinfan Meng ; Furu Wei ; Xiaohua Liu ; Ming Zhou ; Ge Xu ; Houfeng Wang
Abstract: The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English). Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language. This approach suffers from the limited coverage of vocabulary in the machine translation results. In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data. By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage signifi- cantly. Experiments on multiple data sets show that CLMM is consistently effective in two settings: (1) labeled data in the target language are unavailable; and (2) labeled data in the target language are also available.
John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Processing, page 120–128. Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, page 92–100. Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Enhanced sentiment learning using twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, page 241–249. Arthur Dempster, Nan Laird, and Donald Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), page 1–38. Kevin Duh, Akinori Fujino, and Masaaki Nagata. 2011. Is machine translation ripe for Cross-Lingual sentiment classification? In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, page 429–433, Portland, Oregon, USA, June. Association for Computational Linguistics. Michael Gamon. 2004. Sentiment classification on customer feedback data: noisy data, large feature vectors, andthe role oflinguistic analysis. InProceedings ofthe 20th international conference on Computational Linguistics, page 841. Mingqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACMSIGKDD international conference on Knowledge discovery and data mining, page 168–177. Tao Li, Yi Zhang, and Vikas Sindhwani. 2009. A nonnegative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, page 244–252, Suntec, Singapore, August. Association for Computational Linguistics. Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, page 104–1 11. Bin Lu, Chenhao Tan, Claire Cardie, and Benjamin K. Tsou. 2011. Joint bilingual sentiment classification with unlabeled parallel corpora. In Proceedings of the 49th Annual Meeting of the Association for Compu580 tational Linguistics: Human Language TechnologiesVolume 1, page 320–330. Dragos Stefan Munteanu and Daniel Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics, 3 1(4):477–504. Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine learning, 39(2): 103–134. Junfeng Pan, Gui-Rong Xue, Yong Yu, and Yang Wang. 2011. Cross-lingual sentiment classification via biview non-negative matrix tri-factorization. Advances in Knowledge Discovery and Data Mining, page 289–300. Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retr. , 2(12): 1–135, January. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL02 conference on Empirical methods in natural language processing-Volume 10, page 79–86. Peter Prettenhofer and Benno Stein. 2011. Cross-lingual adaptation using structural correspondence learning. ACM Transactions on Intelligent Systems and Technology (TIST), 3(1): 13. Yohei Seki, David Kirk Evans, Lun-Wei Ku, Hsin-Hsi Chen, Noriko Kando, and Chin-Yew Lin. 2007. Overview of opinion analysis pilot task at NTCIR-6. In Proceedings of NTCIR-6 Workshop Meeting, page 265–278. Yohei Seki, David Kirk Evans, Lun-Wei Ku, Le Sun, Hsin-Hsi Chen, Noriko Kando, and Chin-Yew Lin. 2008. Overview of multilingual opinion analysis task at NTCIR-7. In Proc. of the Seventh NTCIR Workshop. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-Based methods for sentiment analysis. Comput. Linguist., page to appear. Peter D Turney. 2002. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, page 417–424. Xiaojun Wan. 2008. Using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis. In Proceedings of the Conference on EmpiricalMethods in NaturalLanguage Processing, EMNLP ’08, page 553–561, Stroudsburg, PA, USA. Association for Computational Linguistics. Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, page 235–243. Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2): 165–210. Taras Zagibalov and John Carroll. 2008. Automatic seed word selection for unsupervised sentiment classification of chinese text. In Proceedings of the 22nd International Conference on Computational LinguisticsVolume 1, page 1073–1080. 581