acl acl2011 acl2011-326 acl2011-326-reference knowledge-graph by maker-knowledge-mining

326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization

Source: pdf

Author: Xiaojun Wan

Abstract: Cross-language document summarization is defined as the task of producing a summary in a target language (e.g. Chinese) for a set of documents in a source language (e.g. English). Existing methods for addressing this task make use of either the information from the original documents in the source language or the information from the translated documents in the target language. In this study, we propose to use the bilingual information from both the source and translated documents for this task. Two summarization methods (SimFusion and CoRank) are proposed to leverage the bilingual information in the graph-based ranking framework for cross-language summary extraction. Experimental results on the DUC2001 dataset with manually translated reference Chinese summaries show the effectiveness of the proposed methods. 1

reference text

α for CoRank A. Aker, T. Cohn, and R. Gaizauskas. 2010. Multidocument summarization using A* search and discriminative training. In Proceedings of EMNLP2010. M. R. Amini, P. Gallinari. 2002. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002. G. de Chalendar, R. Besançon, O. Ferret, G. Grefenstette, and O. Mesnard. 2005. Crosslingual summarization with thematic extraction, syntac- tic sentence simplification, and bilingual generation. In Workshop on Crossing Barriers in Text Summarization Research, 5th International Conference on Recent Advances in Natural Language Processing (RANLP2005). A. Celikyilmaz and D. Hakkani-Tur. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of ACL2010. G. ErKan, D. R. Radev. LexPageRank. 2004. Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004. D. Klein and C. D. Manning. 2002. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Proceedings of NIPS2002. J. Kupiec, J. Pedersen, F. Chen. 1995. A.Trainable Document Summarizer. In Proceedings of SIGIR1995. A. Leuski, C.-Y. Lin, L. Zhou, U. Germann, F. J. Och, E. Hovy. 2003. Cross-lingual C*ST*RD: English access to Hindi information. ACM Transactions on Asian Language Information Processing, 2(3): 245-269. J.-M. Lim, I.-S. Kang, J.-H. Lee. 2004. Multidocument summarization using cross-language texts. In Proceedings of NTCIR-4. C. Y. Lin, E. Hovy. 2000. The Automated Acquisi- tion of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics. C.-Y. Lin and E.H. Hovy. 2003. Automatic Evaluation of Summaries Using N-gram Cooccurrence Statistics. In Proceedings of HLTNAACL -03. 1554 C.-Y. Lin, L. Zhou, and E. Hovy. 2005. Multilingual summarization evaluation 2005: automatic evaluation report. In Proceedings of MSE (ACL2005 Workshop). M. Litvak, M. Last, and M. Friedman. 2010. A new approach to improving multilingual summarization using a genetic algorithm. In Proceedings of ACL2010. H. P. Luhn. 1969. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2). R. Mihalcea, P. Tarau. 2004. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004. R. Mihalcea and P. Tarau. 2005. A language independent algorithm for single and multiple docu- ment summarization. IJCNLP-05. In Proceedings of A. Nenkova and A. Louis. 2008. Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization. In Proceedings of ACL-08:HLT. A. Nenkova, R. Passonneau, and K. McKeown. 2007. The Pyramid method: incorporating human content selection variation in summarization evaluation. ACM Transactions on Speech and Language Processing (TSLP), 4(2). C. Orasan, and O. A. Chiorean. 2008. Evaluation of a Crosslingual Romanian-English Multidocument Summariser. In Proceedings of 6th Language Resources and Evaluation Conference (LREC2008). P. Pingali, J. Jagarlamudi and V. Varma. 2007. Experiments in cross language query focused multi-document summarization. In Workshop on Cross Lingual Information Access Addressing the Information Need of Multilingual Societies in IJCAI2007. E. Pitler, A. Louis, and A. Nenkova. 2010. Automatic evaluation of linguistic quality in multidocument summarization. In Proceedings of ACL2010. Radev, H. Y. Jing, M. Stys and D. Tam. 2004. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919-938. D. R. A. Siddharthan and K. McKeown. 2005. Improving multilingual summarization: using redundancy in the input to correct MT errors. In Proceedings of HLT/EMNLP-2005. X. Wan, H. Li and J. Xiao. 2010. Cross-language document summarization based on machine translation quality prediction. In Proceedings of ACL2010. X. Wan, J. Yang and J. Xiao. 2006. Using crossdocument random walks for topic-focused multi-documetn summarization. In Proceedings of WI2006. X. Wan and J. Yang. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of SIGIR-08. X. Wan, J. Yang and J. Xiao. 2007. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL2007. K.-F. Wong, M. Wu and W. Li. 2008. Extractive summarization using supervised and semisupervised learning. In Proceedings of COLING-08. H. Y. Zha. 2002. Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In Proceedings of SIGIR2002. 1555