acl acl2010 acl2010-77 acl2010-77-reference knowledge-graph by maker-knowledge-mining

77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

Source: pdf

Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao

Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1

reference text

J. Albrecht and R. Hwa. 2007. A re-examination of machine learning approaches for sentence-level mt evaluation. In Proceedings of ACL2007. M. R. Amini, P. Gallinari. 2002. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002. J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, and N. Ueffing. 2003. Confidence estimation for statistical machine translation. Johns Hopkins Summer Workshop Final Report. J. Chae and A. Nenkova. 2009. Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In Proceedings of EACL2009. G. de Chalendar, R. Besançon, O. Ferret, G. Grefenstette, and O. Mesnard. 2005. Crosslingual summarization with thematic extraction, syntactic sentence simplification, and bilingual generation. In Workshop on Crossing Barriers in Text Summarization Research, 5th International Conference on Recent Advances in Natural Language Processing (RANLP2005). C.-C. Chang and C.-J. Lin. 2001. LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm G. ErKan, D. R. Radev. LexPageRank. 2004. Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004. M. Gamon, A. Aue, and M. Smets. 2005. Sentencelevel MT evaluation without reference translations: beyond language modeling. In Proceedings of EAMT2005. D. Klein and C. D. Manning. 2002. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Proceedings of NIPS2002. J. Kupiec, J. Pedersen, F. Chen. 1995. A.Trainable Document Summarizer. In Proceedings of SIGIR1995. A. Leuski, C.-Y. Lin, L. Zhou, U. Germann, F. J. Och, E. Hovy. 2003. Cross-lingual C*ST*RD: English access to Hindi information. ACM Transactions on Asian Language Information Processing, 2(3): 245-269. J.-M. Lim, I.-S. Kang, J.-H. Lee. 2004. Multidocument summarization using cross-language texts. In Proceedings of NTCIR-4. C. Y. Lin, E. Hovy. 2000. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics. C..-Y. Lin and E.. H. Hovy. 2002. From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL-02. C.-Y. Lin and E.H. Hovy. 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of HLT-NAACL -03. C.-Y. Lin, L. Zhou, and E. Hovy. 2005. Multilingual summarization evaluation 2005: automatic evaluation report. In Proceedings of MSE (ACL-2005 Workshop). H. P. Luhn. 1969. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2). R. Mihalcea, P. Tarau. 2004. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004. R. Mihalcea and P. Tarau. 2005. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP-05. A. Nenkova and A. Louis. 2008. Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization. In Proceedings of ACL-08:HLT. A. Nenkova, R. Passonneau, and K. McKeown. 2007. The Pyramid method: incorporating human content selection variation in summarization evaluation. 925 ACM Transactions on Speech and Language Processing (TSLP), 4(2). C. Orasan, and O. A. Chiorean. 2008. Evaluation of a Crosslingual Romanian-English Multi-document Summariser. In Proceedings of 6th Language Resources and Evaluation Conference (LREC2008). P. Pingali, J. Jagarlamudi and V. Varma. 2007. Experiments in cross language query focused multidocument summarization. In Workshop on Cross Lingual Information Access Addressing the Information Need of Multilingual Societies in IJCAI2007. C. Quirk. 2004. Training a sentence-level machine translation confidence measure. In Proceedings of LREC2004. D. R. Radev, H. Y. Jing, M. Stys and D. Tam. 2004. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919-938. A. Siddharthan and K. McKeown. 2005. Improving multilingual summarization: using redundancy in the input to correct MT errors. In Proceedings of HLT/EMNLP-2005. L. Specia, Z. Wang, M. Turchi, J. Shawe-Taylor, C. Saunders. 2009. Improving the Confidence of Machine Translation Quality Estimates. In MT Summit 2009 (Machine Translation Summit XII). V. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer. X. Wan, H. Li and J. Xiao. 2010. EUSUM: extracting easy-to-understand English summaries for nonnative readers. In Proceedings of SIGIR2010. X. Wan, J. Yang and J. Xiao. 2006. Using crossdocument random walks for topic-focused multidocumetn summarization. In Proceedings of WI2006. X. Wan and J. Yang. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of SIGIR-08. X. Wan, J. Yang and J. Xiao. 2007. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL2007. K.-F. Wong, M. Wu and W. Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of COLING-08. H. Y. Zha. 2002. Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering. In Proceedings of SIGIR2002. 926