acl acl2012 acl2012-178 acl2012-178-reference knowledge-graph by maker-knowledge-mining

178 acl-2012-Sentence Simplification by Monolingual Machine Translation

Source: pdf

Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer

Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

reference text

Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In ACL ’05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597–604, Morristown, NJ, USA. Association for Computational Linguistics. Chris Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 196–205, Stroudsburg, PA, USA. Association for Computational Linguistics. Yvonne Canning, John Tait, Jackie Archibald, and Ros Crawley. 2000. Cohesive regeneration of syntactically simplified newspaper text. In Proceedings ofROMAND 2000, Lausanne. John Carroll, Guido Minnen, Yvonne Canning, Siobhan Devlin, and John Tait. 1998. Practical simplification of English newspaper text to assist aphasic readers. In AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, Madison, Wisconsin. John Carroll, Guido Minnen, Darren Pearce, Yvonne Canning, Siobhan Devlin, and John Tait. 1999. Simplifying text for language-impaired readers. In Proceedings of EACL’99, Bergen. ACL. R. Chandrasekar and B. Srinivas. 1997. Automatic rules for text simplification. Knowledge-Based Systems, 10: 183–190. Raman Chandrasekar, Christine Doran, and Bangalore Srinivas. 1996. Motivations and methods for text simplification. In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING’96), pages 1041–1044. David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, and Michael Subotin. 2005. The hiero machine translation system: extensions, evaluation, and analysis. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 779– 786, Stroudsburg, PA, USA. Association for Computational Linguistics. Will Coster and David Kauchak. 2011. Learning to simplify sentences using wikipedia. In Proceedings of the Workshop on Monolingual Text-To-Text Generation, pages 1–9, Portland, Oregon, June. Association for Computational Linguistics. Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. MBT: A Memory-Based Part of Speech Tagger-Generator. In Proc. of Fourth Workshop on Very Large Corpora, pages 14–27. ACL SIGDAT. Walter Daelemans, Anja Hothker, and Erik Tjong Kim Sang. 2004. Automatic sentence simplification 1023 for subtitling in dutch and english. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 1045–1048. George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, HLT ’02, pages 138–145, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. The MIT Press, May. Katja Filippova and Michael Strube. 2008. Sentence fusion via dependency graph compression. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 177–185, Honolulu, Hawaii, October. Association for Computational Linguistics. Kentaro Inui, Atsushi Fujita, Tetsuro Takahashi, Ryu Iida, and Tomoya Iwakura. 2003. Text simplification for reading assistance: A project note. In Proceedings of the Second International Workshop on Paraphrasing, pages 9–16, Sapporo, Japan, July. Association for Computational Linguistics. Kevin Knight and Daniel Marcu. 2000. Statistics-based summarization step one: Sentence compression. In – Proceedings of the 1 National Conference on Ar7th tificial Intelligence (AAAI), pages 703 710, Austin, Texas, USA, July 30 – August 3. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris C. Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL. The Association for Computer Linguistics. V. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8):707–710. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, Jonathan Weese, and Omar F. Zaidan. 2009. Joshua: an open source toolkit for parsingbased machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 135–139, Stroudsburg, PA, USA. Association for Computational Linguistics. Nitin Madnani, Necip Fazil Ayan, Philip Resnik, and Bonnie J. Dorr. 2007. Using paraphrases for parameter tuning in statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 120–127, Stroudsburg, PA, USA. Association for Computational – Linguistics. Rani Nelken and Stuart M. Shieber. 2006. Towards robust context-sensitive sentence alignment for monolingual corpora. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), Trento, Italy, 3–7 April. Franz J. Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1): 19–51, March. Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual machine translation for paraphrase generation. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 142–149, Barcelona, Spain, July. Association for Computational Linguistics. Advaith Siddharthan. 2002. An architecture for a text simplification system. In Language Engineering Conference, page 64. IEEE Computer Society. David A. Smith and Jason Eisner. 2006. Quasisynchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of the HLTNAACL Workshop on Statistical Machine Translation, pages 23–30, New York, June. Andreas Stolcke. 2002. SRILM - An Extensible Language Modeling Toolkit. In In Proc. Int. Conf. on Spoken Language Processing, pages 901–904, Denver, Colorado. D. Vickrey and D. Koller. 2008. Sentence simplification for semantic role labeling. In Proceedings of the 46th Meeting of the Association for Computational Linguistics: Human Language Technologies. Willian Massami Watanabe, Arnaldo Candido Junior, Vincius Rodriguez de Uzłda, Renata Pontin de Mattos Fortes, Thiago Alexandre Salgueiro Pardo, and Sandra M. Alusio. 2009. Facilita: reading assistance for low-literacy readers. In Brad Mehlenbacher, Aristidis Protopsaltis, Ashley Williams, and Shaun Slattery, editors, SIGDOC, pages 29–36. ACM. Kristian Woodsend and Mirella Lapata. 2011. Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 409–420, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Sander Wubben, Antal van den Bosch, and Emiel Krahmer. 2010. Paraphrase generation as monolingual translation: data and evaluation. In Proceedings of the 6th International Natural Language Generation Conference, INLG ’ 10, pages 203–207, Stroudsburg, PA, USA. Association for Computational Linguistics. Kenji Yamada and Kevin Knight. 2001 . A syntaxbased statistical translation model. In Proceedings of 1024 the 39th Annual Meeting on Association for Computational Linguistics, ACL ’01, pages 523–530, Stroudsburg, PA, USA. Association for Computational Linguistics. Mark Yatskar, Bo Pang, Cristian Danescu-NiculescuMizil, and Lillian Lee. 2010. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In Proceedings of the NAACL, pages 365–368. Shiqi Zhao, Xiang Lan, Ting Liu, and Sheng Li. 2009. Application-driven statistical paraphrase generation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL ’09, pages 834–842, Stroudsburg, PA, USA. Association for Computational Linguistics. Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 1353–1361, Beijing, China, August. Coling 2010 Organizing Committee.