acl acl2011 acl2011-283 acl2011-283-reference knowledge-graph by maker-knowledge-mining

283 acl-2011-Simple English Wikipedia: A New Text Simplification Task

Source: pdf

Author: William Coster ; David Kauchak

Abstract: In this paper we examine the task of sentence simplification which aims to reduce the reading complexity of a sentence by incorporating more accessible vocabulary and sentence structure. We introduce a new data set that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification. The data contains the full range of simplification operations including rewording, reordering, insertion and deletion. We provide an analysis of this corpus as well as preliminary results using a phrase-based trans- lation approach for simplification.

reference text

Regina Barzilay and Noemie Elhadad. 2003. Sentence alignment for monolingual comparable corpora. In Proceedings of EMNLP. John Carroll, Gido Minnen, Yvonne Canning, Siobhan Devlin, and John Tait. 1998. Practical simplification of English newspaper text to assist aphasic readers. In Proceedings of AAAI Workshop on Integrating AI and Assistive Technology. Raman Chandrasekar and Bangalore Srinivas. 1997. Automatic induction of rules for text simplification. In Knowledge Based Systems. David Chiang. 2010. Learning to translate with source and target syntax. In Proceedings of ACL. James Clarke and Mirella Lapata. 2006. Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In Proceedings of ACL. Trevor Cohn and Mirella Lapata. 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research. Lijun Feng. 2008. Text simplification: A survey. CUNY Technical Report. Michel Galley and Kathleen McKeown. 2007. Lexicalized Markov grammars for sentence compression. In Proceedings of HLT/NAACL. Ruifang Ge and Raymond Mooney. 2006. Discriminative reranking for semantic parsing. In Proceedings of COLING. Siddhartha Jonnalagadda, Luis Tari, Jorg Hakenberg, Chitta Baral, and Graciela Gonzalez. 2009. To- wards effective sentence simplification for automatic processing of biomedical text. In Proceedings of HLT/NAACL. Dan Klein and Christopher Manning. 2003. Accurate unlexicalized parsing. In Proceedings of ACL. Kevin Knight and Daniel Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. Ryan McDonald. 2006. Discriminative sentence compression with soft syntactic evidence. In Proceedings of EACL. 669 Makoto Miwa, Rune Saetre, Yusuke Miyao, and Jun’ichi Tsujii. 2010. Entity-focused sentence simplication for relation extraction. In Proceedings of COLING. Courtney Napoles and Mark Dredze. 2010. Learning simple Wikipedia: A cogitation in ascertaining abecedarian language. In Proceedings of HLT/NAACL Workshop on Computation Linguistics and Writing. Rani Nelken and Stuart Shieber. 2006. Towards robust context-sensitive sentence alignment for monolingual corpora. In Proceedings of AMTA. Tadashi Nomoto. 2007. Discriminative sentence compression with conditional random fields. In Information Processing and Management. Tadashi Nomoto. 2008. A generic sentence trimmer with CRFs. In Proceedings of HLT/NAACL. Tadashi Nomoto. 2009. A comparison of model free versus model intensive approaches to sentence compression. In Proceedings of EMNLP. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–5 1. Franz Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics. Franz Josef Och, Kenji Yamada, Stanford U, Alex Fraser, Daniel Gildea, and Viren Jain. 2004. A smorgasbord of features for statistical machine translation. In Proceedings of HLT/NAACL. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL. Emily Pitler. 2010. Methods for sentence compression. Technical Report MS-CIS-10-20, University of Pennsylvania. Jenine Turner and Eugene Charniak. 2005. Supervised and unsupervised learning for sentence compression. In Proceedings of ACL. David Vickrey and Daphne Koller. 2008. Sentence simplification for semantic role labeling. In Proceedings of ACL. Elif Yamangil and Rani Nelken. 2008. Mining Wikipedia revision histories for improving sentence compression. In ACL. Mark Yatskar, Bo Pang, Critian Danescu-NiculescuMizil, and Lillian Lee. 2010. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In HLT/NAACL Short Papers.