acl acl2013 acl2013-322 acl2013-322-reference knowledge-graph by maker-knowledge-mining

322 acl-2013-Simple, readable sub-sentences

Source: pdf

Author: Sigrid Klerke ; Anders Sgaard

Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.

reference text

S.M. Aluísio, Lucia Specia, T.A.S. Pardo, E.G. Maziero, H.M. Caseli, and R.P.M. Fortes. 2008. A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th annual ACM international conference on Design of communication, pages 15–22. ACM. Jonathan Anderson. 1983. LIX and RIX: Variations on a little-known readability index. Journal ofReading, 26(6):490–496. C. H. Bjornsson. 1983. Readability of Newspapers in 11 Languages. Reading Research Quarterly, 18(4):480–497. B Bohnet. 2010. Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 89–97. Association for Computational Linguistics. S. Bott, H. Saggion, and D. Figueroa. 2012. A hybrid system for spanish text simplification. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada. Julian Brooke, Vivian Tsang, David Jacob, Fraser Shein, and Graeme Hirst. 2012. Building Readability Lexicons with Unannotated Corpora. In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 33–39, Montr{é}al, Canada, June. Association for Computational Linguistics. Y. Canning, J. Tait, J. Archibald, and R. Crawley. 2000. Cohesive generation of syntactically simplified newspaper text. Springer. John Carroll, G. Minnen, D. Pearce, Yvonne Canning, S. Devlin, and J. Tait. 1999. Simplifying text for language-impaired readers. In Proceedings of EACL, volume 99, pages 269–270. Citeseer. R. Chandrasekar, Christine Doran, and B Srinivas. 1996. Motivations and methods for text simplification. In Proceedings of the 16th conference on Computational linguistics-Volume 2, pages 1041–1044. Association for Computational Linguistics. William Coster and David Kauchak. 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, volume 2, pages 665–669. Association for Computational Linguistics. W. Daelemans, A. Höthker, and E.T.K. Sang. 2004. Automatic sentence simplification for subtitling in dutch and english. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 1045–1048. A. Davison and R.N. Kantor. 1982. On the failure of readability formulas to define readable texts: A case study from adaptations. Reading Research Quarterly, pages 187–209. J. De Belder and M.F. Moens. 2012. A dataset for the evaluation of lexical simplification. Computational Linguistics and Intelligent Text Processing, pages 426–437. Anna Decker. 2003. Towards automatic grammatical simplification of Swedish text. Master’s thesis, Stockholm University. Biljana Drndarevic and Horacio Saggion. 2012. Towards Automatic Lexical Simplification in Spanish: An Empirical Study. In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 8–16, Montr{é}al, Canada, June. Association for Computational Linguistics. M Federico, N Bertoldi, and M Cettolo. 2008. IRSTLM: an open source toolkit for handling large scale language models. In Ninth Annual Conference of the International Speech Communication Association. Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221 . Michael Heilman and Noah A Smith. 2010. Extracting simplified statements for factual question generation. In Proceedings of the Third Workshop on Question Generation. Sigrid Klerke and Anders Søgaard. 2012. DSim , a Danish Parallel Corpus for Text Simplification. In Proceedings of Language Resources and Evaluation (LREC 2012), pages 4015–4018. Sigrid Klerke. 2012. Automatic text simplification in danish. sampling a restricted space of rewrites to optimize readability using lexical substitutions and dependency analyses. Master’s thesis, University of Copenhagen. P Koehn, H Hoang, A Birch, C Callison-Burch, M Federico, N Bertoldi, B Cowan, W Shen, C Moran, R Zens, and Others. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 177–180. Association for Computational Linguistics. M T Kromann. 2003. The Danish Dependency Treebank and the DTAG treebank tool. In Proceedings ofthe Second Workshop on Treebanks andLinguistic Theories (TLT), page 217. Julie Medero. 2011. Identifying Targets for Syntactic Simplification. In Proceedings of Speech and Language Technology in Education. 148 F.J. Och and H. Ney. 2000. A comparison of alignment models for statistical machine translation. In Proceedings of the 18th conference on Computational linguistics-Volume 2, pages 1086–1090. Association for Computational Linguistics. S.E. E Petersen and Mari Ostendorf. 2007. Text simplification for language learners: a corpus analysis. In the Proceedings of the Speech and Language Technology for Education Workshop, pages 69–72. Citeseer. S. Petrov, D. Das, and R. McDonald. 2011. A universal part-of-speech tagset. Arxiv preprint ArXiv:1104.2086. Emily Pitler and Ani Nenkova. 2008. Revisiting readability: A unified framework for predicting text quality. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Jonas Rybing and Christian Smith. 2009. CogFLUX Grunden till ett automatiskt textförenklingssystem för svenska. Master’s thesis, Linköpings Universitet. Sarah E Schwarm and Mari Ostendorf. 2005. Reading Level Assessment Using Support Vector Machines and Statistical Language Models. In Proceedings of the 43rd Annual Meeting of the ACL, pages 523– 530. V. Seretan. 2012. Acquisition of syntactic simplification rules for french. In Proceedings of Language Resources and Evaluation (LREC 2012). Advaith Siddharthan and Napoleon Katsos. 2012. Offline Sentence Processing Measures for testing Readability with Users. In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 17–24, Montr{é}al, Canada, June. Association for Computational Linguistics. Advaith Siddharthan. 2010. Complex lexico-syntactic reformulation of sentences using typed dependency representations. Proceedings of the 6th International Natural Language Generation Conference. Advaith Siddharthan. 2011. Text Simplification using Typed Dependencies: A Comparison of the Robustness of Different Generation Strategies. In Proceedings of the 13th European Workshop on Natural Language Generation, pages 2–1 1. L. Specia, S.K. Jauhar, and R. Mihalcea. 2012. Semeval-2012 task 1: English lexical simplification. In Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), pages 347– 355. L. Specia. 2010. Translating from complex to simplified sentences. In Proceedings of the 9th international conference on Computational Processing of the Portuguese Language, pages 30–39. Andreas Stolcke. 2002. SRILM an extensible language modeling toolkit. In Proceedings of the Seventh International Conference on Spoken Language Processing. – S. Vajjala and D. Meurers. 2012. On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7), pages 163–173. Kristian Woodsend and Mirella Lapata. 2011. Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (2011), pages 409– 420. Mark Yatskar, Bo Pang, C. Danescu-Niculescu-Mizil, and Lillian Lee. 2010. For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 365–368. Association for Computational Linguistics. Zhemin Zhu, Delphine Bernhard, and I. Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of The 23rd International Conference on Computational Linguistics, pages 1353–1361. Association for Computational Linguistics. 149