acl acl2010 acl2010-38 acl2010-38-reference knowledge-graph by maker-knowledge-mining

38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization


Source: pdf

Author: Emily Pitler ; Annie Louis ; Ani Nenkova

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference informa- tion, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.


reference text

R. Barzilay and M. Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1): 1–34. C. Callison-Burch, C. Fordyce, P. Koehn, C. Monz, and J. Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 70– 106. J. Chae and A. Nenkova. 2009. Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text. In Proceedings of EACL, pages 139–147. E. Charniak and M. Elsner. 2009. EM works for pronoun anaphora resolution. In Proceedings of EACL, pages 148–156. J.M. Conroy and H.T. Dang. 2008. Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality. In Proceedings of COLING, pages 145–152. S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407. M. Elsner and E. Charniak. 2008. Coreferenceinspired coherence modeling. In Proceedings of ACL/HLT: Short Papers, pages 41–44. M. Elsner, J. Austerweil, and E. Charniak. 2007. A unified local and global model for discourse coherence. In Proceedings of NAACL/HLT. J.R. Finkel, T. Grenager, and C. Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceed- ings of ACL, pages 363–370. K. Fraurud. 1990. Definiteness and the processing of noun phrases in natural discourse. Journal of Semantics, 7(4):395. A.C. Graesser, D.S. McNamara, M.M. Louwerse, and Z. Cai. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods Instruments and Computers, 36(2): 193–202. B. Grosz, A. Joshi, and S. Weinstein. 1995. Centering: a framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203– 226. K.F. Haberlandt and A.C. Graesser. 1985. Component processes in text comprehension and some of their interactions. Journal of Experimental Psychology: General, 114(3):357–374. M.A.K. Halliday and R. Hasan. 1976. Cohesion in English. Longman Group Ltd, London, U.K. T. Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133–142. M.A. Just and P.A. Carpenter. 1987. The psychology of reading and language comprehension. Allyn and Bacon Boston, MA. D. Klein and C.D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423– 430. K. Knight and D. Marcu. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1):91–107. M. Lapata and R. Barzilay. 2005. Automatic evaluation of text coherence: Models and representations. In International Joint Conference On Artificial Intelligence, volume 19, page 1085. M. Lapata. 2003. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of ACL, pages 545–552. C.Y. Lin and E. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NAACL/HLT, page 78. C.Y. Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pages 25–26. A. Nenkova and K. McKeown. 2003. References to named entities: a corpus study. In Proceedings of HLT/NAACL 2003 (short paper). J. Otterbacher, D. Radev, and A. Luo. 2002. Revisions that improve cohesion in multi-document summaries: a preliminary study. In Proceedings of the Workshop on Automatic Summarization, ACL. P. Over, H. Dang, and D. Harman. 2007. Duc in context. Information Processing Management, 43(6): 1506–1520. C.D. Paice. 1980. The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases. In Proceedings of the 3rd annual ACM conference on Research and development in information retrieval, pages 172–191 . C.D. Paice. 1990. Constructing literature abstracts by computer: Techniques and prospects. Information Processing Management, 26(1): 171–186. E.F. Prince. 198 1. Toward a taxonomy of given-new information. Radical pragmatics, 223:255. H. Saggion. 2009. A Classification Algorithm for Predicting the Structure of Summaries. Proceedings of the 2009 Workshop on Language Generation and Summarisation, page 3 1. 553 R. Soricut and D. Marcu. 2006. Discourse generation using utility-trained coherence models. In Proceedings of ACL. J. Steinberger, M. Poesio, M.A. Kabadjov, and K. Jeek. 2007. Two uses of anaphora resolution in summarization. Information Processing Management, 43(6): 1663–1680. A. Stolcke. 2002. SRILM-an extensible language modeling toolkit. In Seventh International Conference on Spoken Language Processing, volume 3. 554