emnlp emnlp2013 emnlp2013-35 emnlp2013-35-reference knowledge-graph by maker-knowledge-mining

35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations

Source: pdf

Author: Silvia Pareti ; Tim O'Keefe ; Ioannis Konstas ; James R. Curran ; Irena Koprinska

Abstract: Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation extraction and attribution. We propose two methods of extracting all quote types from news articles and evaluate them on two large annotated corpora, one of which is a contribution of this work. We further show that direct quotation attribution methods can be successfully applied to indirect and mixed quotation attribution.

reference text

Sabine Bergler. 1992. Evidential analysis of reported speech. Ph.D. thesis, Brandeis University. Sabine Bergler, Monia Doandes, Christine Gerard, and Ren e´ Witte. 2004. Attributions. In Exploring Attitude and Affect in Text: Theories and Applications, Technical Report SS-04-07, pages 16–19. Papers from the 2004 AAAI Spring Symposium. Eric de La Clergerie, Benoit Sagot, Rosa Stern, Pascal Denis, Gaelle Recource, and Victor Mignot. 2011. Extracting and visualizing quotations from news wires. Human Language Technology. Challenges for Computer Science and Linguistics, pages 522–532. C ´ıcero Nogueira dos Santos and Ruy Luiz Milidi´ u. 2009. Entropy guided transformation learning. In Foundations of Computational, Intelligence Vol- ume 1, Studies in Computational Intelligence, pages 159–184. Springer. David K. Elson and Kathleen R. McKeown. 2010. Automatic attribution of quoted speech in literary narrative. In Proceedings of the Twenty-Fourth Conference of the Association for the Advancement of Artificial Intelligence, pages 1013–1019. Christine Fellbaum. 1998. WordNet: An electronic lexical database. MIT press Cambridge, MA. William Paulo Ducca Fernandes, Eduardo Motta, and Ruy Luiz Milidi´ u. 2011. Quotation extraction for portuguese. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), pages 204–208. Kevin Glass and Shaun Bangay. 2007. A naive salience-based method for speaker identification in fiction books. In Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA07), pages 1–6. Bill Hollingsworth and Simone Teufel. 2005. Human annotation of lexical chains: Coverage and agreement measures. In ELECTRA Workshop on Methodologies and Evaluation of Lexical Cohesion Techniques in Real-world Applications (Be- yond Bag of Words), page 26. Dan Klein and Christopher D Manning. 2002. Fast exact inference with a factored model for natural language parsing. In Advances in neural information processing systems, pages 3–10. Ralf Krestel, Sabine Bergler, and Ren e´ Witte. 2008. Minding the source: Automatic tagging of reported speech in newspaper articles. In Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Jisheng Liang, Navdeep Dhillon, and Krzysztof Koperski. 2010. A large-scale system for annotating and querying quotations in news feeds. In Proceedings of the 3rd International Semantic Search Workshop, pages 1–5. Nuno Mamede and Pedro Chaleira. 2004. Character identification in children stories. Advances in Natural Language Processing, pages 82–90. Tim O’Keefe, Silvia Pareti, James R. Curran, Irena Koprinska, and Matthew Honnibal. 2012. A sequence labelling approach to quote attribution. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 790–799. Silvia Pareti. 2012. A database of attribution relations. In Proceedings of the Eight International Conference on Language Resources and Evaluation, pages 3213–3217. Bruno Pouliquen, Ralf Steinberger, and Clive Best. 2007. Automatic detection of quotations in multilingual news. In Proceedings of Recent Advances in Natural Language Processing, pages 487–492. Rashmi Prasad, Nikhil Dinesh, Alan Lee, Aravind Joshi, and Bonnie Webber. 2006. Annotating attribution in the Penn Discourse TreeBank. In Proceedings of the Workshop on Sentiment and Subjectivity in Text, pages 3 1–38. Rashmi Prasad, Eleni Miltsakaki, Nikhil Dinesh, Alan Lee, Aravind Joshi, Livio Robaldo, and Bonnie Webber. 2008. The Penn Discourse TreeBank 2.0 annotation manual. In Technical report, University of Pennsylvania: Institute for Research in Cognitive Science. Luis Sarmento and Sergio Nunes. 2009. Automatic extraction of quotes and topics from news feeds. 999 In 4th Doctoral Symposium on Informatics Engi- neering. Roser Saur ı´ and James Pustejovsky. 2009. Factbank: A corpus annotated with event factuality. In Language Resources and Evaluation, pages 227–268. Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederik L. Crabbe, and Noah A. Smith. 2010. Visualizing topical quotations over time to understand news discourse. Technical report, Carnegie Mellon University. Karin K. Schuler. 2005. Verbnet: A BroadCoverage, Comprehensive Verb Lexicon. Ph.D. thesis, Faculties of Computer and Information Science of the University of Pennsylvania. Peter R. Skadhauge and Daniel Hardt. 2005. Syntactic identification of attribution in the RST treebank. In Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora. Janyce Wiebe and Ellen Riloff. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Computational Linguistics and Intelligent Text Processing, pages 486–497. Springer. Jason Zhang, Alan Black, and Richard Sproat. 2003. Identifying speakers in children’s stories for speech synthesis. In Proceedings of EUROSPEECH, pages 2041–2044.