emnlp emnlp2012 emnlp2012-9 emnlp2012-9-reference knowledge-graph by maker-knowledge-mining

9 emnlp-2012-A Sequence Labelling Approach to Quote Attribution

Source: pdf

Author: Timothy O'Keefe ; Silvia Pareti ; James R. Curran ; Irena Koprinska ; Matthew Honnibal

Abstract: Quote extraction and attribution is the task of automatically extracting quotes from text and attributing each quote to its correct speaker. The present state-of-the-art system uses gold standard information from previous decisions in its features, which, when removed, results in a large drop in performance. We treat the problem as a sequence labelling task, which allows us to incorporate sequence features without using gold standard information. We present results on two new corpora and an augmented version of a third, achieving a new state-of-the-art for systems using only realistic features.

reference text

James R. Curran and Stephen Clark. 2003. Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, pages 91–98. Peter T. Davis, David K. Elson, and Judith L. Klavans. 2003. Methods for precise named entity matching in digital collections. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital libraries, pages 125–127. Eric de La Clergerie, Benoit Sagot, Rosa Stern, Pascal Denis, Gaelle Recource, and Victor Mignot. 2011. Extracting and visualizing quotations from news wires. Human Language Technology. Challenges for Computer Science and Linguistics, pages 522–532. David. K Elson and Kathleen. R McKeown. 2010. Automatic attribution of quoted speech in literary narrative. In Proceedings of AAAI, pages 1013– 1019. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9: 1871– 1874. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 363–370. Kevin Glass and Shaun Bangay. 2007. A naive salience-based method for speaker identification in fiction books. In Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA07), pages 1–6. Ben Hachey, Will Radford, Joel Nothman, Matthew Honnibal, and James R. Curran. 2012. Evaluating entity linking with Wikipedia. Artificial Intelligence. (in press). John Lafferty, Andrew McCallum, and Fernando C.N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. International Conference on Machine Learning, pages 282–289. Nuno Mamede and Pedro Chaleira. 2004. Character identification in children stories. Advances in Natural Language Processing, pages 82–90. 799 Naoaki Okazaki. 2007. CRFsuite: a fast implementation of Conditional Random Fields (CRFs). URL http : / /www . chokkan .org/ s o ftware / crf suite / . Silvia Pareti. 2012. A database of attribution relations. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 3213–3217. Bruno Pouliquen, Ralf Steinberger, and Clive Best. 2007. Automatic detection of quotations in multilingual news. In Proceedings of Recent Advances in Natural Language Processing, pages 487–492. Beno ıˆt Sagot, Laurence Danlos, and Rosa Stern. 2010. A lexicon of french quotation verbs for automatic quotation extraction. In 7th international conference on Language Resources and Evalua- tion - LREC 2010. Luis Sarmento and Sergio Nunes. 2009. Automatic extraction of quotes and topics from news feeds. In 4th Doctoral Symposium on Informatics Engineering. Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederik L. Crabbe, and Noah A. Smith. 2010. Visualizing topical quotations over time to understand news discourse. Technical Report CMU-LTI-01-013, Carnegie Mellon University. Ralph Weischedel and Ada Brunstein. 2005. BBN pronoun coreference and entity type corpus. Linguistic Data Consortium, Philadelphia. Jason Zhang, Alan Black, and Richard Sproat. 2003. Identifying speakers in children’s stories for speech synthesis. In Proceedings of EUROSPEECH, pages 2041–2044.