acl acl2013 acl2013-293 acl2013-293-reference knowledge-graph by maker-knowledge-mining

293 acl-2013-Random Walk Factoid Annotation for Collective Discourse

Source: pdf

Author: Ben King ; Rahul Jha ; Dragomir Radev ; Robert Mankoff

Abstract: In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways ofbeing realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1) factoid discovery, finding groups of words that correspond to the same factoid, and (2) factoid assignment, using these groups of words to mark collective discourse units that contain the respective factoids. We study this on two novel data sets: the New Yorker caption contest data set, and the crossword clues data set.

reference text

David M Blei, Andrew Y Ng, and Michael IJordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008. Dipanjan Das and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graphbased projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 600–609. Ahmed Hassan and Dragomir Radev. 2010. Identify- ing text polarity using random walks. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 395–403. Association for Computational Linguistics. Leonhard Hennig, Ernesto William De Luca, and Sahin Albayrak. 2010. Learning summary content units with topic modeling. In Proceedings of the 23rd 253 International Conference on Computational Linguistics: Posters, COLING ’ 10, pages 391–399, Stroudsburg, PA, USA. Association for Computational Linguistics. Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. Ani Nenkova and Rebecca Passonneau. 2004. Evaluating content selection in summarization: The pyramid method. Vahed Qazvinian and Dragomir R Radev. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pages 689–696. Association for Computational Linguistics. Vahed Qazvinian and Dragomir R Radev. 2011. Learning from collective human behavior to introduce diversity in lexical choice. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language techologies, pages 1098–1 108. Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language TechnologyVolume 1, pages 173–180. Association for Computational Linguistics. Hans Van Halteren and Simone Teufel. 2003. Examining the consensus between human summaries: initial experiments with factoid analysis. In Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5, pages 57–64. Association for Computational Linguistics. 254