emnlp emnlp2012 emnlp2012-16 emnlp2012-16-reference knowledge-graph by maker-knowledge-mining

16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

Source: pdf

Author: Michael Roth ; Anette Frank

Abstract: Generating coherent discourse is an important aspect in natural language generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structures in a model that exceeds the sentence level. We present an important subtask for this overall goal, in which we align predicates across comparable texts, admitting partial argument structure correspondence. The contribution of this work is two-fold: We first construct a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments. Secondly, we present a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts. Our method significantly outperforms other alignment techniques when applied to this novel alignment task, by a margin of at least 6.5 percentage points in F1-score.

reference text

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. SemEval-2012 Task 6: A pilot on semantic textual similarity. In Proceedings of the 6th International Workshop on Semantic Evaluations, Montreal, Canada, June. to appear. Regina Barzilay and Mirella Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1): 1–34. Regina Barzilay and Lillian Lee. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, Mass., 2–7 May 2004, pages 113–120. Anja Belz, Eric Kow, Jette Viethen, and Albert Gatt. 2009. The grec main subject reference generation challenge 2009: overview and evaluation results. In Proceedings of the 2009 Workshop on Language Generation and Summarisation, pages 79–87. Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth pascal recognizing textual entailment challenge. In Proceedings of TAC. Anders Bj ¨orkelund, Bernd Bohnet, Love Hafdell, and Pierre Nugues. 2010. A high-performance syntactic and semantic dependency parser. In Coling 2010: Demonstration Volume, pages 33–36, Beijing, China, August. Coling 2010 Organizing Committee. Bernd Bohnet. 2010. Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 89–97, Beijing, China, August. Coling 2010 Organizing Committee. Chris Brockett. 2007. Aligning the RTE 2006 Corpus. Microsoft Research. Peter F. Brown, Vincent J. Della Pietra, Stephan A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19:263–3 11. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46. Paul R. Cohen. 1995. Empirical methods for artificial intelligence. MIT Press, Cambridge, MA, USA. Trevor Cohn, Chris Callison-Burch, and Mirella Lapata. 2008. Constructing Corpora for Development and Evaluation of Paraphrase Systems. 34(4). Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In J. Qui n˜onero-Candela, I. Dagan, and 181 B. Magnini, editors, Machine Learning Challenges, pages 177–190. Springer, Heidelberg, Germany. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Mass. Adrew V. Goldberg and Robert E. Tarjan. 1986. A new approach to the maximum flow problem. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 136–146, New York, NY, USA. Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):203–225. Weiwei Guo and Mona Diab. 2011. Semantic topic models: Combining word distributional statistics and dictionary definitions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 552–561, July. Shudong Huang, David Graff, and George Doddington. 2002. Multiple-Translation Chinese Corpus. Linguistic Data Consortium, Philadelphia. Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2008. A Large-scale Classification of English Verbs. 42(1):21–40. Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:21 1 240. Percy Liang, Benjamin Taskar, and Dan Klein. 2006. Alignment by agreement. In North American Association for Computational Linguistics (NAACL), pages 104–1 11. Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, Wisc., 24–27 July 1998, pages 296–304. Bill MacCartney, Michael Galley, and Christopher D. Manning. 2008. A phrase-based alignment model for natural language inference. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawaii, 2527 October 2008. Adam Meyers, Ruth Reeves, and Catherine Macleod. 2008. NomBank v1.0. Linguistic Data Consortium, Philadelphia. Shachar Mirkin, Jonathan Berant, Ido Dagan, and Eyal Shnarch. 2010a. Recognising entailment within discourse. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, August. Coling 2010 Organizing Committee. Shachar Mirkin, Ido Dagan, and Sebastian Pad o´. 2010b. Assessing the role of discourse references in entailment inference. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010. Jeff Mitchell and Mirella Lapata. 2010. Composition in Distributional Models of Semantics. 34(8): 1388– 1429. Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. 29(1): 19–51. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 3 1(1):71– 105. Robert Parker, David Graff, Jumbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English Gigaword Fifth Edition. Linguistic Data Consortium, Philadelphia. Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. WordNet::Similarity Measuring the relatedness of concepts. In Companion Volume to the Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, Mass., 2–7 May 2004, pages 267–270. Tom Richens. 2008. Anomalies in the wordnet verb hierarchy. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 729–736. Association for Computational Linguistics. Michael Roth and Anette Frank. 2012. Aligning predicate argument structures in monolingual comparable texts: A new corpus for a new task. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, Montreal, Canada, June. – Josef Ruppenhofer, Caroline Sporleder, Roser Morante, Collin Baker, and Martha Palmer. 2010. SemEval2010 Task 10: Linking Events and Their Participants in Discourse. In Proceedings of the 5th International Workshop on Semantic Evaluations, pages 45–50, Uppsala, Sweden, July. Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems (NIPS 2011). Ivan Titov and Mikhail Kozhevnikov. 2010. Bootstrapping semantic analyzers from non-contradictory texts. 182 In Proceedings ofthe 48thAnnualMeeting ofthe Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010, pages 958–967. Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia. Stephen Wan, Mark Dras, Robert Dale, and Cecile Paris. 2006. Using dependency-based features to take the ”Para-farce” out of paraphrase. In Proceedings of the Australasian Language Technology Workshop, pages 131–138. Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, and Ann Houston. 2011. OntoNotes Release 4.0. Linguistic Data Consortium, Philadelphia. Sander Wubben, Antal van den Bosch, Emiel Krahmer, and Erwin Marsi. 2009. Clustering and matching headlines for automatic paraphrase acquisition. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), pages 122– 125, Athens, Greece, March. Association for Computational Linguistics.