emnlp emnlp2013 emnlp2013-12 emnlp2013-12-reference knowledge-graph by maker-knowledge-mining

12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity


Source: pdf

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.


reference text

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393, Montr ´eal, Canada, 7-8 June. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of the 1 international conference on Computa7th tional Linguistics, Montreal, Canada. Carmen Banea, Samer Hassan, Michael Mohler, and Rada Mihalcea. 2012. Unt: A supervised synergistic approach to semantic text similarity. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 635–642, Montr ´eal, Canada, 7-8 June. Daniel B a¨r, Chris Biemann, Iryna Gurevych, and Torsten Zesch. 2012. Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 435–440, Montr ´eal, Canada, 7-8 June. Johan Bos and Katja Markert. 2006. Recognising textual entailment with robust logical inference. In Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW’05, pages 404–426, Berlin, Heidelberg. Springer-Verlag. Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. 2004. Wide-coverage semantic representations from a ccg parser. In Proceedings of Coling 2004, pages 1240–1246, Geneva, Switzerland, Aug 23–Aug 27. COLING. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005). Association for Computational Linguistics. Danilo Giampiccolo, Bernardo Magnini, Ido Dagan, and Bill Dolan. 2007. The third pascal recognizing textual entailment challenge. In Proceedings of the ACLPASCAL Workshop on Textual Entailment and Paraphrasing, pages 1–9, Prague, June. Association for Computational Linguistics. Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. 2007. SemEval-2007 Task 04: Classification of Semantic Relations between Nominals. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 13–18, Prague, Czech Republic, June. Association for Computational Linguistics. Demetrios Glinos. 2012. Ata-sem: Chunk-based determination of semantic text similarity. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 547–551, Montr ´eal, Canada, 7-8 June. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The weka data mining software: an update. SIGKDD Explor. Newsl., 11(1): 10–18. Vasileios Hatzivassiloglou, Judith L. Klavans, and Eleazar Eskin. 1999. Detecting text similarity over short passages: exploring linguistic feature combinations via machine learning. In In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Meth1244 ods in Natural Language Processing and Very Large Corpora, pages 203–212. Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid S ´eaghdha, Sebastian Pad o´, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 33–38, Uppsala, Sweden, July. Association for Computational Linguistics. Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. OntoNotes: the 90% Solution. In NAACL ’06: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX, pages 57–60, Morristown, NJ, USA. Association for Computational Linguistics. O´ J.J. Jiang and D.W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of the Int’l. Conf. on Research in Computational Linguistics. C. Leacock and M. Chodorow, 1998. Combining local context and WordNet similarity for word sense identification, pages 305–332. In C. Fellbaum (Ed.), MIT Press. Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, SIGDOC ’86, pages 24–26, New York, NY, USA. ACM. Yuri Lin, Jean-Baptiste Michel, Erez Aiden Lieberman, Jon Orwant, Will Brockman, and Slav Petrov. 2012. Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 System Demonstrations, pages 169–174, Jeju Island, Korea, July. Association for Computational Linguistics. Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, pages 296–304, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Nitin Madnani and Bonnie J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of data- driven methods. Comput. Linguist., 36(3):341–387, September. William McCune and Larry Wos. 1997. Otter: The cade13 competition incarnations. Journal of Automated Reasoning, 18:21 1–220. Adam Meyers, Ruth Reeves, Catherine Macleod, Rachel Szekely, Veronika Zielinska, Brian Young, and Ralph Grishman. 2004. Annotating noun argument structure for nombank. In LREC. European Language Resources Association. Rada Mihalcea, Courtney Corley, and Carlo Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st national conference on Artificial intelligence, AAAI’06, pages 775–780. AAAI Press. George A. Miller. 1995. WordNet: A Lexical Database for English. In Communications of the ACM, volume 38, pages 39–41. Michael Mohler, Razvan Bunescu, and Rada Mihalcea. 2011. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 752–762, Portland, Oregon, USA, June. Association for Computational Linguistics. Dan Moldovan and Eduardo Blanco. 2012. Polaris: Lymba’s semantic parser. In Nicoletta Calzo- lari, Khalid Choukri, Thierry Declerck, Mehmet U g˘ur Do˘ gan, Bente Maegaard, Joseph Mariani, and Jan Odijk a nd Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources andEvaluation (LREC-2012), pages 66–72, Istanbul, Turkey, May. European Language Resources Association (ELRA). ACL Anthology Identifier: L121040. D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi, A. Badulescu, and O. Bolohan. 2002. Lcc tools for question answering. In Voorhees and Buckland, editors, Proceedings of the 11th Text REtrieval Conference (TREC-2002), NIST, Gaithersburg. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71–106. Hoifung Poon and Pedro Domingos. 2009. Unsupervised Semantic Parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1–10, Singapore, August. Association for Computational Linguistics. James Pustejovsky and Marc Verhagen. 2009. SemEval2010 Task 13: Evaluating Events, Time Expressions, and Temporal Relations (TempEval-2). In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009), pages 112–1 16, Boulder, Colorado, June. Association for Computational Linguistics. Ross J. Quinlan. 1992. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343–348, Singapore. World Scientific. Rajat Raina, Andrew Y. Ng, and Christopher D. Manning. 2005. Robust textual inference via learning and 1245 abductive reasoning. In Proceedings of the 20th national conference on Artificial intelligence - Volume 3, AAAI’05, pages 1099–1 105. AAAI Press. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1, IJCAI’95, pages 448–453, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Miguel Rios, Wilker Aziz, and Lucia Specia. 2012. Uow: Semantically informed text similarity. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 673–678, Montr ´eal, Canada, 7-8 June. Evan Sandhaus. 2008. The new york times annotated corpus. In Linguistic Data Consortium, Philadelphia, PA. Marta Tatu and Dan Moldovan. 2005. A semantic approach to recognizing textual entailment. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 371–378, Stroudsburg, PA, USA. Association for Computational Linguistics. Frane Sˇari ´c, Goran Glava ˇs, Mladen Karan, Jan Sˇnajder, and Bojana Dalbelo Baˇ si´ c. 2012. Takelab: Systems for measuring semantic text similarity. In Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 441–448, Montr ´eal, Canada, 7-8 June. Y. Wang and I. H. Witten. 1997. Induction of model trees for predicting continuous classes. In Poster papers of the 9th European Conference on Machine Learning. Springer. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ACL ’94, pages 133–138, Stroudsburg, PA, USA. Association for Computational Linguistics. Luke Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), pages 658–666, Arling- ton, Virginia. AUAI Press.