acl acl2013 acl2013-306 acl2013-306-reference knowledge-graph by maker-knowledge-mining

306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

Source: pdf

Author: Tiziano Flati ; Roberto Navigli

Abstract: We present SPred, a novel method for the creation of large repositories of semantic predicates. We start from existing collocations to form lexical predicates (e.g., break ∗) and learn the semantic classes that best f∗it) tahned ∗ argument. Taon idco this, we extract failtl thhee ∗ occurrences ion Wikipedia ewxthraiccht match the predicate and abstract its arguments to general semantic classes (e.g., break BODY PART, break AGREEMENT, etc.). Our experiments show that we are able to create a large collection of semantic predicates from the Oxford Advanced Learner’s Dictionary with high precision and recall, and perform well against the most similar approach.

reference text

Jonathan Berant, Ido Dagan, and Jacob Goldberger. 2012. Learning entailment relations by global graph structure optimization. Computational Linguistics, 38(1):73–1 11. Shane Bergsma, Dekang Lin, and Randy Goebel. 2008. Discriminative learning of selectional preference from unlabeled text. In Proc. of EMNLP, pages 59–68, Stroudsburg, PA, USA. Christian Bizer, Jens Lehmann, Georgi Kobilarov, S o¨ren Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - a crystal- lization point for the Web of Data. Web Semantics, 7(3): 154–165. Gerlof Bouma. 2010. Collocation Extraction beyond the Independence Assumption. In Proc. of ACL, Short Papers, pages 109–1 14, Uppsala, Sweden. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka, and Tom M. Mitchell. 2010. Toward an architecture for never-ending language learning. In Proc. of AAAI, pages 1306–1313, Atlanta, Georgia. Nathanael Chambers and Dan Jurafsky. 2010. Improving the use of pseudo-words for evaluating selectional preferences. In Proc. of ACL, pages 445–453, Stroudsburg, PA, USA. Tim Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the Web for fine-grained semantic verb relations. In Proc. of EMNLP, pages 33–40, Barcelona, Spain. Jennifer Chu-Carroll and John Prager. 2007. An experimental study ofthe impact ofinformation extraction accuracy on semantic search performance. In Proc. of CIKM, pages 505–5 14, Lisbon, Portugal. Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger. In Proc. of EMNLP, pages 594–602, Sydney, Australia. Stephen Clark and David Weir. 2002. Class-based probability estimation using a semantic hierarchy. Computational Linguistics, 28(2): 187–206. Jonathan Crowther, editor. 1998. Oxford Advanced Learner’s Dictionary. Cornelsen & Oxford, 5th edition. Flavio De Benedictis, Stefano Faralli, and Roberto Navigli. 2013. GlossBoot: Bootstrapping multilingual domain glossaries from the Web. In Proc. of ACL, Sofia, Bulgaria. Gerard de Melo and Gerhard Weikum. 2010. MENTA: Inducing Multilingual Taxonomies from Wikipedia. In Proc. of CIKM, pages 1099–1 108, New York, NY, USA. Katrin Erk and Diana McCarthy. 2009. Graded word sense assignment. In Proc. of EMNLP, pages 440– 449, Stroudsburg, PA, USA. Katrin Erk. 2007. A Simple, Similarity-based Model for Selectional Preferences. In Proc. of ACL, pages 216–223, Prague, Czech Republic. Oren Etzioni, Michael Cafarella, Doug Downey, AnaMaria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence, 165(1):91–134. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proc. of EMNLP, pages 1535–1545, Edinburgh, UK. Stefano Faralli and Roberto Navigli. 2013. A Java framework for multilingual definition and hypernym extraction. In Proc. of ACL, Comp. Volume, Sofia, Bulgaria. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Database. MIT Press, Cambridge, MA. David A. Ferrucci, Eric W. Brown, Jennifer ChuCarroll, James Fan, David Gondek, Aditya Kalyanpur, Adam Lally, J. William Murdock, Eric Nyberg, John M. Prager, Nico Schlaefer, and Christopher A. Welty. 2010. Building Watson: an overview of the DeepQA project. AIMagazine, 3 1(3):59–79. Hagen F ¨urstenau and Mirella Lapata. 2012. Semisupervised semantic role labeling via structural alignment. Computational Linguistics, 38(1): 135– 171. Roxana Girju, Adriana Badulescu, and Dan Moldovan. 2003. Learning semantic constraints for the automatic discovery of part-whole relations. In Proc. of HLT-NAACL, pages 1–8, Edmonton, Canada. Rebecca Green, Bonnie J. Dorr, and Philip Resnik. 2004. Inducing Frame Semantic Verb Classes from WordNet and LDOCE. In Proc. of ACL, pages 375– 382, Barcelona, Spain. Patrick Hanks. 2013. Lexical Analysis: Norms and Exploitations. University Press Group Limited. Marti A. Hearst. 1992. Automatic acquisition of hy- ponyms from large text corpora. In Proc. of COLING, pages 539–545, Nantes, France. Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28–61 . Eduard H. Hovy, Roberto Navigli, and Simone Paolo Ponzetto. 2013. Collaboratively built semistructured content and artificial intelligence: The story so far. Artificial Intelligence, 194:2–27. Ruihong Huang and Ellen Riloff. 2010. Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing. In Proc. of ACL, pages 275–285, Uppsala, Sweden. Sean P. Igo and Ellen Riloff. 2009. Corpus-based semantic lexicon induction with Web-based corroboration. In Proc. of UMSLLS, pages 18–26, Stroudsburg, PA, USA. Rub e´n Izquierdo, Armando Su´ arez, and German Rigau. 2009. An Empirical Study on Class-Based Word Sense Disambiguation. In Proc. of EACL, pages 389–397, Athens, Greece. Hyeju Jang and Jack Mostow. 2012. Inferring selectional preferences from part-of-speech n-grams. In Proc. of EACL, pages 377–386, Stroudsburg, PA, USA. 1231 Boris Katz, Jimmy J. Lin, Daniel Loreto, Wesley Hildebrandt, Matthew W. Bilotti, Sue Felshin, Aaron Fernandes, Gregory Marton, and Federico Mora. 2003. Integrating Web-based and Corpus-based Techniques for Question Answering. In Proc. of TREC, pages 426–435, Gaithersburg, Maryland. Zornitsa Kozareva and Eduard Hovy. 2010. Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns. In Proc. of ACL, pages 1482–1491, Uppsala, Sweden. Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In Proc. ACL/HLT, pages 1048–1056, Columbus, Ohio. Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-scale learning of relationextraction rules with distant supervision from the web. In Proc. of ISWC 2012, Part I, pages 263–278, Boston, MA. Hang Li and Naoki Abe. 1998. Generalizing case frames using a thesaurus and the MDL principle. Computational Linguistics, 24(2):217–244. Rada Mihalcea and Dan Moldovan. eXtended WordNet: Progress report. In Proceedings of the NAACL01 Workshop on WordNet and Other Lexical Resources, Pittsburgh, Penn. Thahir Mohamed, Estevam Hruschka, and Tom Mitchell. 2011. Discovering Relations between Noun Categories. In Proc. of EMNLP, pages 1447– 1455, Edinburgh, Scotland, UK. Andrea Moro and Roberto Navigli. 2012. WiSeNet: Building a Wikipedia-based semantic network with ontologized relations. In Proc. of CIKM, pages 1672–1676, Maui, HI, USA. Andrea Moro and Roberto Navigli. 2013. Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm. In Proc. of IJCAI, Beijing, China. Vivi Nastase and Michael Strube. 2013. Transforming wikipedia into a large scale multilingual concept network. Artificial Intelligence, 194:62–85. Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217– 250. Roberto Navigli and Paola Velardi. 2010. Learning Word-Class Lattices for Definition and Hypernym Extraction. In Proc. of ACL, pages 13 18–1327, Uppsala, Sweden. Roberto Navigli. 2009. Word Sense Disambiguation: A survey. ACM Computing Surveys, 41(2): 1–69. Patrick Pantel, Rahul Bhagat, Timothy Chklovski, and Eduard Hovy. 2007. ISP: learning inferential selectional preferences. In Proc. of NAACL, pages 564– 571, Rochester, NY. Marius Pasca. 2004. Acquisition of categorized named entities for web search. In Proc. of CIKM, pages 137–145, New York, NY, USA. Marco Pennacchiotti and Patrick Pantel. 2006. Ontologizing semantic relations. In Proc. of COLING, pages 793–800, Sydney, Australia. Simone Paolo Ponzetto and Michael Strube. 2011. Taxonomy induction based on a collaboratively built knowledge repository. Artificial Intelligence, 175(910):1737–1756. Philip Resnik. 1996. Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61(1): 127–159. Alan Ritter, Mausam, and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In Proc. of ACL, pages 424–434, Uppsala, Sweden. ACL. Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, and Franz Beil. 1999. Inducing a semantically annotated lexicon via EM-based clustering. In Proc. of ACL, pages 104–1 11, Stroudsburg, PA, USA. Diarmuid O S ´eaghdha. 2010. Latent variable models of selectional preference. In Proc. of ACL, pages 435–444, Uppsala, Sweden. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2004. Learning Syntactic Patterns for Automatic Hypernym Discovery. In NIPS, pages 1297–1304, Cam- bridge, Mass. Asher Stern and Ido Dagan. 2012. Biutee: A modular open-source system for recognizing textual entailment. In Proc. of ACL 2012, System Demonstrations, pages 73–78, Jeju Island, Korea. Mihai Surdeanu, Sanda Harabagiu, John Williams, and Paul Aarseth. 2003. Using predicate-argument structures for information extraction. In Proc. ACL, pages 8–15, Stroudsburg, PA, USA. M. Thelen and E. Riloff. 2002. A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In Proc. of EMNLP, pages 214–221, Salt Lake City, UT, USA. Paola Velardi, Stefano Faralli, and Roberto Navigli. 2013. OntoLearn Reloaded: A graph-based algorithm for taxonomy induction. Computational Linguistics, 39(3). Yorick Wilks. 1975. A preferential, pattern-seeking, semantics for natural language inference. Artificial Intelligence, 6(1):53–74. Fei Wu and Daniel S. Weld. 2010. Open Information Extraction Using Wikipedia. In Proc. of ACL, pages 118–127, Uppsala, Sweden. Akane Yakushiji, Yusuke Miyao, Tomoko Ohta, Yuka Tateisi, and Jun’ichi Tsujii. 2006. Automatic construction of predicate-argument structure patterns for biomedical information extraction. In Proc. of EMNLP, pages 284–292, Stroudsburg, PA, USA. David Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proc. of ACL, pages 189–196, Cambridge, MA, USA. Alexander Yates and Oren Etzioni. 2009. Unsupervised methods for determining object and relation synonyms on the web. Journal of Artificial Intelligence Research, 34(1):255. Alexander Yates, Michael Cafarella, Michele Banko, Oren Etzioni, Matthew Broadhead, and Stephen Soderland. 2007. TextRunner: open information extraction on the web. In Proc. of NAACLDemonstrations, pages 25–26, Stroudsburg, PA, USA. 1232