acl acl2010 acl2010-44 acl2010-44-reference knowledge-graph by maker-knowledge-mining

44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network


Source: pdf

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present BabelNet a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and existing gold-standard datasets to show the high quality and coverage of the resource. –


reference text

Jordi Atserias, Luis Villarejo, German Rigau, Eneko Agirre, John Carroll, Bernardo Magnini, and Piek Vossen. 2004. The MEANING multilingual central repository. In Proc. of GWC-04, pages 80–210. S o¨ren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ive. 2007. Dbpedia: A nucleus for a web of open data. In Proceedings of 6th International Semantic Web Conference joint with 2nd Asian Semantic Web Conference (ISWC+ASWC 2007), pages 722–735. Sagot Beno ıˆt and Darja Fiˇ ser. 2008. Building a free French WordNet from multilingual resources. In Proceedings of the Ontolex 2008 Workshop. William Black, Sabri Elkateb Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, and Adam Pease. 2006. Introducing the Arabic WordNet project. In Proc. of GWC-06, pages 295–299. Razvan Bunescu and Marius Pas ¸ca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proc. of EACL-06, pages 9–16. Chris Callison-Burch. 2009. Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In Proc. of EMNLP-09, pages 286– 295. Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249–254. Montse Cuadros and German Rigau. 2006. Quality assessment of large scale knowledge resources. In Proc. of EMNLP-06, pages 534–541. Gerard de Melo and Gerhard Weikum. 2009. Towards a universal wordnet by learning from combined evidence. In Proc. of CIKM-09, pages 5 13–522. Oren Etzioni, Kobi Reiter, Stephen Soderland, and Marcus Sammer. 2007. Lexical translation with application to image search on the Web. In Proceedings of Machine Translation Summit XI. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Database. MIT Press, Cambridge, MA. Pascale Fung. 1995. A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In Proc. of ACL-95, pages 236–243. Evgeniy Gabrilovich and Shaul Markovitch. 2006. Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proc. of AAAI-06, pages 1301–1306. William A. Gale and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1):75–102. Jim Giles. 2005. Internet encyclopedias go head to head. Nature, 438:900–901. Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proc. of ACL-08, pages 771–779. 224 Sanda M. Harabagiu, Dan Moldovan, Marius Pas ¸ca, Rada Mihalcea, Mihai Surdeanu, Razvan Bunescu, Roxana Girju, Vasile Rus, and Paul Morarescu. 2000. FALCON: Boosting knowledge for answer engines. In Proc. of TREC-9, pages 479–488. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Comp. Vol. to Proc. of ACL-07, pages 177–180. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X. Els Lefever and Veronique Hoste. 2009. Semeval2010 task 3: Cross-lingual Word Sense Disambiguation. In Proc. of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009), pages 82–87, Boulder, Colorado. Lothar Lemnitzer and Claudia Kunze. 2002. GermaNet representation, visualization, application. In Proc. of LREC ’02, pages 1485–1491. Alessandro Lenci, Nuria Bel, Federica Busa, Nicoletta Calzolari, Elisabetta Gola, Monica Monachini, Antoine Ogonowski, Ivonne Peters, Wim Peters, Nilda Ruimy, Marta Villegas, and Antonio Zampolli. 2000. SIMPLE: A general framework for the development of multilingual lexicons. International Journal of Lexicography, 13(4):249–263. Mausam, Stephen Soderland, Oren Etzioni, Daniel Weld, Michael Skinner, and Jeff Bilmes. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proc. of ACL-IJCNLP09, pages 262–270. Olena Medelyan, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. Int. J. Hum.-Comput. Stud., 67(9):716– – 754. George A. Miller, Claudia Leacock, Randee Tengi, and Ross Bunker. 1993. A semantic concordance. In Proceedings of the 3rd DARPA Workshop on Human Language Technology, pages 303–308, Plainsboro, N.J. Vivi Nastase. 2008. Topic-driven multi-document summarization with encyclopedic knowledge and activation spreading. In Proc. of EMNLP-08, pages 763–772. Roberto Navigli and Mirella Lapata. 2010. An experimental study on graph connectivity for unsupervised Word Sense Disambiguation. IEEE Transactions on Pattern Anaylsis and Machine Intelligence, 32(4):678–692. Roberto Navigli. 2009a. Using cycles and quasicycles to disambiguate dictionary glosses. In Proc. of EACL-09, pages 594–602. Roberto Navigli. 2009b. Word Sense Disambiguation: A survey. ACM Computing Surveys, 41(2): 1–69. Hwee Tou Ng and Hian Beng Lee. 1996. Integrating multiple knowledge sources to disambiguate word senses: An exemplar-based approach. In Proc. of ACL-96, pages 40–47. Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. 2002. MultiWordNet: Developing an aligned multilingual database. In Proc. of GWC-02, pages 21–25. Simone Paolo Ponzetto and Roberto Navigli. 2009. Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In Proc. of IJCAI-09, pages 2083–2088. Simone Paolo Ponzetto and Roberto Navigli. 2010. Knowledge-rich Word Sense Disambiguation rivaling supervised system. In Proc. of ACL-10. Simone Paolo Ponzetto and Michael Strube. 2007. Deriving a large scale taxonomy from Wikipedia. In Proc. of AAAI-07, pages 1440–1445. Nils Reiter, Matthias Hartung, and Anette Frank. 2008. A resource-poor approach for linking ontology classes to Wikipedia articles. In Johan Bos and Rodolfo Delmonte, editors, Semantics in Text Processing, volume 1of Research in Computational Semantics, pages 381–387. College Publications, London, England. Maria Ruiz-Casado, Enrique Alfonseca, and Pablo Castells. 2005. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Advances in Web Intelligence, volume 3528 of Lecture Notes in Computer Science. Springer Verlag. Marcus Sammer and Stephen Soderland. 2007. Building a sense-distinguished multilingual lexicon from monolingual corpora and bilingual lexicons. In Proceedings of Machine Translation Summit XI. Rion Snow, Dan Jurafsky, and Andrew Ng. 2006. Se- mantic taxonomy induction from heterogeneous evidence. In Proc. of COLING-ACL-06, pages 801 808. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2008. Yago: A large ontology from Wikipedia and WordNet. Journal of Web Semantics, 6(3):203–217. Dan Tufi ¸s, Dan Cristea, and Sofia Stamou. 2004. BalkaNet: Aims, methods, results and perspectives. a general overview. Romanian Journal on Science and Technology of Information, 7(1-2):9–43. Luis von Ahn. 2006. Games with a purpose. IEEE Computer, 6(39):92–94. Piek Vossen, editor. 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer, Dordrecht, The Netherlands. Fei Wu and Daniel Weld. 2007. Automatically semantifying Wikipedia. In Proc. of CIKM-07, pages 41–50. David Yarowsky and Radu Florian. 2002. Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering, 9(4):293– 310. Toshio Yokoi. 1995. The EDR electronic dictionary. Communications of the ACM, 38(1 1):42–44. 225