acl acl2013 acl2013-356 acl2013-356-reference knowledge-graph by maker-knowledge-mining

356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia


Source: pdf

Author: Zhigang Wang ; Zhixing Li ; Juanzi Li ; Jie Tang ; Jeff Z. Pan

Abstract: Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.


reference text

S. Fissaha Adafre and M. de Rijke. 2006. Finding Similar Sentences across Multiple Languages in Wikipedia. EACL 2006 Workshop on New Text: Wikis and Blogs and Other Dynamic Text Sources. Sisay Fissaha Adafre and Maarten de Rijke. 2005. Discovering Missing Links in Wikipedia. Proceedings of the 3rd International Workshop on Link Discovery. Eytan Adar, Michael Skinner and Daniel S. Weld. 2009. Information Arbitrage across Multi-lingual Wikipedia. WSDM’09. David Aumueller, Hong Hai Do, Sabine Massmann and Erhard Rahm”. 2005. Schema and ontology matching with COMA++. SIGMOD Conference’05. Christian Bizer, Jens Lehmann, Georgi Kobilarov, S o¨ren Auer, Christian Becker, Richard Cyganiak and Sebastian Hellmann. 2009. DBpedia - A crystallization Point for the Web of Data. J. Web Sem. . Christian Bizer, Tom Heath, Kingsley Idehen and Tim Berners-Lee. 2008. Linked data on the web (LDOW2008). WWW’08. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge and Jamie Taylor. 2008. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. SIGMOD ’08. Gosse Bouma, Geert Kloosterman, Jori Mur, Gertjan Van Noord, Lonneke Van Der Plas and Jorg Tiedemann. 2008. Question Answering with Joost at CLEF 2007. Working Notes for the CLEF 2008 Workshop. Gosse Bouma, Sergio Duarte and Zahurul Islam. 2009. Cross-lingual Alignment and Completion of Wikipedia Templates. CLIAWS3 ’09. Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu. 2007. Boosting for Transfer Learning. ICML’07. Dietterich and Thomas G. 1998. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. . Sergio Ferr a´ndez, Antonio Toral, ıscar Ferr a´ndez, Antonio Ferr a´ndez and Rafael Mu˜ noz. 2009. Exploiting Wikipedia and EuroWordNet to Solve CrossLingual Question Answering. Inf. Sci. . Aidan Finn and Nicholas Kushmerick. 2004. Multilevel Boundary Classification for Information Extraction. ECML. Achille Fokoue, Felipe Meneguzzi, Murat Sensoy and Jeff Z. Pan. 2012. Querying Linked Ontological Data through Distributed Summarization. Proc. of the 26th AAAI Conference on Artificial Intelligence (AAAI2012). 649 Yoav Freund and Robert E. Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. . Norman Heino and Jeff Z. Pan. 2012. RDFS Reasoning on Massively Parallel Hardware. Proc. of the 11th International Semantic Web Conference (ISWC2012). Aidan Hogan, Jeff Z. Pan, Axel Polleres and Yuan Ren. 2011. Scalable OWL 2 Reasoning for Linked Data. Reasoning Web. Semantic Technologies for the Web of Data. Andreas Hotho, Robert J ¨aschke, Christoph Schmitz and Gerd Stumme. 2006. Information Retrieval in Folksonomies: Search and Ranking. ESWC’06. John D. Lafferty, Andrew McCallum and Fernando C. N. Pereira. 2001 . Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML’01. Alberto Lavelli, MaryElaine Califf, Fabio Ciravegna, Dayne Freitag, Claudio Giuliano, Nicholas Kushmerick, Lorenza Romano and Neil Ireson. 2008. Evaluation of Machine Learning-based Information Extraction Algorithms: Criticisms and Recommendations. Language Resources and Evaluation. Juanzi Li, Jie Tang, Yi Li and Qiong Luo. 2009. RiMOM: A Dynamic Multistrategy Ontology Alignment Framework. IEEE Trans. Knowl. Data Eng. . Xiao Ling, Gui-Rong Xue, Wenyuan Dai, Yun Jiang, Qiang Yang and Yong Yu. 2008. Can Chinese Web Pages be Classified with English Data Source?. WWW’08. Sheila A. McIlraith, Tran Cao Son and Honglei Zeng. 2001. Semantic Web Services. IEEE Intelligent Systems. Thanh Hoang Nguyen, Viviane Moreira, Huong Nguyen, Hoa Nguyen and Juliana Freire. 2011. Multilingual Schema Matching for Wikipedia Infoboxes. CoRR. Jeff Z. Pan and Edward Thomas. 2007. Approximating OWL-DL Ontologies. 22nd AAAI Conference on Artificial Intelligence (AAAI-07). Jeff Z. Pan and Ian Horrocks. 2007. RDFS(FA): Connecting RDF(S) and OWL DL. IEEE Transaction on Knowledge and Data Engineering. 19(2): 192 206. Jeff Z. Pan and Ian Horrocks. 2006. OWL-Eu: Adding Customised Datatypes into OWL. Journal of Web Semantics. Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. . Nick Roussopoulos, Stephen Kelley and Fr ´ed´ eric Vincent. 1995. Nearest Neighbor Queries. SIGMOD Conference ’95. Murat Sensoy, Achille Fokoue, Jeff Z. Pan, Timothy Norman, Yuqing Tang, Nir Oren and Katia Sycara. 2013. Reasoning about Uncertain Information and Conflict Resolution through Trust Revision. Proc. ofthe 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2013). Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum. 2007. Yago: a Core of Semantic Knowledge. WWW’07. Max Volkel, Markus Krotzsch, Denny Vrandecic, Heiko Haller and Rudi Studer. 2006. Semantic Wikipedia. WWW’06. Zhichun Wang, Juanzi Li, Zhigang Wang and Jie Tang. 2012. Cross-lingual Knowledge Linking across Wi- ki Knowledge Bases. 21st International World Wide Web Conference. Daniel S. Weld, Fei Wu, Eytan Adar, Saleema Amershi, James Fogarty, Raphael Hoffmann, Kayur Patel and Michael Skinner. 2008. Intelligence in Wikipedia. AAAI’08. Fei Wu and Daniel S. Weld. 2007. Autonomously Semantifying Wikipedia. CIKM’07. Fei Wu and Daniel S. Weld. 2010. Open Information Extraction Using Wikipedia. ACL’10. Fei Wu, Raphael Hoffmann and Daniel S. Weld. 2008. Information Extraction from Wikipedia: Moving down the Long Tail. KDD ’08. Wentao Wu, Hongsong Li, Haixun Wang and Kenny Qili Zhu. 2012. Probase: a Probabilistic Taxonomy for Text Understanding. SIGMOD Conference ’12. Alexander Yates, Michael Cafarella, Michele Banko, Oren Etzioni, Matthew Broadhead and Stephen Soderland. 2007. TextRunner: Open Information Extraction on the Web. NAACL-Demonstrations’07. Xinfeng Zhang, Xiaozhao Xu, Yiheng Cai and Yaowei Liu. 2009. A Weighted Hyper-Sphere SVM. ICNC(3) ’09. 650