acl acl2011 acl2011-337 acl2011-337-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oliver Ferschke ; Torsten Zesch ; Iryna Gurevych
Abstract: We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history.
Si-Chi Chin, W. Nick Street, Padmini Srinivasan, and David Eichmann. 2010. Detecting wikipedia vandalism with active learning and statistical language mod- els. In Proceedings of the 4th workshop on Information credibility, WICOW ’ 10, pages 3–10. Kenneth W. Church and Robert L. Mercer. 1993. Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics, 19(1): 1–24. Olena Medelyan, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from wikipedia. Int. J. Hum.-Comput. Stud., 67:716–754, September. D. Milne and I. H. Witten. 2009. An open-source toolkit for mining Wikipedia. In Proc. New Zealand Computer Science Research Student Conf., volume 9. Rani Nelken and Elif Yamangil. 2008. Mining wikipedia’s article revision history for training computational linguistics algorithms. In Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy (WikiAI), WikiAI08. Elif Yamangil and Rani Nelken. 2008. Mining wikipedia revision histories for improving sentence compression. In Proceedings of ACL-08: HLT, Short Papers, pages 137–140, Columbus, Ohio, June. Association for Computational Linguistics. 102 Mark Yatskar, Bo Pang, Cristian Danescu-NiculescuMizil, and Lillian Lee. 2010. For the sake of simplicity: unsupervised extraction of lexical simplifications from wikipedia. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 365–368. Fabio Massimo Zanzotto and Marco Pennacchiotti. 2010. Expanding textual entailment corpora from wikipedia using co-training. In Proceedings of the COLING-Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. Honglei Zeng, Maher Alhossaini, Li Ding, Richard Fikes, and Deborah L. McGuinness. 2006. Computing trust from revision history. In Proceedings of the 2006 International Conference on Privacy, Security and Trust. Torsten Zesch and Iryna Gurevych. 2010. The more the better? Assessing the influence of wikipedia’s growth on semantic relatedness measures. In Proceedings of the Conference on Language Resources and Evaluation (LREC), Valletta, Malta. Torsten Zesch, Christof Mueller, and Iryna Gurevych. 2008. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC).