acl acl2011 acl2011-285 acl2011-285-reference knowledge-graph by maker-knowledge-mining

285 acl-2011-Simple supervised document geolocation with geodesic grids


Source: pdf

Author: Benjamin Wing ; Jason Baldridge

Abstract: We investigate automatic geolocation (i.e. identification of the location, expressed as latitude/longitude coordinates) of documents. Geolocation can be an effective means of summarizing large document collections and it is an important component of geographic information retrieval. We describe several simple supervised methods for document geolocation using only the document’s raw text as evidence. All of our methods predict locations in the context of geodesic grids of varying degrees of resolution. We evaluate the methods on geotagged Wikipedia articles and Twitter feeds. For Wikipedia, our best method obtains a median prediction error of just 11.8 kilometers. Twitter geolocation is more challenging: we obtain a median error of 479 km, an improvement on previous results for the dataset.


reference text

Geoffrey Andogah. 2010. Geographically Constrained Information Retrieval. Ph.D. thesis, University of Groningen, Groningen, Netherlands, May. Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me ifyou can: improving geographicalprediction with social and spatial proximity. In Proceedings of the 19th international conference on World wide web, WWW ’ 10, pages 61–70, New York, NY, USA. ACM. Kino Coursey and Rada Mihalcea. 2009. Topic identification using wikipedia graph centrality. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL ’09, pages 117– 120, Morristown, NJ, USA. Association for Computational Linguistics. Kino Coursey, Rada Mihalcea, and William Moen. 2009. Using encyclopedic knowledge for automatic topic identification. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL ’09, pages 210–218, Morristown, NJ, USA. Association for Computational Linguistics. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on Wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716, Prague, Czech Republic, June. Association for Computational Linguistics. Junyan Ding, Luis Gravano, and Narayanan Shivakumar. 2000. Computing geographical scopes ofweb resources. In Proceedings ofthe 26th International Conference on Very Large Data Bases, VLDB ’00, pages 545–556, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. G. Dutton. 1996. Encoding and handling geospatial data with hierarchical triangular meshes. In M.J. Kraak and M. Molenaar, editors, Advances in GIS Research II, pages 505–5 18, London. Taylor and Francis. Jacob Eisenstein, Brendan O’Connor, Noah A. Smith, and Eric P. Xing. 2010. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1277–1287, Cambridge, MA, October. Association for Computational Linguistics. Qiang Hao, Rui Cai, Changhu Wang, Rong Xiao, JiangMing Yang, Yanwei Pang, and Lei Zhang. 2010. Equip tourists with knowledge mined from travelogues. In Proceedings of the 19th international conference on World wide web, WWW ’ 10, pages 401– 410, New York, NY, USA. ACM. Jochen L. Leidner. 2008. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Dissertation.Com, January. M. D. Lieberman and J. Lin. 2009. You are where you edit: Locating Wikipedia users through edit histories. In ICWSM’09: Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media, pages 106–1 13, San Jose, CA, May. Bruno Martins. 2009. Geographically Aware Web Text Mining. Ph.D. thesis, University of Lisbon. Rada Mihalcea and Andras Csomai. 2007. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM ’07, pages 233–242, ACM. 964 New York, NY, USA. Rada Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In North American Chapter of the Associationfor Computational Linguistics (NAACL 2007). Simon Overell. 2009. Geographic Information Retrieval: Classification, Disambiguation and Modelling. Ph.D. thesis, Imperial College London. Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’98, pages 275–281, New York, NY, USA. ACM. Erik Rauch, Michael Bukatin, and Kenneth Baker. 2003. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1, HLT-NAACL-GEOREF ’03, pages 50–54, Stroudsburg, PA, USA. Association for Computational Linguistics. Bjorn Sandvik. 2008. Using KML for thematic mapping. Master’s thesis, The University of Edinburgh. Pavel Serdyukov, Vanessa Murdock, and Roelof van Zwol. 2009. Placing flickr photos on a map. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’09, pages 484–491, New York, NY, USA. ACM. David A. Smith and Gregory Crane. 2001. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, ECDL ’01, pages 127–136, London, UK. Springer-Verlag. B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. 2008. NewsStand: A new view on news. In GIS’08: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 144–153, Irvine, CA, November. Chengxiang Zhai and John Lafferty. 2001 . Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the tenth international conference on Information and knowledge management, CIKM ’01, pages 403–410, New York, NY, USA. ACM.