acl acl2013 acl2013-340 acl2013-340-reference knowledge-graph by maker-knowledge-mining

340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision


Source: pdf

Author: Michael Speriosu ; Jason Baldridge

Abstract: Toponym resolvers identify the specific locations referred to by ambiguous placenames in text. Most resolvers are based on heuristics using spatial relationships between multiple toponyms in a document, or metadata such as population. This paper shows that text-driven disambiguation for toponyms is far more effective. We exploit document-level geotags to indirectly generate training instances for text classifiers for toponym resolution, and show that textual cues can be straightforwardly integrated with other commonly used ones. Results are given for both 19th century texts pertaining to the American Civil War and 20th century newswire articles.


reference text

B. Adams and G. McKenzie. Inferring thematic places from spatially referenced natural language descriptions. Crowdsourcing Geographic Knowledge, pages 201–221, 2013. E. Amitay, N. Har’El, R. Sivan, and A. Soffer. Web-a-Where: geotagging web content. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 273–280, 2004. D. Buscaldi and P. Rosso. A conceptual densitybased approach for the disambiguation of toponyms. International Journal of Geographical Information Science, 22(3):301–3 13, 2008. P. Clough. Extracting metadata for spatiallyaware information retrieval on the internet. In Proceedings of the 2005 workshop on Geographic information retrieval, pages 25–30. ACM, 2005. G. Crane. The Perseus Digital Library, 2000. URL http : / /www .pe rs eu s .tuft s .edu. J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Data Bases, pages 545–556, 2000. J. Eisenstein, B. O’Connor, N. Smith, and E. Xing. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1277–1287, 2010. J. Eisenstein, A. Ahmed, and E. Xing. Sparse additive generative models of text. In Proceedings of the 28th International Conference on Machine Learning, pages 1041–1048, 2011. 1474 J. Gelernter and N. Mushegian. Geo-parsing messages from microtext. Transactions in GIS, 15 (6):753–773, 2011. C. Grover, R. Tobin, K. Byrne, M. Woollard, J. Reid, S. Dunn, and J. Ball. Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical Transac- tions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 368(1925): 3875–3889, 2010. J. Guldi. The spatial turn. Spatial Humanities: a Project of the Institute for Enabling, 2009. Q. Hao, R. Cai, C. Wang, R. Xiao, J. Yang, Y. Pang, and L. Zhang. Equip tourists with knowledge mined from travelogues. In Proceedings ofthe 19th international conference on World wide web, pages 401–410, 2010. B. Hecht, S. Carton, M. Quaderi, J. Sch o¨ning, M. Raubal, D. Gergle, and D. Downey. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 415–424. ACM, 2012. L. Hill. Georeferencing: The Geographic Associations of Information. MIT Press, 2006. J. Hoffart, M. Yosef, I. Bordino, H. F ¨urstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In Proceedings of the Con- ference on Empirical Methods in Natural Language Processing, pages 782–792. Association for Computational Linguistics, 2011. L. Hollenstein and R. Purves. Exploring place through user-generated content: Using Flickr tags to describe city cores. Journal of Spatial Information Science, (1):21–48, 2012. S. Intagorn and K. Lerman. A probabilistic approach to mining geospatial knowledge from social annotations. In Conference on Information and Knowledge Management (CIKM), 2012. C. Jones, R. Purves, P. Clough, and H. Joho. Modelling vague places with knowledge from the web. International Journal of Geographical Information Science, 2008. S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457–466. ACM, 2009. S. Ladra, M. Luaces, O. Pedreira, and D. Seco. A toponym resolution service following the OGC WPS standard. In Web and Wireless Geographical Information Systems, volume 5373, pages 75–85. 2008. J. Leidner. Toponym resolution in text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Universal Press, Boca Raton, FL, USA, 2008. H. Li, R. Srihari, C. Niu, and W. Li. InfoXtract location normalization: a hybrid approach to geographic references in information extraction. In Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1, pages 39–44, 2003. Y. Li. Probabilistic toponym resolution and geographic indexing and querying. Master’s thesis, The University of Melbourne, Melbourne, Australia, 2007. V. Loureiro, I. Anast ´acio, and B. Martins. Learning to resolve geographical and temporal references in text. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 349–352, 2011. M. Louwerse and N. Benesh. Representing spatial structure through maps and language: Lord of the Rings encodes the spatial structure of Middle Earth. Cognitive science, 36(8): 1556–1569, 2012. I. Mani, C. Doran, D. Harris, J. Hitzeman, R. Quimby, J. Richer, B. Wellner, S. Mardis, and S. Clancy. SpatialML: annotation scheme, resources, and evaluation. Language Resources and Evaluation, 44(3):263–280, 2010. S. Overell. Geographic Information Retrieval: Classification, Disambiguation and Modelling. PhD thesis, Imperial College London, 2009. S. Overell and S. R ¨uger. Using co-occurrence models for placename disambiguation. International Journal of Geographical Information Science, 22:265–287, 2008. Y. Pang, Q. Hao, Y. Yuan, T. Hu, R. Cai, and L. Zhang. Summarizing tourist destinations by mining user-generated travelogues and pho1475 tos. Computer Vision and Image Understanding, 115(3):352 363, 2011. V. Petras. Statistical analysis of geographic and language clues in the MARC record. Technical report, The University of California at Berkeley, 2004. – T. Qin, R. Xiao, L. Fang, X. Xie, and L. Zhang. An efficient location extraction algorithm by leveraging web contextual information. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 53–60. ACM, 2010. E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1, pages 50–54, 2003. K. Roberts, C. Bejan, and S. Harabagiu. Toponym disambiguation using events. In Proceedings of the 23rd International Florida Artificial Intelligence Research Society Conference, pages 271– 276, 2010. S. Roller, M. Speriosu, S. Rallapalli, B. Wing, and J. Baldridge. Supervised text-based geolocation using language models on an adaptive grid. In Proceedings of EMNLP 2012, 2012. J. Sankaranarayanan, H. Samet, B. Teitler, M. Lieberman, and J. Sperling. TwitterStand: news in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, 2009. W. Scheidel, E. Meeks, and J. Weiland. ORBIS: The Stanford geospatial network model of the roman world. 2012. A. Skupin and A. Esperb´ e. An alternative map of the United States based on an n-dimensional model of geographic space. Journal of Visual Languages & Computing, 22(4):290–304, 2011. D. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries, pages 127–136, 2001 . D. Smith and G. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1, pages 45–49, 2003. B. Teitler, M. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling. NewsStand: a new view on news. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, page 18. ACM, 2008. R. Volz, J. Kleb, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In Proceedings of the 16th International Conference on World Wide Web, 2007. B. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 955–964, 2011. Q. Zhang, P. Jin, S. Lin, and L. Yue. Extracting focused locations for web pages. In Web-Age Information Management, volume 7142, pages 76–89. 2012. W. Zong, D. Wu, A. Sun, E. Lim, and D. Goh. On assigning place names to geography related web pages. In Proceedings of the 5th ACM/IEEECS joint conference on Digital libraries, pages 354–362, 2005. 1476