cvpr cvpr2013 cvpr2013-99 cvpr2013-99-reference knowledge-graph by maker-knowledge-mining

99 cvpr-2013-Cross-View Image Geolocalization

Source: pdf

Author: Tsung-Yi Lin, Serge Belongie, James Hays

Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.

reference text

[1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski. Building Rome in a day. In ICCV, 2009. 2

[2] G. Baatz, O. Saurer, K. K ¨oser, and M. Pollefeys. Large scale visual geo-localization of images in mountainous terrain. In ECCV, 2012. 2, 7

[3] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011. 5

[4] D. Chen, G. Baatz, K ¨oser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identification on mobile devices. In CVPR, 2011. 2

[5] D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Klein-

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] berg. Mapping the world’s photos. In WWW, 2009. 2 N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In In CVPR, 2005. 3 D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Comput., 16(12):2639– 2664, Dec. 2004. 2 J. Hays. Large Scale Scene Matching for Graphics and Vision. PhD thesis, Carnegie Mellon University, 2009. 2 J. Hays and A. Efros. IM2GPS: estimating geographic information from a single image. In CVPR, 2008. 1, 2, 3 A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof. From structure-from-motion point clouds to fast location recognition. In CVPR, 2009. 2 N. Jacobs, S. Satkin, N. Roman, R. Speyer, and R. Pless. Geolocating static cameras. In ICCV, Oct. 2007. 2 S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. 3 X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm. Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV, 2008. 2 Y. Li, N. Snavely, D. Huttenlocher, and P. Fua. Worldwide pose estimation using 3D point clouds. In ECCV, 2012. 2 Y. Li, N. Snavely, and D. P. Huttenlocher. Location recognition using prioritized feature matching. In ECCV, 2010. 2 J. Liu, M. Shah, B. Kuipers, and S. Savarese. Cross-view action recognition via view knowledge transfer. In CVPR, 2011. 2 A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 2001 . 3

[18] V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011. 2

[19] N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM International Conference on Multimedia, 2010. 2

[20] A. Roshan Zamir and M. Shah. Accurate image localization based on Google maps street view. In ECCV, 2010. 2 Figure 9: Left: input ground level image (shown above) and corresponding satellite image and pie chart of attribute distribution (shown below, but not known at query time). Middle: similar (in green) and dissimilar (in red) groundlevel and satellite image pairs used for training the SVM in our discriminative translation approach. Right: geolocation match score shown as a heat map. The ground truth location is marked with a black circle.

[21] T. Sattler, B. Leibe, and L. Kobbelt. Fast image-based localization using direct 2D-to-3D matching. In ICCV, 2011. 2

[22] G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In CVPR, 2007. 2

[23] A. Sharma, A. Kumar, H. Daum e´ III, and D. W. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, 2012. 2

[24] E. Shechtman and M. Irani. Matching local self-similarities across images and videos. In CVPR, 2007. 3

[25] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 3

[26] H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In CVPR, 2006. 2

[27] Y. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T. Chua, H. Neven, and J. Yagnik. Tour the world: building a web-scale landmark recognition engine. In CVPR, 2009. 2 888889999988666