iccv iccv2013 iccv2013-117 iccv2013-117-reference knowledge-graph by maker-knowledge-mining

117 iccv-2013-Discovering Details and Scene Structure with Hierarchical Iconoid Shift

Source: pdf

Author: Tobias Weyand, Bastian Leibe

Abstract: Current landmark recognition engines are typically aimed at recognizing building-scale landmarks, but miss interesting details like portals, statues or windows. This is because they use a flat clustering that summarizes all photos of a building facade in one cluster. We propose Hierarchical Iconoid Shift, a novel landmark clustering algorithm capable of discovering such details. Instead of just a collection of clusters, the output of HIS is a set of dendrograms describing the detail hierarchy of a landmark. HIS is based on the novel Hierarchical Medoid Shift clustering algorithm that performs a continuous mode search over the complete scale space. HMS is completely parameter-free, has the same complexity as Medoid Shift and is easy to parallelize. We evaluate HIS on 800k images of 34 landmarks and show that it can extract an often surprising amount of detail and structure that can be applied, e.g., to provide a mobile user with more detailed information on a landmark or even to extend the landmark’s Wikipedia article.

reference text

[1] Y. Avrithis, Y. Kalantidis, G. Tolias, and E. Spyrou. Retrieving Landmark and Non-Landmark Images from Community Photo Collections. In ACM Multimedia, 2010.

[2] F. Chin and D. Houck. Algorithms for Updating Minimal Spanning Trees. J. Computer and System Sciences, 16(3):333–344, 1978.

[3] D. Comaniciu and P. Meer. Mean Shift: A Robust Approach Toward Feature Space Analysis. PAMI, 24(5):603– 619, 2002.

[4] D. Comaniciu, V. Ramesh, and P. Meer. The Variable Bandwidth Mean Shift and Data-Driven Scale Selection. In ICCV, 2001.

[5] D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the World’s Photos. In WWW, 2009.

[6] D. DeMenthon. Spatio-Temporal Segmentation of Video by Hierarchical Mean Shift Analysis. In SMVP, 2002.

[7] B. Epshtein, E. Ofek, Y. Wexler, and P. Zhang. Hierarchical Photo Organization using Geo-Relevance. In ACM GIS, 2007.

[8] S. Gammeter, T. Quack, D. Tingdahl, and L. Van Gool. Size Does Matter: Improving Object Recognition and 3D Reconstruction with Cross-Media Analysis of Image Clusters. In ECCV, 2010.

[9] S. Gammeter, T. Quack, and L. Van Gool. I Know What You Did Last Summer: Object-Level Auto-Annotation of Holiday Snaps. In ICCV, 2009.

[10] A. Ladikos, E. Boyer, N. Navab, and S. Ilic. Region graphs for organizing image collections. In RMLE, 2010.

[11] Y. Leung, J.-S. Zhang, and Z.-B. Xu. Clustering by ScaleSpace Filtering. PAMI, 22(12): 1396–1410, 2000.

[12] X. Li, C. Wu, C. Zach, S. Lazebnik, and J.-M. Frahm. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In ECCV, 2008.

[13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object Retrieval with Large Vocabularies and Fast Spatial Matching. In CVPR, 2007.

[14] J. Philbin, J. Sivic, and A. Zisserman. Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets. IJCV, 95: 138–153, 2011.

[15] J. Philbin and A. Zisserman. Object Mining using a Matching Graph on Very Large Image Collections. In ICCVGIP, 2008.

[16] T. Quack, B. Leibe, and L. Van Gool. World-Scale Mining of Objects and Events from Community Photo Collections. In CIVR, 2008.

[17] B. C. Russell, R. Martin-Brualla, D. J. Butler, S. M. Seitz, and L. Zettlemoyer. 3D Wikipedia: Using Online Text to Automatically Label and Navigate Reconstructed Geometry. ACM TOG, 32(6), November 2013.

[18] T. Sattler, B. Leibe, and L. Kobbelt. SCRAMSAC: Improving RANSAC’s Efficiency with a Spatial Consistency Filter. In ICCV, 2009.

[19] Y. Sheikh, E. Khan, and T. Kanade. Mode-Seeking by Medoidshifts. In CVPR, 2007.

[20] I. Simon and S. Seitz. Scene Segmentation Using the Wisdom of Crowds. In ECCV, 2008.

[21] I. Simon, N. Snavely, and S. Seitz. Scene Summarization for Online Image Collections. In ICCV, 2007.

[22] J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. In ICCV, 2003.

[23] N. Snavely, S. Seitz, and R. Szeliski. Photo Tourism: Exploring Photo Collections in 3D. In SIGGRAPH, 2006.

[24] P. Vatturi and W.-K. Wong. Category Detection using Hierarchical Mean Shift. In KDD, 2009.

[25] T. Weyand and B. Leibe. Discovering Favorite Views of Popular Places with Iconoid Shift. In ICCV, 2011.

[26] A. P. Witkin. Scale-space filtering: A new approach to multiscale description. In ICASSP, 1984.

[27] Y.-T. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, and H. Neven. Tour the world: Building a Web-Scale Landmark Recognition Engine. In CVPR, 2009. 33447869