nips nips2012 nips2012-185 nips2012-185-reference knowledge-graph by maker-knowledge-mining

185 nips-2012-Learning about Canonical Views from Internet Image Collections


Source: pdf

Author: Elad Mezuman, Yair Weiss

Abstract: Although human object recognition is supposedly robust to viewpoint, much research on human perception indicates that there is a preferred or “canonical” view of objects. This phenomenon was discovered more than 30 years ago but the canonical view of only a small number of categories has been validated experimentally. Moreover, the explanation for why humans prefer the canonical view over other views remains elusive. In this paper we ask: Can we use Internet image collections to learn more about canonical views? We start by manually finding the most common view in the results returned by Internet search engines when queried with the objects used in psychophysical experiments. Our results clearly show that the most likely view in the search engine corresponds to the same view preferred by human subjects in experiments. We also present a simple method to find the most likely view in an image collection and apply it to hundreds of categories. Using the new data we have collected we present strong evidence against the two most prominent formal theories of canonical views and provide novel constraints for new theories. 1


reference text

[1] V. Blanz, M.J. Tarr, H.H. Bülthoff, and T. Vetter. What object attributes determine canonical views? PERCEPTION-LONDON-, 28:575–600, 1999.

[2] S. Palmer, E. Rosch, and P. Chase. Canonical perspective and the perception of objects. Attention and performance IX, pages 135–151, 1981.

[3] S.E. Palmer. Vision science: Photons to phenomenology, volume 2. MIT press Cambridge, MA., 1999.

[4] H.H. Bülthoff and S. Edelman. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences of the United States of America, 89(1):60, 1992.

[5] K.A. Ehinger and A. Oliva. Canonical views of scenes depend on the shape of the space. CogSci, 2011.

[6] A Torralba. Lecture notes on explicit and implicit http://people.csail.mit.edu/torralba/courses/6.870/slides/lecture4.ppt. 3d object models.

[7] D. Weinshall and M. Werman. On View Likelihood and Stability. IEEE Trans. Pattern Anal. Mach. Intell.

[8] W.T. Freeman. The generic viewpoint assumption in a framework for visual perception. Nature, 368(6471).

[9] PM Hall and MJ Owen. Simple canonical views. In The British Machine Vision Conf.(BMVC05, volume 1, pages 7–16, 2005.

[10] I. Simon, N. Snavely, and S.M. Seitz. Scene summarization for online image collections. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on.

[11] T.L. Berg and A.C. Berg. Finding iconic images. In CVPR Workshops 2009.

[12] R. Raguram and S. Lazebnik. Computing iconic summaries of general visual concepts. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on, pages 1–8. IEEE, 2008.

[13] T. Denton, M.F. Demirci, J. Abrahamson, A. Shokoufandeh, and S. Dickinson. Selecting canonical views for view-based 3-D object recognition. In ICPR 2004.

[14] T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1777–1784. IEEE, 2011.

[15] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175, 2001.

[16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.

[17] Y. Jing, S. Baluja, and H. Rowley. Canonical image selection from the web. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 280–287. ACM, 2007.

[18] T. Weyand and Leibe. B. Discovering favorite views of popular places with iconoid shift. In International Conference on Computer Vision (ICCV), 2011 IEEE Conference on. IEEE, 2011.

[19] E. Murphy-Chutorian and M.M. Trivedi. Head pose estimation in computer vision: A survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(4):607–626, 2009.

[20] M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-scale image search. In Proceeding of the ACM International Conference on Image and Video Retrieval, page 19. ACM, 2009.

[21] J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR 2010.

[22] N. Snavely, S.M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3d. In ACM Transactions on Graphics (TOG), volume 25, pages 835–846. ACM, 2006.

[23] E. Rosch, C.B. Mervis, W.D. Gray, D.M. Johnson, and P. Boyes-Braem. Basic objects in natural categories. Cognitive psychology, 8(3):382–439, 1976. 9