iccv iccv2013 iccv2013-102 iccv2013-102-reference knowledge-graph by maker-knowledge-mining

102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding

Source: pdf

Author: David F. Fouhey, Abhinav Gupta, Martial Hebert

Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.

reference text

[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, F. P., and S. S ¨usstrunk. SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI, 34(11):2274–2281, 2012. 5

[2] I. Biederman. Recognition-by-components: A theory of human image understanding. Psychological Review, 94: 115– 147, 1987. 2 33339981 Figure 8. Cross-dataset results on B3DO (top), SUNS (bottom). More results appear in the supplementary material. Table 3. Cross-dataset results. We train the method on one fold of the NYU dataset [28] and test it on the B3DO dataset [16]. Mean Median RMSE 11.25◦ 22.5◦ 30◦ Lee et al. Hedau et al. 3D Primitives With Manhattan World Constraints 41.9 28.4 56.6 32.7 45.7 50.8 43.5 30.0 58.1 32.8 45.0 50.0 38.0 24.5 51.2 33.6 48.5 54.5 Hoiem et al. Singh et al. Saxena et al. RF + SIFT SVR + SIFT 3D Primitives Without Manhattan World Constraints 42.1 37.4 49.7 8.2 25.5 38.1 36.7 34.2 42.3 9.9 29.4 42.9 45.6 41.2 53.5 8.4 25.5 36.6 36.8 34.3 42.6 10.2 29.5 42.8 37.0 34.0 42.6 9.5 29.1 42.9 34.7 30.7 41.1 14.3 35.9 49.0

[3] T. Binford. Visual perception by computer. In IEEE Conference on Systems and Controls, 1971 . 2

[4] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3D human pose annotations. In ICCV, 2009. 2

[5] R. Brooks, R. Creiner, and T. Binford. The acronym modelbased vision system. In IJCAI, 1979. 2

[6] M. Clowes. On seeing things. Artificial Intelligence, 2:79– 116, 1971. 1

[7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 3, 4

[8] C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. A. Efros. What makes Paris look like Paris? ACM Transactions on Graphics (SIGGRAPH), 31(4), 2012. 2, 4

[9] B. Epshtein and S. Ullman. Semantic hierarchies for recognizing objects and parts. In CVPR, 2007. 2

[10] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. TPAMI, 32(9), 2010. 2

[11] A. Gupta, A. Efros, and M. Hebert. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV, 2010. 2

[12] T. Hassner and R. Basri. Example based 3D reconstruction from single 2D images. In CVPR Workshop: Beyond Patches, 2006. 2

[13] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, 2009. 2, 4, 5, 7

[14] D. Hoiem, A. A. Efros, and M. Hebert. Recovering surface layout from an image. In IJCV, 2007. 2, 4, 5

[15] D. Huffman. Impossible objects as nonsense sentences. Machine Intelligence, 8:475–492, 1971. 1

[16] A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko, and T. Darrell. A category-level 3-D object dataset: Putting the kinect to work. In Workshop on Consumer Depth Cameras in Computer Vision (with ICCV), 2011. 7, 8

[17] K. Karsch, C. Liu, and S. B. Kang. Depth extraction from video using non-parametric sampling. In ECCV, 2012. 2, 5

[18] D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010. 2

[19] D. C. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure recovery. In CVPR, 2009. 2, 5

[20] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91–1 10, 2004. 5

[21] X. Ren, L. Bo, and D. Fox. RGB-(D) scene labeling: Features and algorithms. In CVPR, 2012. 2

[22] S. Satkin, J. Lin, and M. Hebert. Data-driven scene understanding from 3D models. In BMVC, 2012. 2, 7

[23] A. Saxena, M. Sun, and A. Y. Ng. Make3D: Learning 3D scene structure from a single still image. TPAMI, 2008. 2, 5

[24] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1):7–42, 2002. 5

[25] W. J. Scheirer, N. Kumar, P. N. Belhumeur, and T. E. Boult. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR, 2012. 4

[26] A. G. Schwing and R. Urtasun. Efficient exact inference for 3D indoor scene understanding. In ECCV, 2012. 2

[27] A. Shrivastava and A. Gupta. Building parts-based object detectors via 3D geometry. In ICCV, 2013. 2

[28] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from RGBD images. In ECCV, 2012. 4, 8

[29] S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012. 2, 4, 5

[30] K. Sugihara. Machine Interpretation of Line Drawings. MIT Press, 1986. 1 [3 1] Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, 2012. 2

[32] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 7

[33] J. Xiao, B. Russell, and A. Torralba. Localizing 3D cuboids in single-view images. In NIPS, 2012. 2

[34] J. Ye, Z. Zhao, and M. Wu. Discriminative k-means for clustering. In NIPS, 2007. 4

[35] S. X. Yu, H. Zhang, and J. Malik. Inferring spatial layout from a single image via depth-ordered grouping. In Workshop on Perceptual Organization, 2008. 2 33339992