nips nips2012 nips2012-201 nips2012-201-reference knowledge-graph by maker-knowledge-mining

201 nips-2012-Localizing 3D cuboids in single-view images


Source: pdf

Author: Jianxiong Xiao, Bryan Russell, Antonio Torralba

Abstract: In this paper we seek to detect rectangular cuboids and localize their corners in uncalibrated single-view images depicting everyday scenes. In contrast to recent approaches that rely on detecting vanishing points of the scene and grouping line segments to form cuboids, we build a discriminative parts-based detector that models the appearance of the cuboid corners and internal edges while enforcing consistency to a 3D cuboid model. Our model copes with different 3D viewpoints and aspect ratios and is able to detect cuboids across many different object categories. We introduce a database of images with cuboid annotations that spans a variety of indoor and outdoor scenes and show qualitative and quantitative results on our collected database. Our model out-performs baseline detectors that use 2D constraints alone on the task of localizing cuboid corners. 1


reference text

[1] I. Biederman. Recognition by components: a theory of human image interpretation. Pyschological review, 94:115–147, 1987.

[2] J. E. Bresenham. Algorithm for computer control of a digital plotter. IBM Systems Journal, 4(1):25–30, 1965.

[3] J. F. Canny. A computational approach to edge detection. IEEE PAMI, 8(6):679–698, 1986.

[4] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In CVPR, 2005.

[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.

[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. IJCV, 88(2):303–338, 2010.

[7] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, 32(9), 2010.

[8] P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition. IJCV, 61(1), 2005.

[9] A. Gupta, S. Satkin, A. A. Efros, and M. Hebert. From 3d scene geometry to human workspace. In CVPR, 2011.

[10] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition, 2004.

[11] V. Hedau, D. Hoiem, and D. Forsyth. Thinking inside the box: Using appearance models and context based on room geometry. In ECCV, 2010.

[12] V. Hedau, D. Hoiem, and D. Forsyth. Recovering free space of indoor scenes from a single image. In CVPR, 2012.

[13] D. Hoiem, A. Efros, and M. Hebert. Geometric context from a single image. In ICCV, 2005.

[14] http://sketchup.google.com, 2012.

[15] K. Ikeuchi and T. Suehiro. Toward an assembly plan from observation: Task recognition with polyhedral objects. In Robotics and Automation, 1994.

[16] T. Joachims, T. Finley, and C.-N. J. Yu. Cutting-plane training of structural svms. Machine Learning, 77(1), 2009.

[17] D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010.

[18] J. L. Mundy. Object recognition in the geometric era: A retrospective. In Toward Category-Level Object Recognition, volume 4170 of Lecture Notes in Computer Science, pages 3–29. Springer, 2006.

[19] L. D. Pero, J. C. Bowdish, D. Fried, B. D. Kermgard, E. L. Hartley, and K. Barnard. Bayesian geometric modelling of indoor scenes. In CVPR, 2012.

[20] L. Roberts. Machine perception of 3-d solids. In PhD. Thesis, 1965.

[21] H. Wang, S. Gould, and D. Koller. Discriminative learning with latent variables for cluttered indoor scene understanding. In ECCV, 2010.

[22] J. Xiao, T. Fang, P. Tan, P. Zhao, E. Ofek, and L. Quan. Image-based facade modeling. In SIGGRAPH ¸ Asia, 2008.

[23] J. Xiao, T. Fang, P. Zhao, M. Lhuillier, and L. Quan. Image-based street-side city modeling. In SIGGRAPH Asia, 2009.

[24] J. Xiao and Y. Furukawa. Reconstructing the world’s museums. In ECCV, 2012.

[25] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.

[26] J. Xiao, B. C. Russell, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Basic level scene understanding: From labels to structure and beyond. In SIGGRAPH Asia, 2012.

[27] Y. Yang and D. Ramanan. Articulated pose estimation using flexible mixtures of parts. In CVPR, 2011.

[28] S. Yu, H. Zhang, and J. Malik. Inferring spatial layout from a single image via depth-ordered grouping. In IEEE Workshop on Perceptual Organization in Computer Vision, 2008.

[29] Y. Zhao and S.-C. Zhu. Image parsing with stochastic scene grammar. In NIPS. 2011. 9