cvpr cvpr2013 cvpr2013-446 cvpr2013-446-reference knowledge-graph by maker-knowledge-mining

446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

Source: pdf

Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

reference text

[1] S. Y. Bao and S. Savarese. Semantic structure from motion. In CVPR, 2011. 2

[2] S. Y. Bao, M. Sun, and S. Savarese. Toward coherent object detection and scene layout understanding. In CVPR, 2010. 1, 2

[3] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1–27:27, 2011. 6

[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1, 2, 4

[5] C. Desai, D. Ramanan, and C. C. Fowlkes. Discriminative models for multi-class object layout. IJCV, 2011. 2, 3, 4, 5, 6, 7

[6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) challenge. IJCV, 2010. 6, 7

[7] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. CVPR, pages 524–53 1, 2005. 1, 2

[8] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 32(9), Sept. 2010. 1, 2, 3, 5, 6, 7

[9] D. F. Fouhey, V. Delaitre, A. Gupta, A. A. Efros, I. Laptev, and J. Sivic. People watching: Human actions as a cue for single-view geometry. In ECCV, 2012. 2

[10] A. Geiger, C. Wojek, and R. Urtasun. Joint 3d estimation of objects and scene layout. In NIPS, 2011. 2

[11] A. Gupta, A. Efros, and M. Hebert. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV, 2010. 2

[12] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered room. In ICCV, 2009. 1, 2, 3, 4, 5, 6, 7, 8

[13] V. Hedau, D. Hoiem, and D. Forsyth. Thinking inside the box: Using appearance models and context based on room geometry. In ECCV, 2010. 2, 4

[14] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence properties of the nelder–mead simplex method in low dimensions. SIAM J. on Optimization, 1998. 4

[15] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006. 1, 2, 3, 6

[16] D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010. 1, 2, 4

[17] B. Leibe, A. Leonardis, and B. Schiele. Combined object categoriza-

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] 444000 tion and segmentation with an implicit shape model. In Statistical Learning in Computer Vision, ECCV, 2004. 1 C. Li, D. Parikh, and T. Chen. Automatic discovery of groups of objects for scene understanding. In CVPR, 2012. 2, 4 L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A highlevel image representation for scene classification & semantic feature sparsification. In NIPS, December 2010. 2, 6, 7 D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–1 10, Nov. 2004. 2 M. Pandey and S. Lazebnik. Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, 2011. 1, 2, 6, 7 L. D. Pero, J. Bowdish, D. Fried, B. Kermgard, E. L. Hartley, and K. Barnard. Bayesian geometric modeling of indoor scenes. In CVPR, 2012. 2, 4 A. Quattoni and A. Torralba. Recognizing indoor scenes. In CVPR, 2009. 1, 2, 6 A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011. 2, 4 S. Satkin, J. Lin, and M. Hebert. Data-driven scene understanding from 3d models. In BMVC, 2012. 2 A. G. Schwing and R. Urtasun. Efficient exact inference for 3d indoor scene understanding. In ECCV, 2012. 2 H. Wang, S. Gould, and D. Koller. Discriminative learning with latent variables for cluttered indoor scene understanding. In ECCV, 2010. 1, 2, 4 Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic versus max margin. PAMI, 2011. 5 Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, 2012. 1 Y. Zhao and S.-C. Zhu. Image parsing via stochastic scene grammar. In NIPS, 2011. 2, 3