cvpr cvpr2013 cvpr2013-381 cvpr2013-381-reference knowledge-graph by maker-knowledge-mining

381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

Source: pdf

Author: Yibiao Zhao, Song-Chun Zhu

Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.

reference text

[1] E. Bar-aviv and E. Rivlin. Functional 3d object classification using simulation of embodied agent. In BMVC, 2006.

[2] A. Criminisi, I. Reid, and A. Zisserman. Single view metrology. Int. J. Computer Vision (IJCV), 40(2): 123–148, Nov. 2000.

[3] L. Del Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. Bayesian geometric modeling of indoor scenes. In CVPR, pages 2719–2726, 2012.

[4] L. Del Pero, J. Guan, E. Brau, J. Schlecht, and K. Barnard. Sampling bedrooms. In CVPR, pages 2009–2016, 2011.

[5] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html. 333 111222335 prediction), planar objects (blue rectangles), background layout (red box). The parse tree is shown to the right of each image.

[6] P. F. Felzenszwalb, R. B. Girshick, and D. Mcallester. D.m.: Cascade object detection with deformable part models. In CVPR, 2010.

[7] J. J. Gibson. The Theory of Affordances. Lawrence Erlbaum, 1977.

[8] H. Grabner, J. Gall, and L. V. Gool. What makes a chair a chair? In CVPR, 2011.

[9] A. Gupta, S. Satkin, A. A. Efros, and M. Hebert. From 3d scene geometry to human workspace. In CVPR, pages 1961– 1968, Washington, DC, USA, 2011. IEEE Computer Society.

[10] F. Han and S. C. Zhu. Bottom-up/top-down image parsing with attribute grammar. PAMI, 2009.

[11] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 05215405 18, second edition, 2004.

[12] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, 2009.

[13] V. Hedau, D. Hoiem, and D. Forsyth. Thinking inside the

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24] box: Using appearance models and context based on room geometry. In ECCV, 2010. M. Hejrati and D. Ramanan. Analyzing 3d objects in cluttered images. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 602–610. 2012. W. Hu. Learning 3d object templates by hierarchical quantization of geometry and appearance spaces. In CVPR, pages 2336–2343, 2012. D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces advances in neural information processing systems. Cambridge: MIT Press, pages 609–616, 2010. D. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure recovery. In CVPR, 2009. B. Pepik, P. Gehler, M. Stark, and B. Schiele. 3d2pm - 3d deformable part models. In ECCV, Firenze, Italy, 2012. S. Satkin, J. Lin, and M. Hebert. Data-driven scene understanding from 3d models. In BMVC, September 2012. R. von Gioi, J. Jakubowicz, J. M. Morel, and G. Randall. Lsd: A fast line segment detector with a false detection control. TPAMI, 32(4):722–732, 2010. H. Wang, S. Gould, and D. Koller. Discriminative learning with latent variables for cluttered indoor scene understanding. In ECCV, pages 497–510, 2010. Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, 2012. J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, pages 3485 –3492, 2010. J. Xiao, B. Russell, and A. Torralba. Localizing 3d cuboids in single-view images. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, NIPS, pages 755–763. 2012.

[25] Y.-T. Yeh, L. Yang, M. Watson, N. D. Goodman, and P. Hanrahan. Synthesizing open worlds with constraints using locally annealed reversible jump mcmc. ACM Trans. Graph., 31(4):56: 1–56: 11, July 2012.

[26] Y. Zhao and S. C. Zhu. Image parsing via stochastic scene grammar. In NIPS. 2011. 333 111222446