cvpr cvpr2013 cvpr2013-1 cvpr2013-1-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
[1] D. Baraff. Physically based modeling: Rigid body simulation. Technical report, Pixar Animation Studios, 2001. 2
[2] M. Bleyer, C. Rhemann, and C. Rother. Extracting 3D sceneconsistent object proposals and depth from stereo images. In ECCV, 2012. 2
[3] C. Chang, B. Gorissen, and S. Melchior. Fast oriented bounding box optimization on the rotation group SO(3, R). ACM Transactions on Graphics, 30(5), 2011. 3, 6
[4] J. Chang and J. W. Fisher. Efficient MCMC sampling with implicit shape representations. In CVPR, 2011. 6
[5] H. D, A. A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1):151–172, 2007. 2
[6] A. Flint, D. W. Murray, and I. Reid. Manhattan scene understanding using monocular, stereo, and 3D features. In ICCV, 2011. 2
[7] S. Gottschalk. Separating axis theorem. Technical Report, 1996. 4
[8] H. Grabner, J. Gall, and L. J. V. Gool. What makes a chair a chair? In CVPR, 2011. 2
[9] A. Gupta, A. A. Efros, and M. Hebert. Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV, 2010. 1, 2
[10] V. Hedau, D. Hoiem, and D. A. Forsyth. Recovering free space of indoor scenes from a single image. In CVPR, 2012. 2
[11] D. Hoiem, A. N. Stein, A. A. Efros, and M. Hebert. Recovering occlusion boundaries from a single image. In ICCV, 2007. 2, 6, 7
[12] A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. A category-level 3-D object dataset: Putting the kinect to work. In ICCV workshop, 2011. 2
[13] Y. Jiang, H. Koppula, and A. Saxena. Hallucinated humans as the hidden context for labeling 3d scenes. In CVPR, 2013. 2
[14] Y. Jiang, M. Lim, C. Zheng, and A. Saxena. Learning to place new objects in a scene. IJRR, 31(9), 2012. 1, 2
[15] H. Koppula, A. Anand, T. Joachims, and A. Saxena. Semantic labeling of 3D point clouds for indoor scenes. In NIPS, 2011. 2
[16] H. Koppula, R. Gupta, and A. Saxena. Learning human activities and object affordances from rgb-d videos. IJRR, 2013. 2
[17] K. Lai, L. Bo, X. Ren, and D. Fox. A large-scale hierarchical multiview RGB-D object dataset. In ICRA, 2011. 2
[18] D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010. 2
[19] D. Ly, A. Saxena, and H. Lipson. Co-evolutionary predictors for kinematic pose inference from rgbd images. In GECCO, 2012. 3
[20] M. McCloskey. Intuitive physics. Scientific American, 248(4): 114– 122, 1983. 2
[21] A. Saxena, S. H. Chung, and A. Y. Ng. Learning depth from single monocular images. In NIPS, 2005. 2
[22] A. Saxena, M. Sun, and A. Y. Ng. Make3D: Learning 3D scene structure from a single still image. PAMI, 31(5), 2009. 2
[23] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from RGBD images. In ECCV, 2012. 1, 2, 4, 5, 6, 7, 8
[24] J. Xiao, B. C. Russell, and A. Torralba. Localizing 3D cuboids in single-view images. In NIPS, 2012. 2
[25] Y. Zheng, X. Chen, M. Cheng, K. Zhou, S. Hu, and N. J. Mitra. Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph, 31(4):99, 2012. 2