cvpr cvpr2013 cvpr2013-445 cvpr2013-445-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
[1] A. Gupta, S. Satkin, A. A. Efros, and M. Hebert. From 3d scene geometry to human workspace. In CVPR, 2011.
[2] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, 2009.
[3] V. Hedau, D. Hoiem, and D. Forsyth. Thinking inside the box: Using appearance models and context based on room geometry. In ECCV, 2010. Figure8.Scenereconstructions(toprows)andfailures(botom row). As shown from left to right in the bottom row, typical failures are due to: 1) confusion between object categories (two couches confused for cabinets), 2) hallucinating objects (the table “latched” to the texture of the wall and the shelf), 3) a poor camera estimate from which the algorithm could not recover, 4) poor fits due to errors in the feature detection process, which mostly occur in blurry images. Best viewed in color.
[4] V. Hedau, D. Hoiem, and D. Forsyth. Recovering free space of indoor scenes from a single image. In CVPR, 2012.
[5] D. Hoiem, A. Efros, and M. Hebert. Geometric context from a single image. In ICCV, 2005.
[6] K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem. Rendering synthetic objects into legacy photographs. In SIGGRAPH
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17] Asia, 2011. D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, 2010. J. Liebelt and C. Schmid. Multi-view object class detection with a 3d geometric model. In CVPR, 2010. R. M. Neal. Probabilistic inference using markov chain monte carlo methods. Technical report, 1993. L. D. Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. Bayesian geometric modeling of indoor scenes. In CVPR, 2012. L. D. Pero, J. Guan, E. Brau, J. Schlecht, and K. Barnard. Sampling bedrooms. In CVPR, 2011. S. Satkin, J. Lin, and M. Hebert. Data-driven scene understanding from 3d models. In BMVC, 2012. J. Schlecht and K. Barnard. Learning models of object structure. In NIPS, 2009. A. Schwing, T. Hazan, M. Pollefeys, and U. R. Efficient structure prediction with latent variables for general graphics models. In CVPR, 2012. Z. Tu and S.-C. Zhu. Image segmentation by data-driven markov chain monte-carlo. IEEE Trans. Patt. Analy. Mach. Intell., 24(5):657–673, 2002. Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, pages 3410–3417, 2012. S. X. Yu, H. Zhang, and J. Malik. Inferring spatial layout from a single image via depth-ordered grouping. In POCV, 2008. 111666000