iccv iccv2013 iccv2013-387 iccv2013-387-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.
[1] Y. Bao, M. Chandraker, Y. Lin, and S. Savarese. Dense object reconstruction with semantic priors. In CVPR, 2013. 2
[2] P. Besl and N. McKay. A method for registration of 3-d shapes. Trans. PAMI, 1992. 3
[3] A. Blake, A. Zisserman, and G. Knowles. Surface descriptions from
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19] stereo and shading. Image and Vision Computing, 1985. 2 L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, 2009. 2, 4 L. Breiman. Random forests. Mach. learning, 45(1):5–32, 2001 . 4 T. Dean, M. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik. Fast, accurate detection of 100,000 object classes on a single machine. In CVPR, 2013. 4 M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981. 5 A. Flint, D. Murray, and I. Reid. Manhattan scene understanding using monocular, stereo, and 3d features. In ICCV, 2011. 2 Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski. Manhattan-world stereo. In CVPR, 2009. 2 Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski. Reconstructing building interiors from images. In ICCV, 2009. 2 Y. Furukawa and J. Ponce. Accurate, dense, and robust multiview stereopsis. Trans. PAMI, 2010. 1, 6 M. Gharbi, T. Malisiewicz, S. Paris, and F. Durand. A gaussian approximation of feature space for fast image similarity. Technical Report 2012-032, MIT CSAIL, 2012. 4 C. Hane, C. Zach, A. Cohen, R. Angst, and M. Pollefeys. Joint 3d scene reconstruction and class segmentation. In CVPR, 2013. 2 B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification. ECCV, 2012. 4 D. Hoiem, A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 2007. 2 K. Karsch, C. Liu, and S. B. Kang. Depth extraction from video using non-parametric sampling. In ECCV, 2012. 1, 2 M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In Eurographics, 2006. 2 Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum. Lazy snapping. ACM ToG (SIGGRAPH), 2004. 5 C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. Freeman. Sift flow: Dense correspondence across different scenes. In ECCV, 2008. 2
[20] A. Saxena, M. Sun, and A. Ng. 3-d reconstruction from sparse views using monocular vision. In ICCV workshop on Virtual Representations and Modeling of Large-scale Environments (VRML), 2007. 2
[21] A. Saxena, M. Sun, and A. Ng. Make3d: Learning 3d scene structure from a single still image. PAMI, 2009. 1, 2 Figure 8. 3D reconstruction results for four scenes, chosen from the test set for their large number of shape anchor predictions. We show the PMVS point cloud and two views of our dense reconstruction combined with the PMVS points (our final output). For each scene, we show four shape anchor transfers, selected by hand from among the top-ten highest scoring ones (that survive occlusion testing); we show one erroneous shape anchor per scene in the last row. We mark significant errors, two per scene, with a red circle. We encourage readers to view our video fly-throughs, since it is difficult to perceive reconstruction errors in a static image. Frame Anchor Match PMVS Ours+PMVS Imagefromdat base Figure 9. Training with similar scenes. particularly dense reconstruction. When our training set contains sequences from the same apartment complex, the result is a
[22] S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR, 2006. 2, 3, 6
[23] A. Shrivastava, T. Malisiewicz, A. Gupta, and A. Efros. Data-driven visual similarity for cross-domain image matching. ACM ToG (SIGGRAPH Asia), 2011. 3
[24] S. Singh, A. Gupta, and A. Efros. Unsupervised discovery of midlevel discriminative patches. In ECCV, 2012. 2
[25] N. Snavely, S. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3d. In ACM ToG (SIGGRAPH), 2006. 6
[26] C. Strecha, W. von Hansen, L. Van Gool, P. Fua, and U. Thoennessen. On benchmarking camera calibration and multi-view stereo for high resolution imagery. In CVPR, 2008. 2
[27] A. Torralba and W. Freeman. Properties and applications of shape recipes. In CVPR, 2003. 2
[28] J. Xiao, A. Owens, and A. Torralba. SUN3D: A database of big spaces reconstructed using SfM and object labels. In ICCV, 2013. 1, 5 40