cvpr cvpr2013 cvpr2013-110 cvpr2013-110-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sid Yingze Bao, Manmohan Chandraker, Yuanqing Lin, Silvio Savarese
Abstract: We present a dense reconstruction approach that overcomes the drawbacks of traditional multiview stereo by incorporating semantic information in the form of learned category-level shape priors and object detection. Given training data comprised of 3D scans and images of objects from various viewpoints, we learn a prior comprised of a mean shape and a set of weighted anchor points. The former captures the commonality of shapes across the category, while the latter encodes similarities between instances in the form of appearance and spatial consistency. We propose robust algorithms to match anchor points across instances that enable learning a mean shape for the category, even with large shape variations across instances. We model the shape of an object instance as a warped version of the category mean, along with instance-specific details. Given multiple images of an unseen instance, we collate information from 2D object detectors to align the structure from motion point cloud with the mean shape, which is subsequently warped and refined to approach the actual shape. Extensive experiments demonstrate that our model is general enough to learn semantic priors for different object categories, yet powerful enough to reconstruct individual shapes with large variations. Qualitative and quantitative evaluations show that our framework can produce more accurate reconstructions than alternative state-of-the-art multiview stereo systems.
[1] S. Bao and S. Savarese. Semantic structure from motion. In CVPR, pages 2025 –2032, 2011.
[2] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. PAMI, 24(4):509–522, 2002.
[3] A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In CVPR, volume 1, pages 26–33, 2005.
[4] V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In SIGGRAPH, pages 187–194, 1999.
[5] F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. PAMI, 11(6):567–585, 1989.
[6] H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. CVIU, 89(2-3): 114–141, 2003.
[7] P. Cignoni, C. Rocchini, and R. Scopigno. Metro: Measuring error on simplified surfaces. CGF, 17: 167–174, 1998.
[8] T. Cootes, C. Taylor, D. Cooper, and J. Graham. Active shape models: Their training and application. CVIU, 61(1):38 – 59, 1995.
[9] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005.
[10] I. Dryden and K. Mardia. Statistical Shape Analysis. John Wiley
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20] and Sons, 1998. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. PAMI, 32(9):1627 –1645, 2010. V. Ferrari, F. Jurie, and C. Schmid. Accurate object detection with deformable shape models learnt from images. In CVPR, pages 1–8, 2007. Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski. Manhattanworld stereo. In CVPR, pages 1422 –1429, 2009. Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski. Towards internet-scale multi-view stereo. In CVPR, pages 1434–1441, 2010. Y. Furukawa and J. Ponce. Accurate, dense and robust multiview stereopsis. PAMI, 32(8): 1362–1376, 2010. D. Gallup, J.-M. Frahm, and M. Pollefeys. Piecewise planar and non-planar stereo for urban scene reconstruction. In CVPR, pages 1418–1425, 2010. M. Goesele, J. Ackermann, S. Fuhrmann, R. Klowsky, F. Langguth, P. Mu¨andcke, and M. Ritz. Scene reconstruction from community photo collections. IEEE Computer, 43:48–53, 2010. C. Herna´ndez and G. Vogiatzis. Shape from photographs: A multi-view stereo pipeline. In Computer Vision, volume 285 of Studies in Comp. Intell., pages 281–31 1. Springer, 2010. T. Jiang, F. Jurie, and C. Schmid. Learning shape prior models for object matching. In CVPR, pages 848 –855, 2009. M. Kazhdan, M. Bolitho, and H. Hoppe. Poisson surface reconstruction. In SGP, pages 61–70, 2006. 11111222226667780088 (a) Sample Image (b) MVS Patches [15] (c) MVS + PSR [20] (d) Our Method (e) Ground Truth Figure 11. Examples of reconstructed objects. Notice the lack of texture and presence of specularities in sample images (a). MVS reconstruction from 48 images using the method of [14] produces clearly visible holes and extremely noisy reconstructed patches (b). Poisson surface reconstruction fails to produce a reasonable mesh under such scenarios (c). Our semantic framework, on the other hand, yields a high quality reconstruction (d), which closely resembles the ground truth (e), both visually and quantitatively. The results are obtained by using 48 images for cars and fruits, and 5 images for keyboards.
[21] V. Kolmogorov and R. Zabih. Multi-camera scene reconstruction via graph cuts. In ECCV, pages 82–96, 2002.
[22] B. Leibe, A. Leonardis, and B. Schiele. An implicit shape model for combined object categorization and segmentation. In Toward Category-Level Object Recognition, volume 4170 of LNCS, pages 508–524. Springer, 2006.
[23] M. J. Leotta and J. L. Mundy. Predicting high resolution image edges with a generic, adaptive, 3D vehicle model. In CVPR, pages 1311–13 18, 2009.
[24] B. Munsell, P. Dalal, and S. Wang. Evaluating shape correspondence for statistical shape analysis: A benchmark study. PAMI, 30(1 1):2023 –2039, 2008.
[25] M. Pauly, N. J. Mitra, J. Giesen, M. Gross, and L. J. Guibas. Example-based 3D scan completion. In SGP, pages 23–32, 2005.
[26] K. Rohr, H. S. Stiehl, R. Sprengel, W. Beil, T. M. Buzug,
[27]
[28]
[29]
[30]
[31]
[32] J. Weese, and M. H. Kuhn. Point-based elastic registration of medical image data using approximating thin-plate splines. In Int. Conf. on Vis. in Biomed. Comp., pages 297–306, 1996. S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR, pages 519–528, 2006. M. Sun, G. Bradski, B.-X. Xu, and S. Savarese. Depth-encoded hough voting for coherent object detection, pose estimation, and shape recovery. In ECCV, pages 658–671, 2010. A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, and L. Van Gool. Depth-from-recognition: Inferring meta-data by cognitive feedback. In ICCV, pages 1–8, 2007. G. Vogiatzis, C. Hernandez, P. Torr, and R. Cipolla. Multiview stereo via volumetric graph-cuts and occlusion robust photoconsistency. PAMI, 29(12):2241 –2246, 2007. C. Wu, S. Agarwal, B. Curless, and S. Seitz. Schematic surface reconstruction. In CVPR, pages 1498–1505, 2012. Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, pages 3410–3417, 2012. 11111222226667791199