Author: Sid Yingze Bao, Manmohan Chandraker, Yuanqing Lin, Silvio Savarese

Abstract: We present a dense reconstruction approach that overcomes the drawbacks of traditional multiview stereo by incorporating semantic information in the form of learned category-level shape priors and object detection. Given training data comprised of 3D scans and images of objects from various viewpoints, we learn a prior comprised of a mean shape and a set of weighted anchor points. The former captures the commonality of shapes across the category, while the latter encodes similarities between instances in the form of appearance and spatial consistency. We propose robust algorithms to match anchor points across instances that enable learning a mean shape for the category, even with large shape variations across instances. We model the shape of an object instance as a warped version of the category mean, along with instance-specific details. Given multiple images of an unseen instance, we collate information from 2D object detectors to align the structure from motion point cloud with the mean shape, which is subsequently warped and refined to approach the actual shape. Extensive experiments demonstrate that our model is general enough to learn semantic priors for different object categories, yet powerful enough to reconstruct individual shapes with large variations. Qualitative and quantitative evaluations show that our framework can produce more accurate reconstructions than alternative state-of-the-art multiview stereo systems.

1 Given training data comprised of 3D scans and images of objects from various viewpoints, we learn a prior comprised of a mean shape and a set of weighted anchor points. [sent-2, score-0.969]

2 We propose robust algorithms to match anchor points across instances that enable learning a mean shape for the category, even with large shape variations across instances. [sent-4, score-1.256]

3 We model the shape of an object instance as a warped version of the category mean, along with instance-specific details. [sent-5, score-0.317]

4 Given multiple images of an unseen instance, we collate information from 2D object detectors to align the structure from motion point cloud with the mean shape, which is subsequently warped and refined to approach the actual shape. [sent-6, score-0.301]

5 We propose a framework for semantic dense reconstruction that learns a category-level shape prior, which is used with weighted warping and refinement mechanisms to reconstruct regularized, high-quality 3D shapes. [sent-29, score-0.641]

6 This paper presents a framework for dense 3D reconstruction that overcomes the drawbacks of traditional MVS by leveraging semantic information in the form of object detection and shape priors learned from a database of training images and 3D shapes. [sent-31, score-0.592]

7 We postulate in Section 3 that while object instances within a category might have very different shapes and appearances, they share certain similarities at a semantic level. [sent-40, score-0.312]

8 We model semantic similarity as a shape prior, which consists of a set of automatically learned anchor points across several instances, along with a learned mean shape that captures the – 11111222226666624422 shared commonality of the entire category. [sent-42, score-1.379]

9 In the learning phase (Section 4), the anchor points encode attributes such as frequency, appearance and location similarity of features across instances. [sent-44, score-0.798]

10 Based on matched anchor points, the shape prior for a category is determined by a series of weighted thin-plate spline (TPS) warps over the scans of training objects. [sent-46, score-1.094]

11 Our reconstruction phase (Section 5) starts with a point cloud obtained by applying a structure-from-motion (SFM) or MVS system to images of an unseen instance (with a shape different from training objects). [sent-47, score-0.434]

12 This guides the process of matching anchor points shown by green stars in right panel in Figure 2 – between the learned prior and the test object’s SFM point cloud, followed by a warping of the prior shape in order to closely resemble the true shape. [sent-49, score-1.326]

13 Finer details not captured by the shape prior may be recovered by a refinement step, using guidance from SFM or MVS output. [sent-50, score-0.295]

14 The refinement combines confidence scores from anchor points and photoconsistency in order to produce a regularized, high quality output shape. [sent-51, score-0.909]

15 This paper provides a framework to augment traditional multiview stereo (MVS) reconstruction methods with semantic information. [sent-56, score-0.318]

16 A set of example shapes is used by active shape models (ASM) to encode patterns of variability, thereby ensuring a fitted shape consistent with deformations observed in training [8]. [sent-63, score-0.389]

17 Subsequent works on statistical shape analysis [10] allow nonrigid TPS warps between shapes [5], but often require landmark identification and initial rigid alignment based on point distributions, which is not feasible for general scenes [24]. [sent-66, score-0.393]

18 We use semantic information, namely object detection for localization and anchor point matching, to overcome those drawbacks. [sent-67, score-0.841]

19 Learned anchor points yield confidence scores, which guide our deformation process through a weighted TPS [26]. [sent-68, score-0.797]

20 Morphable models in 3D demonstrate realistic shape recovery, but are limited to categories like faces with low shape variation that can be accurately modeled with a linear PCA basis [4]. [sent-69, score-0.32]

21 By exploiting semantics in the form of object detection and anchor point matching, we handle both greater shape variation and noisy, incomplete, image-based MVS inputs. [sent-72, score-0.929]

22 Determining correspondence across instances with varying shape is a key step in shape matching. [sent-74, score-0.418]

23 The demands on correspondences for 3D reconstruction are far higher than 2D shape matching competing factors like high localization accuracy, stringent outlier rejection and good density are all crucial to obtaining a high quality dense reconstruction. [sent-79, score-0.384]

24 However, the complexity of 3D shapes and the accuracy demands of 3D reconstruction necessitate far greater control over the deformation process, so we consider it advantageous to compute priors in the mesh space. [sent-86, score-0.324]

25 Our Model We assume that for each object category, there exists a prior that consists of a 3D mean shape S∗ that captures the commonality of shapes across all instances and a set of anchor points A that captures similarities between subsets of instances. [sent-88, score-1.321]

26 The shape of any particular object Si is a transformation of S∗, plus specific details Δi not shared by other instances: Si = T({S∗, A}, θi) + Δi, (1) where T is a warping (transformation) function and θi is the warping parameter that is unique to each object instance. [sent-89, score-0.506]

27 We leverage on certain reliable features associated with the shape prior, which we call anchor points. [sent-93, score-0.848]

28 Anchor points form the backbone of our framework, since they are representative of object shape and the relative importance of different object structures. [sent-94, score-0.322]

29 Anchor points with high weights, ω, are considered stable in terms of location and appearance, and thus, more representative of object shape across instances. [sent-95, score-0.311]

30 1, we detail the mechanism of learning anchor points from training data. [sent-98, score-0.768]

31 In particular, prior work on shape matching [2, 19] has demonstrated inspiring results using regularized thin-plate spline (TPS) transformations [5] to capture deformations. [sent-101, score-0.315]

32 i}, i= 1, · · · , n, be two sets of anchor points for object ins{taxnc}es Oi = =an 1d, ·O··? [sent-103, score-0.809]

33 Semantic information of this nature is determined automatically in our framework by the anchor point learning mechanism. [sent-123, score-0.728]

34 To incorporate semantic information from anchor points, in the form of a weight matrix W = diag(ω1 , · · · , ωn), we use an extension of TPS [26]: (K + nλW−1)β + Φα = x? [sent-124, score-0.76]

35 Details specific to each object that are not captured in the shape prior are recovered by a refinement step. [sent-128, score-0.336]

36 This refinement is used in both mean shape learning and during reconstruction of a particular test object. [sent-129, score-0.414]

37 To refine a shape Si (a mesh) towards shape Sj , we compute displacements for vertices in Si. [sent-130, score-0.346]

38 The vertices of the refined shape are obtained as pik + dik and it inherits the connectivity of Si. [sent-149, score-0.415]

39 This is because the above mechanism can be used, with minor changes, for both mean shape learning with the shape Sj being a mesh and for reconstruction with Sj being the oriented point cloud output of MVS, as elaborated in Sections 4. [sent-151, score-0.644]

40 Learning Reconstruction Priors For each object category, we use a set of object instances {On} to learn a mean shape S∗ and a set of anchor points A. [sent-155, score-1.118]

41 They also serve as the initial{izSatio}n for the anchor point learning, as described in the following. [sent-160, score-0.728]

42 Learning Anchor Points An anchor point, A = {Γ, χ, ω}, consists of a feature vector Γ that describes appearance, t,hωe} 3D location χ with respect to the mean shape and a scalar weight ω. [sent-163, score-0.888]

43 For cars, most anchor points are located around wheels and body corners since those parts are shared across instances. [sent-167, score-0.798]

44 For fruits, anchor points are distributed around the stem and bottom. [sent-168, score-0.768]

45 We also show image patches associated with the features of a few example anchor points. [sent-170, score-0.688]

46 For an anchor point A, if V are the indices of objects across which the corresponding SFMV points are matched and Ωi are the indices of images of Oi where A is visible, the corresponding feature vector is: Γ = {{fkii }ki∈Ωi }i∈V. [sent-172, score-0.901]

47 Then, the location for the anchor point is χj=|V1|? [sent-175, score-0.728]

48 (7) The weight ω reflects “importance” of an anchor point. [sent-177, score-0.688]

49 We consider an anchor point important if it appears across many instances, with low position and appearance variance. [sent-178, score-0.758]

50 In contrast to applications like shape matching, the quality of dense reconstruction is greatly affected by the order and extent of deformations. [sent-189, score-0.326]

51 Thus, the learned anchor point weights ω are crucial to the success of dense reconstruction. [sent-190, score-0.833]

52 The key precursor to learning anchor points is matching 3D points across instances, which is far from trivial. [sent-193, score-0.91]

53 Such points usually dominate an SFM point cloud, but do not generalize across instances Algorithm 1 Learning anchor points Set Parameters δf, δp. [sent-195, score-0.986]

54 endU wphdailtee Output: denser anchor point set A. [sent-213, score-0.728]

55 Learned shape prior and anchor points for keyboard category. [sent-215, score-1.018]

56 since they do not correspond to the object shape, thus, may not be anchor point candidates. [sent-220, score-0.769]

57 Moreover, the density of anchor points cannot be too low, since they guide the deformation process that computes the mean shape and fits it to the 3D point cloud. [sent-221, score-1.063]

58 To ensure the robustness of anchor point matching and good density, we propose an iterative algorithm, detailed in Algorithm 1. [sent-222, score-0.76]

59 The distribution and weights of the learned anchor points are visualized in Figure 3 and 4. [sent-223, score-0.837]

60 Mean Shape Construction The learned anchor points are used to compute a mean shape for an object category. [sent-226, score-1.052]

61 Recall that we have a mapping from the set of anchor points to each instance in the training set. [sent-227, score-0.797]

62 Thus, we can warp successive shapes closer to a mean shape using the anchor points. [sent-228, score-1.041]

63 The mean shape is constructed by combining these aligned and warped shapes of different instances. [sent-229, score-0.351]

64 In our experiments, we use the weighted number of commonly matched anchor points as the similarity cue. [sent-232, score-0.831]

65 We combine the warped shapes T(Ssican) following the order of merging successive branches, to eventually obtain a single shape S∗, which represents the commonality of all training instances. [sent-238, score-0.348]

66 The mean shape learning procedure is shown for a subset of the car dataset in Fig. [sent-240, score-0.304]

67 Note that S∗ is computed by using the warped training examples, where the warping maps the 3D locations of learned anchor points. [sent-242, score-0.907]

68 Thus, the prior shape is always aligned with the anchor points. [sent-243, score-0.925]

69 Two shapes aligned by anchor points are eventually combined into a single one using displacement vectors computed by minimizing (5). [sent-245, score-0.863]

70 Semantic Reconstruction with Shape Priors Given a number of images of an object O, we can reconstruct its 3D shape by warping the learned prior shape S∗ based on the estimated θ and by recovering Δ in (1) subsequently. [sent-249, score-0.614]

71 The reconstruction consists of three steps: matching anchor points, warping by anchor points, and refinement. [sent-250, score-1.658]

72 Accurately recovering warp parameters θ requires accurate matches between anchor points in S∗ and SFM points in Ssfm. [sent-251, score-0.942]

73 Initial Alignment It is conventional in shape modeling literature to compute shape alignments using Procrustes analysis or ICP [8]. [sent-255, score-0.32]

74 Matching anchor points from leaned model (left) to new object (right). [sent-266, score-0.809]

75 Since we also know those for the shape prior, we can use a rigid transformation to coarsely align the prior shape and its anchor points to fit the SFM point cloud of the object. [sent-276, score-1.303]

76 The initial alignment for a car reconstruction is shown in Figure 6. [sent-277, score-0.321]

77 Reconstruction Given a set of images I an object with unknown shape of S, we use standard SFM to recover the 3D point cloud Ssfm. [sent-281, score-0.316]

78 Our goal is to use the mean shape S∗ to produce a dense reconstruction that closely resembles S. [sent-282, score-0.366]

79 Since the initial alignment uses the object’s location, pose and scale, anchor points are likely to be aligned to 3D locations in the vicinity of their true matches. [sent-284, score-0.881]

80 Thus, the burden of identifying the point in Ssfm that corresponds to an anchor point in S∗ is reduced to a local search. [sent-285, score-0.768]

81 We use HOG features to match anchor points to SFM points. [sent-286, score-0.768]

82 Examples of robust anchor point matches from our algorithm are shown in Figure 7. [sent-288, score-0.762]

83 11111222226666668866 Algorithm 2 Matching anchor points Set parameters δ1δ2η. [sent-289, score-0.768]

84 Warping of the shape prior with the learned anchor points matched to SFM points using Algorithm 2. [sent-340, score-1.165]

85 Note that while the shape prior represents the commonality of all instances, anchor point-based warping recovers coarse aspects of instance-specific shape, such as the back geometry of Car 2. [sent-341, score-1.082]

86 Assume S∗ is the shape prior after the initial alignment of Section 5. [sent-343, score-0.298]

87 We use the above matches between anchor points in S∗ and SFM points in Ssfm to estimate parameters θ for the weighted TPS warping (4) and obtain S? [sent-345, score-1.002]

88 Notice that, this warping not only reduces the alignment error from the initial detection-based alignment, it also deforms the prior to fit the actual shape of the object. [sent-347, score-0.418]

89 This refined shape is the final output of our dense reconstruction framework. [sent-361, score-0.326]

90 The efficacy of using anchor points and their learned weights can be demonstrated by Table 2. [sent-386, score-0.837]

91 Using anchor points can greatly reduce the reconstruction error compared to only using object detection for alignment. [sent-387, score-0.939]

92 Learning anchor point weights further enhances the reconstruction accuracy. [sent-388, score-0.884]

93 RGD: Rigidly align mean shape to test object using matched anchor points. [sent-399, score-1.041]

94 WP: Align and warp mean shape using matched anchor points (without refinement). [sent-400, score-1.091]

95 RGD: Rigidly align the mean shape to a test object by using matched anchor points. [sent-416, score-1.041]

96 WP: Align and warp the mean shape by using matched anchor points (Section 5. [sent-417, score-1.091]

97 In contrast, we successfully learn meaningful semantic priors across shape variations and use them in our reconstruction, to produce the much higher quality reconstructions in (d), that closely resemble the ground truth (e). [sent-427, score-0.366]

98 Discussion and Future Work We have presented a comprehensive framework for dense object reconstruction that uses data-driven semantic priors to recover shape in situations unfavorable to traditional MVS. [sent-429, score-0.522]

99 Our learned priors, combined with robust anchor point matching and refinement mechanisms, are shown to produce visually high quality and quantitatively accurate results. [sent-430, score-0.887]

100 Evaluating shape correspondence for statistical shape analysis: A benchmark study. [sent-599, score-0.32]

