Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen

Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.

1 We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. [sent-13, score-0.736]

2 Our algorithm inputs RGB-D data, performs 3D box fitting of proposed object segments, and extracts box representation features for scene reasoning, such as box intersection and stability inference. [sent-25, score-1.759]

3 (c) A 3D bounding box is fit to the 3D point clouds of each segment, and several features are extracted for reasoning about stability. [sent-33, score-0.742]

4 (d) The segmentation is updated based on the stability analysis and it produces a better segmentation and a stable box representation. [sent-35, score-0.91]

5 Reasoning about stability brings physics into our model, and encourages more plausible segmentations and block arrangements (see Fig. [sent-40, score-0.596]

6 This evaluation of the box representation allows us to refine the segmentation based on these box properties through a learning process. [sent-48, score-0.81]

7 We experiment on several datasets, from a synthetic block dataset to the NYU dataset of room scenes, and a new Supporting Object Dataset (SOD) with various configurations and supporting relations. [sent-49, score-0.691]

8 Fur- thermore, the algorithm provides a 3D volumetric model of the scene, and high-level information related to stability and support. [sent-51, score-0.47]

9 Novel features based on box representation and stability reasoning. [sent-55, score-0.771]

10 A new supporting objects dataset including human segmentation and support information. [sent-59, score-0.767]

11 Segments in outdoor scenes are represented by one of eight predefined box types that represent a box viewed from various positions. [sent-67, score-0.781]

12 In this work, we use RGB-D data and fit boxes with depth information for volumetric and stability reasoning. [sent-74, score-0.755]

13 In this way, segmentation and supporting inference are transformed into a classification problem in a 2. [sent-81, score-0.658]

14 However, in this paper, we perform a more general analysis of the 3D objects in the scene through box fitting and stability reasoning. [sent-90, score-0.968]

15 However, reasoning about support and stability are two different things. [sent-95, score-0.697]

16 We use stability reasoning to verify whether a given volumetric representation of a scene could actually support itself without toppling, and adjust the segmentation accordingly. [sent-97, score-0.89]

17 We use a simple model for evaluating the stability of our block arrangements, although more complicated physicsbased simulators [1] could be employed. [sent-98, score-0.515]

18 Our approach for stability evaluation is based on a simple Newtonian model: the center of gravity of each adjacent object subset must project within its region of support. [sent-103, score-0.565]

19 (a) A bounding box fit based on minimum volume may not be a good representation for RGB-D images, where only partially observed 3D data is available. [sent-107, score-0.611]

20 (b) A better fit box will not only enclose a small volume, but also have many points near the box surface. [sent-108, score-0.861]

21 First, we fit a 3D bounding box to each segment in the 3D pointcloud. [sent-114, score-0.583]

22 Next, we compute features between boxes and propose supporting relations, perform stability reasoning, and adjust the box orientation based on the supporting surfaces. [sent-115, score-2.181]

23 Single box fitting RGB-D data is observed from only one viewpoint, and fitting 3D bounding boxes with minimum volumes [3] may fail. [sent-120, score-0.97]

24 A minimum volume box covers all the data points but might not give the correct orientation of the object, and fails to represent the object well. [sent-123, score-0.595]

25 A well-fit box should have many 3D points near box surfaces, as shown in Fig. [sent-124, score-0.79]

26 Minimum surface distance The orientation of a 3D bounding box is determined by two perpendicular normal vectors (the third normal is perpendicular to these two vectors). [sent-131, score-0.699]

27 The idea is to find the two principle orientations of the 3D bounding box so that the 3D points are as close as possible to the box surfaces. [sent-132, score-0.902]

28 The minimum volume is determined by finding the extent of the 3D points given the box orientation. [sent-146, score-0.505]

29 Note that there are usually noisy depth points: If a segment mistakenly includes a few points from other segments before or behind, it can lead to a large increase of the box volume. [sent-147, score-0.604]

30 ral times and the best fitting box (smallest distance ? [sent-154, score-0.501]

31 Visibility We identify which box surfaces are visible to the camera. [sent-159, score-0.48]

32 If the objects in the scene are mostly convex, then most 3D points should belong to the visible box surfaces instead of hidden faces. [sent-160, score-0.588]

33 We define the positive normal direction of a surface as the normal pointing away from the box center, and then a surface is visible if the camera center lies at its positive direction. [sent-164, score-0.695]

34 Given the camera position and a proposed bounding box, we determine the visible surfaces of the box, shown as a solid parallel black line to the box surface. [sent-167, score-0.574]

35 (b) With a better box fit, most of the points lie on the visible surfaces of the two boxes. [sent-169, score-0.514]

36 , the two books in the image, then the new box fit to the segment is likely to intersect with neighboring boxes, e. [sent-174, score-0.611]

37 Pairwise box interaction We examine the two pairwise relations between nearby boxes: box intersection, and box support. [sent-179, score-1.35]

38 Ideally, a box fit to an object should contain the object’s depth points, and not intrude into neighboring boxes. [sent-183, score-0.562]

39 If a proposed merging of two segments produces a box that intersects with many other boxes, it is likely an incorrect merge. [sent-184, score-0.573]

40 We explicitly compute the box intersection, and the minimum separation distance between box pairs and direction. [sent-187, score-0.82]

41 Extending this algorithm to 3D bounding boxes is straight-forward: since three surface orientations of a box are orthogonal to one another, we examine a plane parallel 4 (a) (b) Figure 6. [sent-193, score-0.853]

42 θsep is used when determining the pairwise supporting relations between boxes. [sent-201, score-0.789]

43 To classify supporting relations, we detect the ground and compute the ground orientation following [23]. [sent-208, score-0.696]

44 (a) to (c): three different supporting relations: (a) surface on-top support (black arrow); (b) partial on-top support (red arrow); (c) side support (blue arrow). [sent-215, score-1.07]

45 Different supporting relations give different supporting areas plot in red dashed circles. [sent-216, score-1.362]

46 (d) to (e): stability reasoning: (e) considering only the top two boxes, the center of the gravity (in black dashed line) intersects the supporting area (in red dashed circle), and appears (locally) stable. [sent-217, score-1.228]

47 (e) When proceeding further down, the new center of the gravity does not intersect the supporting area, and the configuration is found to be unstable. [sent-218, score-0.742]

48 (f) to (g) supporting area with multi-support: (f) one object can be supported by multiple other objects. [sent-219, score-0.623]

49 (g) The supporting area projected on the ground is the convex hull of all the supporting areas. [sent-220, score-1.269]

50 Reasoning about stability requires that we compute centers of mass for object volumes, and determine areas of support (i. [sent-226, score-0.583]

51 We use an object’s supporting relation to find the supporting area projected on the ground, and different supporting relations provide different supporting areas. [sent-230, score-2.542]

52 For “surface on-top” support, we project the vertexes of the two 3D bounding box to the ground, compute the convex hull for each projection, and use their intersection area on the ground plane as the supporting area. [sent-231, score-1.273]

53 For “partial on-top” and “side” support, we assume there is only one edge touching between two boxes, and project this touching edge on the ground plane as the supporting area. [sent-232, score-0.77]

54 Examples of the supporting areas are shown as red dashed circles in Fig. [sent-233, score-0.616]

55 Global stability Box stability is a global property: boxes can appear to be fully supported locally, but still be in a globally unstable configuration. [sent-236, score-1.005]

56 We perform a top-down stability reasoning by iteratively examining the current gravity center and supporting areas. [sent-239, score-1.315]

57 We begin with the top box by finding the box center of mass, and check whether its gravity projection intersects the supporting area. [sent-243, score-1.484]

58 If so, we mark the current box stable, and proceed to another box beneath for reasoning. [sent-244, score-0.788]

59 Following the constant density assumption, the center of mass Pc = [x, y, z] for a set of boxes is calculated by averaging the volume Vi of each box i: = Pc (? [sent-245, score-0.68]

60 If we found that the current supporting area does not support the center of mass, we label the current box unstable, shown in Fig. [sent-250, score-1.113]

61 For the set of boxes with multiple supports, we compute the convex hull of the multi-supporting areas as the combined supporting area, shown in Fig. [sent-252, score-0.823]

62 We trim these unnecessary supporting relations by examining the support relations in the order: surface on-top, partial on-top and side support. [sent-256, score-1.264]

63 Box fitting: Stability reasoning and supporting relations are used to refine the orientation of a box. [sent-260, score-1.007]

64 If the box is fully supported through a “surface on-top” relation, then we refit the 3D bounding box of the object on top, confining the rotation of the first principle surface S1 to be the same as the supporting surface. [sent-261, score-1.548]

65 We repeat the supporting relation inference and stability reasoning with the re-fitted boxes. [sent-262, score-1.253]

66 This improves the box representation and support interpretation of the scene. [sent-263, score-0.541]

67 We extract a set of features x based on the box fitting, pairwise box relation, and the global stability, shown in Table 1. [sent-267, score-0.799]

68 For example, for a merge move, we record the minimum surface distances of two neighboring boxes before merging (2 dimensions, noted as B), and the minimum surface distance of the box after merging (1 dimension, noted as A), as well Table 1. [sent-268, score-1.114]

69 During testing, we greedily merge the neighboring segments based on the output prediction ofthe regression f,fit a new bounding box for the segment, perform stability reasoning, and re-extract the features for regression. [sent-274, score-0.981]

70 Splitting and Merging with MCMC In this Section, we improve our model (Stability) from Section 7 by introducing an energy function with unary and pairwise terms based on the volumetric boxes, their support relations, and stability (MCMC). [sent-277, score-0.643]

71 i,j where φ(si) is a regression score of a segment si describing the quality of the segment when compared with the groundtruth, and it is learned using single box features and its stability. [sent-286, score-0.508]

72 ψ(si, sj) is a regression score of two neighboring boxes learned using pairwise box features and their support relations. [sent-287, score-0.753]

73 We start with an initial segmentation, and move to a new set of segmentations by either: (a) merging two neighboring segments into one; or (b) splitting one segment into two smaller segments based on the boundary beliefs from [11]. [sent-295, score-0.497]

74 Experiments We experiment on three datasets: a block dataset, a supporting object dataset, and a dataset of indoor scenes [23]. [sent-299, score-0.786]

75 The following algorithms are compared: Min-vol: the baseline algorithm from [3] of fitting minimum volume bounding box . [sent-309, score-0.663]

76 Min-surf: the proposed box fitting algorithm of finding the minimum surface distance. [sent-310, score-0.686]

77 Supp-surf: use our proposed algorithm Min-surf to find the initial boxes, and adjust the orientation of the box based on the supporting relations and stability. [sent-311, score-1.209]

78 We compare the orientation of the bounding box from each algorithm to the ground-truth, and calculate the average angle difference. [sent-312, score-0.508]

79 1 shows that our proposed minimum surface distance provides a better box fitting compared to the minimum volume criteria, reducing the errors in angle to 40%. [sent-314, score-0.779]

80 With stability reasoning, the fitting decreases error by another 15%. [sent-315, score-0.516]

81 We compare with the ground truth supporting relations, Table 2. [sent-317, score-0.604]

82 Three different types of the supporting relations are colored in black (surface-top), red (partial-top), and blue (side). [sent-334, score-0.746]

83 and count an object as correct if all its supporting objects are predicted. [sent-336, score-0.638]

84 We compare our proposed algorithm (stability) that reasons about the stability of each block and deletes the false supporting relations with the baseline (neighbor) that assumes one block is supported by its neighbors, i. [sent-337, score-1.331]

85 However, our proposed stability reasoning improves the supporting relation accuracy by an absolute 10%, achieving over 90% of accuracy. [sent-342, score-1.244]

86 Exemplar images of the predicted supporting relations are shown in Fig. [sent-343, score-0.746]

87 For each object, we manually label the segment and the other objects supporting it. [sent-349, score-0.674]

88 First, we measure the prediction of the supporting relations with the ground truth segmentation. [sent-352, score-0.777]

89 The results of using the baseline neighbors and our stability reasoning stability are shown in Table. [sent-353, score-0.986]

90 In this dataset with irregular shaped objects and complicated support configurations, using the touching neighbors to infer supporting 7 Table 4. [sent-355, score-0.792]

91 Wequalit velyshow urboxfitngalorithm(left)on daily objects with ground-truth image segmentation and the supporting relation prediction after stability reasoning (right). [sent-362, score-1.312]

92 12 presents the exemplar results of our box fitting and support prediction from the supporting object dataset. [sent-368, score-1.207]

93 Then we add our features using the single and pairwise box relations (S/P), and our full feature set with stability reasoning (stability) with the model proposed in Section 7. [sent-372, score-1.187]

94 Reasoning about each object as a box gives around 4% boost in segmentation accuracy, and adding the stability features further improves the performance by 2%. [sent-376, score-0.902]

95 Segmentation and box fit ing results of our proposed algorithm on the testing images. [sent-381, score-0.449]

96 We qualitatively present the box fitting and supporting inference result with ground-truth segmentation in Fig. [sent-390, score-1.159]

97 We begin with box fitting on partially observed 3D point clouds, and then introduce pairwise box interaction features. [sent-394, score-0.922]

98 We explore global stability reasoning on proposed box representations of a scene. [sent-395, score-0.971]

99 Stability reasoning allows us to improve reasoning about supporting relations (by requiring enough support to provide stability for each object) and improve box orientation (by knowing when objects are fully or partially supported from below). [sent-397, score-2.118]

100 Experiments show that our proposed algorithm works in synthetic scenarios as well as real world scenes, and leads to improvements in box fitting, support detection, and segmentation. [sent-398, score-0.509]

