cvpr cvpr2013 cvpr2013-1 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
Reference: text
sentIndex sentText sentNum sentScore
1 We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. [sent-13, score-0.736]
2 Our algorithm inputs RGB-D data, performs 3D box fitting of proposed object segments, and extracts box representation features for scene reasoning, such as box intersection and stability inference. [sent-25, score-1.759]
3 (c) A 3D bounding box is fit to the 3D point clouds of each segment, and several features are extracted for reasoning about stability. [sent-33, score-0.742]
4 (d) The segmentation is updated based on the stability analysis and it produces a better segmentation and a stable box representation. [sent-35, score-0.91]
5 Reasoning about stability brings physics into our model, and encourages more plausible segmentations and block arrangements (see Fig. [sent-40, score-0.596]
6 This evaluation of the box representation allows us to refine the segmentation based on these box properties through a learning process. [sent-48, score-0.81]
7 We experiment on several datasets, from a synthetic block dataset to the NYU dataset of room scenes, and a new Supporting Object Dataset (SOD) with various configurations and supporting relations. [sent-49, score-0.691]
8 Fur- thermore, the algorithm provides a 3D volumetric model of the scene, and high-level information related to stability and support. [sent-51, score-0.47]
9 Novel features based on box representation and stability reasoning. [sent-55, score-0.771]
10 A new supporting objects dataset including human segmentation and support information. [sent-59, score-0.767]
11 Segments in outdoor scenes are represented by one of eight predefined box types that represent a box viewed from various positions. [sent-67, score-0.781]
12 In this work, we use RGB-D data and fit boxes with depth information for volumetric and stability reasoning. [sent-74, score-0.755]
13 In this way, segmentation and supporting inference are transformed into a classification problem in a 2. [sent-81, score-0.658]
14 However, in this paper, we perform a more general analysis of the 3D objects in the scene through box fitting and stability reasoning. [sent-90, score-0.968]
15 However, reasoning about support and stability are two different things. [sent-95, score-0.697]
16 We use stability reasoning to verify whether a given volumetric representation of a scene could actually support itself without toppling, and adjust the segmentation accordingly. [sent-97, score-0.89]
17 We use a simple model for evaluating the stability of our block arrangements, although more complicated physicsbased simulators [1] could be employed. [sent-98, score-0.515]
18 Our approach for stability evaluation is based on a simple Newtonian model: the center of gravity of each adjacent object subset must project within its region of support. [sent-103, score-0.565]
19 (a) A bounding box fit based on minimum volume may not be a good representation for RGB-D images, where only partially observed 3D data is available. [sent-107, score-0.611]
20 (b) A better fit box will not only enclose a small volume, but also have many points near the box surface. [sent-108, score-0.861]
21 First, we fit a 3D bounding box to each segment in the 3D pointcloud. [sent-114, score-0.583]
22 Next, we compute features between boxes and propose supporting relations, perform stability reasoning, and adjust the box orientation based on the supporting surfaces. [sent-115, score-2.181]
23 Single box fitting RGB-D data is observed from only one viewpoint, and fitting 3D bounding boxes with minimum volumes [3] may fail. [sent-120, score-0.97]
24 A minimum volume box covers all the data points but might not give the correct orientation of the object, and fails to represent the object well. [sent-123, score-0.595]
25 A well-fit box should have many 3D points near box surfaces, as shown in Fig. [sent-124, score-0.79]
26 Minimum surface distance The orientation of a 3D bounding box is determined by two perpendicular normal vectors (the third normal is perpendicular to these two vectors). [sent-131, score-0.699]
27 The idea is to find the two principle orientations of the 3D bounding box so that the 3D points are as close as possible to the box surfaces. [sent-132, score-0.902]
28 The minimum volume is determined by finding the extent of the 3D points given the box orientation. [sent-146, score-0.505]
29 Note that there are usually noisy depth points: If a segment mistakenly includes a few points from other segments before or behind, it can lead to a large increase of the box volume. [sent-147, score-0.604]
30 ral times and the best fitting box (smallest distance ? [sent-154, score-0.501]
31 Visibility We identify which box surfaces are visible to the camera. [sent-159, score-0.48]
32 If the objects in the scene are mostly convex, then most 3D points should belong to the visible box surfaces instead of hidden faces. [sent-160, score-0.588]
33 We define the positive normal direction of a surface as the normal pointing away from the box center, and then a surface is visible if the camera center lies at its positive direction. [sent-164, score-0.695]
34 Given the camera position and a proposed bounding box, we determine the visible surfaces of the box, shown as a solid parallel black line to the box surface. [sent-167, score-0.574]
35 (b) With a better box fit, most of the points lie on the visible surfaces of the two boxes. [sent-169, score-0.514]
36 , the two books in the image, then the new box fit to the segment is likely to intersect with neighboring boxes, e. [sent-174, score-0.611]
37 Pairwise box interaction We examine the two pairwise relations between nearby boxes: box intersection, and box support. [sent-179, score-1.35]
38 Ideally, a box fit to an object should contain the object’s depth points, and not intrude into neighboring boxes. [sent-183, score-0.562]
39 If a proposed merging of two segments produces a box that intersects with many other boxes, it is likely an incorrect merge. [sent-184, score-0.573]
40 We explicitly compute the box intersection, and the minimum separation distance between box pairs and direction. [sent-187, score-0.82]
41 Extending this algorithm to 3D bounding boxes is straight-forward: since three surface orientations of a box are orthogonal to one another, we examine a plane parallel 4 (a) (b) Figure 6. [sent-193, score-0.853]
42 θsep is used when determining the pairwise supporting relations between boxes. [sent-201, score-0.789]
43 To classify supporting relations, we detect the ground and compute the ground orientation following [23]. [sent-208, score-0.696]
44 (a) to (c): three different supporting relations: (a) surface on-top support (black arrow); (b) partial on-top support (red arrow); (c) side support (blue arrow). [sent-215, score-1.07]
45 Different supporting relations give different supporting areas plot in red dashed circles. [sent-216, score-1.362]
46 (d) to (e): stability reasoning: (e) considering only the top two boxes, the center of the gravity (in black dashed line) intersects the supporting area (in red dashed circle), and appears (locally) stable. [sent-217, score-1.228]
47 (e) When proceeding further down, the new center of the gravity does not intersect the supporting area, and the configuration is found to be unstable. [sent-218, score-0.742]
48 (f) to (g) supporting area with multi-support: (f) one object can be supported by multiple other objects. [sent-219, score-0.623]
49 (g) The supporting area projected on the ground is the convex hull of all the supporting areas. [sent-220, score-1.269]
50 Reasoning about stability requires that we compute centers of mass for object volumes, and determine areas of support (i. [sent-226, score-0.583]
51 We use an object’s supporting relation to find the supporting area projected on the ground, and different supporting relations provide different supporting areas. [sent-230, score-2.542]
52 For “surface on-top” support, we project the vertexes of the two 3D bounding box to the ground, compute the convex hull for each projection, and use their intersection area on the ground plane as the supporting area. [sent-231, score-1.273]
53 For “partial on-top” and “side” support, we assume there is only one edge touching between two boxes, and project this touching edge on the ground plane as the supporting area. [sent-232, score-0.77]
54 Examples of the supporting areas are shown as red dashed circles in Fig. [sent-233, score-0.616]
55 Global stability Box stability is a global property: boxes can appear to be fully supported locally, but still be in a globally unstable configuration. [sent-236, score-1.005]
56 We perform a top-down stability reasoning by iteratively examining the current gravity center and supporting areas. [sent-239, score-1.315]
57 We begin with the top box by finding the box center of mass, and check whether its gravity projection intersects the supporting area. [sent-243, score-1.484]
58 If so, we mark the current box stable, and proceed to another box beneath for reasoning. [sent-244, score-0.788]
59 Following the constant density assumption, the center of mass Pc = [x, y, z] for a set of boxes is calculated by averaging the volume Vi of each box i: = Pc (? [sent-245, score-0.68]
60 If we found that the current supporting area does not support the center of mass, we label the current box unstable, shown in Fig. [sent-250, score-1.113]
61 For the set of boxes with multiple supports, we compute the convex hull of the multi-supporting areas as the combined supporting area, shown in Fig. [sent-252, score-0.823]
62 We trim these unnecessary supporting relations by examining the support relations in the order: surface on-top, partial on-top and side support. [sent-256, score-1.264]
63 Box fitting: Stability reasoning and supporting relations are used to refine the orientation of a box. [sent-260, score-1.007]
64 If the box is fully supported through a “surface on-top” relation, then we refit the 3D bounding box of the object on top, confining the rotation of the first principle surface S1 to be the same as the supporting surface. [sent-261, score-1.548]
65 We repeat the supporting relation inference and stability reasoning with the re-fitted boxes. [sent-262, score-1.253]
66 This improves the box representation and support interpretation of the scene. [sent-263, score-0.541]
67 We extract a set of features x based on the box fitting, pairwise box relation, and the global stability, shown in Table 1. [sent-267, score-0.799]
68 For example, for a merge move, we record the minimum surface distances of two neighboring boxes before merging (2 dimensions, noted as B), and the minimum surface distance of the box after merging (1 dimension, noted as A), as well Table 1. [sent-268, score-1.114]
69 During testing, we greedily merge the neighboring segments based on the output prediction ofthe regression f,fit a new bounding box for the segment, perform stability reasoning, and re-extract the features for regression. [sent-274, score-0.981]
70 Splitting and Merging with MCMC In this Section, we improve our model (Stability) from Section 7 by introducing an energy function with unary and pairwise terms based on the volumetric boxes, their support relations, and stability (MCMC). [sent-277, score-0.643]
71 i,j where φ(si) is a regression score of a segment si describing the quality of the segment when compared with the groundtruth, and it is learned using single box features and its stability. [sent-286, score-0.508]
72 ψ(si, sj) is a regression score of two neighboring boxes learned using pairwise box features and their support relations. [sent-287, score-0.753]
73 We start with an initial segmentation, and move to a new set of segmentations by either: (a) merging two neighboring segments into one; or (b) splitting one segment into two smaller segments based on the boundary beliefs from [11]. [sent-295, score-0.497]
74 Experiments We experiment on three datasets: a block dataset, a supporting object dataset, and a dataset of indoor scenes [23]. [sent-299, score-0.786]
75 The following algorithms are compared: Min-vol: the baseline algorithm from [3] of fitting minimum volume bounding box . [sent-309, score-0.663]
76 Min-surf: the proposed box fitting algorithm of finding the minimum surface distance. [sent-310, score-0.686]
77 Supp-surf: use our proposed algorithm Min-surf to find the initial boxes, and adjust the orientation of the box based on the supporting relations and stability. [sent-311, score-1.209]
78 We compare the orientation of the bounding box from each algorithm to the ground-truth, and calculate the average angle difference. [sent-312, score-0.508]
79 1 shows that our proposed minimum surface distance provides a better box fitting compared to the minimum volume criteria, reducing the errors in angle to 40%. [sent-314, score-0.779]
80 With stability reasoning, the fitting decreases error by another 15%. [sent-315, score-0.516]
81 We compare with the ground truth supporting relations, Table 2. [sent-317, score-0.604]
82 Three different types of the supporting relations are colored in black (surface-top), red (partial-top), and blue (side). [sent-334, score-0.746]
83 and count an object as correct if all its supporting objects are predicted. [sent-336, score-0.638]
84 We compare our proposed algorithm (stability) that reasons about the stability of each block and deletes the false supporting relations with the baseline (neighbor) that assumes one block is supported by its neighbors, i. [sent-337, score-1.331]
85 However, our proposed stability reasoning improves the supporting relation accuracy by an absolute 10%, achieving over 90% of accuracy. [sent-342, score-1.244]
86 Exemplar images of the predicted supporting relations are shown in Fig. [sent-343, score-0.746]
87 For each object, we manually label the segment and the other objects supporting it. [sent-349, score-0.674]
88 First, we measure the prediction of the supporting relations with the ground truth segmentation. [sent-352, score-0.777]
89 The results of using the baseline neighbors and our stability reasoning stability are shown in Table. [sent-353, score-0.986]
90 In this dataset with irregular shaped objects and complicated support configurations, using the touching neighbors to infer supporting 7 Table 4. [sent-355, score-0.792]
91 Wequalit velyshow urboxfitngalorithm(left)on daily objects with ground-truth image segmentation and the supporting relation prediction after stability reasoning (right). [sent-362, score-1.312]
92 12 presents the exemplar results of our box fitting and support prediction from the supporting object dataset. [sent-368, score-1.207]
93 Then we add our features using the single and pairwise box relations (S/P), and our full feature set with stability reasoning (stability) with the model proposed in Section 7. [sent-372, score-1.187]
94 Reasoning about each object as a box gives around 4% boost in segmentation accuracy, and adding the stability features further improves the performance by 2%. [sent-376, score-0.902]
95 Segmentation and box fit ing results of our proposed algorithm on the testing images. [sent-381, score-0.449]
96 We qualitatively present the box fitting and supporting inference result with ground-truth segmentation in Fig. [sent-390, score-1.159]
97 We begin with box fitting on partially observed 3D point clouds, and then introduce pairwise box interaction features. [sent-394, score-0.922]
98 We explore global stability reasoning on proposed box representations of a scene. [sent-395, score-0.971]
99 Stability reasoning allows us to improve reasoning about supporting relations (by requiring enough support to provide stability for each object) and improve box orientation (by knowing when objects are fully or partially supported from below). [sent-397, score-2.118]
100 Experiments show that our proposed algorithm works in synthetic scenarios as well as real world scenes, and leads to improvements in box fitting, support detection, and segmentation. [sent-398, score-0.509]
wordName wordTfidf (topN-words)
[('supporting', 0.573), ('stability', 0.393), ('box', 0.378), ('reasoning', 0.2), ('boxes', 0.179), ('relations', 0.173), ('fitting', 0.123), ('surface', 0.121), ('support', 0.104), ('block', 0.096), ('segments', 0.092), ('gravity', 0.084), ('segmentations', 0.077), ('volumetric', 0.077), ('mcmc', 0.072), ('fit', 0.071), ('bounding', 0.069), ('merging', 0.069), ('segment', 0.065), ('surfaces', 0.064), ('minimum', 0.064), ('indoor', 0.063), ('orientation', 0.061), ('separating', 0.059), ('mass', 0.057), ('relation', 0.056), ('segmentation', 0.054), ('touching', 0.053), ('neighboring', 0.049), ('blocks', 0.049), ('intersect', 0.048), ('axis', 0.048), ('saxena', 0.047), ('pairwise', 0.043), ('dashed', 0.043), ('corne', 0.043), ('orientations', 0.043), ('intersection', 0.042), ('dmin', 0.041), ('hull', 0.041), ('unstable', 0.04), ('visible', 0.038), ('koppula', 0.038), ('scene', 0.038), ('plane', 0.038), ('interpretation', 0.037), ('center', 0.037), ('arrow', 0.037), ('side', 0.036), ('objects', 0.036), ('depth', 0.035), ('perpendicular', 0.035), ('nyu', 0.035), ('supports', 0.035), ('points', 0.034), ('sep', 0.034), ('bso', 0.034), ('intersects', 0.034), ('chair', 0.034), ('volumes', 0.034), ('beneath', 0.032), ('stable', 0.031), ('ground', 0.031), ('inference', 0.031), ('arrangements', 0.03), ('convex', 0.03), ('move', 0.03), ('object', 0.029), ('volume', 0.029), ('vertexes', 0.028), ('partial', 0.028), ('examining', 0.028), ('unnecessary', 0.028), ('world', 0.027), ('tilted', 0.027), ('sod', 0.026), ('gives', 0.026), ('complicated', 0.026), ('wall', 0.026), ('energy', 0.026), ('physical', 0.026), ('gupta', 0.026), ('scenes', 0.025), ('bleyer', 0.025), ('ijrr', 0.025), ('visibility', 0.025), ('parallel', 0.025), ('clouds', 0.024), ('occupies', 0.024), ('adjust', 0.024), ('zheng', 0.023), ('cornell', 0.023), ('cuboids', 0.023), ('manhattan', 0.023), ('splitting', 0.023), ('silberman', 0.022), ('project', 0.022), ('configurations', 0.022), ('improves', 0.022), ('area', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
2 0.29516888 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
3 0.18413046 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
4 0.18377168 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
5 0.17641062 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
6 0.13193977 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
7 0.11155099 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.10705139 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
9 0.10419695 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
10 0.096599311 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
11 0.096042499 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
12 0.095572203 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
13 0.094554655 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
14 0.092443675 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
15 0.089809433 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
16 0.087049767 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
17 0.084883377 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
18 0.084019274 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
19 0.082259268 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
20 0.08036954 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
topicId topicWeight
[(0, 0.184), (1, 0.06), (2, 0.053), (3, -0.027), (4, 0.079), (5, -0.038), (6, 0.031), (7, 0.124), (8, -0.011), (9, 0.013), (10, 0.001), (11, -0.098), (12, -0.015), (13, 0.01), (14, -0.011), (15, -0.049), (16, 0.093), (17, 0.122), (18, -0.104), (19, 0.075), (20, -0.056), (21, 0.006), (22, 0.195), (23, -0.0), (24, 0.051), (25, 0.021), (26, 0.014), (27, -0.082), (28, -0.045), (29, -0.098), (30, -0.047), (31, 0.041), (32, 0.019), (33, 0.032), (34, -0.02), (35, -0.004), (36, 0.088), (37, -0.024), (38, 0.0), (39, -0.041), (40, 0.02), (41, 0.002), (42, 0.092), (43, 0.161), (44, 0.017), (45, 0.041), (46, 0.049), (47, 0.042), (48, 0.034), (49, 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.97329915 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
2 0.77548248 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
3 0.76894087 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
4 0.66285437 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
5 0.65273613 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
6 0.6510874 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
7 0.63142699 364 cvpr-2013-Robust Object Co-detection
8 0.61836106 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
9 0.61791974 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
10 0.60526633 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
11 0.59328824 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
12 0.57587481 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
13 0.56262726 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
14 0.54972851 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
15 0.54308158 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
16 0.53862059 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
17 0.53160238 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
18 0.52608496 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
19 0.51292884 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
20 0.49884185 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition
topicId topicWeight
[(10, 0.105), (16, 0.018), (26, 0.039), (33, 0.21), (39, 0.012), (67, 0.029), (69, 0.396), (87, 0.093)]
simIndex simValue paperId paperTitle
same-paper 1 0.8877511 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
2 0.85848671 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
3 0.84679872 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
Author: Zhe Yang, Zhiwei Xiong, Yueyi Zhang, Jiao Wang, Feng Wu
Abstract: This paper proposes novel density modulated binary patterns for depth acquisition. Similar to Kinect, the illumination patterns do not need a projector for generation and can be emitted by infrared lasers and diffraction gratings. Our key idea is to use the density of light spots in the patterns to carry phase information. Two technical problems are addressed here. First, we propose an algorithm to design the patterns to carry more phase information without compromising the depth reconstruction from a single captured image as with Kinect. Second, since the carried phase is not strictly sinusoidal, the depth reconstructed from the phase contains a systematic error. We further propose a pixelbased phase matching algorithm to reduce the error. Experimental results show that the depth quality can be greatly improved using the phase carried by the density of light spots. Furthermore, our scheme can achieve 20 fps depth reconstruction with GPU assistance.
4 0.83744133 135 cvpr-2013-Discriminative Subspace Clustering
Author: Vasileios Zografos, Liam Ellis, Rudolf Mester
Abstract: We present a novel method for clustering data drawn from a union of arbitrary dimensional subspaces, called Discriminative Subspace Clustering (DiSC). DiSC solves the subspace clustering problem by using a quadratic classifier trained from unlabeled data (clustering by classification). We generate labels by exploiting the locality of points from the same subspace and a basic affinity criterion. A number of classifiers are then diversely trained from different partitions of the data, and their results are combined together in an ensemble, in order to obtain the final clustering result. We have tested our method with 4 challenging datasets and compared against 8 state-of-the-art methods from literature. Our results show that DiSC is a very strong performer in both accuracy and robustness, and also of low computational complexity.
5 0.83191061 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu
Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
6 0.81685472 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
7 0.80589366 392 cvpr-2013-Separable Dictionary Learning
9 0.7315405 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
10 0.70385379 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
11 0.66895485 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
12 0.66773313 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
13 0.662637 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
14 0.6592291 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
15 0.65822947 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
16 0.65700728 282 cvpr-2013-Measuring Crowd Collectiveness
17 0.6526655 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
18 0.65022779 402 cvpr-2013-Social Role Discovery in Human Events
19 0.64433366 364 cvpr-2013-Robust Object Co-detection
20 0.64432335 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments