cvpr cvpr2013 cvpr2013-61 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 j p i Abstract In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. [sent-8, score-0.821]
2 Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. [sent-11, score-1.727]
3 We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. [sent-12, score-0.447]
4 Such representations lacks important physical information, such as the 3D volume of the objects, supporting relations, stability, and affordance which are critical for robotics applications: grasping, manipulation and navigation. [sent-18, score-0.302]
5 In this paper, we present an approach for reasoning physical stability of 3D volumetric objects reconstructed from either a depth image captured by a range camera or a large scale point cloud scene reconstructed by the SLAM techKatsushi Ikeuchi? [sent-20, score-1.259]
6 1) Geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud. [sent-32, score-0.618]
7 (d) shows the 3D primitives in rectangular or cylindrical shapes. [sent-39, score-0.244]
8 2) Physical reasoning: grouping the primitives to physically stable objects by optimizing the stability and the scene prior. [sent-40, score-0.704]
9 We build a contact graph for the neighborhood relations of the primitives as shown in Fig. [sent-41, score-0.556]
10 For example, the lamp on the desk originally was divided in 3 primitives and will fall under gravity (see result simulated using a physics engine), and become stable when they are grouping into one object the lamp. [sent-44, score-0.772]
11 To achieve the physical reasoning goal, we make the fol– lowing novel contributions in comparison to the most recent work in dealing with physical space reasoning [8, 16]. [sent-46, score-0.93]
12 • We define the physical stability function explicitly by studying em tihnei pmhuymsi energy (physical wno erxkp) icnietelyd btoy change the pose and position of an primitive (or ob333 111222557 ? [sent-47, score-0.737]
13 (a) 3D scene reconstructed by SLAM technique, (b) point cloud as Input. [sent-117, score-0.27]
14 In geometric reasoning, (c) a portion is shown to be segmented by a segment-and-merge approach, with missing voxels, (d) solid primitives by volumetric completion. [sent-118, score-0.588]
15 In physical reasoning, (e) the contact graph are labeled through stability optimization. [sent-119, score-0.698]
16 • • • We introduce disconnectivity graph (DG) from physics (Spin-glass) eto d represent tvhitey energy landscapes. [sent-123, score-0.473]
17 Our approach for geometry reasoning is related to a set of segmentation methods (e. [sent-132, score-0.251]
18 Most of the existing methods are focused on classifying point clouds for object category recognition, not for 3D volumetric completion. [sent-135, score-0.376]
19 [1] extracts 3D geometric primitives (planes or cylinders) from 3D mesh. [sent-137, score-0.295]
20 In comparison, our method is more faithful to the original geometric shape of object in the point cloud data. [sent-138, score-0.294]
21 [7] also performed volumetric reasoning with the Manhattan-world assumption on the problem of multi-view stereo. [sent-143, score-0.466]
22 In comparison, our volumetric reasoning is based on complex point cloud data and provides more accurate 3D physical properties, e. [sent-144, score-0.932]
23 The vision communities have studied the physical properties based on single image for the ”block world” in the past three decades [3, 8, 9, 21, 15, 14]). [sent-150, score-0.255]
24 [3] studied human sensitivity of objects that violate certain physical relations. [sent-154, score-0.287]
25 Our goal of inferring physical relations is most closely related to Gupta et al. [sent-155, score-0.316]
26 [8] who infer volumetric shapes, occlusion, and support relations in outdoor scenes inspired by physical reasoning from a 2D image, and Silberman et al. [sent-156, score-0.782]
27 [10] showed that knowledge of Newtonian principles and probabilistic representations are generally applied for human physical reasoning, and the intuitive physics model is an important perspective for human-level complex scene understanding. [sent-164, score-0.47]
28 However, to our best knowledge, there is little work that mathematically defines intuitive physics models for real scene understanding. [sent-165, score-0.215]
29 Four types of voxels are estimated: invisible voxels (light green), empty voxels (white), surface voxels (red and blue dots), and the voxels filled in the invisible space (colored square in light red or blue). [sent-170, score-1.769]
30 Geometric reasoning Given a point cloud of scene, the goal of geometric reasoning is to infer the object primitives (e. [sent-173, score-0.958]
31 1(d)), such as that each primitive can own physical properties (e. [sent-176, score-0.385]
32 We infer the object primitives with two major steps: 1) point cloud segmentation and 2) Volumetric completion. [sent-180, score-0.528]
33 Segmentation with implicit algebraic models We first adopt implicit algebraic models (IAMs) [4] to separate point cloud into several simple surfaces. [sent-183, score-0.345]
34 We adopt a split-and-merge strategy as: 1) splitting the point cloud into simple and smooth regions by IAM fitting, and then 2) merging the regions which are “convexly” connected each other. [sent-184, score-0.29]
35 (a), suppose the 2D point cloud is first split into three line segments with first-order IAM fitting: f1, f2 and f3, and then f2 and f3 are merged together, since they are “convexly” connected. [sent-187, score-0.211]
36 For splitting point cloud into pieces, we adopt region growing scheme [18]. [sent-197, score-0.286]
37 for any line segment L whose two ends are in two connected regions with IAM fits fi and fj respectively, if the points on this line, {∀pl |pl ∈ L}, satisfy fi(pl) < 0 iafn tdh fj (opinl)t < o n0 t,h ithse lnin we say regions i} ,a snadti j are convexly connected. [sent-209, score-0.285]
38 2 (a), we first randomly sample several line points (in dark dot lines) between connected regions, and then check them if satisfy the convexly connected relationship defined above. [sent-211, score-0.229]
39 Volumetric space completion To obtain the physical properties for each object primitive (e. [sent-217, score-0.457]
40 Thus, we complete each surface segment into a volumetric (voxel-based) primitive under three assumptions: a) Occlusion assumption: voxels occluded by the observed point cloud could be parts of objects. [sent-221, score-0.992]
41 Voxel generation and gravity direction We first generate voxels for each segment obtained by above point cloud segmentation by 1) detecting Manhattan axes [7], 2) constructing voxels from point cloud along Manhattan axes by octree construction method [19], and 3) detecting gravity direc- tion. [sent-226, score-1.384]
42 However this invisible space is very helpful for completing the missing voxels from occlusion. [sent-230, score-0.434]
43 Inspired by Furukawa’s method in [7], the Manhattan space is carved by the point cloud into three spaces (as shown in Figure 2(b)): Object surface S (colored-dots voxels), Invisible space U (light green voxels) and Visible space E (white voxels). [sent-231, score-0.268]
44 We complete an object primitive from each labeled surface segment. [sent-233, score-0.219]
45 Suppose each convex surface segment is the visible part of a primitive, we complete invisible part by filling voxels in a visual hull which is occluded by the surface under two assumptions: 1) as lights travel in lines, the voxels complected are behind the point clouds, as shown in Fig. [sent-234, score-0.925]
46 Therefore our algorithm can be simply described as: Loop: for each invisible voxel vi ∈ U, i= 1, 2, . [sent-237, score-0.256]
47 ctions of Manhattan axes, to collect six nearest surface voxels {vj ∈ MS}a (j ≤att a6n). [sent-243, score-0.328]
48 Energy landscapes A 3D object (or primitive) has a potential energy defined by gravity and its state (pose and center) supported by neighboring object in 3D space. [sent-248, score-0.471]
49 The object is said to be in equilibrium when its current state is a local minimum (stable) or local maximum (unstable) of this potential function (See Fig 4 for illustration). [sent-249, score-0.235]
50 , nature disturbance) and then the object moves to a new equilibrium and releases energy. [sent-252, score-0.221]
51 3, the chair in (a) is in a stable equilibrium and its pose is changed with external work to raise its center ofmass. [sent-256, score-0.305]
52 We define the energy change needed to the state change x0 → x1 by Er (x0 → ) = (Rc − t1) x1 · mg, F? [sent-257, score-0.216]
53 (c) The landscape of potential energy is calculated by Eq. [sent-266, score-0.247]
54 (3) over two rotation angles where x0 is a local minimum and x1 is a saddle point passing which, the chair will fall to a deeper energy basin (blue). [sent-267, score-0.331]
55 where R is rotation matrix; c is center of mass, g = (0, 0, 1)T is the gravity direction, t1 is the lowest contact point on the support region (its legs). [sent-268, score-0.365]
56 We visualize the energy landscape on the sphere (φ, θ) : S2 → R in Fig. [sent-269, score-0.247]
57 Such energy can be computed for any rigid objects by bounding the object with a convex hull. [sent-275, score-0.224]
58 Imaging a cup on a desk at stable equilibrium state x0, one can push it to the edge of the table. [sent-278, score-0.487]
59 Then it falls to the ground and releases energy to reach a deeper minimum state x1. [sent-279, score-0.258]
60 The energy change needed to move the cup is Et (x0 → ) = (c − t) x1 · mg − f, (4) where t ∈ R3 is the translation parameter (shortest distwahnecere eto t? [sent-280, score-0.27]
61 Therefore the energy landscape can be viewed as a map from 3D space R3 → R. [sent-287, score-0.247]
62 In both cases, we observe that object stability is only l Ro-. [sent-288, score-0.224]
63 Disconnectivity graph representation The energy map is continuously defined over the object position and pose. [sent-292, score-0.25]
64 For our purpose, we are only interested in how deep its energy basin is at current state (according to the current interpretation of the scene). [sent-293, score-0.251]
65 Therefore, we represent the energy landscape by a so-called disconnectivity graph (DG) which has been used in studying the spinglass models in physics [20]. [sent-294, score-0.56]
66 local minimum energy barrier current state (a)Energyfuntion(b)Discon ectiv ygraph stable equilibrium unstable equilibrium Figure 4. [sent-320, score-0.802]
67 (a) Energy landscapes and its corresponding disconnectivity graph (b). [sent-321, score-0.258]
68 For the cup example, its energy barrier is the work needed (to overcome friction) to push it to the edge. [sent-334, score-0.372]
69 The stability S(a, x0, W) of an object a at state x0 in the presence of a disturbance work W is the maximum energy that it can release when it moves out the energy barrier by the work W. [sent-350, score-0.926]
70 im →ize x the energy barrier and the ewaanodsries tsht tud idsrie trchetceiot ienon enxr? [sent-373, score-0.257]
71 Contact graph and group labelling The contact graph is an adjacency graph G =< V, E >, where V = {v1, v2 , . [sent-383, score-0.402]
72 , vk} is the set of nodes representing wtheh e3reD primitives, and E is} a sse tht eo fs edges denoting tehseen contact relation between the primitives. [sent-386, score-0.23]
73 se I primitives are efisx {edv t}o a single rigid object, dhae-t noted by Oi, and the stability is re-calculated according to Oi. [sent-393, score-0.436]
74 f3 can be calculated as the ratio between the support plane and the contact area of each pair of primitives {vj, vk ∈ Oi}, where one of themo oisf supported by rtihme toitvheesr. [sent-400, score-0.437]
75 Inference of Maximum stability As the label of primitives are coupled with each other, we adopt the graph partition algorithm Swendsen-Wang Cut (SWC) [2] for efficient MCMC inference. [sent-410, score-0.494]
76 )H deerneo we adopt a feature using the ratio between contact area (plane) and ob- max(##ACAi,#Aj), ject planes as: F = where CA is the contact area, Ai and Aj are the areas of vi and vj on the same plane of CA. [sent-416, score-0.506]
77 5 illustrates the process of labeling a number of primitives of a table into a single object. [sent-432, score-0.244]
78 SWC starts with an initial graph in (a), and some of the sampling proposals are accepted by the probability (9) shown in (b) and (c), resulted the energy v. [sent-433, score-0.249]
79 Segmentation accuracy comparison of three methods: Region growing method [18], result of our geometric reasoning and physical reasoning by one “Cut Discrepancy” and three “Hamming Distance”. [sent-437, score-0.763]
80 2) On the other hand, we increase the disturbance W in (5), the chair is fixed to floor. [sent-439, score-0.23]
81 Experimental result We quantitatively evaluate our method in terms of 1) single depth image segmentation, 2) volumetric completion evaluation, 3) physical inference accuracy evaluation, and 4) intuitive physical reality (by videos in supplementary). [sent-441, score-0.893]
82 All these evaluations are based on three datasets: i) NYU depth dataset V2 [16] including 1449 RGBD images with manually labeled ground truth, ii) a set synthesized depth map and volumetric images simulated from CAD scene data. [sent-442, score-0.403]
83 7, our segmentation by physical reasoning is with lower error rate than the another two: region growing segmentation [18], and our geometric reasoning. [sent-448, score-0.635]
84 6 shows some examples for comparing point cloud segmentation result [18] and our result. [sent-450, score-0.252]
85 However it is worth noticing that, beyond the segmentation task, our method can provide richer information such as volumetric information, physical relations, and stabilities etc. [sent-451, score-0.587]
86 For evaluating the accuracy of volumetric completion, we densely sample point clouds from a set of CAD data including 3 indoor scenes. [sent-453, score-0.344]
87 We simulate the volumetric data (as ground truth) and depth images from a certain view (as test images). [sent-454, score-0.3]
88 We calculate the precision and recall which evaluates voxel overlapping 333 111333002 stable volumetric objects by physical reasoning. [sent-455, score-0.695]
89 Comparison of three method: 1) voxel-based representation generated by Octree algorithm [19], 2) voxels in surface and invisible space (sec. [sent-466, score-0.491]
90 Accuracy is measured by nodes of contact graph whose label is correctly inferred divided by the total number of labeled nodes. [sent-476, score-0.288]
91 between ground truth and the volumetric completion of test- ing data. [sent-477, score-0.296]
92 Because the physical relations are defined in terms of our contact graph, we map the ground-truth labels to the nodes of contact graphs obtained by geometric reasoning. [sent-481, score-0.79]
93 Than we evaluate our physical reasoning against two baselines: discriminative methods of using 3D feature priors as similar as one in [16], and greedy inference method such as marching pursuit algorithm for physical inference. [sent-482, score-0.72]
94 The unstable state is calculated out as that it trends to release much potential energy (draw from the sofa) by absorbing little possible energy (e. [sent-490, score-0.536]
95 Case II: Figure 8 (d) the “air pump” unstably stands on floor but is an independent object, because although its stability is very low, the penalty designed in Eq. [sent-493, score-0.228]
96 Case IV: Figure 8 (i) voxels under the “chair” are completed with respect to stability. [sent-499, score-0.271]
97 Conclusion We presented a novel approach for scene understanding by reasoning the stability and unsafeness using intuitive mechanics with the novel representations of disconnectivity graph and disturbance field. [sent-504, score-0.906]
98 (j): hidden voxels under chair compared to (h). [sent-510, score-0.333]
99 Perceived object stability is affected by the internal representation of gravity. [sent-552, score-0.224]
100 Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces advances in neural information processing systems. [sent-611, score-0.498]
wordName wordTfidf (topN-words)
[('voxels', 0.271), ('volumetric', 0.256), ('physical', 0.255), ('primitives', 0.244), ('reasoning', 0.21), ('contact', 0.193), ('stability', 0.192), ('cloud', 0.172), ('disturbance', 0.168), ('invisible', 0.163), ('energy', 0.16), ('convexly', 0.147), ('equilibrium', 0.147), ('disconnectivity', 0.142), ('iam', 0.14), ('gravity', 0.133), ('primitive', 0.13), ('oi', 0.119), ('swc', 0.118), ('physics', 0.113), ('manhattan', 0.099), ('unstable', 0.099), ('barrier', 0.097), ('stable', 0.096), ('dg', 0.089), ('landscape', 0.087), ('vj', 0.083), ('cup', 0.078), ('octree', 0.078), ('desk', 0.073), ('chair', 0.062), ('relations', 0.061), ('release', 0.061), ('scene', 0.059), ('friction', 0.058), ('landscapes', 0.058), ('graph', 0.058), ('surface', 0.057), ('voxel', 0.056), ('state', 0.056), ('pl', 0.054), ('kinect', 0.051), ('geometric', 0.051), ('clouds', 0.049), ('blane', 0.047), ('hamrick', 0.047), ('pump', 0.047), ('mass', 0.047), ('supporting', 0.047), ('fitting', 0.046), ('grouping', 0.046), ('depth', 0.044), ('intuitive', 0.043), ('furukawa', 0.042), ('defective', 0.042), ('absorption', 0.042), ('attene', 0.042), ('biederman', 0.042), ('iams', 0.042), ('releases', 0.042), ('freedom', 0.042), ('connected', 0.041), ('segmentation', 0.041), ('completion', 0.04), ('point', 0.039), ('slam', 0.039), ('zheng', 0.038), ('splitting', 0.038), ('nodes', 0.037), ('solid', 0.037), ('growing', 0.037), ('push', 0.037), ('vi', 0.037), ('newtonian', 0.037), ('penalty', 0.036), ('physically', 0.035), ('fall', 0.035), ('basin', 0.035), ('noticing', 0.035), ('segment', 0.035), ('algebraic', 0.035), ('labelling', 0.035), ('parsing', 0.035), ('iii', 0.035), ('understanding', 0.034), ('zhao', 0.034), ('parses', 0.034), ('object', 0.032), ('mg', 0.032), ('grouped', 0.032), ('occluded', 0.032), ('implicit', 0.032), ('objects', 0.032), ('gupta', 0.032), ('fj', 0.031), ('accepted', 0.031), ('cad', 0.031), ('light', 0.031), ('fitted', 0.031), ('ii', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
2 0.29516888 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
3 0.16213487 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
4 0.16136509 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
Author: Christian Häne, Christopher Zach, Andrea Cohen, Roland Angst, Marc Pollefeys
Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being ’too noisy’. Unfortunately, these priors generally yield overly smooth reconstructions and/or segmentations in certain regions whereas they fail in other areas to constrain the solution sufficiently. In this paper we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task. As a consequence, we propose a rigorous mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. Image segmentations provide geometric cues about which surface orientations are more likely to appear at a certain location in space whereas a dense 3D reconstruction yields a suitable regularization for the segmentation problem by lifting the labeling from 2D images to 3D space. We show how appearance-based cues and 3D surface orientation priors can be learned from training data and subsequently used for class-specific regularization. Experimental results on several real data sets highlight the advantages of our joint formulation.
5 0.16024335 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
6 0.14435883 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
7 0.14048651 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
8 0.1303045 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
9 0.11971512 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
10 0.10883401 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
11 0.10867614 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
12 0.10300234 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
13 0.095036298 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
15 0.089002676 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity
16 0.084217191 342 cvpr-2013-Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso
17 0.081883773 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models
18 0.080243923 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
19 0.078331485 357 cvpr-2013-Revisiting Depth Layers from Occlusions
20 0.075451851 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
topicId topicWeight
[(0, 0.198), (1, 0.112), (2, 0.042), (3, -0.016), (4, 0.06), (5, -0.047), (6, -0.004), (7, 0.112), (8, -0.034), (9, 0.012), (10, 0.054), (11, -0.034), (12, -0.057), (13, 0.048), (14, -0.021), (15, -0.046), (16, 0.05), (17, 0.157), (18, -0.056), (19, 0.072), (20, -0.056), (21, 0.039), (22, 0.033), (23, 0.041), (24, -0.016), (25, 0.025), (26, 0.08), (27, -0.12), (28, -0.063), (29, -0.01), (30, -0.098), (31, 0.063), (32, 0.033), (33, 0.001), (34, -0.029), (35, -0.007), (36, 0.115), (37, 0.002), (38, 0.009), (39, -0.104), (40, 0.008), (41, -0.027), (42, 0.053), (43, 0.042), (44, 0.104), (45, 0.057), (46, 0.005), (47, 0.008), (48, 0.01), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.94350415 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
2 0.7315805 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
3 0.72801185 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
4 0.68807477 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese
Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.
5 0.66676491 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
6 0.65477473 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
7 0.61079043 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
8 0.60380709 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
9 0.58517742 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
10 0.58003449 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
12 0.5510155 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
13 0.5451681 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
14 0.53877985 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
15 0.535438 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
16 0.52488118 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity
17 0.51887763 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve
18 0.51612133 440 cvpr-2013-Tracking People and Their Objects
19 0.51241332 111 cvpr-2013-Dense Reconstruction Using 3D Object Shape Priors
20 0.50816619 354 cvpr-2013-Relative Volume Constraints for Single View 3D Reconstruction
topicId topicWeight
[(10, 0.115), (16, 0.044), (26, 0.052), (28, 0.015), (33, 0.208), (39, 0.031), (47, 0.01), (67, 0.044), (69, 0.122), (74, 0.167), (87, 0.108)]
simIndex simValue paperId paperTitle
same-paper 1 0.86919326 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
2 0.83379096 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
Author: Victor Fragoso, Matthew Turk
Abstract: We present SWIGS, a Swift and efficient Guided Sampling method for robust model estimation from image feature correspondences. Our method leverages the accuracy of our new confidence measure (MR-Rayleigh), which assigns a correctness-confidence to a putative correspondence in an online fashion. MR-Rayleigh is inspired by Meta-Recognition (MR), an algorithm that aims to predict when a classifier’s outcome is correct. We demonstrate that by using a Rayleigh distribution, the prediction accuracy of MR can be improved considerably. Our experiments show that MR-Rayleigh tends to predict better than the often-used Lowe ’s ratio, Brown’s ratio, and the standard MR under a range of imaging conditions. Furthermore, our homography estimation experiment demonstrates that SWIGS performs similarly or better than other guided sampling methods while requiring fewer iterations, leading to fast and accurate model estimates.
3 0.82722646 172 cvpr-2013-Finding Group Interactions in Social Clutter
Author: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
Author: Pradipto Das, Chenliang Xu, Richard F. Doell, Jason J. Corso
Abstract: The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach of generating language through combinations of object detections and language models or bottom-up propagation of keyword tags from training images to test images through probabilistic or nearest neighbor techniques. In contrast, describing videos with natural language is a less studied problem. In this paper, we combine ideas from the bottom-up and top-down approaches to image description and propose a method for video description that captures the most relevant contents of a video in a natural language description. We propose a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions. We compare the results of our system to human descriptions in both short and long forms on two datasets, and demonstrate that final system output has greater agreement with the human descriptions than any single level.
5 0.82505703 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
6 0.820948 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
7 0.82006717 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
8 0.81075525 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
9 0.80903006 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
10 0.80896926 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
11 0.80850297 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
12 0.80729312 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
13 0.80460101 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
14 0.80355632 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
15 0.79992676 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
16 0.79733706 279 cvpr-2013-Manhattan Scene Understanding via XSlit Imaging
17 0.79733223 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
18 0.79699391 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
19 0.79612541 298 cvpr-2013-Multi-scale Curve Detection on Surfaces
20 0.79411197 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases