cvpr cvpr2013 cvpr2013-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. [sent-3, score-0.65]
2 Beyond depth maps, these cuboids reveal important structures of a scene. [sent-4, score-0.631]
3 Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. [sent-5, score-2.075]
4 The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. [sent-6, score-1.287]
5 We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. [sent-7, score-0.838]
6 Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter. [sent-9, score-0.065]
7 Given a color image and depth map we match cuboidshaped objects in the scene. [sent-28, score-0.079]
8 (a) and (b): The color image and aligned depth map from Kinect. [sent-29, score-0.071]
9 (c): The cuboids detected by the proposed method and projected onto the color image. [sent-30, score-0.604]
10 (d): The cuboids in the scene viewed from another perspective. [sent-31, score-0.557]
11 In this paper, we design an efficient algorithm to match cuboid structures in an indoor scene using the RGBD images, as illustrated in Fig. [sent-33, score-0.828]
12 Detecting cuboids from RGBD images is challenging due to heavy object occlusion, missing data and strong clutter. [sent-39, score-0.557]
13 Even though matching planes, spheres, cylinders and cones in point clouds has been intensively studied [4], there have been few methods that are able to match multiple cuboids simultaneously in 3D data. [sent-41, score-0.698]
14 Local approaches have been proposed for fitting cuboids to point clouds. [sent-44, score-0.557]
15 Due to high complexity, this method has been used to find cuboids in simple scenes with clutter removed. [sent-47, score-0.588]
16 In contrast to these local methods, our proposed method is able to work on cluttered scenes, does not need initialization and guarantees globally optimal result. [sent-48, score-0.082]
17 In [6], a method is proposed to reliably extract cuboids in 2D images of indoor scenes. [sent-51, score-0.591]
18 This method assumes that all the cuboids are aligned to three dominant orientations. [sent-52, score-0.577]
19 Recently, a method [15] is proposed to detect cuboids with flexible poses in 2D images. [sent-55, score-0.576]
20 Finding 3D cuboids in 2D images requires different domain knowledge to achieve reliable results. [sent-56, score-0.58]
21 In contrast, our method directly works on RGBD images and there is no restriction on the cuboid configuration: cuboids may have arbitrary pose and they can interact in complex ways. [sent-57, score-1.343]
22 By using a branch and bound global optimization, our method is able to give more reliable results than 2D approaches. [sent-58, score-0.068]
23 In [13] and [14], points in color point clouds are classified into a small number of categories. [sent-60, score-0.066]
24 In this paper, instead of trying to segment a 3D scene into regions, we match cuboids to the scene. [sent-66, score-0.585]
25 Our method constructs reliable cuboid candidates by using pairs of planar patches and globally optimizes the cuboid configuration in a novel linear framework. [sent-67, score-1.747]
26 Finding cuboids in cluttered RGBD images is still unsolved. [sent-68, score-0.584]
27 No previous methods are able to globally optimize the cuboid configuration when there is no restriction on the poses and interactions among objects. [sent-69, score-0.87]
28 The proposed method first partitions the 3D point cloud into groups of piecewise linear patches using the graph method [11]. [sent-71, score-0.072]
29 These patches are then used to generate a set of cuboid candidates, each of which has a cost. [sent-72, score-0.77]
30 We globally optimize the selection of the cuboids so that they have small total cost and satisfy the global constraints. [sent-73, score-0.623]
31 The optimal cuboid configuration has small intersection, and we prefer a large coverage of the cuboids. [sent-74, score-0.849]
32 At the same time, we make sure the cuboids satisfy the occlusion conditions. [sent-75, score-0.6]
33 Our contribution is a novel linear approach that efficiently optimizes multiple cuboid matching in RGBD images. [sent-77, score-0.795]
34 Overview We optimize the matching of multiple cuboids in a RGBD image from Kinect. [sent-83, score-0.614]
35 Our goal is to find a set of cuboids that match the RGBD image and at the same time satisfy the spatial interaction constraint. [sent-84, score-0.609]
36 We construct a set of cuboid candidates and select the optimal subset. [sent-85, score-0.856]
37 We formulate cuboid matching into the following optimization problem, mxin{U(x) + λP(x) + μN(x) γA(x) + ξO(x)} (1) s. [sent-87, score-0.777]
38 By minimizing the objective, we prefer to find the multiple cuboid matching that has low local matching cost, small object intersection and occlusion, and covers a large area in the image with a small number of cuboids. [sent-95, score-0.926]
39 Besides the soft constraints specified by the objective function, we further enforce that the optimal cuboid configuration x satisfies hard constraints on cuboid intersection and occlusion. [sent-96, score-1.647]
40 Cuboid candidates We first construct a set of cuboid candidates using pairs of superpixels in the RGBD image. [sent-101, score-1.005]
41 Finding high quality cuboid candidates is critical; we propose a new method as follows. [sent-102, score-0.824]
42 Partition 3D points into groups: We first use the graph method in [11] to find superpixels on the RGBD image. [sent-103, score-0.093]
43 l9 image with the three channels containing the x, y and z components, the superpixels by using both the color and normal images, and the superpixels by using the normal image only. [sent-230, score-0.232]
44 Row 2: Left shows the cuboids constructed using neighboring planar patches and projected on the image; right shows the top 200 cuboid candidates. [sent-232, score-1.41]
45 Row 3: 3D view of the cuboids in the color image in row 2. [sent-233, score-0.577]
46 The red and blue dots are the points from the neighboring surface patches. [sent-234, score-0.103]
47 Row 4: The normalized poses of these cuboids with three edges parallel to the xyz axises. [sent-235, score-0.594]
48 With both color and surface normal images, we partition the depth map into roughly piecewise planar patches. [sent-236, score-0.173]
49 2, we also use the superpixels from the normal image itself; this helps find textured planar patches. [sent-238, score-0.125]
50 Constructing cuboids: We use each pair of neighboring 3D surface patches to construct a cuboid candidate. [sent-239, score-0.861]
51 We define that two surface patches are neighbors if their corresponding superpixels in the color or normal image have a distance less than a small threshold, e. [sent-240, score-0.194]
52 The distance of two superpixels is defined as the shortest distance between their boundaries. [sent-243, score-0.064]
53 We are ready to construct cuboid candidates from pairs of patches. [sent-246, score-0.856]
54 We select one of the two neighboring patches and rotate the 3D points in them so that the normal vector of the chosen one is aligned with the z axis. [sent-247, score-0.165]
55 We then rotate the 3D points again so that the projected normal vector of the second 3D patch on the xy plane is aligned with y axis. [sent-248, score-0.178]
56 We then find two rectangles parallel to the xy and xz plane to fit the points on the two 3D patches. [sent-249, score-0.102]
57 For the first plane the z coordinate is the mean z of the points in the 3D patch, and for the second plane its y is the mean y of the points. [sent-251, score-0.083]
58 The cuboid is the smallest one that encloses both of the rectangles as shown in Fig. [sent-252, score-0.782]
59 2 rows 2-4 illustrate some of the cuboids reconstructed from neighboring superpixels. [sent-256, score-0.586]
60 We change the red and blue channels of the color image to show the neighboring superpixels. [sent-257, score-0.065]
61 2 row 2 shows that the 3D cuboid estimation is accurate. [sent-259, score-0.739]
62 Each candidate cuboid is represented by the lower and upper bounds of x, y and z coordinates in the normalized pose and a matrix T that transforms the cuboid back to the original pose. [sent-260, score-1.534]
63 Such a representation facilitates the computation of cuboid space occupancy and intersection. [sent-262, score-0.759]
64 Local matching costs: The quality of the matching of a cuboid to the 3D data is determined by three factors. [sent-263, score-0.815]
65 The first factor is the coverage area of the points in the two contact cuboid faces. [sent-264, score-0.818]
66 To simplify the computation, the coverage area of the surface points on a cuboid face is determined by the tightest bounding box as shown in Fig. [sent-265, score-0.913]
67 We compute the ratio r of the bounding box area to the area of the corresponding cuboid face. [sent-267, score-0.849]
68 The smaller one of the two ratios are used to quantify the cuboid local matching. [sent-268, score-0.739]
69 We require that cuboids should mostly be behind the 3D scene surface. [sent-271, score-0.577]
70 To measure the solidness of cuboids, as shown in Fig. [sent-272, score-0.145]
71 The points behind the scene surface have z coordinates less than the surface z coordinates. [sent-276, score-0.139]
72 To compute the solidness of cuboid i, we transform the points using the cuboid matrix Fi defined before to bring the cuboid to the normalized position. [sent-277, score-2.391]
73 We do not need to transform all the points but only the points inside the bounding box of the cuboid in the original pose; other points are irrelevant. [sent-278, score-0.86]
74 The solidness is approximated by ns/na, where ns is the number of solid space points in the cuboid and na is the number of all the transformed points falling in the cuboid. [sent-279, score-0.962]
75 We keep only the cuboid candidates whose solidness is greater than 0. [sent-280, score-0.991]
76 The third factor is the cuboid boundary matching cost. [sent-282, score-0.793]
77 When we project each cuboid candidate to the target image, the candidate’s projection silhouette should match the image edges. [sent-283, score-0.838]
78 We keep only the cuboid candidates whose silhouettes to edge average distance is less than 10 pixels. [sent-285, score-0.868]
79 We choose the top M cuboids ranked by the surface matching ratio r with the solidness and average boundary error in specific ranges, e. [sent-286, score-0.837]
80 (a): We compute the solidness of a cuboid by discretizing the space in front of and behind the scene surface into voxels. [sent-309, score-0.949]
81 (b): We encourage cuboids to cover large area of superpixels on an image. [sent-310, score-0.641]
82 (a): The cuboid matching ratio r is min(A/B, C/D), where A, B, C, D are the areas of the rectangular regions. [sent-331, score-0.813]
83 (b) and (c): The projection of cuboids and their depth order determine the occlusion. [sent-332, score-0.608]
84 Since their matching costs are nonnegative, we would obtain a trivial all-zero solution if we simply minimize the unary term. [sent-342, score-0.108]
85 However, in practice, the number of cuboids is usually unknown. [sent-344, score-0.557]
86 Another method is to train an SVM cuboid classifier based on the features and the classification result would have positive and negative values. [sent-345, score-0.739]
87 Our experiment shows that the cuboid classifier has quite low classification rate and the top 200 candidates are almost always classified into the same category. [sent-346, score-0.824]
88 We need to incorporate more global constraints and estimate the number of the cuboid objects and their poses at the same time. [sent-347, score-0.758]
89 1 Unary term We define a binary variable xi to indicate whether cuboid candidate iis selected. [sent-352, score-0.771]
90 ry term U is the overall local cuboid matching cost, U = ? [sent-355, score-0.777]
91 oosing cuboid candidate i; ci is defined in section 2. [sent-357, score-0.787]
92 ere is a guarantee that all the cuboid candidates are at least 50% solid and have projection silhouettes with the average distance of less than 10 pixels to the image edges. [sent-360, score-0.886]
93 2 Volume exclusion Since each cuboid is solid, they tend to occupy nonoverlapping space in the scene. [sent-365, score-0.766]
94 However, completely prohibiting the cuboid intersection is too strong a condition due to the unavoidable errors in candidate pose estimation. [sent-366, score-0.886]
95 We set a tolerance value t for the cuboid intersection and in this paper t = 0. [sent-367, score-0.812]
96 1, which means cuboids may have up to 10% intersection. [sent-368, score-0.557]
97 Here the intersection ratio of two cuboids is defined as the ratio of the volume intersection to the volume of the smaller cuboid. [sent-369, score-0.807]
98 If one cuboid contains the other, the ratio is 1. [sent-370, score-0.775]
99 The intersection ratio between cuboid iand j is denoted as ei,j, which is the larger one of the two possible intersection ratios. [sent-372, score-0.921]
100 If two cuboid candidates have intersection less than t, we have the soft term Winh tehne o opbtjeimcti zvieng fu tnhceti o bnje,c Ptiv =e f? [sent-374, score-0.916]
wordName wordTfidf (topN-words)
[('cuboid', 0.739), ('cuboids', 0.557), ('rgbd', 0.19), ('solidness', 0.145), ('candidates', 0.085), ('intersection', 0.073), ('superpixels', 0.064), ('configuration', 0.047), ('surface', 0.045), ('quantifies', 0.041), ('matching', 0.038), ('ratio', 0.036), ('normal', 0.034), ('indoor', 0.034), ('costs', 0.034), ('candidate', 0.032), ('patches', 0.031), ('scenes', 0.031), ('depth', 0.031), ('coverage', 0.03), ('points', 0.029), ('branch', 0.029), ('neighboring', 0.029), ('intensively', 0.028), ('match', 0.028), ('rectangles', 0.027), ('cluttered', 0.027), ('plane', 0.027), ('planar', 0.027), ('kinect', 0.027), ('projected', 0.027), ('exclusion', 0.027), ('structures', 0.027), ('reasoning', 0.026), ('cloud', 0.025), ('satisfy', 0.024), ('pose', 0.024), ('reliable', 0.023), ('globally', 0.023), ('restriction', 0.023), ('nonnegative', 0.022), ('silhouettes', 0.022), ('rotate', 0.022), ('keep', 0.022), ('unary', 0.02), ('projection', 0.02), ('behind', 0.02), ('area', 0.02), ('color', 0.02), ('facilitates', 0.02), ('aligned', 0.02), ('solid', 0.02), ('poses', 0.019), ('optimize', 0.019), ('silhouette', 0.019), ('xy', 0.019), ('occlusion', 0.019), ('rgb', 0.019), ('soft', 0.019), ('cixi', 0.018), ('jianxiong', 0.018), ('unavoidable', 0.018), ('xyz', 0.018), ('optimizes', 0.018), ('prefer', 0.018), ('construct', 0.017), ('clouds', 0.017), ('boston', 0.017), ('guarantees', 0.017), ('bounding', 0.017), ('box', 0.017), ('volumetric', 0.017), ('xiao', 0.017), ('trivial', 0.016), ('mixed', 0.016), ('reveal', 0.016), ('bound', 0.016), ('ci', 0.016), ('bounded', 0.016), ('encloses', 0.016), ('noteworthy', 0.016), ('tightest', 0.016), ('piecewise', 0.016), ('finding', 0.016), ('channels', 0.016), ('boundary', 0.016), ('volume', 0.016), ('satisfies', 0.015), ('enclosed', 0.015), ('hao', 0.015), ('ofinterest', 0.015), ('sai', 0.015), ('unc', 0.015), ('za', 0.015), ('pairs', 0.015), ('studied', 0.015), ('superpixel', 0.015), ('cylinders', 0.015), ('market', 0.015), ('optimal', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
2 0.42094848 407 cvpr-2013-Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera
Author: Lu Xia, J.K. Aggarwal
Abstract: Local spatio-temporal interest points (STIPs) and the resulting features from RGB videos have been proven successful at activity recognition that can handle cluttered backgrounds and partial occlusions. In this paper, we propose its counterpart in depth video and show its efficacy on activity recognition. We present a filtering method to extract STIPsfrom depth videos (calledDSTIP) that effectively suppress the noisy measurements. Further, we build a novel depth cuboid similarity feature (DCSF) to describe the local 3D depth cuboid around the DSTIPs with an adaptable supporting size. We test this feature on activity recognition application using the public MSRAction3D, MSRDailyActivity3D datasets and our own dataset. Experimental evaluation shows that the proposed approach outperforms stateof-the-art activity recognition algorithms on depth videos, and the framework is more widely applicable than existing approaches. We also give detailed comparisons with other features and analysis of choice of parameters as a guidance for applications.
3 0.13400568 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese
Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.
4 0.10235519 80 cvpr-2013-Category Modeling from Just a Single Labeling: Use Depth Information to Guide the Learning of 2D Models
Author: Quanshi Zhang, Xuan Song, Xiaowei Shao, Ryosuke Shibasaki, Huijing Zhao
Abstract: An object model base that covers a large number of object categories is of great value for many computer vision tasks. As artifacts are usually designed to have various textures, their structure is the primary distinguishing feature between different categories. Thus, how to encode this structural information and how to start the model learning with a minimum of human labeling become two key challenges for the construction of the model base. We design a graphical model that uses object edges to represent object structures, and this paper aims to incrementally learn this category model from one labeled object and a number of casually captured scenes. However, the incremental model learning may be biased due to the limited human labeling. Therefore, we propose a new strategy that uses the depth information in RGBD images to guide the model learning for object detection in ordinary RGB images. In experiments, the proposed method achieves superior performance as good as the supervised methods that require the labeling of all target objects.
5 0.097786941 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
6 0.079305336 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
7 0.062845498 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition
8 0.06204728 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
9 0.061249964 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
10 0.054927245 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
11 0.054741155 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
12 0.054237746 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
13 0.051192775 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
14 0.049569055 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
15 0.047633924 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
16 0.047177549 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
17 0.043654572 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
18 0.043206047 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
19 0.042994339 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
20 0.042214788 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images
topicId topicWeight
[(0, 0.105), (1, 0.045), (2, 0.025), (3, -0.028), (4, -0.013), (5, -0.025), (6, -0.011), (7, 0.071), (8, -0.023), (9, -0.025), (10, 0.006), (11, -0.054), (12, -0.002), (13, 0.05), (14, -0.008), (15, -0.056), (16, -0.032), (17, 0.066), (18, -0.019), (19, 0.049), (20, 0.018), (21, 0.048), (22, 0.018), (23, 0.002), (24, 0.052), (25, 0.018), (26, -0.017), (27, -0.053), (28, -0.053), (29, -0.004), (30, -0.075), (31, 0.1), (32, 0.107), (33, -0.033), (34, -0.039), (35, -0.029), (36, -0.006), (37, 0.127), (38, 0.039), (39, -0.076), (40, 0.035), (41, -0.03), (42, 0.118), (43, 0.06), (44, -0.024), (45, 0.027), (46, 0.083), (47, 0.052), (48, -0.038), (49, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.91401619 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
2 0.60026026 407 cvpr-2013-Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera
Author: Lu Xia, J.K. Aggarwal
Abstract: Local spatio-temporal interest points (STIPs) and the resulting features from RGB videos have been proven successful at activity recognition that can handle cluttered backgrounds and partial occlusions. In this paper, we propose its counterpart in depth video and show its efficacy on activity recognition. We present a filtering method to extract STIPsfrom depth videos (calledDSTIP) that effectively suppress the noisy measurements. Further, we build a novel depth cuboid similarity feature (DCSF) to describe the local 3D depth cuboid around the DSTIPs with an adaptable supporting size. We test this feature on activity recognition application using the public MSRAction3D, MSRDailyActivity3D datasets and our own dataset. Experimental evaluation shows that the proposed approach outperforms stateof-the-art activity recognition algorithms on depth videos, and the framework is more widely applicable than existing approaches. We also give detailed comparisons with other features and analysis of choice of parameters as a guidance for applications.
3 0.58847737 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
Author: Yibiao Zhao, Song-Chun Zhu
Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.
4 0.56354958 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
5 0.51758087 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
6 0.50932705 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
7 0.50639236 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
8 0.47336867 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
9 0.42989808 114 cvpr-2013-Depth Acquisition from Density Modulated Binary Patterns
10 0.4266687 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D
11 0.42394289 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
12 0.42037389 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
13 0.41918802 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
14 0.41691518 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
15 0.39811125 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve
16 0.39601561 127 cvpr-2013-Discovering the Structure of a Planar Mirror System from Multiple Observations of a Single Point
17 0.3911919 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
18 0.3905687 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
19 0.38910824 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
20 0.37889841 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
topicId topicWeight
[(10, 0.091), (16, 0.025), (26, 0.036), (33, 0.242), (61, 0.219), (67, 0.054), (69, 0.087), (87, 0.091)]
simIndex simValue paperId paperTitle
1 0.89059681 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
Author: Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, David Suter
Abstract: We investigate projective estimation under model inadequacies, i.e., when the underpinning assumptions oftheprojective model are not fully satisfied by the data. We focus on the task of image stitching which is customarily solved by estimating a projective warp — a model that is justified when the scene is planar or when the views differ purely by rotation. Such conditions are easily violated in practice, and this yields stitching results with ghosting artefacts that necessitate the usage of deghosting algorithms. To this end we propose as-projective-as-possible warps, i.e., warps that aim to be globally projective, yet allow local non-projective deviations to account for violations to the assumed imaging conditions. Based on a novel estimation technique called Moving Direct Linear Transformation (Moving DLT), our method seamlessly bridges image regions that are inconsistent with the projective model. The result is highly accurate image stitching, with significantly reduced ghosting effects, thus lowering the dependency on post hoc deghosting.
same-paper 2 0.85143065 16 cvpr-2013-A Linear Approach to Matching Cuboids in RGBD Images
Author: Hao Jiang, Jianxiong Xiao
Abstract: We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fitting cuboids to 3D data, we first construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the configuration of the cuboids to satisfy the global structure constraints. The optimal configuration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization efficiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is efficient, accurate and robust against object appearance variations, occlusions and strong clutter.
3 0.83015835 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures
Author: Xiaodi Hou, Alan Yuille, Christof Koch
Abstract: For an ill-posed problem like boundary detection, human labeled datasets play a critical role. Compared with the active research on finding a better boundary detector to refresh the performance record, there is surprisingly little discussion on the boundary detection benchmark itself. The goal of this paper is to identify the potential pitfalls of today’s most popular boundary benchmark, BSDS 300. In the paper, we first introduce a psychophysical experiment to show that many of the “weak” boundary labels are unreliable and may contaminate the benchmark. Then we analyze the computation of f-measure and point out that the current benchmarking protocol encourages an algorithm to bias towards those problematic “weak” boundary labels. With this evidence, we focus on a new problem of detecting strong boundaries as one alternative. Finally, we assess the performances of 9 major algorithms on different ways of utilizing the dataset, suggesting new directions for improvements.
4 0.80500829 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel. While progress in deep learning shows the importance of learning features through multiple layers, it is equally important to learn features through multiple paths. We propose Multipath Hierarchical Matching Pursuit (M-HMP), a novel feature learning architecture that combines a collection of hierarchical sparse features for image classification to capture multiple aspects of discriminative structures. Our building blocks are MI-KSVD, a codebook learning algorithm that balances the reconstruction error and the mutual incoherence of the codebook, and batch orthogonal matching pursuit (OMP); we apply them recursively at varying layers and scales. The result is a highly discriminative image representation that leads to large improvements to the state-of-the-art on many standard benchmarks, e.g., Caltech-101, Caltech-256, MITScenes, Oxford-IIIT Pet and Caltech-UCSD Bird-200.
5 0.80499816 291 cvpr-2013-Motionlets: Mid-level 3D Parts for Human Motion Recognition
Author: LiMin Wang, Yu Qiao, Xiaoou Tang
Abstract: This paper proposes motionlet, a mid-level and spatiotemporal part, for human motion recognition. Motionlet can be seen as a tight cluster in motion and appearance space, corresponding to the moving process of different body parts. We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability. Towards this goal, we develop a data-driven approach to learn motionlets from training videos. First, we extract 3D regions with high motion saliency. Then we cluster these regions and preserve the centers as candidate templates for motionlet. Finally, we examine the representative and discriminative power of the candidates, and introduce a greedy method to select effective candidates. With motionlets, we present a mid-level representation for video, called motionlet activation vector. We conduct experiments on three datasets, KTH, HMDB51, and UCF50. The results show that the proposed methods significantly outperform state-of-the-art methods.
6 0.80288672 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
7 0.79037118 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
9 0.78777325 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
10 0.78393584 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
11 0.78297287 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
12 0.78276932 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
13 0.78225863 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
14 0.78092963 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.77936727 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
16 0.77936167 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
17 0.77916431 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
18 0.77904528 172 cvpr-2013-Finding Group Interactions in Social Clutter
19 0.77849531 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
20 0.77793759 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation