cvpr cvpr2013 cvpr2013-329 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Saurabh Gupta, Pablo Arbeláez, Jitendra Malik
Abstract: We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb − ucm approach of [se2]g mbeyn mtaatkioinng t effective use oef t dheep gthP information. Wroea schho owf that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.
Reference: text
sentIndex sentText sentNum sentScore
1 edu ae Abstract We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. [sent-3, score-0.563]
2 We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb − ucm approach of [se2]g mbeyn mtaatkioinng t effective use oef t dheep gthP information. [sent-5, score-0.3]
3 Wroea schho owf that our system can label each contour with its type (depth, normal or albedo). [sent-6, score-0.329]
4 We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. [sent-7, score-0.657]
5 We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. [sent-8, score-0.473]
6 Introduction In this paper, we study the problem of image understanding in indoor scenes from color and depth data. [sent-13, score-0.29]
7 The output of our approach is shown in Figure 1: given a single RGB-D image, our system produces contour detection and bottomup segmentation, grouping by amodal completion, and semantic labeling of objects and scene surfaces. [sent-14, score-0.973]
8 Thus, [10] recovers walk- Figure 1: Output of our system: From a single color and depth image, we produce bottom-up segmentation (top-right), long range completion(bottom-left), semantic segmentation (bottom-middle) and contour classification (bottom-right). [sent-19, score-0.715]
9 They also look at the task of bottom-up RGB-D segmentation and semantic scene labeling. [sent-28, score-0.387]
10 They modify the algorithm of [12] to use depth for bottom-up segmentation and then look at the task of seman555666224 tic segmentation using context features derived from inferring support relationships in the scene. [sent-29, score-0.52]
11 [23] use features based on kernel descriptors on superpixels and its ancestors from a region hierarchy followed by a Markov Random Field (MRF) based context modeling. [sent-31, score-0.358]
12 [15] also study the problem of indoor scene parsing with RGB-D data in the context of mobile robotics, where multiple views of the scene are acquired with a Kinect sensor and subsequently merged into a full 3D reconstruction. [sent-33, score-0.262]
13 We visit the segmentation problem afresh from ground-up and develop a gPb [2] like machinery which combines depth information naturally, giving us significantly better bottom-up segmentation when compared to earlier works. [sent-36, score-0.48]
14 We also look at the interesting problem of amodal completion [14] and obtain long range grouping giving us much better bottom-up re- gion proposals. [sent-37, score-0.656]
15 Finally, we are also able to label each edge as being a depth edge, a normal edge, or neither. [sent-38, score-0.191]
16 We build on this motivation and propose new features to represent bottom-up region proposals (which in our case are non-overlapping superpixels and their amodal completion), and use randomized decision tree forest [3, 5] and SVM classifiers. [sent-41, score-0.757]
17 In addition to semantic segmentation, we look at the problem of RGB-D scene classification and show that knowledge of the scene category helps improve accuracy of semantic segmentation. [sent-42, score-0.462]
18 2 we propose an algorithm for estimating the gravity direction from a depth map. [sent-45, score-0.433]
19 4 we describe our semantic segmentation approach, and then apply our machinery to scene classification in Sect. [sent-49, score-0.4]
20 Extracting a Geocentric Coordinate Frame We note that the direction of gravity imposes a lot of structure on how the real world looks (the floor and other supporting surfaces are always horizontal, the walls are al- ways vertical). [sent-52, score-0.436]
21 On the other hand the role of the gravity vector tFdmhisageturfildboeugtr2iao:wvnitCyohufdmtiahrnueglcaetiosnvtie. [sent-58, score-0.249]
22 Since we have depth data available, we propose a simple yet robust algorithm to estimate the direction of gravity. [sent-60, score-0.184]
23 Intuitively, the algorithm tries to find the direction which is the most aligned to or most orthogonal to locally estimated surface normal directions at as many points as possible. [sent-61, score-0.204]
24 The algorithm starts with an estimate of the gravity vector and iteratively refines the estimate via the following 2 steps. [sent-62, score-0.249]
25 Using the current estimate of the gravity direction × make hard-assignments of local surface normals to aligned set N? [sent-64, score-0.455]
26 and orthogonal set N⊥, (based on a t ahrleigsnheodld s edt on thaen angle omgaodnea by tth Ne local surface normal with gi−1). [sent-65, score-0.206]
27 would contain normals from points on tThyep filcoaolrly a,n Nd table-tops and N⊥ would contain normals tfrhoem fl points on bthlee- twopalsls an. [sent-72, score-0.208]
28 Solve for a new estimate of the gravity vector gi which is as aligned to normals in the aligned set and as orthogonal to the normals in the orthogonal set as possible. [sent-74, score-0.604]
29 sin2(θ(n,g)) Our initial estimate for the gravity vector g0 is the Yaxis, and we run 5 iterations with d = 45◦ followed by 5 iterations with d = 15◦. [sent-83, score-0.249]
30 To benchmark the accuracy of our gravity direction, we use the metric of [27]. [sent-84, score-0.249]
31 We rotate the point cloud to align the Y-axis with the estimated gravity direction and look at the angle the floor makes with the Y-axis. [sent-85, score-0.482]
32 Note that our gravity estimate is within 5◦ of the actual direction for 90% of the images, and works as well as the method of [27], while being significantly simpler. [sent-87, score-0.302]
33 Perceptual Organization One of our main goals is to perform perceptual organization on RGB-D images. [sent-89, score-0.163]
34 We would like an algorithm that detects contours and produces a hierarchy ofbottom-up segmentations from which we can extract superpixels at any granularity. [sent-90, score-0.356]
35 We would also like a generic machinery that can be trained to detect object boundaries, but that can also be used to detect different types of geometric contours by leveraging the depth information. [sent-91, score-0.378]
36 In order to design such a depth-aware perceptual organization system, we build on the architecture of the gPb ucm algorithm [2], which is a widely huitseecdt usoreft owfa three f goPr bmo−nuoccumla arl image segmentation. [sent-92, score-0.303]
37 Geometric Contour Cues In addition to color data, we have, at each image pixel, an estimation of its 3D location in the scene and of its surface normal orientation. [sent-95, score-0.184]
38 In order to estimate the local geometric contour cues, we consider a disk centered at each image location. [sent-99, score-0.364]
39 We split the disk into two halves at a pre-defined orientation and compare the information in the two disk-halves, as suggested originally in [22] for contour detection in monocular images. [sent-100, score-0.444]
40 Then, for DG we calculate the distance between the two planes at the disk center and for NG+ and NG−we calculate the angle between the normals of the planes. [sent-104, score-0.23]
41 Contour Detection and Segmentation We formulate contour detection as a binary pixel classification problem where the goal is to separate contour from non-contour pixels, an approach commonly adopted in the literature [22, 12, 2]. [sent-107, score-0.472]
42 Contour Locations We first consider the average of all local contour cues in each orientation and form a combined gradient by taking the maximum response across orientations. [sent-109, score-0.419]
43 We then compute the watershed transform of the combined gradient and declare all pixels on the watershed lines as possible contour locations. [sent-110, score-0.442]
44 Since the combined gradient is constructed with contours from all the cues, the watershed over-segmentation guarantees full recall for the contour locations. [sent-111, score-0.383]
45 We first identify the ground-truth contours in a given orientation, and then declare as positives the candidate contour pixels in the same orientation within a distance tolerance. [sent-114, score-0.431]
46 Features For each orientation, we consider as features our geometric cues DG, NG+ and NG−at 4 scales, and the monocular cues from gPb: BG, CG and TG at their 3 default scales. [sent-116, score-0.256]
47 We also consider three additional cues: the depth of the pixel, a spectral gradient [2] obtained by globalizing the combined local gradient via spectral graph partitioning, and the length of the oriented contour. [sent-117, score-0.239]
48 Oriented Contour Detectors We use as classifiers support vector machines (SVMs) with additive kernels [21], which allow learning nonlinear decision boundaries with an efficiency close to linear SVMs, and use their probabilistic output as the strength of our oriented contour detectors. [sent-118, score-0.437]
49 Hierarchical Segmentation Finally, we use the generic machinery of [2] to construct a hierarchy of segmentations, by merging regions of the initial over-segmentation based on the average strength of our oriented contour detectors. [sent-119, score-0.523]
50 Amodal Completion The hierarchical segmentation obtained thus far only groups regions which are continuous in 2D image space. [sent-122, score-0.16]
51 Common examples are floors, table tops and counter tops, which 555666446 often get fragmented into small superpixels because of objects resting on them. [sent-124, score-0.283]
52 Estimate low dimensional parametric geometric models for individual superpixels obtained from the hierarchical segmentation. [sent-128, score-0.292]
53 Greedily merge superpixels into bigger more complete regions based on the agreement among the parametric geometric fits, and re-estimate the geometric model. [sent-130, score-0.313]
54 In the context of indoor scenes we use planes as our low dimensional geometric primitive. [sent-131, score-0.216]
55 Results We train and test our oriented contour detectors using the instance level boundary annotations of the NYUD2 as the ground-truth labels. [sent-136, score-0.325]
56 However, a drawback of choosing one single level of superpixels in later applications is that it inevitably leads to over- or under-segmentation. [sent-160, score-0.199]
57 Table 2 compares in detail this design choice against our amodal completion approach. [sent-161, score-0.571]
58 A first observation is that our base superpixels are finer than the NYUD2 ones: we obtain a larger number and our ground truth covering is lower (from 0. [sent-162, score-0.199]
59 62: no single level in the full hierarchy would produce better regions than our amodally completed superpixels. [sent-170, score-0.163]
60 Our use of our depth-aware contour cues DG, NG+, and NG− , is further justified because it allows us to also 555666557 ONgPuYbr-UhuciDemr2ahricehryarchy0 BO0. [sent-171, score-0.289]
61 6s 7kt3C95 Table 1: Segmentation benchmarks for hierarchical segmentation on NYUD2. [sent-177, score-0.205]
62 Segmentation benchmarks for superpixels on infer the type for each boundary, whether it is an depth edge, concave edge, convex edge or an albedo edge. [sent-184, score-0.415]
63 Semantic Segmentation We now turn to the problem of semantic segmentation on NYUD2. [sent-188, score-0.224]
64 These include scene structure categories like walls, floors, ceiling, windows, doors; furniture items like beds, chairs, tables, sofa; and objects like lamps, bags, towels, boxes. [sent-191, score-0.198]
65 We leverage the reorganization machinery developed in Section 3 and approach the semantic segmentation task by predicting labels for each superpixel. [sent-193, score-0.374]
66 We define features based on the geocentric pose, shape, size and appearance of the superpixel and its amodal completion. [sent-194, score-0.833]
67 Features As noted above, we define features for each superpixel based on the properties of both the superpixel and its amodal completion. [sent-199, score-0.789]
68 As we describe below, our features capture affordances via absolute sizes and heights which are more meaningful when calculated for the amodal completion rather than just over the superpixel. [sent-200, score-0.624]
69 Note that we describe the features below in context of superpixels but we actually calculate them for both the superpixel and its amodal completion. [sent-201, score-0.833]
70 1 Generic Features Geocentric Pose: These features capture the pose - orientation and height, of the superpixel relative to the gravity di- rection. [sent-204, score-0.554]
71 This includes the size of the 3D bounding rectangle, the surface area - total area, vertical area, horizontal area facing up, horizontal area facing down, if the superpixel is clipped by the image and what fraction of the convex hull is occluded. [sent-207, score-0.298]
72 Shape Features: These include - planarity of the superpixel (estimated by the error in the plane fitting), average strength of local geometric gradients inside the region, on the boundary of the region and outside the region, average orientation of patches in the regions around the superpixel. [sent-208, score-0.309]
73 These features are relatively crude and can be replaced by richer features such as spin images [13] or 3D shape contexts [7]. [sent-209, score-0.164]
74 In total, these add up to 101 features each for the superpixel and its amodal completion. [sent-210, score-0.634]
75 2 Category Specific Features In addition to features above, we train one-versus-rest SVM classifiers based on appearance and shape of the superpixel, and use the SVM scores for each category as features along with the other features mentioned above. [sent-213, score-0.326]
76 To train these SVMs, we use (1) histograms of vector quantized color SIFT [29] as the appearance features, and (2) histograms of geocentric textons (vector quantized words in the joint 2dimensional space ofheight from the ground and local angle with the gravity direction) as shape features. [sent-214, score-0.73]
77 This makes up for 40 features each for the superpixel and its amodal completion. [sent-215, score-0.634]
78 To prevent over-fitting because of retraining on the same set, we train our category specific SVMs only on half of the train set. [sent-221, score-0.167]
79 As baselines, we use [27]-Structure Classifier, where we retrain their structure classifiers for the 40 class task, and [23], where we again retrained their model for this task on this dataset using code available on their website2. [sent-225, score-0.185]
80 We observe that we are able to do well on scene surfaces (walls, floors, ceilings, cabinets, counters), and most furniture items (bed, chairs, sofa). [sent-226, score-0.196]
81 We do poorly on small objects, due to limited training data and weak shape features (our features are designed to describe big scene level surfaces and objects). [sent-227, score-0.229]
82 Ablation Studies In order to gain insights into how much each type of feature contributes towards the semantic segmentation task, we conduct an ablation study by removing parts from the final system. [sent-231, score-0.294]
83 Randomized decision forests (RF) work slightly better than SVMs when using only generic or category specific features, but SVMs are able to more effectively combine information when using both these sets of features. [sent-233, score-0.203]
84 Using features from amodal completion also provides some improvement. [sent-234, score-0.624]
85 [27]-SP: we also retrain our system on the superpixels from [27] and obtain better performance than [27] (36. [sent-235, score-0.283]
86 Performance on NYUD2 4 category task We compare our performance with existing results on the super-ordinate category task as defined in[27] in Table 5. [sent-238, score-0.244]
87 To generate predictions for the super-ordinate categories, we simply retrain our classifiers to predict the 4 super-ordinate category labels. [sent-239, score-0.212]
88 As before we report the pixel wise Jaccard index for 2We run their code on NYUD2 with our bottom-up segmentation hierarchy using the same classifier hyper-parameters as specified in their code. [sent-240, score-0.302]
89 We report the pixel-wise Jaccard index for 4 categories, and 3 aggregate metrics: avg - average Jaccard index, fwavacc - pixel-frequency weighted average Jaccard index, pixacc and pixel accuracy. [sent-252, score-0.256]
90 Scene Classification We address the task of indoor scene classification based on the idea that a scene can be recognized by identifying the objects in it. [sent-268, score-0.311]
91 Thus, we use our predicted semantic segmentation maps as features for this task. [sent-269, score-0.277]
92 Our(RF), Our(SVM) are versions of our system using randomized decision forests, and additive kernel support vector machines respectively (Section 4. [sent-300, score-0.179]
93 Our(RF+Scene) and Our(SVM+Scene) are versions of our system which use the inferred scene category as additional context features (Section 5). [sent-302, score-0.234]
94 × segmentation, long range completion, semantic segmentation and contour classification (into depth discontinuities (red), normal discontinuities (green) and convex normal discontinuities (blue)). [sent-303, score-0.831]
95 We report the diagonal of the confusion matrix for each of the scene class and use the mean of the diagonal, and overall accuracy as aggregate measures of performance. [sent-309, score-0.182]
96 We compare against an appearance-only baseline based on SPM on vector quantized color SIFT descriptors [29], a geometry-only baseline based on SPM on geocentric textons (introduced in Sect. [sent-312, score-0.341]
97 8 34 32 15 Accuracy 46 52 55 58 Mean Diagonal34373847 Table 7: Performance on the scene classification task: We report the diagonal entry of the confusion matrix for each category; and the mean diagonal of the confusion matrix and the overall accuracy as aggregate metrics. [sent-326, score-0.182]
98 We then do an additional experiment of using the predicted scene information as additional features (‘scene context’) for the superpixels and obtain a performance of45. [sent-332, score-0.327]
99 Conclusion We have developed a set of algorithmic tools for perceptual organization and recognition in indoor scenes from RGB-D data. [sent-336, score-0.322]
100 Our system produces contour detection and hierarchical segmentation, grouping by amodal completion, and semantic labeling of objects and scene surfaces. [sent-337, score-0.952]
wordName wordTfidf (topN-words)
[('amodal', 0.426), ('gravity', 0.249), ('contour', 0.236), ('geocentric', 0.199), ('superpixels', 0.199), ('gpb', 0.16), ('superpixel', 0.155), ('jaccard', 0.146), ('completion', 0.145), ('ucm', 0.14), ('depth', 0.131), ('segmentation', 0.124), ('ng', 0.121), ('indoor', 0.112), ('spm', 0.109), ('hierarchy', 0.106), ('normals', 0.104), ('textons', 0.104), ('machinery', 0.101), ('semantic', 0.1), ('orientation', 0.097), ('organization', 0.086), ('perceptual', 0.077), ('dg', 0.077), ('scene', 0.075), ('category', 0.073), ('furniture', 0.073), ('disk', 0.071), ('ablation', 0.07), ('aggregate', 0.07), ('additive', 0.067), ('floors', 0.066), ('gi', 0.063), ('watershed', 0.063), ('normal', 0.06), ('spin', 0.058), ('svms', 0.057), ('geometric', 0.057), ('amodally', 0.057), ('bestc', 0.057), ('bottomup', 0.057), ('fwavacc', 0.057), ('pixacc', 0.057), ('height', 0.055), ('angle', 0.055), ('features', 0.053), ('direction', 0.053), ('cues', 0.053), ('contours', 0.051), ('retrain', 0.051), ('tops', 0.05), ('categories', 0.05), ('surface', 0.049), ('task', 0.049), ('rf', 0.048), ('surfaces', 0.048), ('gupta', 0.047), ('scenes', 0.047), ('classifiers', 0.047), ('forests', 0.047), ('train', 0.047), ('declare', 0.047), ('horizontal', 0.047), ('walls', 0.046), ('cloud', 0.046), ('grouping', 0.046), ('berkeley', 0.045), ('benchmarks', 0.045), ('decision', 0.045), ('ods', 0.042), ('orthogonal', 0.042), ('oriented', 0.042), ('tpami', 0.041), ('predictions', 0.041), ('monocular', 0.04), ('floor', 0.04), ('concave', 0.04), ('discontinuities', 0.04), ('look', 0.039), ('hoiem', 0.039), ('svm', 0.039), ('generic', 0.038), ('efros', 0.038), ('quantized', 0.038), ('koppula', 0.038), ('retrained', 0.038), ('report', 0.037), ('kinect', 0.037), ('hierarchical', 0.036), ('room', 0.036), ('baselines', 0.036), ('candidates', 0.036), ('index', 0.035), ('sift', 0.034), ('fragmented', 0.034), ('randomized', 0.034), ('system', 0.033), ('gradient', 0.033), ('ceiling', 0.033), ('silberman', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
Author: Saurabh Gupta, Pablo Arbeláez, Jitendra Malik
Abstract: We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb − ucm approach of [se2]g mbeyn mtaatkioinng t effective use oef t dheep gthP information. Wroea schho owf that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.
Author: Jia Xu, Maxwell D. Collins, Vikas Singh
Abstract: We study the problem of interactive segmentation and contour completion for multiple objects. The form of constraints our model incorporates are those coming from user scribbles (interior or exterior constraints) as well as information regarding the topology of the 2-D space after partitioning (number of closed contours desired). We discuss how concepts from discrete calculus and a simple identity using the Euler characteristic of a planar graph can be utilized to derive a practical algorithm for this problem. We also present specialized branch and bound methods for the case of single contour completion under such constraints. On an extensive dataset of ∼ 1000 images, our experimOenn tasn suggest vthea dt a assmetal ol fa m∼ou 1n0t0 of ismidaeg knowledge can give strong improvements over fully unsupervised contour completion methods. We show that by interpreting user indications topologically, user effort is substantially reduced.
3 0.1969028 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
Author: Yan Wang, Rongrong Ji, Shih-Fu Chang
Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.
4 0.18201074 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
Author: Gautam Singh, Jana Kosecka
Abstract: This paper presents a nonparametric approach to semantic parsing using small patches and simple gradient, color and location features. We learn the relevance of individual feature channels at test time using a locally adaptive distance metric. To further improve the accuracy of the nonparametric approach, we examine the importance of the retrieval set used to compute the nearest neighbours using a novel semantic descriptor to retrieve better candidates. The approach is validated by experiments on several datasets used for semantic parsing demonstrating the superiority of the method compared to the state of art approaches.
5 0.16602471 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
Author: David Weiss, Ben Taskar
Abstract: We propose SCALPEL, a flexible method for object segmentation that integrates rich region-merging cues with mid- and high-level information about object layout, class, and scale into the segmentation process. Unlike competing approaches, SCALPEL uses a cascade of bottom-up segmentation models that is capable of learning to ignore boundaries early on, yet use them as a stopping criterion once the object has been mostly segmented. Furthermore, we show how such cascades can be learned efficiently. When paired with a novel method that generates better localized shapepriors than our competitors, our method leads to a concise, accurate set of segmentation proposals; these proposals are more accurate on the PASCAL VOC2010 dataset than state-of-the-art methods that use re-ranking to filter much larger bags of proposals. The code for our algorithm is available online.
6 0.16138151 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
7 0.15986128 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
8 0.15974769 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
9 0.15942562 357 cvpr-2013-Revisiting Depth Layers from Occlusions
10 0.15888199 187 cvpr-2013-Geometric Context from Videos
11 0.15797523 437 cvpr-2013-Towards Fast and Accurate Segmentation
12 0.15191293 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
13 0.15099713 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
14 0.14936337 29 cvpr-2013-A Video Representation Using Temporal Superpixels
15 0.14486684 468 cvpr-2013-Winding Number for Region-Boundary Consistent Salient Contour Extraction
16 0.14435883 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
17 0.14344741 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images
18 0.14282514 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
19 0.14179091 33 cvpr-2013-Active Contours with Group Similarity
20 0.13685614 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
topicId topicWeight
[(0, 0.275), (1, 0.088), (2, 0.074), (3, -0.004), (4, 0.121), (5, -0.018), (6, -0.018), (7, 0.211), (8, -0.103), (9, -0.013), (10, 0.12), (11, -0.145), (12, 0.012), (13, 0.117), (14, 0.027), (15, -0.031), (16, 0.029), (17, -0.023), (18, -0.188), (19, 0.106), (20, 0.05), (21, 0.06), (22, -0.047), (23, -0.014), (24, -0.07), (25, 0.159), (26, 0.024), (27, -0.045), (28, 0.035), (29, 0.029), (30, 0.004), (31, 0.039), (32, -0.085), (33, 0.024), (34, 0.026), (35, -0.024), (36, -0.014), (37, 0.116), (38, 0.012), (39, 0.055), (40, 0.032), (41, 0.049), (42, 0.02), (43, 0.046), (44, 0.091), (45, 0.062), (46, 0.102), (47, -0.026), (48, 0.028), (49, 0.103)]
simIndex simValue paperId paperTitle
same-paper 1 0.94328594 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
Author: Saurabh Gupta, Pablo Arbeláez, Jitendra Malik
Abstract: We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb − ucm approach of [se2]g mbeyn mtaatkioinng t effective use oef t dheep gthP information. Wroea schho owf that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.
2 0.77443373 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration
Author: Zhile Ren, Gregory Shakhnarovich
Abstract: We propose a hierarchical segmentation algorithm that starts with a very fine oversegmentation and gradually merges regions using a cascade of boundary classifiers. This approach allows the weights of region and boundary features to adapt to the segmentation scale at which they are applied. The stages of the cascade are trained sequentially, with asymetric loss to maximize boundary recall. On six segmentation data sets, our algorithm achieves best performance under most region-quality measures, and does it with fewer segments than the prior work. Our algorithm is also highly competitive in a dense oversegmentation (superpixel) regime under boundary-based measures.
3 0.72913444 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
4 0.70087469 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
5 0.68440455 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
Author: Yan Wang, Rongrong Ji, Shih-Fu Chang
Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.
6 0.67527157 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
7 0.67138094 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures
9 0.65908808 468 cvpr-2013-Winding Number for Region-Boundary Consistent Salient Contour Extraction
10 0.63307482 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation
11 0.63018399 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
12 0.62587684 140 cvpr-2013-Efficient Color Boundary Detection with Color-Opponent Mechanisms
13 0.61664885 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics
14 0.61203241 29 cvpr-2013-A Video Representation Using Temporal Superpixels
15 0.61121607 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
16 0.6030001 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images
17 0.60265785 437 cvpr-2013-Towards Fast and Accurate Segmentation
18 0.60029382 406 cvpr-2013-Spatial Inference Machines
19 0.58565921 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
20 0.58162951 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
topicId topicWeight
[(10, 0.125), (16, 0.021), (26, 0.055), (28, 0.014), (30, 0.151), (33, 0.249), (39, 0.013), (55, 0.013), (67, 0.054), (69, 0.092), (80, 0.016), (87, 0.123)]
simIndex simValue paperId paperTitle
1 0.88949531 128 cvpr-2013-Discrete MRF Inference of Marginal Densities for Non-uniformly Discretized Variable Space
Author: Masaki Saito, Takayuki Okatani, Koichiro Deguchi
Abstract: This paper is concerned with the inference of marginal densities based on MRF models. The optimization algorithmsfor continuous variables are only applicable to a limited number of problems, whereas those for discrete variables are versatile. Thus, it is quite common to convert the continuous variables into discrete ones for the problems that ideally should be solved in the continuous domain, such as stereo matching and optical flow estimation. In this paper, we show a novel formulation for this continuous-discrete conversion. The key idea is to estimate the marginal densities in the continuous domain by approximating them with mixtures of rectangular densities. Based on this formulation, we derive a mean field (MF) algorithm and a belief propagation (BP) algorithm. These algorithms can correctly handle the case where the variable space is discretized in a non-uniform manner. By intentionally using such a non-uniform discretization, a higher balance between computational efficiency and accuracy of marginal density estimates could be achieved. We present a method for actually doing this, which dynamically discretizes the variable space in a coarse-to-fine manner in the course of the computation. Experimental results show the effectiveness of our approach.
same-paper 2 0.88437295 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
Author: Saurabh Gupta, Pablo Arbeláez, Jitendra Malik
Abstract: We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb − ucm approach of [se2]g mbeyn mtaatkioinng t effective use oef t dheep gthP information. Wroea schho owf that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.
3 0.88132012 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
4 0.88112485 468 cvpr-2013-Winding Number for Region-Boundary Consistent Salient Contour Extraction
Author: Yansheng Ming, Hongdong Li, Xuming He
Abstract: This paper aims to extract salient closed contours from an image. For this vision task, both region segmentation cues (e.g. color/texture homogeneity) and boundary detection cues (e.g. local contrast, edge continuity and contour closure) play important and complementary roles. In this paper we show how to combine both cues in a unified framework. The main focus is given to how to maintain the consistency (compatibility) between the region cues and the boundary cues. To this ends, we introduce the use of winding number–a well-known concept in topology–as a powerful mathematical device. By this device, the region-boundary consistency is represented as a set of simple linear relationships. Our method is applied to the figure-ground segmentation problem. The experiments show clearly improved results.
5 0.87933266 69 cvpr-2013-Boosting Binary Keypoint Descriptors
Author: Tomasz Trzcinski, Mario Christoudias, Pascal Fua, Vincent Lepetit
Abstract: Binary keypoint descriptors provide an efficient alternative to their floating-point competitors as they enable faster processing while requiring less memory. In this paper, we propose a novel framework to learn an extremely compact binary descriptor we call BinBoost that is very robust to illumination and viewpoint changes. Each bit of our descriptor is computed with a boosted binary hash function, and we show how to efficiently optimize the different hash functions so that they complement each other, which is key to compactness and robustness. The hash functions rely on weak learners that are applied directly to the imagepatches, whichfrees usfrom any intermediate representation and lets us automatically learn the image gradient pooling configuration of the final descriptor. Our resulting descriptor significantly outperforms the state-of-the-art binary descriptors and performs similarly to the best floating-point descriptors at a fraction of the matching time and memory footprint.
6 0.87928897 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
7 0.87559503 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.87336189 314 cvpr-2013-Online Object Tracking: A Benchmark
10 0.87167025 298 cvpr-2013-Multi-scale Curve Detection on Surfaces
11 0.8685649 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
12 0.86844689 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus
13 0.86832887 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery
14 0.86801469 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
15 0.86796921 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
16 0.86542749 155 cvpr-2013-Exploiting the Power of Stereo Confidences
17 0.86426777 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
18 0.86383891 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
19 0.86383283 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
20 0.86328554 279 cvpr-2013-Manhattan Scene Understanding via XSlit Imaging