cvpr cvpr2013 cvpr2013-458 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
Reference: text
sentIndex sentText sentNum sentScore
1 Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. [sent-5, score-0.349]
2 Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. [sent-6, score-0.533]
3 We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. [sent-7, score-0.221]
4 Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. [sent-8, score-0.273]
5 Introduction Segmentation algorithms aim to group pixels in images into perceptually meaningful regions which conform to object boundaries. [sent-12, score-0.116]
6 While this scheme has been successfully used in many state-of-the-art algorithms [4, 15], it suffers from one significant disadvantage; mistakes in the over-segmentation which creates the superpixels generally cannot be recovered from and will propagate to later steps in the vision pipeline. [sent-19, score-0.32]
7 Due to their strong impact on the quality of the eventual segmentation [5], it is important that superpixels have certain characteristics. [sent-20, score-0.362]
8 Of these, avoiding violating object boundaries is the most vital, as failing to do so will decrease the accuracy of classifiers used later - since they will be forced to consider pixels which belong to more than one class. [sent-21, score-0.115]
9 Another useful quality is regular distribution over the area being segmented, as this will produce a simpler graph for later steps. [sent-23, score-0.106]
10 This is accomplished using a seeding method- 222000222755 ology based in 3D space and a flow-constrained local iterative clustering which uses color and geometric features. [sent-25, score-0.246]
11 In Section 3 we present the 3D supervoxel segmentation algorithm. [sent-29, score-0.375]
12 In Section 5 we use standard quantitative measures on results from a large RGB+D semantic segmentation dataset to demonstrate that our algorithm conforms to real object boundaries better than other state-of-the-art methods. [sent-31, score-0.157]
13 Graph-based superpixel methods, similar to graph-based full segmentation methods, consider each pixel as a node in a graph, with edges connecting to neighboring pixels. [sent-38, score-0.217]
14 Edge weights are used to characterize similarity between pixels, and superpixel labels are solved for by minimizing a cost function over the graph. [sent-39, score-0.115]
15 [8] produce superpixels which conform to a regular lattice structure by seeking optimal paths horizontally and vertically across a boundary image. [sent-41, score-0.488]
16 While this method does have the advantage of producing superpixels in a regular grid, it sacrifices boundary adherence to so, and furthermore, is heavily dependent on the quality of the precomputed boundary image. [sent-43, score-0.458]
17 uses a geometric flow-based algorithm based on level-set, and enforces a compactness constraint to ensure that superpixels have regular shape. [sent-45, score-0.397]
18 Recently, a significantly faster class of superpixel methods has emerged - Simple Linear Iterative Clustering[1] (SLIC). [sent-51, score-0.115]
19 This is an iterative gradient ascent algorithm which uses a local k-means clustering approach to efficiently find superpixels, clustering pixels in the five dimensional space of color and pixel location. [sent-52, score-0.206]
20 Depth-Adaptive Superpixels[14] recently extended this idea to use depth images, expanding the clustering space with the added dimensions of depth and point normal angles. [sent-53, score-0.237]
21 5D methods, as it does not explicitly consider 3D connectivity or geometric flow. [sent-55, score-0.114]
22 Geometrically Constrained Supervoxels In this Section we present Voxel Cloud Connectivity Segmentation (VCCS), a new method for generating superpixels and supervoxels from 3D point cloud data. [sent-61, score-0.823]
23 The supervoxels produced by VCCS adhere to object boundaries better than state-of-the-art methods while the method remains efficient enough to use in online applications. [sent-62, score-0.398]
24 The seeding of supervoxel clusters is done by partitioning 3D space, rather than the projected image plane. [sent-64, score-0.495]
25 This ensures that supervoxels are evenly distributed according to the geometry of the scene. [sent-65, score-0.294]
26 The iterative clustering algorithm enforces strict spatial connectivity of occupied voxels when considering points for clusters. [sent-67, score-0.441]
27 This means that supervoxels strictly cannot flow across boundaries which are disjoint in 3D space, even though they are connected in the projected plane. [sent-68, score-0.456]
28 1 we shall describe how neighbor voxels are calculated efficiently, then in 3. [sent-70, score-0.259]
29 Unless otherwise noted, all processing is being performed in the 3D point- cloud space constructed from one or more RGB+D cameras (or any other source of point-cloud data). [sent-74, score-0.297]
30 Furthermore, because we work exclusively in a voxel-cloud space (rather than the continuous point-cloud space), we shall adopt the following notation to refer to voxel at index iwithin voxelcloud V of voxel resolution r: Vr(i) = F1. [sent-75, score-0.555]
31 Adjacency Graph Adjacency is a key element of the proposed method, as it ensures that supervoxels do not flow across object boundaries which are disconnected in space. [sent-82, score-0.399]
32 There are three definitions of adjacency in a voxelized 3D space; 6-,18-, or 26-adjacent. [sent-83, score-0.27]
33 As a preliminary step, we construct the adjacency graph for the voxel-cloud. [sent-86, score-0.174]
34 This can be done efficiently by searching the voxel kd-tree, as for a given voxel, t√he centers of all 26-adjacent voxels are contained within √3 ∗ Rvoxel. [sent-87, score-0.427]
35 Rvoxel specifies the voxel resolution which will 3be ∗ u Rsed for the segmentation (for clarity, we shall simply refer to discrete elements at this resolution as voxels). [sent-88, score-0.403]
36 The adjacency graph thus constructed is used extensively throughout the rest of the algorithm. [sent-89, score-0.174]
37 In order to do this, we first divide the space into a voxelized grid with a chosen resolution Rseed, which is significantly higher than Rvoxel. [sent-93, score-0.19]
38 The effect of increasing the seed resolution Rseed can be seen in Figure 2. [sent-94, score-0.135]
39 Initial candidates for seeding are chosen by selecting the voxel in the cloud nearest to the center of each occupied seeding voxel. [sent-95, score-0.821]
40 Once we have candidates for seeding, we must filter out seeds caused by noise in the depth image. [sent-96, score-0.152]
41 This means that we must remove seeds which are points isolated in space (which are likely due to noise), while leaving those which exist on surfaces. [sent-97, score-0.119]
42 To do this, we establish a small search radius Rsearch around each seed, and delete seeds which do not have at least as many voxels as would be occupied by a planar surface intersecting with half of the search volume (this is shown by the green plane in Figure 1). [sent-98, score-0.448]
43 Once filtered, we shift the remaining seeds to the connected voxel within the search volume which has the smallest gradient in the Figure 1. [sent-99, score-0.354]
44 Rseed determines the distance between supervoxels, while Rvoxel determines the resolution to which the cloud is quantized. [sent-101, score-0.266]
45 Rsearch is used to determine if there are a sufficient number of occupied voxels to necessitate a seed. [sent-102, score-0.277]
46 CIELab; (2) we use sum of distances in CIELAB space from neighboring voxels, requiring us to normalize the gradient measure by number of connected adjacent voxels Nadj . [sent-107, score-0.27]
47 Once the seed voxels have been selected, we initialize the supervoxel feature vector by finding the center (in feature space) of the seed voxel and connected neighbors within 2 voxels. [sent-109, score-0.918]
48 Features and Distance Measure VCCS supervoxels space, given as are clusters in a 39 dimensional F = [x, y, z, L, a, b, FPFH1. [sent-112, score-0.32]
49 to calculate distances in this space, we must first normalize the spatial component, as distances, and thus their relative importance, will vary depending on the seed resolution Rseed. [sent-121, score-0.134]
50 Image segmented using VCCS with seed resolutions of 0. [sent-124, score-0.11]
51 In practice we keep the spatial distance constant relative to the other two so that supervoxels occupy a relatively spherical space, but this is not strictly necessary. [sent-136, score-0.294]
52 Flow Constrained Clustering Assigning voxels to supervoxels is done iteratively, using a local k-means clustering related to [1, 14], with the significant difference that we consider connectivity and flow when assigning pixels to a cluster. [sent-140, score-0.68]
53 The general process is as follows: beginning at the voxel nearest the cluster center, we flow outward to adjacent voxels and compute the distance from each of these to the supervoxel center using Equation 4. [sent-141, score-0.906]
54 If the distance is the smallest this voxel has seen, its label is set, and using the adjacency graph, we add its neighbors which are further from the center to our search queue for this label. [sent-142, score-0.428]
55 We proceed iteratively outwards until we have reached the edge of the search volume for each supervoxel (or have no more neighbors to check). [sent-144, score-0.447]
56 This amounts to a breadth-first search of the adjacency graph, where we check the same level for all supervoxels before we proceed down the graphs in depth. [sent-145, score-0.54]
57 Importantly, we avoid edges to adjacent voxels which we have already checked this iteration. [sent-146, score-0.273]
58 The search concludes for a supervoxel when we have reached all the leaf nodes of its adjacency graph or none of the nodes searched in the current level were set to its label. [sent-147, score-0.584]
59 Supervoxel labels will tend to be continuous in 3D space, since labels flow outward from the center of each supervoxel, expanding in space at the same rate. [sent-150, score-0.171]
60 Once the search of all supervoxel adjacency graphs has concluded, we update the centers of each supervoxel cluster by taking the mean of all its constituents. [sent-151, score-0.853]
61 For this work we found that the supervoxels were stable within a few iterations, and so have simply used five iterations for all presented results. [sent-153, score-0.294]
62 Three Dimensional Voxel Segments The proposed method works directly on voxelized point clouds, which has advantages over existing methods which operate in the projected image plane. [sent-155, score-0.214]
63 The most important of these is the ability to segment clouds coming from many sensor observations - either using multiple cameras [3] or accumulated clouds from one [6]. [sent-156, score-0.204]
64 Computationally, this is advantageous, as the speed of our method is dependent on the number of occupied voxels in the scene2, and not the number of observed pixels. [sent-157, score-0.277]
65 As observations will have significant overlap, this means that it is cheaper to segment the overall voxel cloud than the individual 2D observations. [sent-158, score-0.488]
66 For instance, the scene in Figure 5 comes from 180 Kinect observations (640x480), and yet the final voxel cloud (with Rvoxel = 0. [sent-159, score-0.488]
67 Additionally, while VCCS will become more accurate as cloud information is filled in by additional observations, 2D methods must necessarily segment them independently and therefore cannot make use of the added information. [sent-161, score-0.262]
68 Most 2 We should note that while the initial voxelization of the cloud does take more time with a larger cloud, it remains insignificant overall 222000223088 ? [sent-162, score-0.285]
69 Dotted edges in the adjacency graph are not searched, as the nodes have already been added to the search queue. [sent-197, score-0.278]
70 [14], it is not clear how one would combine the multiple segmented 2d images, as superpixels from sequential observations will have no relation to each other and will have conflicting partitionings of space in the merged cloud. [sent-199, score-0.386]
71 Experimental Evaluation In order to evaluate the quality of supervoxels generated by VCCS, we performed a quantitative comparison with three state-of-the-art superpixel methods using publicly available source code. [sent-201, score-0.409]
72 We selected the two 2D techniques with the highest published performance from a recent review [1]: a graph based method, GCb10 [13]3, and a gradient ascent local clustering method, SLIC [1]4. [sent-202, score-0.131]
73 Dataset For testing, we used the recently created NYU Depth Dataset V2 semantic segmentation dataset of Silberman et al. [sent-207, score-0.123]
74 Returning to the Projected Plane RGB+D sensors produce what is known as an organized point cloud- a cloud where every point corresponds to a pixel in the original RGB and depth images. [sent-214, score-0.299]
75 The resulting supervoxels do not cover these holes as shown in the bottom left, since the cloud has no points in them. [sent-229, score-0.568]
76 cloud is voxelized, it necessarily loses this correspondence, and becomes an unstructured cloud which no longer has any direct relationship back to the 2D projected plane. [sent-232, score-0.529]
77 As such, in order to compare results with existing 2D methods we were forced to devise a scheme to apply supervoxel labels to the original image. [sent-233, score-0.36]
78 To do this, we take every point in the original organized cloud and search for the nearest voxel in the voxelized representation. [sent-234, score-0.625]
79 Unfortunately, since there are blank areas in the original depth image due to such factors as reflective surfaces, noise, and limited sensor range, this leaves us with some blank areas in the output labeled images. [sent-235, score-0.123]
80 This is not a significant drawback, as the purpose of the algorithm is to form supervoxels in 3D space, not superpixels in the projected plane, and this hole-filling is only needed for comparison purposes. [sent-237, score-0.644]
81 Over-segmentation of a cloud from the RGB-D scenes dataset[6]. [sent-239, score-0.236]
82 The cloud is created by aligning 180 kinect frames, examples of which are seen on the left side. [sent-240, score-0.305]
83 The resulting cloud has over 100k points with Rvoxel = 0. [sent-241, score-0.236]
84 ing actually makes our results worse, since it does not consider depth, and therefore tends to bleed over some object boundaries that were correctly maintained in the supervoxel representation. [sent-268, score-0.367]
85 Evaluation Metrics The most important property for superpixels is the ability to adhere to, and not cross, object boundaries. [sent-272, score-0.336]
86 To measure this quantitatively, we have used two standard met- rics for boundary adherence- boundary recall and undersegmentation error[7, 13]. [sent-273, score-0.158]
87 Boundary recall measures what fraction of the ground truth edges fall within at least two pixels of a superpixel boundary. [sent-274, score-0.186]
88 High boundary recall indicates that the superpixels properly follow the edges of objects in the ground truth labeling. [sent-275, score-0.424]
89 As can be seen, VCCS and SLIC have the best boundary recall performance, giving similar results as the number of superpixels in the segmentation varies. [sent-277, score-0.46]
90 , gM, and the set of superpixels from an over-segmentation, s1, . [sent-282, score-0.293]
91 sj∩gi|sj|⎞⎠− N⎦⎤, (5) where sj | sj ∩gi is the set of superpixels required to cover a ground tru|th s l∩abgel gi, and N is the number oflabeled ground truth pixels. [sent-287, score-0.385]
92 A lower value means that less superpixels violated ground truth borders by crossing over them. [sent-288, score-0.356]
93 Figure 7 compares the four algorithms, giving under-segmentation error for increasing superpixel counts. [sent-289, score-0.142]
94 Time Performance As superpixels are used as a preprocessing step to reduce the complexity of segmentation, they should be computationally efficient so that they do not negatively impact overall performance. [sent-293, score-0.293]
95 To quantify segmentation speed, we measured the time required for the methods on images of increasing size (for the 2D methods) and increasing number of voxels (for VCCS). [sent-294, score-0.329]
96 VCCS shows performance competitive with SLIC and DASP (the two fastest superpixel methods in the literature) for voxel clouds of sizes which are typical for Kinect data at Rvoxel = 0. [sent-297, score-0.445]
97 In contrast to existing approaches, it works on a voxelized cloud, using spatial connectivity and geometric features to help superpixels conform better to object boundaries. [sent-302, score-0.654]
98 This is fortunate, as we consider under-segmentation error to be the more important of the two measures, as boundary recall does not penalize for crossing ground truth boundaries- meaning that even with a high boundary recall score, superpixels might perform poorly in actual segmentation. [sent-304, score-0.552]
99 We have also presented timing results which show that VCCS has run time comparable to the fastest existing methods, and is fast enough for use as a pre-processing step in online semantic segmentation applications such as robotics. [sent-305, score-0.161]
100 The variation seen in VCCS run-time is due to dependence on other factors, such as Rseed and overall amount of connectivity in the adjacency graphs. [sent-332, score-0.223]
wordName wordTfidf (topN-words)
[('vccs', 0.439), ('supervoxel', 0.306), ('supervoxels', 0.294), ('superpixels', 0.293), ('cloud', 0.236), ('voxel', 0.221), ('voxels', 0.206), ('rvoxel', 0.146), ('adjacency', 0.14), ('seeding', 0.132), ('voxelized', 0.13), ('slic', 0.125), ('dasp', 0.122), ('superpixel', 0.115), ('fpfh', 0.1), ('rseed', 0.098), ('conform', 0.09), ('connectivity', 0.083), ('cielab', 0.08), ('seed', 0.078), ('rgb', 0.073), ('turbopixels', 0.072), ('clouds', 0.071), ('occupied', 0.071), ('segmentation', 0.069), ('depth', 0.063), ('crossing', 0.063), ('seeds', 0.063), ('boundaries', 0.061), ('boundary', 0.06), ('projected', 0.057), ('rusu', 0.054), ('clustering', 0.053), ('shall', 0.053), ('conference', 0.052), ('papon', 0.049), ('rsearch', 0.049), ('voxelization', 0.049), ('weikersdorfer', 0.049), ('sj', 0.046), ('regular', 0.045), ('flow', 0.044), ('ascent', 0.044), ('adhere', 0.043), ('kinect', 0.042), ('additionally', 0.042), ('supe', 0.04), ('outward', 0.04), ('abramov', 0.04), ('pcl', 0.04), ('outwards', 0.04), ('recall', 0.038), ('fastest', 0.038), ('search', 0.038), ('holes', 0.038), ('graphs', 0.037), ('icra', 0.035), ('levinshtein', 0.035), ('graph', 0.034), ('adjacent', 0.034), ('international', 0.033), ('edges', 0.033), ('nodes', 0.033), ('automation', 0.032), ('segmented', 0.032), ('volume', 0.032), ('pages', 0.032), ('geometric', 0.031), ('cameras', 0.031), ('clarity', 0.031), ('observations', 0.031), ('proceed', 0.031), ('returning', 0.03), ('blank', 0.03), ('space', 0.03), ('resolution', 0.03), ('nyu', 0.03), ('vital', 0.029), ('center', 0.029), ('silberman', 0.028), ('enforces', 0.028), ('expanding', 0.028), ('robotics', 0.027), ('forced', 0.027), ('later', 0.027), ('gi', 0.027), ('semantic', 0.027), ('increasing', 0.027), ('moore', 0.027), ('programme', 0.027), ('existing', 0.027), ('created', 0.027), ('perceptually', 0.026), ('achanta', 0.026), ('dimensional', 0.026), ('cluster', 0.026), ('notes', 0.026), ('must', 0.026), ('speeds', 0.025), ('fill', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
2 0.23965287 29 cvpr-2013-A Video Representation Using Temporal Superpixels
Author: Jason Chang, Donglai Wei, John W. Fisher_III
Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.
3 0.23229855 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
Author: Yan Wang, Rongrong Ji, Shih-Fu Chang
Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.
4 0.18270859 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
Author: Guang Shu, Afshin Dehghan, Mubarak Shah
Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.
5 0.16024335 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
6 0.15191293 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
7 0.14176233 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
8 0.13662337 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
9 0.13014619 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
10 0.12739101 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration
11 0.11035211 357 cvpr-2013-Revisiting Depth Layers from Occlusions
13 0.10278228 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics
14 0.1018225 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
15 0.097806796 406 cvpr-2013-Spatial Inference Machines
16 0.096610822 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
17 0.089658767 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation
18 0.086875044 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images
19 0.086828753 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus
20 0.086559951 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation
topicId topicWeight
[(0, 0.184), (1, 0.071), (2, 0.058), (3, 0.007), (4, 0.096), (5, -0.011), (6, 0.018), (7, 0.09), (8, -0.125), (9, 0.02), (10, 0.176), (11, -0.089), (12, 0.015), (13, 0.095), (14, -0.01), (15, -0.035), (16, 0.015), (17, -0.059), (18, -0.107), (19, 0.119), (20, 0.068), (21, 0.026), (22, -0.125), (23, 0.023), (24, -0.079), (25, 0.051), (26, -0.072), (27, -0.081), (28, 0.015), (29, 0.072), (30, 0.024), (31, 0.055), (32, 0.066), (33, -0.074), (34, 0.058), (35, -0.034), (36, -0.006), (37, 0.016), (38, 0.043), (39, -0.068), (40, 0.044), (41, -0.116), (42, -0.044), (43, -0.006), (44, 0.053), (45, -0.024), (46, -0.021), (47, -0.046), (48, -0.003), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.91466606 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
2 0.81850952 29 cvpr-2013-A Video Representation Using Temporal Superpixels
Author: Jason Chang, Donglai Wei, John W. Fisher_III
Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.
3 0.78592372 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images
Author: Andrew Predoehl, Scott Morris, Kobus Barnard
Abstract: unkown-abstract
4 0.75582337 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
Author: Yan Wang, Rongrong Ji, Shih-Fu Chang
Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.
5 0.73690629 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics
Author: Xiaobai Liu, Liang Lin, Alan L. Yuille
Abstract: In this work, we present an efficient multi-scale low-rank representation for image segmentation. Our method begins with partitioning the input images into a set of superpixels, followed by seeking the optimal superpixel-pair affinity matrix, both of which are performed at multiple scales of the input images. Since low-level superpixel features are usually corrupted by image noises, we propose to infer the low-rank refined affinity matrix. The inference is guided by two observations on natural images. First, looking into a single image, local small-size image patterns tend to recur frequently within the same semantic region, but may not appear in semantically different regions. We call this internal image statistics as replication prior, and quantitatively justify it on real image databases. Second, the affinity matrices at different scales should be consistently solved, which leads to the cross-scale consistency constraint. We formulate these two purposes with one unified formulation and develop an efficient optimization procedure. Our experiments demonstrate the presented method can substantially improve segmentation accuracy.
6 0.73422599 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
7 0.70981872 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
8 0.68536884 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
9 0.67719811 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration
10 0.60575318 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
11 0.55165875 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
12 0.54161149 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
13 0.53677797 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
14 0.52114707 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
15 0.47277421 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
16 0.46696964 357 cvpr-2013-Revisiting Depth Layers from Occlusions
17 0.46678627 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction
18 0.46396187 406 cvpr-2013-Spatial Inference Machines
19 0.45493722 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus
20 0.45300272 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation
topicId topicWeight
[(10, 0.538), (16, 0.016), (26, 0.033), (33, 0.215), (67, 0.036), (69, 0.042), (87, 0.048)]
simIndex simValue paperId paperTitle
1 0.93798488 295 cvpr-2013-Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior
Author: Haichao Zhang, David Wipf, Yanning Zhang
Abstract: This paper presents a robust algorithm for estimating a single latent sharp image given multiple blurry and/or noisy observations. The underlying multi-image blind deconvolution problem is solved by linking all of the observations together via a Bayesian-inspired penalty function which couples the unknown latent image, blur kernels, and noise levels together in a unique way. This coupled penalty function enjoys a number of desirable properties, including a mechanism whereby the relative-concavity or shape is adapted as a function of the intrinsic quality of each blurry observation. In this way, higher quality observations may automatically contribute more to the final estimate than heavily degraded ones. The resulting algorithm, which requires no essential tuning parameters, can recover a high quality image from a set of observations containing potentially both blurry and noisy examples, without knowing a priorithe degradation type of each observation. Experimental results on both synthetic and real-world test images clearly demonstrate the efficacy of the proposed method.
2 0.92565972 307 cvpr-2013-Non-uniform Motion Deblurring for Bilayer Scenes
Author: Chandramouli Paramanand, Ambasamudram N. Rajagopalan
Abstract: We address the problem of estimating the latent image of a static bilayer scene (consisting of a foreground and a background at different depths) from motion blurred observations captured with a handheld camera. The camera motion is considered to be composed of in-plane rotations and translations. Since the blur at an image location depends both on camera motion and depth, deblurring becomes a difficult task. We initially propose a method to estimate the transformation spread function (TSF) corresponding to one of the depth layers. The estimated TSF (which reveals the camera motion during exposure) is used to segment the scene into the foreground and background layers and determine the relative depth value. The deblurred image of the scene is finally estimated within a regularization framework by accounting for blur variations due to camera motion as well as depth.
3 0.92156583 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
4 0.91583729 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?
Author: Filippo Bergamasco, Andrea Albarelli, Emanuele Rodolà, Andrea Torsello
Abstract: Traditional camera models are often the result of a compromise between the ability to account for non-linearities in the image formation model and the need for a feasible number of degrees of freedom in the estimation process. These considerations led to the definition of several ad hoc models that best adapt to different imaging devices, ranging from pinhole cameras with no radial distortion to the more complex catadioptric or polydioptric optics. In this paper we dai s .unive . it ence points in the scene with their projections on the image plane [5]. Unfortunately, no real camera behaves exactly like an ideal pinhole. In fact, in most cases, at least the distortion effects introduced by the lens should be accounted for [19]. Any pinhole-based model, regardless of its level of sophistication, is geometrically unable to properly describe cameras exhibiting a frustum angle that is near or above 180 degrees. For wide-angle cameras, several different para- metric models have been proposed. Some of them try to modify the captured image in order to follow the original propose the use of an unconstrained model even in standard central camera settings dominated by the pinhole model, and introduce a novel calibration approach that can deal effectively with the huge number of free parameters associated with it, resulting in a higher precision calibration than what is possible with the standard pinhole model with correction for radial distortion. This effectively extends the use of general models to settings that traditionally have been ruled by parametric approaches out of practical considerations. The benefit of such an unconstrained model to quasipinhole central cameras is supported by an extensive experimental validation.
5 0.91099912 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation
Author: Dohyung Seo, Jeffrey Ho, Baba C. Vemuri
Abstract: In this paper, we introduce a novel framework for computing a path of diffeomorphisms between a pair of input diffeomorphisms. Direct computation of a geodesic path on the space of diffeomorphisms Diff(Ω) is difficult, and it can be attributed mainly to the infinite dimensionality of Diff(Ω). Our proposed framework, to some degree, bypasses this difficulty using the quotient map of Diff(Ω) to the quotient space Diff(M)/Diff(M)μ obtained by quotienting out the subgroup of volume-preserving diffeomorphisms Diff(M)μ. This quotient space was recently identified as the unit sphere in a Hilbert space in mathematics literature, a space with well-known geometric properties. Our framework leverages this recent result by computing the diffeomorphic path in two stages. First, we project the given diffeomorphism pair onto this sphere and then compute the geodesic path between these projected points. Sec- ond, we lift the geodesic on the sphere back to the space of diffeomerphisms, by solving a quadratic programming problem with bilinear constraints using the augmented Lagrangian technique with penalty terms. In this way, we can estimate the path of diffeomorphisms, first, staying in the space of diffeomorphisms, and second, preserving shapes/volumes in the deformed images along the path as much as possible. We have applied our framework to interpolate intermediate frames of frame-sub-sampled video sequences. In the reported experiments, our approach compares favorably with the popular Large Deformation Diffeomorphic Metric Mapping framework (LDDMM).
6 0.89563417 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
7 0.8781848 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
8 0.87728822 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
9 0.84236646 198 cvpr-2013-Handling Noise in Single Image Deblurring Using Directional Filters
same-paper 10 0.84181768 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
12 0.78683257 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
14 0.76634645 131 cvpr-2013-Discriminative Non-blind Deblurring
15 0.76444519 314 cvpr-2013-Online Object Tracking: A Benchmark
16 0.74912256 414 cvpr-2013-Structure Preserving Object Tracking
17 0.74664462 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
18 0.73661536 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
19 0.73487973 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems