cvpr cvpr2013 cvpr2013-242 knowledge-graph by maker-knowledge-mining

242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds


Source: pdf

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , Abstract Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. [sent-3, score-0.175]

2 However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. [sent-4, score-0.906]

3 In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. [sent-5, score-0.377]

4 Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. [sent-6, score-0.361]

5 Such massive point cloud data has shown great potential for solving several fundamental problems in computer vision and robotics, for example, route planning and face analysis. [sent-12, score-0.495]

6 While the existing work mainly focuses on building better point clouds [1] [2] [3][4], point-wise semantic labeling remains an open problem. [sent-13, score-0.488]

7 The framework of search based label propagation from ImageNet to point clouds. [sent-16, score-0.359]

8 While important, labeling 3D point cloud is not an easy task at all. [sent-20, score-0.645]

9 Following 2D semantic labeling, the state-of-the-art solutions [5][6][7][8][9][10] train pointwise label classifiers based on visual and 3D geometric features, and optionally refined with spatial contexts. [sent-21, score-0.181]

10 Another fundamental challenge comes from the lack of sufficient point cloud labels for training, which, in turn has been shown as a key factor towards successful 2D image labeling [11] [12][13]. [sent-24, score-0.781]

11 This factor, as highly aware by the computer vision community, has led to a decade-long effort in building large-scale labeling datasets, showing large benefits for 2D image segmentation, labeling, classification and object detection [11][12][13][14]. [sent-25, score-0.243]

12 However, limited efforts are conducted for point cloud labeling benchmarks. [sent-26, score-0.618]

13 To the best of the authors’ knowledge, the existing labeled point cloud or RGB-D datasets [15][16] are incomparable to the 2D ones, in terms of either scale or coverage. [sent-27, score-0.375]

14 This causes 333 111333335 even state-of-the-art point cloud labeling algorithms only touch data from well controlled environments, with similar training and testing conditions [5][6][7]. [sent-28, score-0.669]

15 Manual point cloud labeling is certainly one solution to the lack of sufficient training data. [sent-30, score-0.732]

16 However, it requires intensive human labor, especially considering the difficulty in labeling 3D points. [sent-31, score-0.304]

17 Even given sufficient point cloud labels, the effective 3D feature design still remains open. [sent-32, score-0.401]

18 But turning to the 2D side, with such massive pixelwise image labels at hand, is it possible to “propagate” or “transfer” such labels from images to point clouds? [sent-33, score-0.391]

19 This approach, if possible, solves the training data insufficiency, while not requiring the intensive point cloud labeling, and also gets around the open problem in designing effective 3D feature and geometric representation. [sent-34, score-0.48]

20 To achieve this goal, we propose to exploit the reference images required for point cloud constructions as a “bridge”. [sent-35, score-0.502]

21 Turning to a label propagation perspective, if we can link query regions to regions in the dataset with the right label as a graph formulation, this point cloud or reference image labeling problem can be easily solved by propagating labels from labeled nodes to unlabeled nodes along the edges. [sent-36, score-1.171]

22 This idea is also inspired from the recent endeavors in search based mask transfer learning, which has shown great potential to deal with the “cross-domain” issue in both object detection and image segmentation [14][17][18]. [sent-37, score-0.27]

23 Furthermore, such search based propagation can be performed in parallel by nature, with high scalability towards big data. [sent-39, score-0.203]

24 We design two key operations to propagate external image labels to point clouds, namely “Search based Superpixel Labeling” and “3D Contextual Refinement”, as outlined in Figure 1. [sent-43, score-0.291]

25 More specifically, we first train linear Support Vector Machines (SVMs) for individual “exemplar” superpixels in the external image collection, use them to retrieve the robust k Nearest Neighbors (kNN) for each superpixel from the reference images, and then collect their labels for future fusion. [sent-48, score-0.894]

26 Note this is comparably efficient to naive kNN search by exploiting the high independence and efficiency of the linear SVMs. [sent-49, score-0.181]

27 3D Contextual Refinement: We then aggregate superpixel label candidates to jointly infer the point cloud labels. [sent-50, score-0.772]

28 Similar to the existing works in image labeling, we exploit the intra-image spatial consistency to boost the labeling accuracy. [sent-51, score-0.243]

29 In addition, and more importantly, 3D contexts are further modeled to capture the inter-image superpixel consistency. [sent-52, score-0.399]

30 Both contexts are integrated into a graphical model to seek for a joint optimal among the superpixel outputs with Loopy Belief Propagation. [sent-53, score-0.56]

31 The rest of this paper is organized as follows: Section 2 introduces our search based superpixel labeling. [sent-54, score-0.369]

32 We denote a point cloud as a set of 3D points P = {pi}, each of which is described with its 3D coordiPnate =s {apnd} R, eGacBh c ooflo wrsh {cxhi i , yi, zi, iRbei,d dG wi,i Bthi i}ts. [sent-59, score-0.375]

33 We also have an external superpixel labeling pool consisting of superpixels with ground truth labels S = {Si, li}iN=1 . [sent-61, score-1.091]

34 Our goal is to assign egarochu pi a stehm laanbteilcs lSab =el l { fSrom an exclusive label set L, as propagated efmroamnt tihce l labeling pool S ex1 . [sent-62, score-0.478]

35 Note that we do not leverage randomly sampled rectangles used in recent works in search based segmentation [14] [21] or object detection [18], to ensure label consistency among pixels in each region, as widely assumed for superpixels [19]2. [sent-65, score-0.424]

36 For every superpixel Sq to be labeled in the reference images, we aim to find the most “similar” superpixels in S, wimhaogsees l,a wbeel a iwmil lt oth fiennd b teh propagated laanrd” sfuupseedrp itox Sq. [sent-66, score-0.692]

37 For each result, we show not only the superpixel but also its surroundings for clarity. [sent-74, score-0.311]

38 As pointed out in [18], nearest neighbor search with Euclidean distance cannot capture the intrinsic visual similarity between superpixels, while on the other hand training a label classifier is too sensitive to the training data, with large generalization error against propagation. [sent-79, score-0.283]

39 For every superpixel extracted from the labeling pool Si ∈ S, we train a linear SVM to identify its visually similar superpixels. [sent-81, score-0.669]

40 To further guarantee the matching robustness, every superpixel Si is translated and rotated to expand to more pos- itive examples for training. [sent-89, score-0.311]

41 However, even with this C setting, if a superpixel other than but very similar to the exemplar appears in the negative set, it will significantly degenerate the performance with an illtrained SVM, while this is not studied in object detection [18]. [sent-92, score-0.538]

42 Different from regular Exemplar SVMs only able to find nearly identical instances, we set a small C in the training process for better generality, allowing examples not exactly the same as the exemplar also have positive scores. [sent-96, score-0.236]

43 To address this problem, given the decision boundary is only determined by the “hard” examples (the support vectors), we introduce the hard negative mining to constrain the decision boundary: 1. [sent-98, score-0.192]

44 We can see that although naive kNN usually outputs visually similar instances, it is not robust against false positives, while ESVM does better in label robustness, with more sensitivity on the difference of labels. [sent-105, score-0.243]

45 To label the superpixel Sq in the reference images, we find the superpixels with the k strongest responses from their ESVMs as its k nearest neighbors in S, i. [sent-106, score-0.857]

46 (b) shows the abstract representation of the graphical model, where N(Sj) denotes the spatial adjacent superpixels of Sj in the reference images. [sent-118, score-0.48]

47 More specifically, given a held-out superpixels validation set with ground truth labels SH = {Sq}qH=1, we first apply the ESVMs to get prediction scores {sq}qH=1, and then caoppllelyctt hSeVEMSVs Mwitshto positive scores fscoro reranking, only some of which have the same label with Sq. [sent-123, score-0.487]

48 Here we aim to learn the function Fi for each ESVM in S, making superpixels with the same labfeorl as Sq hEaSvVeM larger score kthinagn southpeerrsp,i xfoelr-s mulated as a structured learning-to-rank problem [23]. [sent-124, score-0.281]

49 3D Contextual Refinement Given the k nearest neighbors of each superpixel in the reference images, the next step is to label the point cloud {l(pi) | pi ∈ P} by backprojecting and fusing their labels. [sent-137, score-1.012]

50 Note different from traditional contextual refinement approaches [5][6][25], our approach does not require any labeled 3D training data. [sent-140, score-0.227]

51 First the point cloud is oversegmented based on the smoothness and continuity [5], producing a 3D segment set {Di} as shown in Figure 3 (a). [sent-142, score-0.375]

52 eIf r tehfeisr projected region shares enough portion with some superpixel {Si} from this reference image, we connect an edge bet{wSe}en f {roSmi} tahnids SMj (eDncie), as asgheow, wn as roendn leicntks a inn Figure 3(twa)e. [sent-145, score-0.438]

53 Spatially adjacent superpixels within one reference image are also connected, as shown as yellow links in Figure 3 (a). [sent-146, score-0.381]

54 Then, an undirected graph G = {V, E} is built with 3D segments ,a annd u2nDd superpixels as Gno =de {sV VV, E=} {Di}∪{Sq}qM=1 asnegdm mtehen tcso annnde2ctDio snusp emrepnitxieolnse ads nabodovees as edges }E∪. [sent-147, score-0.254]

55 For every node v, we adopt its label l(v) as the variables on the graphical model, and define the potential function to enforce both intra-image and inter-image consistency, as detailed later. [sent-149, score-0.229]

56 In the end, the semantic labels L for 3D segments {Di} is inferred by minimizing the potential function mofe tnhtes graphical fmeordreedl: b argL m∈iLnnv? [sent-150, score-0.322]

57 in which L is the label set, n is the number of nodes in the graphical Lmo isd tehl,e al anbde λl eist a nc iosn tshtaen nt utmhabt weights edisff ienr tehnet potential components. [sent-157, score-0.229]

58 The assigned labels of 2D superpixels are encouraged to be the same as their results from label propagation. [sent-159, score-0.477]

59 To encode the intra-image consistency in the reference images, neighbor superpixels 333 111333668 Algorithm 2: Search based Label Propagation. [sent-168, score-0.381]

60 ake the 3D labeling results consistent among reference images, we further define the inter-image smoothing term as, ψs,3D? [sent-181, score-0.396]

61 If 3D point cloud training data with ground truth label is also available, we can further integrate stronger context into our graphical model. [sent-205, score-0.644]

62 To build the superpixel labeling pool, we collect superpixels from ImageNet [12], which provides object detection ground truth as bounding boxes. [sent-216, score-0.842]

63 We first over-segment the image using Mean Shift [20], and then select the superpixels sharing enough area with the bounding boxes of some object of interest (e. [sent-217, score-0.254]

64 These superpixels are then added in to the superpixel labeling pool with their corresponding labels. [sent-220, score-0.923]

65 To evaluate our algorithm, the Cornell’s indoor dataset [16] is adopted, which contains 24 office scenes and 28 home scenes, constructed from Kinect sensor and RGBDSLAM (http://openslam. [sent-221, score-0.191]

66 Each scene consists of 3D points with 3D coordinates, RGB values, semantic labels, and reference images used for the RGBDSLAM construction. [sent-223, score-0.197]

67 Wabee merge etnoot specific labels while they do not occur in the labeling pool (e. [sent-225, score-0.467]

68 ,W laep tiol pu,st broaotek }the rationality of our approach by visualizing the superpixel labeling pool (blue dots) and the superpixels from the reference images (red dots) in a 2D space mapped from the feature space with Principal Component Analysis, as Figure 4 shows. [sent-230, score-1.095]

69 On the other hand, the coverage is still not perfect, calling for more advanced techniques other than naive kNN search such as ESVM. [sent-232, score-0.186]

70 We use average classification accuracy, that is, the average percentage of the 3Different from [5], these labels in our training process at all. [sent-234, score-0.16]

71 Visual features for label propagation Ed4ge×D4AStiWeufvcpaetorl sLhng(pawceH ixloatehGSdclfLaFiGmefnva eutrlsd erinksaetrnglse)Di9315m × e1n6 1s2i on correctly classified points among all the point clouds, as our protocol to evaluate both the 2D superpixel labeling and the 3D point labeling. [sent-236, score-0.951]

72 We compare our approach to the following baselines: (1) Naive kNN search based propagation; (2) ESVM based propagation; (3) ESVM based propagation with contextual refinement; (4) The state-of-the-art work by Anand et al. [sent-237, score-0.293]

73 Baseline (1) to (3) use our superpixel label propagation pool for ESVM training, and all the Cornell Point Cloud Dataset for testing. [sent-242, score-0.657]

74 And the training examples are generated from the original superpixel with five levels of translation and rotation. [sent-249, score-0.362]

75 For every superpixel, we collect 10,000 negative examples and do five rounds of hard negative mining. [sent-250, score-0.16]

76 In terms of superpixel labeling pool, we collect about 28K superpixels for each type of scenes (office or home). [sent-252, score-0.875]

77 Figure 6 shows the average accuracy of 3D labeling in both office and home scenes, with comparison among different baselines. [sent-254, score-0.394]

78 We can see ESVM, even without contextual refinement, performs well and outperforms naive kNN with a large gap. [sent-255, score-0.182]

79 , ESVM with contextual refinement trained with labeled reference images in Cornell Point Cloud Dataset. [sent-277, score-0.303]

80 And Figure 7 shows the confusion matrices of office scenes and home scenes. [sent-283, score-0.158]

81 This indicates the limit of pure visual approaches for 3D point cloud labeling. [sent-285, score-0.375]

82 Semantic labeling of 2D images is a long-standing problem in computer vision. [sent-291, score-0.243]

83 Example results of point cloud labeling on Cornell dataset [16]. [sent-293, score-0.618]

84 To demonstrate labeling results with more details, reference images from multiple views are provided. [sent-294, score-0.37]

85 The rows are, reference images, ground truth, labeling result from Naive kNN, ESVM, ESVM with refinement, and our oracle performance. [sent-295, score-0.426]

86 Some of the recent works in 3D semantic labeling also follow this scheme, either under a structured SVM framework [5] [6] or using CRFs [9]. [sent-299, score-0.34]

87 Although good performance is reported, such approaches, no matter 2D or 3D ones, require the training and testing data being from similar collection settings, thus prevents its practical applications on 3D point clouds, where large scale training data is not available and hard to label. [sent-301, score-0.214]

88 We address this problem by seeking help from existing massive 2D datasets, with a novel labeling approach inspired from mask transfer. [sent-302, score-0.382]

89 Another branch of labeling work comes from the rising endeavors in transfer learning, i. [sent-304, score-0.373]

90 , to intelligently obtain certain knowledge from different yet related sources with metadata propagation [30], showing promising performance in various tasks such as scene understanding [3 1], segmentation [14], and 3D object detection [32]. [sent-306, score-0.181]

91 In 3D semantic labeling, there is also work adopting online synthesized data for label transfer [8][10]. [sent-307, score-0.208]

92 Its principle lies in identifying the nearest neighbors in the reference data collection, following by transferring the corresponding metadata from the neighbors to the query target. [sent-308, score-0.284]

93 However, traditional search based mask transfer is typically deployed between datasets within the same domain (e. [sent-309, score-0.145]

94 We address this with robust search using Exemplar SVMs and incorporating 3D context to ensure a robust fusion from 2D superpixels to point clouds. [sent-312, score-0.443]

95 Our approach, as detailed in Section 2, handles it with a jointly optimized reranking step using structured prediction [23]. [sent-320, score-0.208]

96 Conclusion How to deal with the semantic labeling problem on the rapidly growing point cloud data is an emerging challenge with a wide variety of practical applications. [sent-322, score-0.688]

97 In this work, we propose a novel 2D-to-3D search based label propagation approach to address this issue. [sent-324, score-0.317]

98 More specially, we use an Exemplar SVM based scheme to transfer the massive 2D image labels from ImageNet to point clouds, with a structured SVM based reranking functions design. [sent-325, score-0.477]

99 Our second contribution is proposing a graphical model to integrate both the intra-image and inter-image spatial context in and among reference images to fuse individual superpixel labels onto 3D points. [sent-326, score-0.705]

100 Experiments over popular datasets validate our advantages, with comparable accuracy and superior efficiency to the direct and fully supervised 3D point labeling state of the arts, even without any point cloud labeling ground truth. [sent-327, score-0.962]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('esvm', 0.312), ('superpixel', 0.311), ('cloud', 0.305), ('superpixels', 0.254), ('labeling', 0.243), ('sq', 0.22), ('exemplar', 0.185), ('knn', 0.173), ('propagation', 0.145), ('reranking', 0.143), ('cornell', 0.135), ('reference', 0.127), ('sj', 0.125), ('pool', 0.115), ('imagenet', 0.111), ('labels', 0.109), ('clouds', 0.105), ('graphical', 0.099), ('si', 0.098), ('naive', 0.092), ('contextual', 0.09), ('contexts', 0.088), ('refinement', 0.086), ('label', 0.086), ('massive', 0.076), ('ir', 0.074), ('point', 0.07), ('semantic', 0.07), ('svms', 0.068), ('home', 0.066), ('svm', 0.064), ('esvms', 0.062), ('floor', 0.059), ('external', 0.059), ('office', 0.059), ('search', 0.058), ('oracle', 0.056), ('labelme', 0.054), ('bi', 0.053), ('transfer', 0.052), ('anand', 0.052), ('training', 0.051), ('endeavors', 0.05), ('rgbdslam', 0.05), ('smj', 0.05), ('sqi', 0.05), ('wall', 0.046), ('rationality', 0.045), ('printer', 0.045), ('potential', 0.044), ('negative', 0.042), ('neighbors', 0.042), ('hard', 0.042), ('prediction', 0.038), ('joachims', 0.038), ('nearest', 0.037), ('qh', 0.037), ('loopy', 0.037), ('certainly', 0.037), ('coverage', 0.036), ('outputs', 0.036), ('metadata', 0.036), ('fulton', 0.036), ('psi', 0.036), ('mask', 0.035), ('positives', 0.035), ('wi', 0.035), ('collect', 0.034), ('pi', 0.034), ('scenes', 0.033), ('ijrr', 0.033), ('koppula', 0.033), ('munoz', 0.033), ('indoor', 0.033), ('context', 0.033), ('bagnell', 0.032), ('difficulty', 0.032), ('issue', 0.031), ('yj', 0.031), ('efficiency', 0.031), ('alexe', 0.029), ('sh', 0.029), ('false', 0.029), ('intensive', 0.029), ('mining', 0.028), ('columbia', 0.028), ('deselaers', 0.028), ('comes', 0.028), ('address', 0.028), ('encouraged', 0.028), ('generality', 0.027), ('propagate', 0.027), ('easy', 0.027), ('structured', 0.027), ('turning', 0.027), ('decision', 0.026), ('among', 0.026), ('sufficient', 0.026), ('operations', 0.026), ('geometric', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

2 0.23229855 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter

Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.

3 0.21699645 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

Author: Guang Shu, Afshin Dehghan, Mubarak Shah

Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.

4 0.21698162 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

Author: Gautam Singh, Jana Kosecka

Abstract: This paper presents a nonparametric approach to semantic parsing using small patches and simple gradient, color and location features. We learn the relevance of individual feature channels at test time using a locally adaptive distance metric. To further improve the accuracy of the nonparametric approach, we examine the importance of the retrieval set used to compute the nearest neighbours using a novel semantic descriptor to retrieve better candidates. The approach is validated by experiments on several datasets used for semantic parsing demonstrating the superiority of the method compared to the state of art approaches.

5 0.21199845 29 cvpr-2013-A Video Representation Using Temporal Superpixels

Author: Jason Chang, Donglai Wei, John W. Fisher_III

Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.

6 0.1969028 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

7 0.19270375 406 cvpr-2013-Spatial Inference Machines

8 0.18868333 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

9 0.17704704 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

10 0.15143287 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics

11 0.14980298 357 cvpr-2013-Revisiting Depth Layers from Occlusions

12 0.14926861 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

13 0.14385107 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

14 0.1403586 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

15 0.13684262 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

16 0.12476533 152 cvpr-2013-Exemplar-Based Face Parsing

17 0.12089274 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition

18 0.1196192 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

19 0.11508133 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

20 0.11226916 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.244), (1, -0.017), (2, 0.043), (3, -0.018), (4, 0.175), (5, 0.024), (6, 0.005), (7, 0.113), (8, -0.113), (9, -0.005), (10, 0.183), (11, -0.102), (12, 0.062), (13, 0.115), (14, -0.067), (15, -0.074), (16, 0.08), (17, -0.071), (18, -0.14), (19, 0.089), (20, 0.078), (21, -0.056), (22, -0.083), (23, 0.047), (24, -0.111), (25, 0.027), (26, -0.101), (27, -0.055), (28, 0.044), (29, 0.07), (30, -0.03), (31, -0.065), (32, 0.002), (33, -0.058), (34, -0.015), (35, -0.084), (36, -0.078), (37, -0.052), (38, 0.055), (39, -0.093), (40, 0.051), (41, -0.111), (42, -0.037), (43, 0.017), (44, 0.129), (45, -0.028), (46, 0.044), (47, -0.033), (48, -0.054), (49, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94779807 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

2 0.77074206 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter

Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.

3 0.76007879 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

4 0.75877696 29 cvpr-2013-A Video Representation Using Temporal Superpixels

Author: Jason Chang, Donglai Wei, John W. Fisher_III

Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.

5 0.75867993 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

Author: Yang Liu, Jing Liu, Zechao Li, Jinhui Tang, Hanqing Lu

Abstract: In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i.e., collaboratively performing image segmentation and tag alignment with those regions. The proposed approach is motivated from the observation that superpixels belonging to an object class usually exist across multiple images and hence can be gathered via the idea of clustering. In WSDC, spectral clustering is adopted to cluster the superpixels obtained from a set of over-segmented images. At the same time, a linear transformation between features and labels as a kind of discriminative clustering is learned to select the discriminative features among different classes. The both clustering outputs should be consistent as much as possible. Besides, weakly-supervised constraints from image-level labels are imposed to restrict the labeling of superpixels. Finally, the non-convex and non-smooth objective function are efficiently optimized using an iterative CCCP procedure. Extensive experiments conducted on MSRC andLabelMe datasets demonstrate the encouraging performance of our method in comparison with some state-of-the-arts.

6 0.72959137 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images

7 0.72935206 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

8 0.72888726 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics

9 0.64391637 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

10 0.64245987 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization

11 0.63880318 406 cvpr-2013-Spatial Inference Machines

12 0.63441902 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

13 0.6269207 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

14 0.6241942 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

15 0.59155375 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

16 0.57913822 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

17 0.52934527 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

18 0.50900984 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction

19 0.48508146 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

20 0.48196638 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.133), (12, 0.017), (16, 0.036), (22, 0.013), (26, 0.057), (28, 0.011), (33, 0.261), (67, 0.072), (69, 0.059), (77, 0.02), (82, 0.097), (87, 0.118)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94144911 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

2 0.93855697 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof

Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –

3 0.93712246 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem

Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.

4 0.93029004 327 cvpr-2013-Pattern-Driven Colorization of 3D Surfaces

Author: George Leifman, Ayellet Tal

Abstract: Colorization refers to the process of adding color to black & white images or videos. This paper extends the term to handle surfaces in three dimensions. This is important for applications in which the colors of an object need to be restored and no relevant image exists for texturing it. We focus on surfaces with patterns and propose a novel algorithm for adding colors to these surfaces. The user needs only to scribble a few color strokes on one instance of each pattern, and the system proceeds to automatically colorize the whole surface. For this scheme to work, we address not only the problem of colorization, but also the problem of pattern detection on surfaces.

5 0.92841655 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments

Author: Yiliang Xu, Sangmin Oh, Anthony Hoogs

Abstract: We present a novel vanishing point detection algorithm for uncalibrated monocular images of man-made environments. We advance the state-of-the-art by a new model of measurement error in the line segment extraction and minimizing its impact on the vanishing point estimation. Our contribution is twofold: 1) Beyond existing hand-crafted models, we formally derive a novel consistency measure, which captures the stochastic nature of the correlation between line segments and vanishing points due to the measurement error, and use this new consistency measure to improve the line segment clustering. 2) We propose a novel minimum error vanishing point estimation approach by optimally weighing the contribution of each line segment pair in the cluster towards the vanishing point estimation. Unlike existing works, our algorithm provides an optimal solution that minimizes the uncertainty of the vanishing point in terms of the trace of its covariance, in a closed-form. We test our algorithm and compare it with the state-of-the-art on two public datasets: York Urban Dataset and Eurasian Cities Dataset. The experiments show that our approach outperforms the state-of-the-art.

6 0.92794424 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery

7 0.92731762 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path

8 0.927037 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

9 0.92661154 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

10 0.92570776 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

11 0.92480427 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems

12 0.92461532 143 cvpr-2013-Efficient Large-Scale Structured Learning

13 0.92448974 298 cvpr-2013-Multi-scale Curve Detection on Surfaces

14 0.92370892 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus

15 0.92339122 414 cvpr-2013-Structure Preserving Object Tracking

16 0.92309058 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image

17 0.92287701 155 cvpr-2013-Exploiting the Power of Stereo Confidences

18 0.92272282 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

19 0.92252851 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

20 0.9222613 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling