cvpr cvpr2013 cvpr2013-29 knowledge-graph by maker-knowledge-mining

29 cvpr-2013-A Video Representation Using Temporal Superpixels

Source: pdf

Author: Jason Chang, Donglai Wei, John W. Fisher_III

Abstract: We develop a generative probabilistic model for temporally consistent superpixels in video sequences. In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We develop a generative probabilistic model for temporally consistent superpixels in video sequences. [sent-8, score-0.39]

2 In contrast to supervoxel methods, object parts in different frames are tracked by the same temporal superpixel. [sent-9, score-0.254]

3 We explicitly model flow between frames with a bilateral Gaussian process and use this information to propagate superpixels in an online fashion. [sent-10, score-0.487]

4 We consider four novel metrics to quantify performance of a temporal superpixel representation and demonstrate superior performance when compared to supervoxel methods. [sent-11, score-0.652]

5 Introduction Since their inception in the work of Ren and Malik [18], superpixels have become an important preprocessing step in many vision systems (e. [sent-13, score-0.296]

6 For example, one can reduce the hundreds of thousands of pixels to hundreds (or thousands) of superpixels while still maintaining very accurate boundaries of objects. [sent-17, score-0.326]

7 Many video analysis applications begin by inferring temporal correspondences across frames. [sent-18, score-0.129]

8 [13]) or over the dense pixels with optical flow (e. [sent-21, score-0.131]

9 For example, structure from motion typically tracks features to solve the correspondence of points within two frames and video segmentation algorithms (e. [sent-24, score-0.161]

10 [8]) often use optical flow to relate segments between two frames. [sent-26, score-0.131]

11 More recently, [5] and [19] use optical flow to obtain robust long-term point trajectories. [sent-27, score-0.131]

12 Furthermore, [16] develops a method to extend the sparse segmentation in a single frame to a dense segmentation with the help of superpixels. [sent-29, score-0.145]

13 Inspired by these previous methods, our primary focus is to develop a representation for videos that parallels the superpixel representation in images. [sent-30, score-0.463]

14 We call these new elementary components, temporal superpixels (TSPs). [sent-31, score-0.368]

15 In the seminal work of [18], Ren and Malik define a superpixel as a set of pixels that are “local, coherent, and which preserve most of the structure necessary for segmentation”. [sent-37, score-0.358]

16 Consequently, intra-frame TSPs should represent a superpixel segmentation of the frame. [sent-39, score-0.411]

17 We believe that the TSP representation bridges the gap between superpixels and videos. [sent-41, score-0.313]

18 For example, the work of [8] connects nodes in a 3D graph along flow vectors to produce oversegmentations of the video. [sent-52, score-0.159]

19 As we discuss in Section 2, however, these over222000445 919 Figure 2: Example hierarchy of oversegmentations, ranging from object segmentation (left) to superpixels (right). [sent-53, score-0.349]

20 [23]) run off-the-shelf superpixel algorithms independently on frames of a video, producing superpixels that are unrelated across time. [sent-62, score-0.717]

21 [7, 15, 18, 26]) formulate the superpixel problem on an affinity graph and solve the spectral clustering problem with graph cuts. [sent-68, score-0.378]

22 As we shall see, the proposed generative model of TSPs in videos reduces to a generative superpixel model in a single frame. [sent-72, score-0.488]

23 In contrast to supervoxel methods ([25] being a notable exception) the proposed TSP approach infers superpixels using only past and current frames and thus scales linearly with video length. [sent-74, score-0.517]

24 We consider novel metrics that evaluate desired traits ofTSPs, and quantitatively show that our method outperforms the supervoxel methods presented in [24]. [sent-75, score-0.245]

25 For example, an image can be approximated by setting each superpixel to a constant color. [sent-80, score-0.358]

26 For example, in videos, one may want to approximate the flow between frames with a constant translation for each superpixel, as is done in [27]. [sent-85, score-0.128]

27 A superpixel segmentation is an oversegmentation that pre- serves the salient features of a pixel-based representation. [sent-90, score-0.45]

28 Describing the motion of each small superpixel as a translation may be a suitable approximation, but doing so for the entire background may introduce much larger errors. [sent-93, score-0.382]

29 Given the vast literature on superpixel methods, we focus on those that most closely relate to the proposed method. [sent-97, score-0.377]

30 The recent work of [1] presents an extremely fast superpixel algorithm called Simple Linear Iterative Clustering (SLIC). [sent-98, score-0.358]

31 As shown by the authors, SLIC rivals other state-of-the-art superpixel techniques in preserving boundaries on the Berkeley Segmentation Dataset [14] while achieving orders ofmagnitude in speed gains. [sent-99, score-0.388]

32 of iterations; and (2) eliminate single-pixel superpixels and enforce that each superpixel is a single 4-connected region. [sent-105, score-0.654]

33 Using these concepts, we restrict the distribution over superpixel labels such that each unique label must be a single 4-connected region. [sent-116, score-0.407]

34 As we show in Section 4, new superpixels will have to be created to explain new objects and disocclusions. [sent-120, score-0.296]

35 If new superpixels can be created, the optimal configuration in the absence of such a penalty is the set of single-pixel superpixels, each with a mean centered at the data of that pixel. [sent-121, score-0.341]

36 d,μk,d), (4) =C denotes equality up to an additive constant, =T as- where sumes that the following configuration of z is a valid topology, some constants have been combined to form α, and Ik dogeyn,ost eosm thee c osnets toanf pixels i bne superpixel ekd. [sent-152, score-0.377]

37 We denote the following relevant superpixel statistics = tk,d ? [sent-154, score-0.358]

38 (6) We define the observation log likelihood for superpixel k as Ln(xIk,d) ? [sent-162, score-0.446]

39 We now describe the three types of proposed label moves to change z: local moves, merge moves, and split moves. [sent-170, score-0.192]

40 If any of proposed moves increase the log likelihood, we make the move to a more optimal configuration. [sent-171, score-0.204]

41 Because of the topology constraints, only pixels bordering another superpixel can change. [sent-175, score-0.421]

42 Merge Moves: combine two superpixels by observing that merging two neighboring 4-connectedregions still results in a 4-connected region. [sent-177, score-0.296]

43 A random superpixel is chosen, and the largest likelihood for merging with any of the neighboring superpixel is found. [sent-178, score-0.761]

44 A split is constructed by running k-means on a random superpixel followed by enforcing connectivity similar to SLIC. [sent-181, score-0.396]

45 As superpixels get larger, they will also have to be more irregularly shaped to capture boundaries. [sent-191, score-0.296]

46 2, are related to the size of the resulting superpixels and are learned based on natural image statistics. [sent-195, score-0.296]

47 The resulting mean superpixel area (N/K) is shown in Figure 5. [sent-198, score-0.358]

48 Here, K refers to the number of superpixels produced by the algorithm, and M is the number of desired superpixels. [sent-199, score-0.336]

49 (10) The variances can then be automatically set based on the desired number of superpixels and some arbitrary α. [sent-203, score-0.336]

50 Example superpixel segmentation results are shown in Figure 6. [sent-204, score-0.411]

51 Temporal Consistency In this section, we extend the superpixel model to develop a temporal superpixel representation. [sent-208, score-0.805]

52 The mean locations of the superpixels evolve in a more complex fashion. [sent-214, score-0.33]

53 Because objects move somewhat smoothly, we make the assumption that superpixels that are close in location and that look alike should move similarly. [sent-215, score-0.386]

54 One reason is that the prior on flow fields must be able to accommodate both smoothness within objects and discontinuities across objects. [sent-225, score-0.157]

55 For example, using an L2 penalty on neighbor differences in the Horn-Schunck optical flow formulation [10] is related to a GP with a precision that has a 4-connected neighbor sparsity pattern. [sent-226, score-0.131]

56 From left to right: superpixel segmentation; GP with 4-connected neighbor precision; GP with location kernel; GP with bilateral kernel; mapping between flow vectors and colors. [sent-228, score-0.524]

57 We call this covariance kernel the bilateral kernel, as it is similar to the bilateral filter [21]. [sent-231, score-0.162]

58 While the bilateral GP is able to model flow discontinuities, the prior does not fit the movement of a deformable object composed of a single color. [sent-233, score-0.146]

59 New, Old, and Dead Superpixels Due to camera motion, occlusions, and disocclusions, we must also allow old superpixels to disappear and new super- pixels to emerge. [sent-243, score-0.409]

60 We define a dead superpixel as one that existed in the previous frame but no longer exists in the current frame. [sent-244, score-0.581]

61 In the first frame, each TSP was treated as a new superpixel with the corresponding log likelihood of Equation 9. [sent-246, score-0.446]

62 In subsequent frames, the label distribution is updated to p(z) ∝ ˆ αKnβˆKo1 I{T(z)}, (15) βˆ where is the geometric distribution parameter for old TSPs. [sent-247, score-0.164]

63 Consequently, we try to separate the size of a superpixel from the tradeoff between using an old or new superpixel. [sent-251, score-0.471]

64 However, initializing with optical flow using [12] significantly improves results. [sent-259, score-0.15]

65 [17]) on the old TSPs, o = {1, · · · , Ko}: ft = Σ(Σ + δ? [sent-263, score-0.151]

66 (17) As before, we propose different label moves with corresponding optimal parameters, and accept the move if it increases the likelihood. [sent-268, score-0.192]

67 In the case of old TSPs, the prior on the parameters is no longer uniform, and the form of the optimal values change. [sent-269, score-0.158]

68 (19) Using the optimal mean, the observation log likelihood for old TSPs becomes (up to a constant) ? [sent-278, score-0.227]

69 The split move is slightly modified to accommodate the difference in new and old TSPs. [sent-288, score-0.216]

70 For less than 1000 superpixels per frame, inference takes a few seconds for the first frame and tens of seconds for subsequent frames. [sent-289, score-0.385]

71 We define K as the wseet cofo nlasibdeelsr tmhautl we can split in latob:e K = {k ; Nk = 0, sk = o} ∪ knew, (22) consisting of all the dead TSPs and a possible new one. [sent-295, score-0.131]

72 In particular, if the support of a superpixel is not represented outside of the image domain, the data, ? [sent-309, score-0.358]

73 Consider Figure 9, which illustrates the tracking of two superpixels that are moving to the right at the same speed. [sent-311, score-0.296]

74 Because the green superpixel is moving out of the image, the empirical location mean does not move correctly, causing errors in the flow estimates and the optimal parameter estimates. [sent-312, score-0.522]

75 Consequently, we represent the full support of any superpixel that contains a pixel in the image domain. [sent-313, score-0.375]

76 Experiments In this section, we compare our temporal superpixel method to the supervoxel methods described in [24] and [25]. [sent-316, score-0.567]

77 Parameter values for all videos and experiments were fixed excluding M, which indicates the desired number of superpixels per frame. [sent-318, score-0.39]

78 We consider a set of additional metrics aimed to capture the following aspects of a good model: object segmentation consistency, 2D boundary accuracy, intra-frame spa- tial locality, inter-frame temporal extent, and inter-frame label consistency. [sent-326, score-0.292]

79 While [24] introduces the 3D boundary recall (BR3), we find that BR3 captures a mixture of information between 2D boundary recall and object segmentation consistency. [sent-335, score-0.187]

80 The typical boundary recall metric finds the percent of ground truth boundaries that are also declared superpixel boundaries. [sent-337, score-0.519]

81 The superpixel boundaries are often dilated to reconcile the localization problem, but this causes the error to depend on the amount of dilation. [sent-339, score-0.388]

82 We introduce the 2D boundary recall distance (BRD) metric, related to the metric of [6], as the average distance between points on the ground truth boundary to a declared super pixel boundary averaged across frames. [sent-340, score-0.264]

83 As superpixels get larger, they lose their representative power. [sent-344, score-0.315]

84 Consequently, assuming a perfect ACC, UE, and BRD, assigning equally-sized superpixels corresponds to the best representation. [sent-345, score-0.296]

85 We therefore introduce the size variation (SZV) metric which considers the average standard deviation of superpixel sizes across all frames. [sent-346, score-0.396]

86 The authors of [25] introduce the mean duration time (MDT) metric which computes the average number of frames a supervoxel exists in. [sent-349, score-0.202]

87 We therefore introduce the temporal extent (TEX) metric which normalizes the MDT by the total number of frames in the video. [sent-351, score-0.154]

88 Finally, we introduce the label consistency (LC) which measures how well superpixels track parts of objects. [sent-353, score-0.355]

89 ot Satuepde ground atrbeutlhs aflto wfra mande compared to the superpixel labels at frame t. [sent-355, score-0.415]

90 LC counts the average number of pixels that agree between the inferred superpixels and the ones propagated via the flow. [sent-356, score-0.314]

91 Algorithm Comparison Using these metrics, we compare our TSP algorithm to the top two supervoxel methods of [24] (GBH [8] and SWA [20]) and the streaming version of GBH developed in [25]. [sent-359, score-0.194]

92 We use the GBH implementation provided by [24] which does not use optical flow since the original algorithm does not produce superpixel segmentations (as shown in Figure 3). [sent-360, score-0.489]

93 In contrast, our algorithm only required changing M, the desired number of superpixels per frame. [sent-365, score-0.336]

94 Therefore, unlike [24] which plots the metrics against the number of supervoxels, we plot the metrics against the average number of superpixels per frame. [sent-367, score-0.432]

95 For the LC, we use the videos from [2] and [12] since ground truth flow is needed. [sent-370, score-0.155]

96 The BRD value for TSPs indicates that the average distance between a ground truth boundary and a superpixel boundary is approximately 1pixel. [sent-372, score-0.474]

97 After obtaining the full superpixel segmentation, we manually color a subset of superpixels that existed in the first frame and visualize their extent in time by looking at subsequent frames. [sent-376, score-0.802]

98 TSP track superpixels correctly through all frames, while GBH and SWA lose the tracks as time progresses and exhibit drifting effects (shown in blue). [sent-377, score-0.343]

99 Conclusion In this paper, we have presented a low-level video representation called the temporal superpixel. [sent-380, score-0.128]

100 We have shown quantitatively that TSP representations outperform supervoxel methods. [sent-389, score-0.137]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tsps', 0.651), ('superpixel', 0.358), ('tsp', 0.305), ('superpixels', 0.296), ('gbh', 0.137), ('supervoxel', 0.137), ('old', 0.113), ('gp', 0.102), ('moves', 0.1), ('slic', 0.096), ('dead', 0.093), ('flow', 0.083), ('oversegmentations', 0.076), ('swa', 0.076), ('existed', 0.072), ('temporal', 0.072), ('metrics', 0.068), ('bilateral', 0.063), ('topology', 0.063), ('streaming', 0.057), ('brd', 0.056), ('videos', 0.054), ('supervoxels', 0.053), ('segmentation', 0.053), ('mdt', 0.049), ('boundary', 0.049), ('optical', 0.048), ('acc', 0.047), ('frames', 0.045), ('likelihood', 0.045), ('log', 0.043), ('sai', 0.041), ('desired', 0.04), ('csail', 0.04), ('video', 0.039), ('frame', 0.039), ('oversegmentation', 0.039), ('generative', 0.038), ('ft', 0.038), ('split', 0.038), ('kn', 0.038), ('donglai', 0.037), ('ue', 0.036), ('move', 0.035), ('evolve', 0.034), ('ko', 0.032), ('label', 0.031), ('accommodate', 0.03), ('boundaries', 0.03), ('inference', 0.03), ('ren', 0.028), ('track', 0.028), ('declared', 0.026), ('optimal', 0.026), ('graphical', 0.026), ('lc', 0.026), ('discontinuities', 0.026), ('motion', 0.024), ('supplement', 0.024), ('aid', 0.024), ('ik', 0.023), ('merge', 0.023), ('nk', 0.023), ('ka', 0.023), ('mit', 0.023), ('medical', 0.023), ('volumetric', 0.022), ('berkeley', 0.022), ('switch', 0.022), ('consequently', 0.022), ('expressed', 0.02), ('subsequent', 0.02), ('clustering', 0.02), ('metric', 0.02), ('zi', 0.02), ('location', 0.02), ('closely', 0.019), ('lose', 0.019), ('kernel', 0.019), ('initializing', 0.019), ('coherent', 0.019), ('longer', 0.019), ('aimed', 0.019), ('configuration', 0.019), ('joint', 0.019), ('conditioned', 0.018), ('recall', 0.018), ('labels', 0.018), ('propagated', 0.018), ('across', 0.018), ('truth', 0.018), ('voxel', 0.018), ('pixel', 0.017), ('representation', 0.017), ('commonly', 0.017), ('logp', 0.017), ('extent', 0.017), ('covariance', 0.017), ('develop', 0.017), ('brox', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 29 cvpr-2013-A Video Representation Using Temporal Superpixels

Author: Jason Chang, Donglai Wei, John W. Fisher_III

2 0.23965287 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter

Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as superpixels, is a widely used preprocessing step in segmentation algorithms. Superpixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that superpixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent superpixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.

3 0.21682647 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

Author: Guang Shu, Afshin Dehghan, Mubarak Shah

Abstract: We propose an approach to improve the detection performance of a generic detector when it is applied to a particular video. The performance of offline-trained objects detectors are usually degraded in unconstrained video environments due to variant illuminations, backgrounds and camera viewpoints. Moreover, most object detectors are trained using Haar-like features or gradient features but ignore video specificfeatures like consistent colorpatterns. In our approach, we apply a Superpixel-based Bag-of-Words (BoW) model to iteratively refine the output of a generic detector. Compared to other related work, our method builds a video-specific detector using superpixels, hence it can handle the problem of appearance variation. Most importantly, using Conditional Random Field (CRF) along with our super pixel-based BoW model, we develop and algorithm to segment the object from the background . Therefore our method generates an output of the exact object regions instead of the bounding boxes generated by most detectors. In general, our method takes detection bounding boxes of a generic detector as input and generates the detection output with higher average precision and precise object regions. The experiments on four recent datasets demonstrate the effectiveness of our approach and significantly improves the state-of-art detector by 5-16% in average precision.

4 0.21199845 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

Author: Yan Wang, Rongrong Ji, Shih-Fu Chang

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decadelong community efforts, such as ImageNet and LabelMe, and a novel “cross-domain ” label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-ofthe-art approaches with far better efficiency.

5 0.17543866 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

Author: Gautam Singh, Jana Kosecka

Abstract: This paper presents a nonparametric approach to semantic parsing using small patches and simple gradient, color and location features. We learn the relevance of individual feature channels at test time using a locally adaptive distance metric. To further improve the accuracy of the nonparametric approach, we examine the importance of the retrieval set used to compute the nearest neighbours using a novel semantic descriptor to retrieve better candidates. The approach is validated by experiments on several datasets used for semantic parsing demonstrating the superiority of the method compared to the state of art approaches.

6 0.17413695 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

7 0.16462983 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics

8 0.15951969 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

9 0.15867981 357 cvpr-2013-Revisiting Depth Layers from Occlusions

10 0.1532677 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

11 0.14936337 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

12 0.13875434 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

13 0.12432957 294 cvpr-2013-Multi-class Video Co-segmentation with a Generative Multi-video Model

14 0.11561123 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

15 0.10585759 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction

16 0.10469939 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation

17 0.1038411 187 cvpr-2013-Geometric Context from Videos

18 0.10155788 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation

19 0.10017315 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

20 0.089954421 406 cvpr-2013-Spatial Inference Machines

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, 0.025), (2, 0.052), (3, -0.041), (4, 0.066), (5, 0.007), (6, 0.063), (7, 0.035), (8, -0.135), (9, 0.063), (10, 0.239), (11, -0.08), (12, 0.042), (13, 0.097), (14, 0.013), (15, 0.032), (16, 0.076), (17, -0.133), (18, -0.131), (19, 0.101), (20, 0.074), (21, -0.011), (22, -0.118), (23, -0.012), (24, -0.152), (25, 0.027), (26, -0.14), (27, -0.081), (28, 0.031), (29, 0.089), (30, 0.086), (31, -0.03), (32, 0.077), (33, -0.058), (34, 0.025), (35, -0.066), (36, -0.053), (37, 0.029), (38, 0.094), (39, -0.041), (40, 0.059), (41, -0.062), (42, -0.058), (43, -0.038), (44, -0.024), (45, 0.028), (46, 0.005), (47, -0.068), (48, 0.015), (49, -0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93301064 29 cvpr-2013-A Video Representation Using Temporal Superpixels

Author: Jason Chang, Donglai Wei, John W. Fisher_III

2 0.75697958 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

Author: Jeremie Papon, Alexey Abramov, Markus Schoeler, Florentin Wörgötter

3 0.74569112 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images

Author: Andrew Predoehl, Scott Morris, Kobus Barnard

Abstract: unkown-abstract

4 0.72123092 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen

Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

5 0.70736516 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics

Author: Xiaobai Liu, Liang Lin, Alan L. Yuille

Abstract: In this work, we present an efficient multi-scale low-rank representation for image segmentation. Our method begins with partitioning the input images into a set of superpixels, followed by seeking the optimal superpixel-pair affinity matrix, both of which are performed at multiple scales of the input images. Since low-level superpixel features are usually corrupted by image noises, we propose to infer the low-rank refined affinity matrix. The inference is guided by two observations on natural images. First, looking into a single image, local small-size image patterns tend to recur frequently within the same semantic region, but may not appear in semantically different regions. We call this internal image statistics as replication prior, and quantitatively justify it on real image databases. Second, the affinity matrices at different scales should be consistently solved, which leads to the cross-scale consistency constraint. We formulate these two purposes with one unified formulation and develop an efficient optimization procedure. Our experiments demonstrate the presented method can substantially improve segmentation accuracy.

6 0.68319225 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

7 0.67488682 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

8 0.6481269 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

9 0.60912097 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization

10 0.54230976 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels

11 0.53200734 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

12 0.5153321 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

13 0.51457608 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction

14 0.49837267 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

15 0.49662077 294 cvpr-2013-Multi-class Video Co-segmentation with a Generative Multi-video Model

16 0.48580703 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

17 0.45547417 326 cvpr-2013-Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation

18 0.44273674 357 cvpr-2013-Revisiting Depth Layers from Occlusions

19 0.41250652 455 cvpr-2013-Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

20 0.40732551 406 cvpr-2013-Spatial Inference Machines

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.173), (16, 0.014), (22, 0.011), (26, 0.045), (33, 0.293), (39, 0.016), (67, 0.05), (69, 0.034), (87, 0.078), (92, 0.164)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95114458 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D

Author: unkown-author

Abstract: We tackle the problem of jointly increasing the spatial resolution and apparent measurement accuracy of an input low-resolution, noisy, and perhaps heavily quantized depth map. In stark contrast to earlier work, we make no use of ancillary data like a color image at the target resolution, multiple aligned depth maps, or a database of highresolution depth exemplars. Instead, we proceed by identifying and merging patch correspondences within the input depth map itself, exploiting patchwise scene self-similarity across depth such as repetition of geometric primitives or object symmetry. While the notion of ‘single-image ’ super resolution has successfully been applied in the context of color and intensity images, we are to our knowledge the first to present a tailored analogue for depth images. Rather than reason in terms of patches of 2D pixels as others have before us, our key contribution is to proceed by reasoning in terms of patches of 3D points, with matched patch pairs related by a respective 6 DoF rigid body motion in 3D. In support of obtaining a dense correspondence field in reasonable time, we introduce a new 3D variant of Patch- Match. A third contribution is a simple, yet effective patch upscaling and merging technique, which predicts sharp object boundaries at the target resolution. We show that our results are highly competitive with those of alternative techniques leveraging even a color image at the target resolution or a database of high-resolution depth exemplars.

2 0.94682926 393 cvpr-2013-Separating Signal from Noise Using Patch Recurrence across Scales

Author: Maria Zontak, Inbar Mosseri, Michal Irani

Abstract: Recurrence of small clean image patches across different scales of a natural image has been successfully used for solving ill-posed problems in clean images (e.g., superresolution from a single image). In this paper we show how this multi-scale property can be extended to solve ill-posed problems under noisy conditions, such as image denoising. While clean patches are obscured by severe noise in the original scale of a noisy image, noise levels drop dramatically at coarser image scales. This allows for the unknown hidden clean patches to “naturally emerge ” in some coarser scale of the noisy image. We further show that patch recurrence across scales is strengthened when using directional pyramids (that blur and subsample only in one direction). Our statistical experiments show that for almost any noisy image patch (more than 99%), there exists a “good” clean version of itself at the same relative image coordinates in some coarser scale of the image.This is a strong phenomenon of noise-contaminated natural images, which can serve as a strong prior for separating the signal from the noise. Finally, incorporating this multi-scale prior into a simple denoising algorithm yields state-of-the-art denois- ing results.

same-paper 3 0.91690367 29 cvpr-2013-A Video Representation Using Temporal Superpixels

Author: Jason Chang, Donglai Wei, John W. Fisher_III

4 0.90001345 414 cvpr-2013-Structure Preserving Object Tracking

Author: Lu Zhang, Laurens van_der_Maaten

Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.

5 0.89933401 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

Author: Junseok Kwon, Kyoung Mu Lee

Abstract: We propose a novel tracking algorithm that robustly tracks the target by finding the state which minimizes uncertainty of the likelihood at current state. The uncertainty of the likelihood is estimated by obtaining the gap between the lower and upper bounds of the likelihood. By minimizing the gap between the two bounds, our method finds the confident and reliable state of the target. In the paper, the state that gives the Minimum Uncertainty Gap (MUG) between likelihood bounds is shown to be more reliable than the state which gives the maximum likelihood only, especially when there are severe illumination changes, occlusions, and pose variations. A rigorous derivation of the lower and upper bounds of the likelihood for the visual tracking problem is provided to address this issue. Additionally, an efficient inference algorithm using Interacting Markov Chain Monte Carlo is presented to find the best state that maximizes the average of the lower and upper bounds of the likelihood and minimizes the gap between two bounds simultaneously. Experimental results demonstrate that our method successfully tracks the target in realistic videos and outperforms conventional tracking methods.

6 0.89871174 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

7 0.89749038 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection

8 0.897003 131 cvpr-2013-Discriminative Non-blind Deblurring

9 0.8965323 314 cvpr-2013-Online Object Tracking: A Benchmark

10 0.89626098 143 cvpr-2013-Efficient Large-Scale Structured Learning

11 0.8961395 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

12 0.89569491 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration

13 0.89464992 267 cvpr-2013-Least Soft-Threshold Squares Tracking

14 0.89448005 406 cvpr-2013-Spatial Inference Machines

15 0.89437091 325 cvpr-2013-Part Discovery from Partial Correspondence

16 0.89415866 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning

17 0.89409447 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds

18 0.89350462 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation

19 0.89256096 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

20 0.89219069 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking