iccv iccv2013 iccv2013-379 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
Reference: text
sentIndex sentText sentNum sentScore
1 s g e Abstract Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. [sent-4, score-1.021]
2 In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. [sent-5, score-1.357]
3 Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. [sent-6, score-1.049]
4 The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. [sent-7, score-0.49]
5 The final segmentation result is obtained by merging the segmentation results in the bounding boxes. [sent-8, score-0.893]
6 We conduct an extensive analysis of the effect of object bounding box accuracy. [sent-9, score-0.663]
7 Introduction Object classification, detection and segmentation are the core and strongly correlated sub-tasks [21, 28, 5] of object recognition, each yielding different levels of understanding. [sent-12, score-0.425]
8 The classification tells what objects the image contains, detection further solves the problem of where the objects are in the image, while segmentation aims to assign class label to each pixel. [sent-13, score-0.487]
9 [17] proposed a figure-ground segmentation framework, in which the training masks are transferred to object windows on the test image based on visual similarity. [sent-30, score-0.438]
10 In [15], similar idea is proposed and a class-independent shape prior is introduced to transfer object shapes from an exemplar database to the test image. [sent-32, score-0.285]
11 Generally, bottom-up methods without modelling objects globally tend to generate visually consistent segmentation instead of semantically meaningful ones. [sent-34, score-0.306]
12 [1] proposed regionbased object detectors that integrate top-down poselet detector and global appearance cues. [sent-41, score-0.283]
13 This method [1] produces class-specific scores for the regions and aggregates multiple overlapping candidates through pixel classification 2176 Then, a voting based scheme is applied to estimate object shape guidance. [sent-42, score-0.464]
14 By making use of the shape guidance, a graph-cut-based figureground segmentation provides a mask for each bounding box. [sent-43, score-0.881]
15 The main challenge is to obtain object shape templates, especially for objects with relatively large intra-class appearance and pose variations. [sent-46, score-0.298]
16 In this paper, we propose an efficient, learning-free design for semantic segmentation when the object bounding boxes are available (see Fig. [sent-52, score-1.033]
17 However, the object bounding boxes can be obtained in a much easier way, either through user interaction or from object detector which also provides class label as additional information. [sent-56, score-0.958]
18 Here, we propose an approach based on detected bounding boxes, where no additional segment annotation from the training set or user interaction is required. [sent-57, score-0.622]
19 Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate the shape guidance. [sent-60, score-0.326]
20 The derived shape guidance is used in the subsequent graph-cutbased formulation to provide the figure-ground segmentation. [sent-61, score-0.49]
21 • Comprehensive experiments on the most challenging object segmentation edriamtaseenttss [ o1n2, t h22e] mdeosmto cnhsatrlaleteng gthinagt the performance ofthe proposed method is competitive or even superior against to the state-of-the-art methods. [sent-62, score-0.34]
22 We also conduct an analysis of the effect of the bounding box accuracy. [sent-63, score-0.575]
23 Related Work Numerous semantic segmentation methods utilize the object bounding box as a prior. [sent-65, score-0.964]
24 The bounding boxes are provided by either user interaction or object detectors. [sent-66, score-0.817]
25 These methods tend to exploit the provided bounding box merely to exclude its exterior from segmentation. [sent-67, score-0.584]
26 In fact, the shape of a detected object is represented in terms of a layered, perpixel segmentation. [sent-70, score-0.278]
27 [11] proposed and evaluated several color models based on learned graph-cut segmentations to help re-localize objects in the initial bounding boxes predicted from deformable parts model (DPM) [13]. [sent-72, score-0.668]
28 The objects are detected on the image, then for each detected bounding box, the objects from the same category along with their object masks are selected from the training set and transferred to a latent mask within the given bound- ing box. [sent-75, score-0.785]
29 GrabCut combines hard segmentation by iterative graph-cut optimization with border matting to deal with blurred and mixed pixels on object boundaries. [sent-80, score-0.34]
30 In [22] a method is introduced which further exploits the bounding box to impose a powerful topological prior. [sent-81, score-0.545]
31 In [9], an adaptive figure-ground classification algorithm is presented to automatically extract a foreground region using a user provided bounding box. [sent-85, score-0.517]
32 Finally, the best segmentation is automatically selected with a voting or weighted combination scheme. [sent-88, score-0.332]
33 For a given test image, first the object bounding boxes with detection scores are predicted by object detectors. [sent-91, score-0.955]
34 The detection scores are normalized and some bounding boxes with low scores are removed (see Section 3. [sent-92, score-0.859]
35 A large pool of segment hypotheses are generated by purely applying CPMC method [8] (without using any learning process), in order to estimate the object shape guidance in a given bounding box. [sent-94, score-1.236]
36 The shape guidance is then obtained by a simple but effective voting scheme (see Section 3. [sent-95, score-0.57]
37 The derived object shape guidance is integrated into a graph-cutbased optimization for each bounding box (see Section 3. [sent-97, score-1.123]
38 The obtained segmentation results corresponding to different bounding boxes are merged and further refined through some post-processing techniques including morphological operations, e. [sent-99, score-0.981]
39 Bounding Box Score Normalization In order to obtain the bounding boxes, we apply the state-of-the-art object detectors provided by the authors of [10, 28]. [sent-107, score-0.594]
40 For a given test image, class-specific object detectors provide a set of bounding boxes with class labels and detection scores. [sent-108, score-0.907]
41 By applying the different detection scores as threshold values over the objects in the validation set, one can estimate the precision over score (PoS) function for a given class. [sent-117, score-0.313]
42 By substituting the actual detection scores into PoS functions, one can transform and compare the scores provided by detectors from different classes. [sent-119, score-0.362]
43 In our experiments, the bounding boxes that have lower detection scores than a threshold value (τ) are removed. [sent-126, score-0.779]
44 Object Shape Guidance Estimation After obtaining object bounding boxes, a figure-ground segmentation is performed for each bounding box. [sent-130, score-1.118]
45 As figure-ground segmentation methods [17, 15] can benefit significantly from the shape guidance, we introduce a simple yet effective idea to obtain the shape guidance. [sent-131, score-0.564]
46 For this purpose, a set of object segments serving as various hypotheses for the object shape, is generated for the given test image. [sent-132, score-0.508]
47 The object shape is then estimated based on a simple voting scheme. [sent-133, score-0.324]
48 The segment hypotheses are generated by solving a sequence ofCPMC problems [8] without any prior knowledge about the properties of individual object classes. [sent-134, score-0.398]
49 Some xemplar images (top) and the stimated object shape guidance with shape confidence (bot om). [sent-136, score-0.843]
50 The information about the object localization is provided by the bounding box, and hence we can crop the segments. [sent-140, score-0.516]
51 Therefore, we omit those segments smaller than γ1 = 20% or larger than γ2 = 80% of the bounding box area. [sent-142, score-0.698]
52 Those regions sharing more overlapping segments and thus higher scores, have higher confidence to be the part of the object shape. [sent-153, score-0.346]
53 The generated segments partially cover the object, nevertheless, some segment among S1, . [sent-154, score-0.282]
54 The final object shape guidance is achieved by restricting the domain of M¯(p) based on the “best” segment, more precisely M(p) = M¯(p) i∗ (p). [sent-166, score-0.578]
55 This approach provides the shape guidance as well as the shape confidence score for each pixel. [sent-167, score-0.737]
56 Some examples of the estimated shape guidance are shown in Fig. [sent-168, score-0.49]
57 Vf and Vb are estimated bbaacsekdg on nthde eragtiioon osf, rtehseipre overlap Vwitha nthde Vestimated shape guidance M = {p ∈ R2 | M(p) > 0}, obtained in Secgtiouind a3n. [sent-210, score-0.566]
58 The shape term S(xi = 1) for the super-pixel is simply calculated by the average value of M over the overlapping area with the given super-pixel. [sent-219, score-0.288]
59 Note that this shape )t =erm 1 immediately incorporates the spatial difference between the super-pixels and the shape guidance M. [sent-221, score-0.646]
60 Merging and Post-processing ith After obtaining figure-ground segmentations for the bounding boxes, the results are projected back to the image and merged. [sent-240, score-0.389]
61 Most of the experiments were run on the PASCAL VOC 2011, 2012 object segmentation datasets [12] consisting of 20 object classes Figure 4. [sent-250, score-0.467]
62 In our experiments, we generated on average 547 segment hypotheses for each image by following [8]. [sent-256, score-0.269]
63 Note that ground truth bounding box information is also available for these images. [sent-266, score-0.545]
64 We evaluated the quality of the segmentation results provided by the shape guidance M that is merged directly without running graph-cut optimization, i rse mfererregde as GireTc-Stly. [sent-267, score-0.86]
65 (2)) when the shape guidance model is omitted (α = 1). [sent-269, score-0.49]
66 The significant improvement from the GT-GC to GT-S-GC validates the effectiveness of the shape guidance in the proposed segmentation framework. [sent-279, score-0.742]
67 We have assumed that the bounding boxes provided by the object detectors are accurate enough, which is sometimes not the case. [sent-280, score-0.819]
68 Here, we also analyze the effect of the bounding box accuracy. [sent-281, score-0.545]
69 We evaluated the proposed method with different settings (GT-S, GT-GC and GT-S-GC) on various sets of bounding boxes with different accuracies. [sent-282, score-0.614]
70 Since the ground truth is given, we can generate new bounding boxes for each ob2180 NBDXAMOriUEeoabSNTthl-k1´dsePaFSz. [sent-284, score-0.614]
71 ject in the validation dataset by modifying the corner points of the bounding boxes. [sent-333, score-0.437]
72 Thus, we randomly modified the ground truth bounding boxes based on uniform distribution to achieve 5%, 10%, . [sent-334, score-0.614]
73 4, more accurate bounding boxes lead to better performance in segmentation, since it provides not only more accurate localization, but also more accurate cropped segments to estimate the shape guidance. [sent-341, score-0.923]
74 Furthermore, the shape guidance term provides important top-down guidance prior that improves the final results. [sent-342, score-0.896]
75 The PoS functions for different object classes were estimated on the detection validation dataset, which is also available in [12]. [sent-346, score-0.26]
76 The threshold value for the bounding boxes τ is set to 0. [sent-347, score-0.614]
77 For some images, however, the current state-of-the-art object detector in [10, 28] (referred as DET1) cannot provide bounding boxes with higher score than τ leading to mis-detection of the objects. [sent-378, score-0.798]
78 DET2 predicts some bounding boxes based on segmentation results obtained from [3] only for the images without bounding box prediction from DET1, otherwise, the bounding boxes from DET1 are considered. [sent-384, score-2.025]
79 DET3 directly obtains bounding boxes from segmentation results of NUS-DET-SPR-GC-SP, which is our submitted methods to VOC 2012 challenge. [sent-385, score-0.866]
80 Note that only the estimated bounding boxes are used in our solution, which contain much less information than the segmentation results, hence the improvement of 0. [sent-388, score-0.866]
81 Although DET2 and DET3 implicitly use ground truth segments which seems to contradict with our claim that no annotated segments are needed, we aim to further validate that better detection leads to better segmentation (see Section 4. [sent-391, score-0.728]
82 However, there are some failure cases mainly due to mis-detection and inaccurate bounding box Figure 7. [sent-397, score-0.592]
83 The second one is due to wrong bounding box prediction, since the cloth is labelled as person and the parrot (bird) is mis-detected. [sent-402, score-0.579]
84 The third one is due to inaccurate bounding box prediction (i. [sent-403, score-0.592]
85 GrabCut-50 dataset We also compare the proposed method to the related segmentation frameworks guided by bounding box prior [24, 22, 9]. [sent-408, score-0.886]
86 For this sake, these experiments were run on the GrabCut-50 [22] dataset consisting of 50 images with ground truth bounding boxes. [sent-409, score-0.389]
87 ) is computed as the percentage of mislabeled pixels inside the bounding box. [sent-411, score-0.389]
88 In these experiments, we generated the segment hypotheses for the whole image instead of the object bounding boxes. [sent-412, score-0.746]
89 GrabCutPinpoint uses an iterative solution and relies on the assump- tion that the bounding box is tight, which is not always true. [sent-421, score-0.545]
90 Comparison with bounding box prior based algorithms on GrabCut-50 dataset. [sent-428, score-0.586]
91 Some segmentation results, overlaid on the images with blue color and white boundary, on the GrabCut-50 dataset [22] obtained by the proposed method. [sent-444, score-0.301]
92 This also concludes that better bounding box prior significantly improves the final segmentation results. [sent-447, score-0.838]
93 Conclusions In this paper, we proposed a detection based learning free approach for semantic segmentation without the requirement of any annotated segments from the training set. [sent-453, score-0.613]
94 Furthermore, a simple voting scheme based on a generated pool of segment hypotheses, is proposed to obtain the shape guidance. [sent-454, score-0.365]
95 Some general observations from the results are that the proposed method performs nearly perfectly in those cases with single object, while for images with multiple objects or interacting objects, the performance depends on the accuracy of the bounding box. [sent-457, score-0.509]
96 Therefore, one of the main limitations of this approach is that the object detector inherently affects the segmentation performance. [sent-458, score-0.39]
97 With better object detectors, such as one that could well handle partial objects and occlusions, huge improvement could be expected for object segmentation performance. [sent-460, score-0.482]
98 In addition, better ways to obtain the shape guidance and handle multiple interacting segments are also worth exploring to further refine the existing detection-based segmentation methods. [sent-461, score-0.961]
99 Object segmentation by alignment of poselet activations to image contours. [sent-500, score-0.289]
100 Shape prior segmentation of multiple objects with graph cuts. [sent-631, score-0.347]
wordName wordTfidf (topN-words)
[('bounding', 0.389), ('guidance', 0.334), ('segmentation', 0.252), ('voc', 0.237), ('boxes', 0.225), ('box', 0.156), ('shape', 0.156), ('segments', 0.153), ('hypotheses', 0.14), ('pos', 0.137), ('xi', 0.11), ('vf', 0.091), ('xia', 0.091), ('cpmc', 0.091), ('vb', 0.091), ('segment', 0.09), ('object', 0.088), ('detection', 0.085), ('aez', 0.082), ('arbel', 0.08), ('scores', 0.08), ('voting', 0.08), ('semantic', 0.079), ('ladick', 0.079), ('detectors', 0.078), ('carreira', 0.078), ('morphological', 0.075), ('gpb', 0.068), ('masks', 0.067), ('vij', 0.066), ('hole', 0.066), ('interacting', 0.066), ('grabcut', 0.065), ('csaba', 0.064), ('domokos', 0.064), ('uttel', 0.064), ('xemplar', 0.064), ('xj', 0.063), ('iou', 0.062), ('overlapping', 0.06), ('ui', 0.06), ('singapore', 0.058), ('rc', 0.055), ('layered', 0.055), ('objects', 0.054), ('detector', 0.05), ('harmony', 0.05), ('labellings', 0.05), ('cheong', 0.05), ('figureground', 0.05), ('foreground', 0.05), ('overlaid', 0.049), ('frameworks', 0.048), ('validation', 0.048), ('boix', 0.047), ('winner', 0.047), ('inaccurate', 0.047), ('comprehensive', 0.046), ('score', 0.046), ('background', 0.045), ('confidence', 0.045), ('annotated', 0.044), ('obj', 0.044), ('class', 0.042), ('calculated', 0.041), ('prior', 0.041), ('claim', 0.041), ('svr', 0.04), ('pb', 0.04), ('merged', 0.04), ('pascal', 0.04), ('rse', 0.039), ('generated', 0.039), ('classes', 0.039), ('ri', 0.039), ('user', 0.039), ('provided', 0.039), ('russell', 0.038), ('nthde', 0.038), ('seeds', 0.038), ('competing', 0.038), ('filling', 0.037), ('interaction', 0.037), ('kohli', 0.037), ('poselet', 0.037), ('numerous', 0.036), ('called', 0.035), ('poselets', 0.035), ('mask', 0.034), ('wrong', 0.034), ('pooling', 0.034), ('detected', 0.034), ('annotation', 0.033), ('transferred', 0.031), ('dai', 0.031), ('july', 0.031), ('term', 0.031), ('rare', 0.03), ('conduct', 0.03), ('integrate', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
2 0.24972679 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
3 0.2334819 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
4 0.21966587 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
5 0.20026901 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
6 0.19808699 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
7 0.19501379 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
8 0.19000311 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
9 0.18677337 150 iccv-2013-Exemplar Cut
10 0.16656275 104 iccv-2013-Decomposing Bag of Words Histograms
11 0.16503623 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
12 0.15251671 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
13 0.14859478 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
14 0.14800777 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
15 0.14592196 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
16 0.14155956 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
17 0.13474718 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
18 0.13416742 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
19 0.13111763 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
20 0.12464149 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
topicId topicWeight
[(0, 0.302), (1, -0.017), (2, 0.095), (3, -0.017), (4, 0.165), (5, -0.021), (6, -0.147), (7, 0.142), (8, -0.053), (9, -0.12), (10, 0.072), (11, 0.126), (12, -0.019), (13, -0.093), (14, -0.097), (15, -0.043), (16, 0.012), (17, 0.015), (18, -0.076), (19, -0.116), (20, 0.038), (21, 0.023), (22, -0.079), (23, 0.046), (24, 0.023), (25, 0.116), (26, -0.04), (27, -0.048), (28, -0.018), (29, -0.037), (30, -0.06), (31, -0.014), (32, -0.046), (33, -0.031), (34, -0.087), (35, 0.119), (36, 0.034), (37, 0.064), (38, 0.039), (39, 0.08), (40, 0.042), (41, 0.084), (42, 0.012), (43, -0.001), (44, 0.112), (45, 0.102), (46, 0.12), (47, -0.031), (48, -0.059), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.9799099 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
2 0.83278322 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
3 0.81821316 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
4 0.78735435 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
5 0.78088999 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
Author: Yuning Chai, Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new method for the task of fine-grained visual categorization. The method builds a model of the baselevel category that can be fitted to images, producing highquality foreground segmentation and mid-level part localizations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature. The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
6 0.78070301 150 iccv-2013-Exemplar Cut
7 0.76204026 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
8 0.72326034 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
9 0.71451664 349 iccv-2013-Regionlets for Generic Object Detection
10 0.70084274 104 iccv-2013-Decomposing Bag of Words Histograms
11 0.69825298 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
12 0.67930478 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
13 0.66865134 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
14 0.66671181 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
15 0.66311276 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
16 0.63672078 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
17 0.63621479 169 iccv-2013-Fine-Grained Categorization by Alignments
18 0.63434047 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
19 0.63380921 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
20 0.62790358 74 iccv-2013-Co-segmentation by Composition
topicId topicWeight
[(2, 0.071), (7, 0.021), (9, 0.073), (13, 0.022), (26, 0.121), (31, 0.055), (34, 0.012), (35, 0.021), (40, 0.017), (42, 0.116), (64, 0.097), (73, 0.062), (89, 0.179), (98, 0.016)]
simIndex simValue paperId paperTitle
1 0.93473238 349 iccv-2013-Regionlets for Generic Object Detection
Author: Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
Abstract: Generic object detection is confronted by dealing with different degrees of variations in distinct object classes with tractable computations, which demands for descriptive and flexible object representations that are also efficient to evaluate for many locations. In view of this, we propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detection window at an arbitrary resolution (i.e. size and aspect ratio). These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposal in selective search from segmentation cues, limiting the evaluation locations to thousands. Our approach significantly outperforms the state-of-the-art on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves the detec- tion mean average precision of 41. 7% on the PASCAL VOC 2007 dataset and 39. 7% on the VOC 2010 for 20 object categories. It achieves 14. 7% mean average precision on the ImageNet dataset for 200 object categories, outperforming the latest deformable part-based model (DPM) by 4. 7%.
same-paper 2 0.9303025 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
3 0.93014967 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
4 0.92971373 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
5 0.92774755 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
6 0.92460835 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
7 0.92241043 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
8 0.92072237 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
9 0.91957009 338 iccv-2013-Randomized Ensemble Tracking
10 0.91880208 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
11 0.91843712 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
12 0.91827691 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
13 0.91776317 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
14 0.91747051 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
15 0.91700172 180 iccv-2013-From Where and How to What We See
16 0.91541588 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
17 0.91257071 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
18 0.91116995 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
19 0.91028702 86 iccv-2013-Concurrent Action Detection with Structural Prediction
20 0.91011453 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking