iccv iccv2013 iccv2013-326 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. [sent-3, score-0.618]
2 Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. [sent-6, score-0.962]
3 Using these predictions, we optimize the mode of input requested on new images a user wants segmented. [sent-7, score-0.352]
4 Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. [sent-8, score-0.534]
5 Visual search systems need foreground segmentation to properly isolate a user’s query. [sent-12, score-0.347]
6 Research on interactive segmentation considers how a human can work in concert with a segmentation algorithm grauman @ c s . [sent-17, score-0.462]
7 edu Figure 1: Interactive segmentation results (shown in red) for three images using various annotation strengths (marked in green). [sent-19, score-0.373]
8 Our method predicts the easiest input modality that will be sufficiently strong to successfully segment a given image. [sent-21, score-0.544]
9 , a bounding box or a scribble), and so they focus on how to use that input most effectively. [sent-32, score-0.339]
10 However, simply fixing the input modality leads to a suboptimal tradeoff in human and machine effort. [sent-33, score-0.369]
11 For example, Figure 1 shows (a) three images, (b) their ground truth foreground, and their interactive segmentation results (shown in red) using either (c) a bounding box or (d) a freehand outline as input (marked in green). [sent-37, score-0.687]
12 The flower (top row) is very distinct from its background and has a compact shape; a bounding box on that image would provide a tight foregroundprior, and hence a very accurate segmentation with very quick user input. [sent-38, score-0.856]
13 In contrast, the cross image (middle row) has a plain background but a complex shape, making a bounding box insufficient as a prior; the more elaborate freehand “sloppy contour” is necessary to account for its intricate shape. [sent-39, score-0.416]
14 Meanwhile, the bird (bottom row) looks similar to the background, causing both the bounding box and sloppy contour to fail. [sent-40, score-0.818]
15 In that case, a manually drawn tight polygon may be the best solution. [sent-41, score-0.331]
16 , with scribbles); this is especially true for visual search on a mobile device, where a user has a query image in hand and would like to quickly identify the foreground and ship it to a server. [sent-47, score-0.404]
17 We propose to learn the image properties that indicate how successful a given form of user input will be, once handed to an interactive segmentation algorithm. [sent-48, score-0.528]
18 First, we develop features capturing the degree of separability between foreground and background regions, as well as the uncertainty of a graph cuts-based optimal label assignment. [sent-50, score-0.564]
19 Having predicted the relative success of each modality, we can explicitly reason about the tradeoffin user effort and segmentation quality. [sent-54, score-0.467]
20 In the first, we take a single image as input, and ask the human user to provide the easiest (fastest) form of input that the system expects to be sufficiently strong to do the job. [sent-56, score-0.386]
21 In the second, we take a batch of images as input together with a budget of time that the user is willing to spend guiding the system. [sent-57, score-0.669]
22 Overall, the results clearly establish the value in reasoning about sufficient annotation strength in interactive segmentation. [sent-63, score-0.425]
23 Related Work Early interactive segmentation methods include active contours [8] and intelligent scissors [16], where a user draws loose contours that the system snaps to a nearby object. [sent-65, score-0.805]
24 Alternatively, a user can indicate some foreground pixels—often with a bounding box or mouse scribble— and then use graph cuts to optimize pixel label assignments based on a foreground likelihood and local smoothness prior [2, 19]. [sent-66, score-1.113]
25 We show how to tailor the user’s input modality to achieve best graph cut segmentation results with minimal effort. [sent-69, score-0.564]
26 In interactive co-segmentation, the system guides a user to scribble on certain areas of certain images to reduce foreground uncertainty [1, 28]. [sent-76, score-0.727]
27 However, whereas prior work predicts which images should be annotated (and possibly where) to minimize uncertainty, we predict what strength of annotation will be sufficient for interactive segmentation to succeed. [sent-78, score-0.702]
28 Approach First we define the annotation modes and interactive segmentation model our method targets (Sec. [sent-86, score-0.561]
29 Then, we define features indicative of image difficulty and learn how they relate to segmentation quality for each annotation mode (Sec. [sent-89, score-0.56]
30 Given a novel image, we forecast the relative success of each modality (Sec. [sent-92, score-0.34]
31 Finally, we propose a more involved optimization strategy for the case where a batch of images must be segmented in a given time budget (Sec. [sent-96, score-0.413]
32 Interactive segmentation model In interactive segmentation, the user indicates the foreground with some mode of input. [sent-101, score-0.785]
33 Our goal is to predict the input modality that will be sufficiently strong to yield an accurate segmentation. [sent-102, score-0.447]
34 Our approach chooses from three annotation modalities, as depicted in Figure 2: (1) Bounding box: The annotator provides a tight bounding box around the foreground objects. [sent-103, score-1.045]
35 (3) Tight polygon: The annotator draws a tight polygon along the foreground boundaries. [sent-110, score-0.682]
36 We equate a tight polygon with perfect segmentation accuracy. [sent-111, score-0.467]
37 Our method extends naturally to handle other modalities where a user specifies foreground pixels (e. [sent-114, score-0.465]
38 No matter the annotation mode, we use the pixels inside and outside the user-marked boundary to initialize the foreground and background models, respectively. [sent-117, score-0.513]
39 Then we apply standard graph-cut based interactive segmentation [2, 19] with the (a) Bounding box (b) Sloppy contour (c) Tight polygon Figure 2: Possible modes of annotation mixture models as likelihood functions. [sent-119, score-0.991]
40 Learning segmentation difficulty per modality Having defined the annotation choices and the basic engine for segmentations, we can now explain our algorithm’s training phase. [sent-132, score-0.816]
41 The main idea is to train a discriminative classifier that takes an image as input, and predicts whether a given annotation modality will be successful once passed to the interactive graph cuts solver above. [sent-133, score-0.975]
42 In other words, one classifier will decide if an image looks “easy” or “difficult” to segment with a bounding box, another classifier will decide if it looks “easy” or “difficult” with a sloppy contour. [sent-134, score-0.655]
43 For the bounding box case, we simply generate the bounding box that tightly fits the true foreground area. [sent-137, score-0.811]
44 For the sloppy contour case, we dilate the true mask by 20 pixels to simulate a coarse human-drawn boundary. [sent-138, score-0.486]
45 1) for each one in turn, we obtain two estimated foreground masks per training image: fgbox and fgcon. [sent-140, score-0.344]
46 Let O¯box and O¯con denote the median 1In a user study, we find these masks are a good proxy; overlap with actual hand-drawn contours by 84%. [sent-143, score-0.358]
47 The ground truth label on an image is positive (“easy”, “successful”) for an annotation modality x if O > O¯x. [sent-145, score-0.566]
48 Graph cut segmentation performance is directly related to the degree of separation between the foreground and background regions. [sent-148, score-0.412]
49 Furthermore, the notion of separability is tied to the form of user input. [sent-150, score-0.342]
50 For example, a bounding box input can fail even for an object that is very distinct from its background if it contains many background pixels. [sent-151, score-0.409]
51 Predicting difficulty on novel images Given a novel image, we predict which of the annotation modes will be successful. [sent-179, score-0.403]
52 Finally, we automatically generate a bounding box and sloppy contour (by dilation), and run graph cuts to get the estimated masks for either modality. [sent-188, score-1.017]
53 While often an image has a primary foreground object of interest, our method (like any graph cuts formulation) can accommodate foregrounds consisting of multiple disconnected regions. [sent-190, score-0.41]
54 The foreground estimate in a test image need only give a rough placement of where the user might put the bounding box or sloppy contour. [sent-191, score-1.059]
55 Always requesting tight polygons is sure to yield accurate results, but will waste human effort when the image content is “easy”. [sent-197, score-0.437]
56 Similarly, always requesting a bounding box is sure to be fast, but will produce lousy results when the image is 13 16 too “hard”. [sent-198, score-0.331]
57 That is, we show the annotator a bounding box tool ifthe bounding box classifier predicts “easy”. [sent-200, score-0.86]
58 Ifnot, we show the sloppy contour tool if its classifier predicts “easy”. [sent-201, score-0.641]
59 Annotation choices under budget constraints In an alternative usage scenario, our system accepts a batch of images and a budget of annotation time as input. [sent-205, score-0.993]
60 Our objective is to select the optimal annotation tool for each image that will maximize total predicted accuracy, subject to the constraint that annotation cost must not exceed the budget. [sent-206, score-0.52]
61 For a high budget, a good choice may be tight polygons on all ofthe hardest images, and sloppy contours on the rest. [sent-211, score-0.694]
62 Let pbk and pck denote the probability of successful interactive segmentation for image k with a bounding box or sloppy contour, as predicted by our model. [sent-216, score-0.951]
63 That is, ckb = 7 means it will take 7 sec to draw a bounding box on image k. [sent-228, score-0.38]
64 The objective says we want to choose the modality per image that will maximize the predicted accuracy. [sent-239, score-0.367]
65 The first constraint enforces the budget, the second ensures we choose only one modality per image, and the third restricts the indicator entries to be binary. [sent-240, score-0.329]
66 While our approach supports image-specific annotation costs ck, we find the biggest factor in cost is which annotation type is used. [sent-243, score-0.474]
67 - Global Features: We train two SVMs (one for bounding box, one for contours) to predict if an image is easy based on a 12-bin color histogram, color variance, and the separability score from [17]. [sent-258, score-0.481]
68 - Random: randomly assigns a confidence value to each modality in the budgeted annotation results. [sent-293, score-0.579]
69 Predicting difficulty per modality First we see how well all methods predict the success of each annotation modality. [sent-299, score-0.744]
70 On IIS, we are again better for bounding boxes, but Global Features is competitive on sloppy contours. [sent-309, score-0.495]
71 Whereas the Global Features and Effort Prediction [22] methods learn from the holistic image content, our method specifically learns how fg-bg separability influences graph cuts segmentation. [sent-315, score-0.318]
72 For the leftmost block of images, our method predicts a bounding box or contour would be sufficient. [sent-319, score-0.537]
73 These images usually have uniform backgrounds, and distinct, compact foreground regions, which are easy to tightly capture with a box (e. [sent-320, score-0.439]
74 For the center block, our method predicts a bounding box would fail, but a sloppy contour would be sufficient. [sent-323, score-0.863]
75 These images usually have objects with com- plex shapes, for which even a tight box can overlap many background pixels (e. [sent-324, score-0.429]
76 For the rightmost block, our method predicts neither a box or contour is sufficient. [sent-327, score-0.368]
77 For example, the skaters in the left block are close together and seem easy to annotate with a box, while the skaters in the right block are far apart and tight polygons are needed to extract their limbs. [sent-334, score-0.533]
78 Annotation choices to meet a budget Next we evaluate our idea for optimizing requests to meet a budget. [sent-342, score-0.479]
79 We apply our method and the baselines to estimate the probability that each modality will succeed on each image. [sent-343, score-0.36]
80 For the cost of each modality in c, we use the average time required by the 101 users in our user study: 7 sec for bounding box, 20 sec for sloppy contour, 54 sec for tight polygon. [sent-347, score-1.353]
81 If the solution says to get a box or contour on an image, we apply graph cuts with the selected modality (Sec. [sent-348, score-0.798]
82 The budget values range from the minimum possible (bounding boxes for all images) to the maximum possible (tight polygons for all images). [sent-354, score-0.438]
83 Our method consistently selects the modalities that best use annotation resources: at almost every budget point, we achieve the highest accuracy. [sent-355, score-0.615]
84 25 hours more annotation effort than we do to obtain 90% average overlap. [sent-358, score-0.335]
85 We find as the budget increases, the bounding box requests decrease. [sent-360, score-0.677]
86 The number of sloppy contour requests increases at first, then starts decreasing after a certain budget, making way for more images to be annotated with a tight polygon. [sent-361, score-0.738]
87 Rather than ask an annotator to give tight polygons on each training image—the default choice for strongly supervised recognition systems—we apply our cascaded modality selection. [sent-376, score-0.683]
88 Our approach substantially reduces the total annotation time required, yet its accuracy 13 19 ing the modality predicted to be sufficiently strong. [sent-381, score-0.581]
89 We present users with the necessary tools to do each modality (see Supp. [sent-385, score-0.359]
90 We collect responses from 5 users for each annotation mode per image, then record the median time spent. [sent-388, score-0.476]
91 We see the most variance among the sloppy contour inputs, since some users are more “sloppy” than others. [sent-391, score-0.545]
92 Still, as expected, sloppy contours typically only improve interactive segmentation results (85. [sent-392, score-0.712]
93 Figure 6 (left) shows the budgeted annotation results with real user data. [sent-395, score-0.472]
94 The plot is like Figure 5, only here 1) we feed the real users’ boxes/contours to the graph cuts engine, rather than simulate it from ground truth masks, and 2) we incur the users’ per-image annotation times at test time (on x-axis). [sent-396, score-0.406]
95 This result confirms that even though the ultimate annotation time may vary not only per modality, but also per image, using a fixed cost per modality during prediction is sufficient to get good savings. [sent-398, score-0.624]
96 Overall, this large-scale user study is promising evidence that by reasoning about the expected success of different annotation modalities, we can use valuable annotator effort much more efficiently. [sent-399, score-0.673]
97 Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. [sent-412, score-0.335]
98 s)3[5r20es] Figure 6: Left: Annotation choices under a budget with real user data. [sent-416, score-0.55]
99 Right: Example user annotations for bounding box (top), sloppy contour (middle), and tight polygon (bottom). [sent-417, score-1.31]
100 Far-sighted active learning on a budget for image and video recognition. [sent-561, score-0.373]
wordName wordTfidf (topN-words)
[('sloppy', 0.355), ('budget', 0.317), ('modality', 0.3), ('annotation', 0.237), ('foreground', 0.211), ('user', 0.193), ('tight', 0.192), ('interactive', 0.16), ('box', 0.16), ('separability', 0.149), ('ifg', 0.148), ('bounding', 0.14), ('polygon', 0.139), ('segmentation', 0.136), ('contour', 0.131), ('cuts', 0.11), ('ibg', 0.105), ('annotator', 0.105), ('effort', 0.098), ('polygons', 0.086), ('mode', 0.085), ('uncertainty', 0.081), ('msrc', 0.081), ('predicts', 0.077), ('difficulty', 0.074), ('icoseg', 0.072), ('easy', 0.068), ('record', 0.066), ('predict', 0.064), ('otsu', 0.063), ('yp', 0.063), ('masks', 0.062), ('contours', 0.061), ('modalities', 0.061), ('succeed', 0.06), ('requests', 0.06), ('graph', 0.059), ('users', 0.059), ('xkb', 0.056), ('active', 0.056), ('scribble', 0.054), ('batch', 0.054), ('freehand', 0.052), ('easiest', 0.052), ('tool', 0.046), ('annotate', 0.045), ('sufficiently', 0.044), ('iis', 0.044), ('superpixel', 0.042), ('budgeted', 0.042), ('ckb', 0.042), ('fgbox', 0.042), ('scissors', 0.042), ('skaters', 0.042), ('suyog', 0.042), ('xkc', 0.042), ('xkp', 0.042), ('segmented', 0.042), ('overlap', 0.042), ('choices', 0.04), ('success', 0.04), ('input', 0.039), ('says', 0.038), ('sec', 0.038), ('willing', 0.037), ('ppk', 0.037), ('researcher', 0.037), ('vijayanarasimhan', 0.036), ('boxes', 0.035), ('background', 0.035), ('requested', 0.035), ('yq', 0.035), ('draws', 0.035), ('fastest', 0.033), ('predicting', 0.033), ('salient', 0.033), ('scribbles', 0.033), ('snaps', 0.033), ('looks', 0.032), ('classifier', 0.032), ('rother', 0.032), ('segment', 0.032), ('saliency', 0.032), ('requesting', 0.031), ('meet', 0.031), ('cut', 0.03), ('human', 0.03), ('foregrounds', 0.03), ('color', 0.03), ('boundary', 0.03), ('label', 0.029), ('block', 0.029), ('per', 0.029), ('spend', 0.029), ('intricate', 0.029), ('modes', 0.028), ('predictions', 0.028), ('indicative', 0.028), ('system', 0.028), ('strength', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
2 0.21966587 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
3 0.15468848 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
4 0.15457143 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
5 0.15340793 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
Author: Jianxiong Xiao, Andrew Owens, Antonio Torralba
Abstract: Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are difficult in isolation hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset construction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propagate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incorporates object-to-object correspondences. This algorithm works by constraining points for the same object from different frames to lie inside a fixed-size bounding box, parameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjustment, and the web-based 3D annotation tool are all avail– able at http://sun3d.cs.princeton.edu.
6 0.1516905 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
7 0.1449655 282 iccv-2013-Multi-view Object Segmentation in Space and Time
8 0.13918176 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.13326211 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
10 0.13240866 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
11 0.13143578 213 iccv-2013-Implied Feedback: Learning Nuances of User Behavior in Image Search
12 0.12549019 54 iccv-2013-Attribute Pivots for Guiding Relevance Feedback in Image Search
13 0.12316068 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
14 0.11862689 71 iccv-2013-Category-Independent Object-Level Saliency Detection
15 0.11627318 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
16 0.105441 52 iccv-2013-Attribute Adaptation for Personalized Image Search
17 0.10284156 74 iccv-2013-Co-segmentation by Composition
18 0.10244347 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
19 0.10126581 6 iccv-2013-A Convex Optimization Framework for Active Learning
20 0.10097344 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
topicId topicWeight
[(0, 0.22), (1, -0.012), (2, 0.09), (3, -0.055), (4, 0.113), (5, 0.014), (6, -0.119), (7, 0.066), (8, 0.037), (9, -0.048), (10, 0.038), (11, 0.109), (12, -0.009), (13, -0.049), (14, -0.048), (15, -0.029), (16, -0.02), (17, -0.053), (18, -0.119), (19, -0.074), (20, 0.035), (21, -0.045), (22, -0.164), (23, 0.029), (24, 0.015), (25, 0.137), (26, 0.02), (27, 0.045), (28, 0.068), (29, -0.071), (30, -0.045), (31, 0.025), (32, 0.044), (33, 0.002), (34, -0.076), (35, 0.122), (36, -0.011), (37, 0.113), (38, 0.003), (39, 0.097), (40, 0.003), (41, -0.005), (42, -0.007), (43, -0.007), (44, 0.065), (45, -0.065), (46, 0.038), (47, -0.042), (48, -0.049), (49, 0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.96735173 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
2 0.76831609 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
3 0.73288167 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
4 0.68347758 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
Author: Masoud S. Nosrati, Shawn Andrews, Ghassan Hamarneh
Abstract: The inclusion of shape and appearance priors have proven useful for obtaining more accurate and plausible segmentations, especially for complex objects with multiple parts. In this paper, we augment the popular MumfordShah model to incorporate two important geometrical constraints, termed containment and detachment, between different regions with a specified minimum distance between their boundaries. Our method is able to handle multiple instances of multi-part objects defined by these geometrical hamarneh} @ s fu . ca (a)Standar laΩb ehlingΩfuhnctionseting(Ωb)hΩOuirseΩtijng Figure 1: The inside vs. outside ambiguity in (a) is resolved by our containment constraint in (b). constraints using a single labeling function while maintaining global optimality. We demonstrate the utility and advantages of these two constraints and show that the proposed convex continuous method is superior to other state-of-theart methods, including its discrete counterpart, in terms of memory usage, and metrication errors.
5 0.68336409 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
6 0.64919388 282 iccv-2013-Multi-view Object Segmentation in Space and Time
7 0.648018 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
8 0.6399399 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
9 0.63504374 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
10 0.61434573 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
11 0.58110148 74 iccv-2013-Co-segmentation by Composition
12 0.57350111 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
13 0.57239866 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
14 0.56155074 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
15 0.55461466 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
16 0.54840809 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
17 0.53350997 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
18 0.52706283 104 iccv-2013-Decomposing Bag of Words Histograms
19 0.51023716 213 iccv-2013-Implied Feedback: Learning Nuances of User Behavior in Image Search
20 0.50584066 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
topicId topicWeight
[(2, 0.1), (12, 0.05), (26, 0.162), (31, 0.035), (34, 0.015), (42, 0.118), (64, 0.063), (70, 0.146), (73, 0.028), (78, 0.015), (89, 0.145), (98, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.8841942 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
Author: Suyog Dutt Jain, Kristen Grauman
Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and easeof-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas existing methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts segmentation will succeed if initialized with a given annotation mode, based on the image ’s visual separability and foreground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
2 0.8573786 325 iccv-2013-Predicting Primary Gaze Behavior Using Social Saliency Fields
Author: Hyun Soo Park, Eakta Jain, Yaser Sheikh
Abstract: We present a method to predict primary gaze behavior in a social scene. Inspired by the study of electric fields, we posit “social charges ”—latent quantities that drive the primary gaze behavior of members of a social group. These charges induce a gradient field that defines the relationship between the social charges and the primary gaze direction of members in the scene. This field model is used to predict primary gaze behavior at any location or time in the scene. We present an algorithm to estimate the time-varying behavior of these charges from the primary gaze behavior of measured observers in the scene. We validate the model by evaluating its predictive precision via cross-validation in a variety of social scenes.
3 0.83890229 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
4 0.83333939 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.
5 0.83112341 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
6 0.83040339 414 iccv-2013-Temporally Consistent Superpixels
7 0.82915121 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
8 0.82671475 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
9 0.82514918 395 iccv-2013-Slice Sampling Particle Belief Propagation
10 0.82445681 180 iccv-2013-From Where and How to What We See
11 0.82349288 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
12 0.82178193 282 iccv-2013-Multi-view Object Segmentation in Space and Time
13 0.82054234 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
14 0.81830764 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection
15 0.81825423 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
16 0.81479347 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
17 0.81435192 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
18 0.81198341 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
19 0.81175017 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
20 0.81149632 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration