iccv iccv2013 iccv2013-414 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
Reference: text
sentIndex sentText sentNum sentScore
1 In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. [sent-4, score-0.31]
2 The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. [sent-5, score-0.408]
3 Introduction The idea to utilize superpixels as primitives for image analysis and processing was introduced by Ren and Malik in [14]. [sent-9, score-0.45]
4 In the following years, several authors proposed different approaches to generate superpixels with special properties from still images [12, 23, 9, 1, 19, 13]. [sent-10, score-0.425]
5 There are a wide variety of applications utilizing superpixels including tracking [20], image parsing [16], depthmap enhancement [24], 3D geometry reconstruction [6] and video segmentation [18]. [sent-14, score-0.547]
6 Especially for video applications, the usage of superpixels instead of raw pixel data is beneficial, as otherwise a vast amount of data has to be handled. [sent-15, score-0.465]
7 Mid row: Subset of superpixels manually selected in frame 15 and shown as color-coded labels. [sent-21, score-0.543]
8 The superpixels in the frames 22 and 30 are generated with our approach and are displayed using the same label colors to indicate temporal consistency. [sent-22, score-0.674]
9 (Best viewed in color) applied to video sequences, this leads to volatile and flickering superpixel contours even if there are only slight changes between consecutive frames. [sent-24, score-0.465]
10 Moreover, by design they omit the temporal connection between superpixels in successive images. [sent-25, score-0.494]
11 Hence, in this work we propose a new approach to generate superpixels that ensures temporal consistency and provides a consistent labeling. [sent-28, score-0.596]
12 Subsequently, in Section 3, we briefly explain the generation of superpixels using energy-minimizing clustering that is extended in Section 4, where we present our approach for temporally consistent superpixels. [sent-32, score-0.81]
13 Related Work In [19, 5, 8, 1] the superpixel idea is extended from the still image to the video domain starting to take the issue of temporal consistency into focus. [sent-35, score-0.384]
14 One proposal was to generate so called supervoxels by grouping adjacent voxels in the video volume, which are similar e. [sent-36, score-0.436]
15 These supervoxels connect coherent image regions or seg- ments over multiple frames. [sent-39, score-0.398]
16 The relation between supervoxels and temporally consistent superpixels can be described in the following way: Temporally consistent superpixels can be stacked up to build supervoxels. [sent-40, score-1.557]
17 Similarly, a superpixel representation with temporal consistency can be obtained by slicing a supervoxel representation at frame instances. [sent-41, score-0.606]
18 Moreover, [21] presents an overview of available supervoxel methods and proposed corresponding benchmark metrics that are extensions of the established superpixel metrics. [sent-47, score-0.443]
19 The SLIC supervoxel approach [1] as well as the approach presented in [19] enforce a rather short temporal duration of the generated supervoxels, either implicitly or explicitly. [sent-48, score-0.325]
20 Therefore, the superpixels are temporally consistent but only over a short range of frames. [sent-49, score-0.695]
21 This reduces to some extent the noisy flickering of the superpixels from one frame to the next. [sent-51, score-0.607]
22 Still the superpixels are only generated on a per frame basis and there is no explicit strategy to handle disocclusions and new objects entering the scene. [sent-52, score-0.582]
23 Superpixels based on Energy-minimizing Clustering As our approach for temporally consistent superpixels is based on energy-minimizing clustering (c. [sent-54, score-0.783]
24 This assignment finally determines the over-segmentation and thus the superpixel generation. [sent-59, score-0.301]
25 In order to find an optimal solution for this assignment problem, an energy function Etotal is defined, which sums up the energy E(n, k) that is needed to assign a data point n ∈ N to a cluster k ∈ K: Etotal=n? [sent-60, score-0.403]
26 Likewise Es (n, k) is proportional to the Euclidean distance of the spatial position of n and the spatial position of the center of cluster k. [sent-63, score-0.424]
27 The initial spatial position of the cluster centers is grid-like including a perturbing of the spatial centers towards the lowest gradient in a 3 3 neighborhood (cseenet e[r9s, 1to]w). [sent-72, score-0.572]
28 As the spatial extent of the superpixels is known to be limited a priori, it is sufficient in the assignment-step to search for pixels only in a limited search window around each cluster center. [sent-78, score-0.832]
29 As a consequence, each temporally consistent superpixel has a single color center for all frames and a separate spatial center for each frame. [sent-85, score-0.87]
30 The motivation for this approach is the observation that the color of matching image regions occupied by a temporally consistent superpixel over multiple frames does not change rapidly in most cases. [sent-87, score-0.72]
31 Therefore, the mean colors of the associated superpixels are –in a first approximation– almost constant over multiple frames. [sent-88, score-0.425]
32 gradual changes of illumination or color over time, we introduce a sliding window approach. [sent-92, score-0.413]
33 For this, a window comprising W consecutive frames is shifted along the video volume frame by frame. [sent-93, score-0.507]
34 This sliding window contains P so called past frames and F so called future frames and one current frame with W = F+P+1. [sent-94, score-0.835]
35 In this example, the frame t is the current frame and it is in the center of the sliding window. [sent-96, score-0.482]
36 The segmentation of the past frames is immutable and thus will not be altered anymore but it influences the superpixel generation in the current frame and future frames. [sent-98, score-0.746]
37 Bottom row: Frames in sliding window (non-transparent) are divided into three groups. [sent-101, score-0.319]
38 future frames is still mutable and thus can change during the optimization. [sent-103, score-0.33]
39 The future frames help to adapt to changes in the scene, whereas the past frames are conservative and try to preserve the superpixel color clustering found. [sent-104, score-0.825]
40 If more past than future frames are used, the update of the color centers is more conservative. [sent-105, score-0.436]
41 Hybrid Clustering Approach The energy function (1) and the energy term (2) as well as the iterative optimization algorithm explained in Section 3 have to be extended to the general idea of global color and local spatial centers. [sent-109, score-0.355]
42 First, we extend the energy term (2) with the frame index τ as the energy Es is now proportional to the distance to the spatial centers in the local frame: E(n, k, τ) = (1−α)Ec(n, k) + αEs (n, k, τ) . [sent-110, score-0.542]
43 (3) Second, we need to sum over all the frames in the sliding window to calculate the total energy with regard to the current frame t: =? [sent-111, score-0.678]
44 After each shift of the sliding window, a number of I iterations of the hybrid clustering algorithm is performed. [sent-121, score-0.389]
45 The colordifference related energy Ec is proportional to the Euclidean distance to the global color center and the spatialdistance-related energy Es is proportional to the Euclidean distance to the local spatial center on frame level. [sent-125, score-0.667]
46 In the update-step, for each cluster a new global color center is calculated using the accumulated color information of those pixels in all frames in the sliding window, which are assigned to this cluster. [sent-126, score-0.786]
47 The spatial centers are updated locally per frame using only the image coordinates of the pixels that are assigned to this cluster in the corresponding frame. [sent-127, score-0.57]
48 In addition, in [15] it was stated that the post-processing method proposed in [1] assigns the isolated superpixel fragments to arbitrary neighboring segments without considering any similarity measure between the isolated fragments and the neighboring segments. [sent-135, score-0.438]
49 In our approach, the contour evolution step is applied for those frames transitioning from the current to the first past Figure3. [sent-140, score-0.382]
50 The contours of the red and yellow cluster can evolve into the unassigned region (Best viewed in color). [sent-145, score-0.302]
51 Thereby, we ede pteorsimtiionne ffroorm mea tc tho c t−lus1te inr tthhee largest spatially coherent part and set the unconnected fragments of the cluster to unassigned and mark them as mutable. [sent-150, score-0.351]
52 The contours of those clusters adjacent to a region marked as mutable can evolve into this region during the contour evolution iterations. [sent-152, score-0.51]
53 In each iteration of the contour evolution the cluster assignment for those pixels at a boundary within a region marked as mutable can be changed. [sent-155, score-0.636]
54 Then, it is assigned to the cluster of one of its adjacent pixels, which × minimizes the energy term (3). [sent-157, score-0.345]
55 In addition, an assignment of a pixel is changed to the cluster of one of its adjacent pixels if the energy term (3) is smaller for this cluster than for the one it was previously assigned to. [sent-158, score-0.603]
56 The iterations are stopped if all pixels in the marked regions are assigned to a cluster and no further changes at the boundaries occur. [sent-159, score-0.368]
57 Initialization As the position of matching image regions and thus the superpixel position can differ in consecutive frames, a concurrent initialization of all frames in the sliding window is not practicable. [sent-163, score-0.811]
58 Therefore, we propose a successive filling of the sliding window according to the following scheme. [sent-164, score-0.319]
59 This frame is positioned at index t+F 3in× ×th3e n sliding rwhoinoddo. [sent-167, score-0.322]
60 Then, the sliding window is shifted, whereby a new 388 frame enters the window at position t+F and the old frame is moved to t+F−1. [sent-171, score-0.7]
61 This procedure is repeated until the sliding window is completely filled. [sent-177, score-0.343]
62 Then the generation of temporally consistent superpixel can further proceed. [sent-178, score-0.542]
63 Thereby, the sliding window is repeatedly shifted as described above until the video sequence is completely processed. [sent-179, score-0.43]
64 The superpixel segmentations of frame t −1 of the sliding window are eslto sreegdm, wenhtiacthio ins sth oef first past −fr1am oef a thnde t shliudsi inmgm wuitnadbolwe. [sent-180, score-0.756]
65 Structural Changes in the Video Volume In general, the generated superpixels should capture the temporal consistency inherent in the video volume as completely as possible. [sent-183, score-0.627]
66 But the continuous adaptation of the superpixels to the video content can lead to steadily growing or shrinking superpixels that tend to violate the constraint of a rather homogeneous size. [sent-184, score-0.919]
67 This effect can be observed in Figure 4 that depicts the temporally consistent label maps of two segmented frames from the soccer sequence that were generated without utilizing any method to ensure a homogeneous size of the superpixels over time. [sent-185, score-1.001]
68 One can see that the superpixels in the right image are squeezed together on the left side of the soccer player while they are huge on the right side. [sent-186, score-0.49]
69 A trivial solution to minimize this effect is to enforce a rather short temporal duration of the generated superpixels (see Section 2). [sent-188, score-0.606]
70 ∀k ∈ K, τ ∈ [t−P;t+F] (5) : Amin < A (k, τ) < Amax , where A (k, τ) is the number of pixels assigned to cluster k in frame τ. [sent-191, score-0.389]
71 We implemented this constrained energy minimization in a first simple but effective approach in our sliding window framework. [sent-192, score-0.419]
72 To meet the constraints the number of pixels assigned to a cluster is traced in two consecutive future frames. [sent-193, score-0.383]
73 If the predicted number of pixels assigned to a cluster is greater than Amax in frame τ = t+F +2 (outside the sliding window) the cluster is split in two. [sent-195, score-0.74]
74 Thereby, each spatial center of the cluster is replaced by two new spatial Figure 4. [sent-196, score-0.321]
75 Label maps of the frames 1and 60 of the soccer sequence segmented with temporal consistency but without a method to cope with structural changes in the video volume. [sent-197, score-0.375]
76 centers in all future frames and its color center is duplicated. [sent-198, score-0.404]
77 The new spatial centers are shifted in opposite directions towards the biggest eigenvector of the spatial distribution of the cluster similar to the superpixel splitting in [23]. [sent-199, score-0.711]
78 In case that –based on this prediction– the number of pixels assigned to a cluster would be lower than Amin in frame τ = t+F + 2, the cluster is terminated by removing its spatial centers from the future frames. [sent-200, score-0.759]
79 If this is not the case the initial number of superpixels is restored by splitting or terminating the biggest or smallest clusters, respectively. [sent-202, score-0.45]
80 Experimental Setup and Performance Metrics We implemented our approach for temporally consistent superpixels (TCS) in MATLAB and compared it with state of the art methods for spatio-temporal over-segmentation. [sent-208, score-0.72]
81 We compared our approach (TCS) against two state of the art supervoxel methods: the SLIC approach for supervoxels (SLIC) [1] and the streaming hierarchical video segmentation (sGBH) [22]. [sent-212, score-0.641]
82 sGBH was selected as a topperforming candidate for the class of streaming capable supervoxel approaches and SLIC was selected as a topperforming candidate for the class of clustering based supervoxel approaches. [sent-213, score-0.517]
83 All diagrams are plotted over the number of supervoxels (Best viewed in color). [sent-224, score-0.432]
84 As described in Section 2, temporally consistent superpixels can be stacked to obtain supervoxels. [sent-226, score-0.721]
85 To evaluate the performance of our method we used the following performance metrics for supervoxels and superpixels. [sent-229, score-0.365]
86 Mean Duration measures the duration of the generated supervoxels or temporally consistent superpixels in terms of number of frames. [sent-231, score-1.146]
87 Benchmark Results The Figures 5 and 6 show the results for the performance metrics over the number of supervoxels as common parameter for the three compared approaches and the two benchmark data sets Chen (see Figure 5) and SegTrack (see Figure 6). [sent-245, score-0.365]
88 It should be added that the number of past frames in the sliding window has a negligible effect on the mean duration while the undersegmentation error, up to some extent, decreases with an increasing number of past frames. [sent-247, score-0.789]
89 1that the past frames preserve the color of superpixels and thus prevent them from e. [sent-249, score-0.704]
90 All diagrams are plotted over the number of supervoxels (Best viewed in color). [sent-264, score-0.432]
91 The label maps show that TCS and SLIC produce more compact superpixels than sGBH. [sent-268, score-0.468]
92 (Best viewed in color) compact superpixels, which –by intuition– makes it easier to capture fine-grained details compared to the more compact superpixels of SLIC and TCS. [sent-269, score-0.539]
93 The visual impression gained from Figure 7 is confirmed by the variance of area, the iso-perimetric quotient and the superpixel compactness that are depicted in Table 1. [sent-271, score-0.456]
94 For each approach a level of detail was selected that generates a comparable number of superpixels or sliced supervoxels with a mean area of approximately 100 pixels. [sent-272, score-0.764]
95 Variance of area (VoA), average iso-perimetric quotient Q and superpixel compactness calculated for the entire data set of Chen for an approximately similar level of detail (100 pixel per superpixel). [sent-277, score-0.408]
96 the lowest variance of area while the iso-perimetric quotient and the superpixel compactness are comparable to SLIC. [sent-278, score-0.464]
97 This indicates that the superpixels generated by TCS and SLIC are more homogeneous in size and more compact in shape than those of sGBH. [sent-279, score-0.536]
98 compact superpixels tend to have a lower average number of neighbors which eases the evaluation of neighborhood relations, and further calculations, e. [sent-287, score-0.468]
99 Complexity Considerations In [1], the SLIC superpixel approach for still images is approximated to have a complexity of O(|N|), where |N| aisp tphreo xniummabteedrs t oof h pixels per image. [sent-292, score-0.3]
100 Using Nthi|s) approxima391 tion, our approach for temporally consistent superpixels has a complexity of O( |N|WV ), where W is the sliding windao cwo msipzlee xiint yfr oafm Oe(s aNnd|W WVV i )s, t wheh enruem Wber is o thf efra slmideisn gin w tihnevideo sequence. [sent-293, score-0.899]
wordName wordTfidf (topN-words)
[('superpixels', 0.425), ('supervoxels', 0.339), ('tcs', 0.294), ('superpixel', 0.245), ('sliding', 0.204), ('temporally', 0.198), ('sgbh', 0.171), ('slic', 0.153), ('cluster', 0.147), ('mutable', 0.147), ('supervoxel', 0.144), ('frames', 0.141), ('frame', 0.118), ('centers', 0.115), ('window', 0.115), ('undersegmentation', 0.108), ('energy', 0.1), ('clustering', 0.088), ('evolution', 0.087), ('etotal', 0.087), ('segtrack', 0.087), ('compactness', 0.082), ('quotient', 0.081), ('contour', 0.08), ('past', 0.074), ('amax', 0.073), ('duration', 0.073), ('consistent', 0.072), ('hybrid', 0.07), ('temporal', 0.069), ('assigned', 0.069), ('spatial', 0.066), ('diagrams', 0.065), ('soccer', 0.065), ('color', 0.064), ('unassigned', 0.06), ('clusters', 0.06), ('coherent', 0.059), ('assignment', 0.056), ('pixels', 0.055), ('amin', 0.054), ('segmentation', 0.05), ('figures', 0.05), ('colordifference', 0.049), ('immutable', 0.049), ('rpnedeiax', 0.049), ('topperforming', 0.049), ('voa', 0.049), ('shifted', 0.047), ('consecutive', 0.046), ('ec', 0.045), ('compact', 0.043), ('streaming', 0.043), ('fragments', 0.043), ('proportional', 0.043), ('center', 0.042), ('future', 0.042), ('spatially', 0.042), ('flickering', 0.04), ('gbh', 0.04), ('video', 0.04), ('marked', 0.04), ('generated', 0.039), ('contours', 0.036), ('es', 0.034), ('lowest', 0.033), ('segments', 0.032), ('utilizing', 0.032), ('reproduced', 0.031), ('evolve', 0.031), ('thereby', 0.031), ('changes', 0.03), ('position', 0.03), ('consistency', 0.03), ('adjacent', 0.029), ('optical', 0.029), ('homogeneous', 0.029), ('stated', 0.029), ('viewed', 0.028), ('established', 0.028), ('voxels', 0.028), ('iterations', 0.027), ('generation', 0.027), ('metrics', 0.026), ('stacked', 0.026), ('biggest', 0.025), ('art', 0.025), ('depicted', 0.025), ('growth', 0.025), ('primitives', 0.025), ('explained', 0.025), ('boundary', 0.024), ('completely', 0.024), ('extent', 0.024), ('euclidean', 0.024), ('chen', 0.024), ('meet', 0.024), ('variance', 0.023), ('isolated', 0.023), ('germany', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
2 0.32415453 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
3 0.32365906 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
Author: Michael Van_Den_Bergh, Gemma Roig, Xavier Boix, Santiago Manen, Luc Van_Gool
Abstract: Superpixel and objectness algorithms are broadly used as a pre-processing step to generate support regions and to speed-up further computations. Recently, many algorithms have been extended to video in order to exploit the temporal consistency between frames. However, most methods are computationally too expensive for real-time applications. We introduce an online, real-time video superpixel algorithm based on the recently proposed SEEDS superpixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of superpixels in the same image or video sequence. The multiple samples are shown to provide a strong cue to efficiently measure the objectness of image windows, and we introduce the novel concept of objectness in temporal windows. Experiments show that the video superpixels achieve comparable performance to state-of-the-art offline methods while running at 30 fps on a single 2.8 GHz i7 CPU. State-of-the-art performance on objectness is also demonstrated, yet orders of magnitude faster and extended to temporal windows in video.
4 0.28875056 282 iccv-2013-Multi-view Object Segmentation in Space and Time
Author: Abdelaziz Djelouah, Jean-Sébastien Franco, Edmond Boyer, François Le_Clerc, Patrick Pérez
Abstract: In this paper, we address the problem of object segmentation in multiple views or videos when two or more viewpoints of the same scene are available. We propose a new approach that propagates segmentation coherence information in both space and time, hence allowing evidences in one image to be shared over the complete set. To this aim the segmentation is cast as a single efficient labeling problem over space and time with graph cuts. In contrast to most existing multi-view segmentation methods that rely on some form of dense reconstruction, ours only requires a sparse 3D sampling to propagate information between viewpoints. The approach is thoroughly evaluated on standard multiview datasets, as well as on videos. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. With multiple videos, we report results that demonstrate the benefit of segmentation propagation through temporal cues.
5 0.2277464 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
6 0.20916297 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
7 0.15342925 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
8 0.1373105 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
9 0.13443638 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
10 0.12981583 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
11 0.12833156 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
12 0.11523504 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
13 0.10882149 317 iccv-2013-Piecewise Rigid Scene Flow
14 0.10351512 82 iccv-2013-Compensating for Motion during Direct-Global Separation
15 0.094679549 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
16 0.082934886 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
17 0.07564301 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
18 0.073113017 143 iccv-2013-Estimating Human Pose with Flowing Puppets
19 0.072927289 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
20 0.071508124 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
topicId topicWeight
[(0, 0.188), (1, -0.058), (2, 0.089), (3, 0.065), (4, 0.034), (5, 0.079), (6, -0.09), (7, 0.093), (8, 0.027), (9, -0.098), (10, -0.006), (11, 0.159), (12, 0.067), (13, 0.064), (14, -0.133), (15, -0.031), (16, -0.092), (17, -0.096), (18, -0.14), (19, -0.051), (20, 0.1), (21, -0.151), (22, -0.044), (23, -0.047), (24, -0.169), (25, -0.09), (26, -0.078), (27, 0.011), (28, -0.138), (29, 0.035), (30, 0.061), (31, -0.001), (32, -0.076), (33, 0.054), (34, 0.173), (35, -0.168), (36, 0.26), (37, -0.145), (38, -0.075), (39, 0.009), (40, 0.053), (41, -0.121), (42, -0.079), (43, -0.071), (44, -0.052), (45, 0.021), (46, 0.009), (47, 0.071), (48, 0.045), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.94807452 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
2 0.86817044 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
Author: Michael Van_Den_Bergh, Gemma Roig, Xavier Boix, Santiago Manen, Luc Van_Gool
Abstract: Superpixel and objectness algorithms are broadly used as a pre-processing step to generate support regions and to speed-up further computations. Recently, many algorithms have been extended to video in order to exploit the temporal consistency between frames. However, most methods are computationally too expensive for real-time applications. We introduce an online, real-time video superpixel algorithm based on the recently proposed SEEDS superpixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of superpixels in the same image or video sequence. The multiple samples are shown to provide a strong cue to efficiently measure the objectness of image windows, and we introduce the novel concept of objectness in temporal windows. Experiments show that the video superpixels achieve comparable performance to state-of-the-art offline methods while running at 30 fps on a single 2.8 GHz i7 CPU. State-of-the-art performance on objectness is also demonstrated, yet orders of magnitude faster and extended to temporal windows in video.
3 0.73852426 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
4 0.69565475 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
5 0.64783746 282 iccv-2013-Multi-view Object Segmentation in Space and Time
Author: Abdelaziz Djelouah, Jean-Sébastien Franco, Edmond Boyer, François Le_Clerc, Patrick Pérez
Abstract: In this paper, we address the problem of object segmentation in multiple views or videos when two or more viewpoints of the same scene are available. We propose a new approach that propagates segmentation coherence information in both space and time, hence allowing evidences in one image to be shared over the complete set. To this aim the segmentation is cast as a single efficient labeling problem over space and time with graph cuts. In contrast to most existing multi-view segmentation methods that rely on some form of dense reconstruction, ours only requires a sparse 3D sampling to propagate information between viewpoints. The approach is thoroughly evaluated on standard multiview datasets, as well as on videos. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. With multiple videos, we report results that demonstrate the benefit of segmentation propagation through temporal cues.
6 0.54026282 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
7 0.52448684 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
8 0.50314569 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
9 0.48418969 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
10 0.41959244 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
11 0.37373561 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
12 0.35122544 145 iccv-2013-Estimating the Material Properties of Fabric from Video
13 0.33382976 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
14 0.3222779 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
15 0.31888455 128 iccv-2013-Dynamic Probabilistic Volumetric Models
16 0.3092815 329 iccv-2013-Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
17 0.30453646 416 iccv-2013-The Interestingness of Images
18 0.30148822 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
19 0.29353517 82 iccv-2013-Compensating for Motion during Direct-Global Separation
20 0.28616002 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
topicId topicWeight
[(2, 0.073), (12, 0.011), (13, 0.013), (26, 0.177), (31, 0.039), (40, 0.023), (42, 0.103), (48, 0.01), (64, 0.074), (73, 0.028), (76, 0.186), (78, 0.011), (89, 0.144), (98, 0.011)]
simIndex simValue paperId paperTitle
1 0.85141224 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
Author: Mohit Gupta, Daisuke Iso, Shree K. Nayar
Abstract: Exposure bracketing for high dynamic range (HDR) imaging involves capturing several images of the scene at different exposures. If either the camera or the scene moves during capture, the captured images must be registered. Large exposure differences between bracketed images lead to inaccurate registration, resulting in artifacts such as ghosting (multiple copies of scene objects) and blur. We present two techniques, one for image capture (Fibonacci exposure bracketing) and one for image registration (generalized registration), to prevent such motion-related artifacts. Fibonacci bracketing involves capturing a sequence of images such that each exposure time is the sum of the previous N(N > 1) exposures. Generalized registration involves estimating motion between sums of contiguous sets of frames, instead of between individual frames. Together, the two techniques ensure that motion is always estimated betweenframes of the same total exposure time. This results in HDR images and videos which have both a large dynamic range andminimal motion-relatedartifacts. We show, by results for several real-world indoor and outdoor scenes, that theproposed approach significantly outperforms several ex- isting bracketing schemes.
same-paper 2 0.8416416 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
3 0.80024505 395 iccv-2013-Slice Sampling Particle Belief Propagation
Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn
Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.
4 0.79997492 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
5 0.79464936 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
Author: Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
Abstract: As a special topic in computer vision, , fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with finegrained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learn- ing (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-ofthe-art classification accuracy in the Caltech-UCSD-Birds200-2011 dataset by making full use of the ground-truth part annotations.
6 0.79214615 282 iccv-2013-Multi-view Object Segmentation in Space and Time
7 0.789343 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
8 0.78609961 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
9 0.78390408 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
10 0.78264666 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
11 0.77797788 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
12 0.77794135 405 iccv-2013-Structured Light in Sunlight
13 0.7767275 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
14 0.77605569 221 iccv-2013-Joint Inverted Indexing
15 0.77019721 150 iccv-2013-Exemplar Cut
16 0.76740587 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
17 0.76375973 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images
18 0.76229864 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
19 0.75671506 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
20 0.75328469 118 iccv-2013-Discovering Object Functionality