iccv iccv2013 iccv2013-76 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
Reference: text
sentIndex sentText sentNum sentScore
1 Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. [sent-7, score-1.395]
2 In this paper, we propose an exact, general and efficient coarse-to-fine energy minimization strategy for image and video segmentation, which can be used to speedup any approximate energy minimization approach (e. [sent-37, score-0.409]
3 Therefore, contiguous supervoxels (both in space and in time) are very likely to have the same label. [sent-41, score-0.516]
4 For instance, if we consider supervoxels of size k k superpixels spanning k frames, then the nvuoxmeblser o fo sfi possible segmentations fnoirn gth ke example eabno thvee reduces to which is significant even for k = 2. [sent-43, score-0.558]
5 To capture the spatial and temporal continuity of a video, we define a hierarchical abstraction of the supervoxel graph such that most supervoxels at a coarse level correspond to a single label. [sent-44, score-1.503]
6 We use a hierarchical graph-based supervoxel segmentation method (see [3 1] for an overview) to identify the supervoxels (at various scales) that are likely to have the same label. [sent-46, score-1.277]
7 Such methods create a supervoxel tree with the biggest (coarsest) supervoxels at the highest level. [sent-47, score-1.201]
8 The top row shows the various abstraction levels in the supervoxel tree. [sent-53, score-0.777]
9 The second row shows the portion of the supervoxel tree explored by our coarse-to-fine scheme to find the optimal labeling of segments. [sent-54, score-0.801]
10 Given this hierarchy, we construct a series of energy functions for different levels of abstraction and propose a coarse-to-fine inference scheme that minimizes these energies to find an optimal segmentation at the finest level of the hierarchy. [sent-57, score-0.662]
11 To define the different energy functions, we first augment the set of labels with an auxiliary label called mixed, which accounts for the fact that coarse supervoxels may contain finer supervoxels with more than one pure label. [sent-58, score-1.291]
12 We then define the unary, pairwise and higherorder costs of the energy at any level of the hierarchy as lower bounds for the costs at the finest level. [sent-59, score-0.678]
13 Our coarseto-fine inference scheme starts by performing inference at the coarsest level of the supervoxel hierarchy using any inference method (e. [sent-61, score-1.213]
14 If the solution at the current level of refinement is such that no supervoxel is assigned the mixed label, then an optimal solution at the finest level has been found by performing inference over a very coarse graph. [sent-64, score-1.283]
15 Otherwise, the mixed supervoxels are refined into its constituent (finer) supervoxels, and a new inference problem is solved over both coarse and fine supervoxels. [sent-65, score-0.814]
16 In general, it is very hard to know if the proposed scheme is more efficient that direct inference over the finest layer. [sent-67, score-0.337]
17 Clearly if the hierarchy of supervoxels is poorly constructed so that many refinement cycles are needed, our method could be less efficient because it solves too many small inference problems. [sent-68, score-0.734]
18 One line of work in hierarchical video segmentation is a bottomup approach based on merging supervoxels using similarity metrics based on variation of intensity inside a supervoxel [13, 18]. [sent-74, score-1.364]
19 Nonetheless, the supervoxel tree obtained by these approaches can be used as the abstraction hierarchy in our framework. [sent-76, score-0.911]
20 Another line of work defines a hierarchical cost function over supervoxels at all levels. [sent-77, score-0.604]
21 Specifically, the works of [22, 25] solve a multilayer optimization problem, while we optimize a cost function defined at the finest layer only. [sent-83, score-0.259]
22 To do this more efficiently, we use the supervoxel tree to iteratively refine the parts of the video that could have more than one label. [sent-84, score-0.773]
23 However, unlike our method, the abstraction used is image-agnostic and the messages at 11886666 a coarse level are only used to initialize messages at the finer level and not to prevent expanding all nodes. [sent-89, score-0.309]
24 Also, we can use any of these algorithms [21, 14, 10, 4] to solve the energy minimization problem in each iteration (at the current level of refinement of the graph) and hence they can complement our hierarchical inference algorithm. [sent-93, score-0.366]
25 We would like to emphasize that further advances in supervoxel tree creation and energy minimization (both being integral components of our approach) will further increase the speedup of our hierarchical algorithm. [sent-95, score-0.99]
26 For the sake of concreteness, we will describe our formulation using a RF whose nodes are the supervoxels in a video. [sent-98, score-0.537]
27 , Lod}e, which represents the category label at supervoxel vi ∈, V}. [sent-105, score-0.74]
28 , eij ∈ E if supervoxels iand j share a common boundary. [sent-110, score-0.492]
29 The labeling of all the nodes in clique c is denoted by a vector xc ∈ L|c| , while the labeling of all the supervoxels in a video is∈ d Lenoted by x ∈ L|V| . [sent-113, score-0.742]
30 j (1) The unary potential, ψiU (xi, V ), captures the cost of assigning the label xi ∈ L to the supervoxel vi in video V . [sent-123, score-0.955]
31 Unary potentials are usually o tbhtea siunepder by training a classifier for × every class on appropriate supervoxel descriptors extracted from the videos in the training data. [sent-124, score-0.773]
32 The higher-order potential ψcH (xc, V ) for video V captures the cost of assigning a label xc to all the supervoxels inside clique c, and can be used to measure the consistency of the labels of all supervoxels inside c. [sent-127, score-1.274]
33 Therefore, energy minimization is usually done using approximate inference methods such as graph cuts, belief propagation, and their extensions to higher-order potentials. [sent-133, score-0.315]
34 For instance, in the case of a video with around 100 frames, each one having a resolution of 960 720 pixe10ls0, t fhrea mneusm,eb aerc ho fo supervoxels rceosoullud easily 9b6e on t 7he2 0o prdixe-r 100, 000. [sent-135, score-0.556]
35 Our approach exploits the fact that labels are coherent both in space and in time, hence we expect many large, contiguous patches of supervoxels to have the same category label. [sent-139, score-0.546]
36 Supervoxel hierarchy The first step in our approach is the construction of a hierarchical supervoxel tree [13, 18]. [sent-142, score-0.884]
37 , level m) contains the biggest supervoxels and the finest level (i. [sent-145, score-0.825]
38 A supervoxel at site iand level j is denoted by vij and its label by xij . [sent-148, score-0.889]
39 The set of all supervoxels at level j is denoted by Vj and its labeling by xj . [sent-149, score-0.605]
40 The refinement of a supervoxel bviyj (j ≥ 2) is the set of supervoxels at the next finer level (j −(1j) t≥hat 2 occupy tsheet same pseertv oofx veolsxe atls hine et nhee xvtid feinoe as evvije . [sent-150, score-1.286]
41 For k < j, we also let R(i, j,ka)s ⊂ (Vi,kj ,dje n−ot 1e) t h⊂e Vset of supervoxels wobeta ailnsoed l by refining )vij ⊂ f oVr j k times. [sent-152, score-0.492]
42 The reverse function Parent : Vj → Vjf+o1r maps a supervoxel rtoe vietsr parent supervoxel att :th Ve n→ext coarser level. [sent-153, score-1.357]
43 In this paper, we will only consider hierarchical supervoxel trees, and hence each supervoxel has a unique parent. [sent-154, score-1.382]
44 The supervoxel hierarchy can be obtained by running any of the existing hierarchical video segmentation algorithms, such as those in [18, 3 1]. [sent-155, score-0.96]
45 By varying this parameter, we can get the desired supervoxel hierarchy. [sent-157, score-0.659]
46 Coarse-to-fine inference scheme Given a hierarchical supervoxel tree, we propose a coarse-to-fine algorithm for efficient inference. [sent-160, score-0.849]
47 The likely scenario is when all the (contiguous) supervoxels at level j − 1 that constitute a supervoxel at level j get the same jla −bel 1 1fr thomat t choen ssetitt Lute. [sent-162, score-1.301]
48 T ah esu unlikely lsc aetn laervieol i js gwehten th a supervoxel afrto lmeve thl j sheats L c. [sent-163, score-0.659]
49 To represent the latter scenario, we introduce a new label L + 1, to denote the case where a supervoxel vij (j ≥ 2) has constituents with more than one label. [sent-165, score-0.819]
50 Of course, only supervoxels that can be further refined can have the mixed label, i. [sent-167, score-0.605]
51 We start by finding a labeling for the coarsest supervoxels in Vm from the augmented label sthete L coAa. [sent-177, score-0.687]
52 r Tehstis s labeling liss fi nou Vnd by minimizing EVm (x, V ) using some inference algorithm A, which can be graph cuts, = ubesilinegf propagation or some hlimne Aar, program, depending on the form ofthe energy function being optimized. [sent-178, score-0.342]
53 All current supervoxels (in Vm) that receive a label L + 1are replaced sinu ptherev ocuxerrlesn (ti optimization problem (at level m) by their constituent supervoxels from the next finer level (m − 1). [sent-179, score-1.272]
54 cTohniss iretufiennetm seunpte risv always possible, sxitn cfine a supervoxel can only receive the mixed label if it can be further refined. [sent-180, score-0.833]
55 For example, we can have a pair of neighboring supervoxels vi1j1 and vi2j2 with j1 j2. [sent-184, score-0.492]
56 As before, we can obtain a labeling for the supervoxels in Vcurr by minimizing tEaiVncu arr l (abx,e V ) using algorithm xAel. [sent-188, score-0.563]
57 W ine V can then refine a supervoxel v,jiV th )a uts riengcei avlgeso rthiteh mmi Axe. [sent-189, score-0.659]
58 d W Wlaebe cla Ln t+h e1n by replacing it by its constituent supervoxels in R(i, j,j −1). [sent-190, score-0.548]
59 We repeat itht ibsy process iteratively, eurnvtoilx aell s supervoxels receive pure elaa-t bels. [sent-191, score-0.552]
60 Since every supervoxel eventually refines to its finest constituents, which in turn can only take pure labels, this process is guaranteed to terminate. [sent-192, score-0.908]
61 Also, at any point in the algorithm, there exists exactly one ancestor of every finest level supervoxel vi1 in the current set of supervoxels. [sent-193, score-0.931]
62 Exactness of the coarse-to-fine solution To make our coarse-to-fine inference scheme converge to the same labeling as that obtained by running A on the finest tlehevesla omftehlea supervoxel hierarchy (e. [sent-197, score-1.159]
63 , a fnlaint graph ctuhtes algorithm), the potentials of the energy at a coarse level, EVcurr , need to be chosen in a specific manner. [sent-199, score-0.264]
64 Since our goal is to minimize the energy function E, the admissible heuristics for the unary, pairwise and higher-order potentials of EVcurr need to be chosen as lower bounds for the values of the corresponding potentials of E. [sent-201, score-0.426]
65 Specifically, let x denote any label assignment for the finest level supervoxels and let x∗ denote the optimal labeling. [sent-202, score-0.823]
66 These inequalities ensure that when we terminate upon finding a pure labeling for the current set of supervoxels (at various levels), all other possible assignments have a higher or equal cost (since their lower bound cost is worse than the current optimal cost of the pure labeling). [sent-205, score-0.804]
67 3, in order for Algorithm 1 to converge to the same solution as that obtained by running A on the finest level, the potentials associated with 11886688 the nodes at the coarse levels should be lower-bounds on the cost associated with the patches of fine nodes constituting these coarse nodes. [sent-215, score-0.585]
68 For the sake of simplicity, we discuss the construction of the lower bounds for an energy function consisting of unary and pairwise terms only. [sent-217, score-0.262]
69 We define the unary cost ψ(Ui,j) (xij) of assigning a pure label l ∈ L to a coarse supervoxel vij at level j as the sum ofthe unary costs of assigning label lto all the nodes at level 1that constitute vij, i. [sent-220, score-1.416]
70 We define the unary cost of assigning a mixed label to a coarse supervoxel as the minimum cost associated with the RF defined by the constituent supervoxels at level 1subject to the constraint that all the constituent supervoxels cannot get the same label. [sent-228, score-2.234]
71 We define the pairwise potential of coarse supervoxels vji11 and vji22, ψ(Pi1,j1)(i2,j2)(xij11,xij22), as ? [sent-234, score-0.609]
72 Therefore, the cost Eo f: tihe ∈ edge betw,e1)en,j jtw ∈o coarse supervoxels vji11 aen,d th vji22 oiss tth oef sum of the costs of the edges connecting the constituent supervoxels of and at level 1. [sent-237, score-1.258]
73 In the case where one of the supervoxels gets the mixed label, the potential associated to the edge is set to zero. [sent-238, score-0.604]
74 Practical considerations As discussed earlier, in general it is very hard to know if the proposed scheme is more efficient that direct inference over the finest layer. [sent-242, score-0.337]
75 Consider two scenarios: all the lower bounds on potentials in scenario 1 are tighter than the corresponding bounds in scenario 2, for the same supervoxel hierarchy. [sent-245, score-0.951]
76 In most cases, only a small number of nodes in the supervoxel tree are used in the entire inference procedure. [sent-252, score-0.84]
77 Thus, we can save computation time by not computing the entire supervoxel tree upfront, and only refining the supervoxels with the mixed label when needed. [sent-253, score-1.353]
78 This on-demand refinement scheme, however, can be more expensive if we end up expanding most of the nodes in the supervoxel tree. [sent-254, score-0.749]
79 Such a hierarchical scheme would have a mixed label at every label level. [sent-260, score-0.315]
80 For more details on how to manage a label hierarchy simulta- neously with a supervoxel hierarchy, we refer the reader to [8], where such a scenario is considered. [sent-261, score-0.857]
81 Moreover, since our goal is to find an optimal labeling for the finest layer of the hierarchy only, the experiments are not designed to find the optimal labeling at every layer of the hierarchy. [sent-265, score-0.426]
82 We simply use the more abstract layers and the associated lower bound costs to find the optimal labeling at the finest layer. [sent-266, score-0.326]
83 While the proposed coarse-to-fine scheme can be exponentially faster than flat optimization in the best case, it can also be much slower when all the supervoxels need to be refined down to their finest level. [sent-267, score-0.843]
84 The unary potential ψiU(x, V ) is defined as the cost of assigning a class label x to supervoxel vi. [sent-291, score-0.888]
85 This cost is obtained as the score of an SVM classifier applied to the descriptor di of supervoxel vi. [sent-292, score-0.707]
86 This classifier is trained on the supervoxel descriptors for each class. [sent-293, score-0.659]
87 The supervoxel descriptor needs to be chosen such that it captures the discriminative characteristics of the supervoxels (both appearance and motion attributes) across various classes. [sent-295, score-1.151]
88 For our experiments, we used 5 levels (including the coarsest and finest levels). [sent-301, score-0.322]
89 The number of supervoxels at the two extreme levels and time taken in minutes (using the on-demand generation scheme) is detailed in Table 2. [sent-302, score-0.519]
90 = xj), (4) where lij is the area of the common boundary between the supervoxels vi and vj and Ii denotes the average intensity of supervoxel vi. [sent-305, score-1.209]
91 We used the LIBSVX [2] library’s implementation of a graph-based hierarchical method [18] to generate the supervoxel trees. [sent-310, score-0.723]
92 Notice that these speedups do not account for the supervoxel tree creation time, which is only required by our algorithm. [sent-317, score-0.737]
93 Thus, our approach provides significant speed up, even including the time to construct the supervoxel tree. [sent-320, score-0.659]
94 Besides computation time, it is also informative to look at the explored portions of the supervoxel tree to get a better understanding of where the computational savings come from. [sent-325, score-0.737]
95 The blacked out superpixels at each level are the ones which never received the mixed label and hence were never refined. [sent-329, score-0.288]
96 We do not show any segmentation results in the paper since the quality of the segmentation is the same as what would be obtained by using α-expansion or belief propagation with the chosen energy function. [sent-332, score-0.352]
97 3e79as do not include supervoxel tree computation time. [sent-348, score-0.709]
98 A flat problem formulation works with the latter large set, while we use an abstraction scheme (namely supervoxel trees) to identify the former smaller set and work on the smaller problem. [sent-361, score-0.841]
99 Percentage of correctly classified supervoxels after every iteration of the coarse-to-fine belief propagation algorithm. [sent-363, score-0.636]
100 It is also exact since it uses admissible heuristic costs for the coarser supervoxel potentials. [sent-365, score-0.827]
wordName wordTfidf (topN-words)
[('supervoxel', 0.659), ('supervoxels', 0.492), ('finest', 0.211), ('suny', 0.127), ('hierarchy', 0.111), ('speedup', 0.107), ('mixed', 0.093), ('camvid', 0.093), ('abstraction', 0.091), ('inference', 0.086), ('coarsest', 0.084), ('energy', 0.084), ('belief', 0.081), ('potentials', 0.075), ('evcurr', 0.075), ('vcurr', 0.075), ('vij', 0.071), ('unary', 0.07), ('coarse', 0.067), ('video', 0.064), ('hierarchical', 0.064), ('propagation', 0.063), ('admissible', 0.063), ('segmentation', 0.062), ('level', 0.061), ('label', 0.059), ('constituent', 0.056), ('bounds', 0.056), ('labeling', 0.052), ('flat', 0.051), ('tree', 0.05), ('cost', 0.048), ('rf', 0.047), ('refinement', 0.045), ('nodes', 0.045), ('superpixels', 0.045), ('xvcurr', 0.045), ('costs', 0.042), ('cuts', 0.042), ('labelings', 0.041), ('scheme', 0.04), ('spikes', 0.04), ('xij', 0.039), ('coarser', 0.039), ('videos', 0.039), ('pure', 0.038), ('graph', 0.038), ('xc', 0.037), ('vj', 0.036), ('assigning', 0.033), ('continuity', 0.031), ('pairwise', 0.031), ('labels', 0.03), ('blacked', 0.03), ('constituents', 0.03), ('evm', 0.03), ('ipj', 0.03), ('conference', 0.029), ('finer', 0.029), ('exponentially', 0.029), ('tighter', 0.028), ('speedups', 0.028), ('portions', 0.028), ('scenario', 0.028), ('levels', 0.027), ('exactness', 0.026), ('xcf', 0.026), ('minimization', 0.026), ('cliques', 0.025), ('chatterjee', 0.024), ('pylon', 0.024), ('python', 0.024), ('exact', 0.024), ('temporally', 0.024), ('contiguous', 0.024), ('bottomup', 0.023), ('integer', 0.023), ('semantic', 0.023), ('receive', 0.022), ('vi', 0.022), ('iu', 0.022), ('heuristics', 0.021), ('lower', 0.021), ('segmentations', 0.021), ('refinements', 0.021), ('ieee', 0.021), ('ice', 0.02), ('eed', 0.02), ('refined', 0.02), ('higherorder', 0.019), ('viterbi', 0.019), ('minimizing', 0.019), ('potential', 0.019), ('assignments', 0.019), ('vm', 0.019), ('strategy', 0.018), ('termination', 0.018), ('superpixel', 0.018), ('frames', 0.018), ('ladicky', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
2 0.56362724 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
3 0.32415453 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
4 0.14323251 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
Author: Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jiménez Cárdenas, Thomas Brox, Bernt Schiele
Abstract: Video segmentation research is currently limited by the lack of a benchmark dataset that covers the large variety of subproblems appearing in video segmentation and that is large enough to avoid overfitting. Consequently, there is little analysis of video segmentation which generalizes across subtasks, and it is not yet clear which and how video segmentation should leverage the information from the still-frames, as previously studied in image segmentation, alongside video specific information, such as temporal volume, motion and occlusion. In this work we provide such an analysis based on annotations of a large video dataset, where each video is manually segmented by multiple persons. Moreover, we introduce a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that reflects the tradeoff between over-segmentation and segmentation accuracy.
5 0.11276389 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
Author: Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Abstract: Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
6 0.082487449 282 iccv-2013-Multi-view Object Segmentation in Space and Time
7 0.082113408 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
8 0.079371221 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
9 0.077713311 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
10 0.077436052 150 iccv-2013-Exemplar Cut
11 0.073541693 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
12 0.073142901 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
13 0.069415972 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
14 0.068008624 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
15 0.067125887 309 iccv-2013-Partial Enumeration and Curvature Regularization
16 0.065315217 81 iccv-2013-Combining the Right Features for Complex Event Recognition
17 0.063742042 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
18 0.061164733 432 iccv-2013-Uncertainty-Driven Efficiently-Sampled Sparse Graphical Models for Concurrent Tumor Segmentation and Atlas Registration
19 0.058222748 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
20 0.058152612 379 iccv-2013-Semantic Segmentation without Annotating Segments
topicId topicWeight
[(0, 0.144), (1, -0.016), (2, 0.03), (3, 0.027), (4, 0.055), (5, 0.068), (6, -0.099), (7, 0.079), (8, 0.04), (9, -0.134), (10, -0.047), (11, 0.105), (12, 0.04), (13, 0.077), (14, -0.026), (15, 0.074), (16, -0.097), (17, -0.11), (18, -0.126), (19, -0.012), (20, 0.022), (21, -0.13), (22, -0.049), (23, 0.006), (24, -0.181), (25, -0.139), (26, -0.013), (27, -0.025), (28, -0.082), (29, -0.003), (30, 0.145), (31, -0.12), (32, -0.101), (33, -0.065), (34, 0.208), (35, -0.263), (36, 0.155), (37, -0.135), (38, -0.016), (39, 0.117), (40, -0.006), (41, -0.152), (42, -0.012), (43, -0.171), (44, 0.077), (45, 0.107), (46, -0.134), (47, 0.011), (48, -0.155), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.92057037 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
2 0.8742435 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
3 0.64673012 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
4 0.51067102 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
Author: Michael Van_Den_Bergh, Gemma Roig, Xavier Boix, Santiago Manen, Luc Van_Gool
Abstract: Superpixel and objectness algorithms are broadly used as a pre-processing step to generate support regions and to speed-up further computations. Recently, many algorithms have been extended to video in order to exploit the temporal consistency between frames. However, most methods are computationally too expensive for real-time applications. We introduce an online, real-time video superpixel algorithm based on the recently proposed SEEDS superpixels. A new capability is incorporated which delivers multiple diverse samples (hypotheses) of superpixels in the same image or video sequence. The multiple samples are shown to provide a strong cue to efficiently measure the objectness of image windows, and we introduce the novel concept of objectness in temporal windows. Experiments show that the video superpixels achieve comparable performance to state-of-the-art offline methods while running at 30 fps on a single 2.8 GHz i7 CPU. State-of-the-art performance on objectness is also demonstrated, yet orders of magnitude faster and extended to temporal windows in video.
5 0.44745857 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
Author: Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jiménez Cárdenas, Thomas Brox, Bernt Schiele
Abstract: Video segmentation research is currently limited by the lack of a benchmark dataset that covers the large variety of subproblems appearing in video segmentation and that is large enough to avoid overfitting. Consequently, there is little analysis of video segmentation which generalizes across subtasks, and it is not yet clear which and how video segmentation should leverage the information from the still-frames, as previously studied in image segmentation, alongside video specific information, such as temporal volume, motion and occlusion. In this work we provide such an analysis based on annotations of a large video dataset, where each video is manually segmented by multiple persons. Moreover, we introduce a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that reflects the tradeoff between over-segmentation and segmentation accuracy.
6 0.37707877 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
7 0.3603237 282 iccv-2013-Multi-view Object Segmentation in Space and Time
8 0.34016344 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
10 0.31247503 395 iccv-2013-Slice Sampling Particle Belief Propagation
11 0.29769695 145 iccv-2013-Estimating the Material Properties of Fabric from Video
13 0.29326844 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
14 0.29228565 329 iccv-2013-Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
15 0.28929615 324 iccv-2013-Potts Model, Parametric Maxflow and K-Submodular Functions
16 0.2856217 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
17 0.28295937 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
18 0.27368051 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
19 0.26874429 150 iccv-2013-Exemplar Cut
20 0.24652548 148 iccv-2013-Example-Based Facade Texture Synthesis
topicId topicWeight
[(2, 0.057), (7, 0.012), (12, 0.018), (26, 0.119), (31, 0.054), (35, 0.014), (40, 0.014), (42, 0.08), (48, 0.011), (64, 0.037), (73, 0.022), (86, 0.226), (89, 0.19), (95, 0.021)]
simIndex simValue paperId paperTitle
Author: Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
Abstract: Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities “in-the-wild”. We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects; we also use a web-scale language model to “fill in ” novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.
same-paper 2 0.82835478 76 iccv-2013-Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees
Author: Aastha Jain, Shuanak Chatterjee, René Vidal
Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmentation. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as minimizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy relative to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
3 0.74666828 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
4 0.73262429 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
Author: Chih-Yuan Yang, Ming-Hsuan Yang
Abstract: The goal of single-image super-resolution is to generate a high-quality high-resolution image based on a given low-resolution input. It is an ill-posed problem which requires exemplars or priors to better reconstruct the missing high-resolution image details. In this paper, we propose to split the feature space into numerous subspaces and collect exemplars to learn priors for each subspace, thereby creating effective mapping functions. The use of split input space facilitates both feasibility of using simple functionsfor super-resolution, and efficiency ofgenerating highresolution results. High-quality high-resolution images are reconstructed based on the effective learned priors. Experimental results demonstrate that theproposed algorithmperforms efficiently and effectively over state-of-the-art methods.
5 0.73203772 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
6 0.73161519 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
7 0.73062527 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
8 0.73033023 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
9 0.7286098 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
10 0.72660649 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
11 0.72544068 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
12 0.72530693 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
13 0.72520792 18 iccv-2013-A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution
14 0.72482646 150 iccv-2013-Exemplar Cut
15 0.72456586 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
16 0.72448993 309 iccv-2013-Partial Enumeration and Curvature Regularization
17 0.72373283 258 iccv-2013-Low-Rank Sparse Coding for Image Classification
18 0.72271442 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
19 0.7225281 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition
20 0.72216195 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation