iccv iccv2013 iccv2013-442 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. [sent-4, score-0.434]
2 Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. [sent-5, score-1.095]
3 By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. [sent-7, score-1.5]
4 Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. [sent-8, score-1.498]
5 Recent advances show that one can create a pool of several hundred overlapping figure-ground segment proposals so that for most objects in the scene, at least one segment in the pool covers 70 − 80% of it. [sent-13, score-1.234]
6 A video sequence offers rich motion and depth cues, from where we could hope to automatically suppress temporally inconsistent segments and obtain good object proposals with fewer segments in the pool. [sent-19, score-0.785]
7 We propose to solve unsupervised video segmentation by simultaneously tracking all segments from segment pools generated by figure-ground segmentation at each frame. [sent-25, score-1.294]
8 Initially, one segment track is initialized from each unsupervised segment generated in the first frame. [sent-26, score-1.225]
9 Then for each segment track, a persistent global appearance model is incrementally trained: at each frame, predictions from the models are used to find the segment that best matches each track, and those matching segments are then used to update the track models. [sent-27, score-1.627]
10 We propose to train the appearance model for each track using all segments in the pools from all previous frames. [sent-33, score-0.582]
11 At each frame, the target output is the overlap between a segment and the matching segment of the track at that frame. [sent-34, score-1.302]
12 Different part segments have different – – target outputs based on their spatial overlap with the matching segment. [sent-35, score-0.44]
13 Segments that do not overlap the matching segment would have a target of 0. [sent-36, score-0.639]
14 We assume that all tracks start from the first frame, so that the training examples are the same for all the segment tracks. [sent-39, score-0.818]
15 Then it turns out most ofthe costly operations for solving least squares need to be done only once, regardless of the number of segment tracks. [sent-40, score-0.549]
16 This formulation allows us to track hundreds of segments with fairly low complexities in both time and space. [sent-41, score-0.513]
17 Our matching framework assumes that at least one good segment is present in most frames for each object. [sent-42, score-0.591]
18 Moreover, we aggressively prune the segment tracks: at each frame, if multiple segment tracks match to the same segment, then only the track with the highest score is retained. [sent-43, score-1.451]
19 Remarkably, our long-term appearance models are robust enough, so that under such strong assumptions and aggressive pruning, we are still able to cover most objects in the testing videos, while reducing the average number of tracks to 60 from about 1, 200 initial segments per frame. [sent-44, score-0.697]
20 Given the fully learnt appearance models, we adopt a recent composite statistical inference (CSI) approach [22] to refine the segments in the previous frames. [sent-45, score-0.596]
21 CSI breaks segment proposals into superpixels and recombines the superpixels by optimizing the likelihood of predictions on the segment proposals given by the appearance model. [sent-47, score-1.525]
22 This framework reflects our attempt to test the validity of using and tracking holistic figure-ground segment proposals for video segmentation. [sent-50, score-0.816]
23 To extend it into a practical tracking algorithm, we would need to lift the assumption that all segment tracks start from the first frame. [sent-51, score-0.942]
24 One can use the proposed framework for multiple intervals of several seconds and regard the generated segment tracks as tracklets. [sent-53, score-0.788]
25 The local tracking methods usually track non-overlapping feature points/superpixels and hence are different from our approach that tracks overlapping holistic segments. [sent-59, score-0.629]
26 [25] proposes a maximal weighted clique framework to optimally link segments in each frame, their mutual exclusion constraint allows only one segment to be selected in each frame, thus segments that partially match the segment tracks are not utilized. [sent-65, score-1.863]
27 Our segment tracking scheme however uses segment proposals which are better boundary-aligned than bounding boxes. [sent-69, score-1.207]
28 Besides the video segmentation work that utilizes segment tracking [29, 34, 6], a great deal of research have been on segment tracking with active contours [13, 28, 5], which require a user-drawn region in the first frame. [sent-77, score-1.413]
29 Our segment tracking does not have a requirement for user initialization. [sent-78, score-0.615]
30 • Compute appearance features for each segment in all fCroammepsu. [sent-83, score-0.568]
31 • Initialize a segment track for each segment in the first fIrnaimtiael. [sent-84, score-1.154]
32 i • Simultaneously learn the appearance models for all segment etroaucsklsy by multi-output regression (eSlsec f. [sent-85, score-0.599]
33 l 2193 Generate segment pools using multiple figure-ground segmentation Incremental regression, at each frame, match one Composite Statistical Inference H W415. [sent-88, score-0.651]
34 Each segment from the first frame spawns a segment track, and the appearance models of all the tracks are learnt incrementally and simultaneously. [sent-95, score-1.531]
35 At each frame, a segment that best matches each appearance model is found, and then all the segments are added to the training, with the target outputs decided by the overlap with the matching segments (middle). [sent-96, score-1.3]
36 Finally, in order to refine the segments, the learned models are tested on all segments across all frames, then relevant regions for each segment track are broken into superpixels and an optimal configuration of the superpixels is found through a composite statistical inference (right). [sent-97, score-1.303]
37 • Match segments in the next frame to existing segment tMraactkcsh wseigtmh a greedy algorithm. [sent-98, score-0.941]
38 • For long enough segment tracks, perform composite sFtoartis ltoincagl einnfoeurgehnc see g(mSeecn. [sent-104, score-0.594]
39 Our ability to simultaneously track hundreds of segments comes from the adoption of the regression-to-overlap framework and casting the problem as multi-output regularized least squares regression. [sent-106, score-0.602]
40 Importantly, with the overlap as targets, different segment tracks can now train on the same set of training examples. [sent-110, score-0.867]
41 In consequence, adding more segment tracks adds very little to the training/testing time, unless the number of tracks exceed the feature dimension. [sent-113, score-1.085]
42 By storing and updating only these two matrices, we can learn the optimal appearance models of all the segment tracks simultaneously. [sent-114, score-0.865]
43 Figure-Ground Segmentation with SpatialTemporal Boundaries A pool of figure-ground segments is generated for each frame by a parametric min-cut [17] figure-ground segmentation algorithm such as [11, 14]. [sent-117, score-0.575]
44 The segments are invariant to internal edges since their sizes are controlled by λ and pairwise losses are only counted at the boundaries (when xi xj). [sent-125, score-0.403]
45 Compounded with a grid-based enumeration of foreground seed pixels, such a figure-ground segmentation approach can generate several hundreds of segments per image that covers full objects and parts within a consistent framework. [sent-126, score-0.517]
46 In order to create more diversity, the resulting boundaries are fed to the segmentation algorithm as E(xi, xj) in 3 different ways: image boundaries only, flow boundaries only, and a 50%-50% linear combination between image boundaries and flow boundaries (Fig. [sent-128, score-0.502]
47 The resulting segment pool contains all the segments generated from the three boundary types. [sent-130, score-0.867]
48 However sometimes motion boundaries are unreliable, therefore we enumerate segments generated from different types of boundaries (image, flow, and image+flow) in the pool. [sent-133, score-0.442]
49 Suppose one segment overlaps 50% with the other segment, then they likely share about × 50% similar SIFT feature points. [sent-146, score-0.53]
50 For segment Ati, denote its feature vector after d-dimensional RF mapping as Xti, and denote Xt = [X? [sent-157, score-0.491]
51 Suppose tuthreere m are n segment tracks, each is represented at frame t by a matching segment Atj ,j = 1, . [sent-164, score-1.131]
52 An nt n overlap matrix Vt is computed between all segments i nn t ohev frrlaampe m aantrdi xth Ve matching segments. [sent-168, score-0.463]
53 Wt is now the learned model for all the segment tracks. [sent-187, score-0.491]
54 Given a new segment At+1,i in frame t + 1, Xt+1,iWt predicts its overlap with hypothetical ground truth segments corresponding to the objects represented by all the segment tracks. [sent-188, score-1.488]
55 Greedy Matching To match segments and eliminate redundant segment tracks rapidly, a greedy matching algorithm is proposed to extend segment tracks to new frames. [sent-195, score-1.967]
56 Suppose we have segment tracks represented by the weight matrix Wt and need to find matching segments for all tracks in frame It+1. [sent-196, score-1.526]
57 , where Vˆt+1 Vˆt+1 [Vˆ1, Vˆn] Vˆj is the prediction vector for segment track Tj on all the segments in frame t + 1. [sent-201, score-1.059]
58 For each segment track Tj, we first threshold with crude motion cues (e. [sent-202, score-0.699]
59 Among all segments that satisfy the motion threshold, we find k = arg max so that Atk is the segment with the best predicted overlap sjk = for the track Tj. [sent-205, score-1.106]
60 If the same segment Aj is matched to multiple tracks, then only the track with the highest score arg maxj sjk is retained (Fig. [sent-206, score-0.699]
61 This simple greedy procedure serves as a nonmaximum suppression (NMS) process to reduce the number of segment tracks. [sent-208, score-0.545]
62 Importantly, greedy matching retains an order of magnitude fewer segment tracks than Hungarian because of the NMS effect. [sent-211, score-0.887]
63 At each frame, NMS is performed among the tracks that match to the same segment and the low-scoring tracks (red) are stopped. [sent-225, score-1.085]
64 After matching, each surviving segment track is updated with a new segment at time t 1. [sent-226, score-1.18]
65 Then, all the segments at time t + 1 are added to the training set, with the target output computed as the overlaps between the segment and the matching segment of each segment track. [sent-227, score-1.873]
66 We start a new track for each segment that has not been matched to any track. [sent-229, score-0.693]
67 In the experiments we assume all the objects are present from frame 1, therefore we only start segment tracks in the first 5 frames to strike a balance between speed and robustness to missing segmentations in the first few frames. [sent-230, score-1.038]
68 Refinement ference using Composite Statistical In- It would be too optimistic to assume that a perfect segment is always present in the initial segment pool. [sent-232, score-0.982]
69 Therefore, in this section we propose an approach to refine the segments in each frame given the learned segment tracks. [sent-234, score-0.887]
70 Composite statistical inference (CSI) [22] is a recent inference approach designed to perform inference using predictions on segment statistics. [sent-236, score-0.693]
71 In CSI, superpixels are obtained from multiple intersections on the candidate segments, defined as the crudest superpixel partition of the image, so that each superpixel either completely belongs to a segment or stays completely outside. [sent-239, score-0.8]
72 Then, real-valued superpixel statistics are defined so that the segment statistics are computable from them. [sent-240, score-0.604]
73 This means that there exist a formula to compute the segment statistics given the superpixel statistics. [sent-241, score-0.604]
74 With these links, one can maximize the composite likelihood of the noisy predictions on segment statistics to recover the unknown superpixel statistics. [sent-242, score-0.74]
75 While [22] deals with semantic segmentation, this paper extends the CSI approach to segment tracking. [sent-244, score-0.491]
76 Second, we introduce temporal consistency terms to connect superpixels in adjacent frames, which leads to segment tracks that deform more smoothly over time. [sent-249, score-0.978]
77 , T, with Ft being the ground truth segment at frame t. [sent-253, score-0.595]
78 Now suppose the final appearance model for one segment track is WT, we use it to predict = XtiWT for all segments Ati in all frames t = 1, . [sent-267, score-1.136]
79 j Putting everything together, we solve the joint optimization problem on the entire segment track: mθin ? [sent-310, score-0.491]
80 The segment with the best predicted overlap in each frame is used as a natural initialization for θ. [sent-348, score-0.674]
81 After obtaining θ, we adopt the following procedure in [22] in order to output the optimal segment for each frame given θ: • Sort all θ in descending order. [sent-349, score-0.595]
82 • From the start of the sorted list, include superpixel into tFhreo mfin tahle segment one-by-one ta,n ind compute tehrpei overlap V of the current segment using formula (5) from θ. [sent-351, score-1.204]
83 • Stop when V > superpixels 1to j 1−θjθj, − and output the segment with 1θ in the sorted list. [sent-352, score-0.574]
84 The CPMC algorithm [11] is used to compute the segment proposals. [sent-380, score-0.491]
85 In the result tables, SPT refers to the online segment tracking algorithm presented in Section 3 without refinement. [sent-390, score-0.615]
86 SPT+CSI refers to the results obtained by CSI refinement of the SPT segment tracks. [sent-391, score-0.529]
87 Among all segment tracks returned by a video segmenta2197 tion algorithm, we report the performance on the best track w. [sent-394, score-1.024]
88 The main competitors are the key segments approach by Lee and Grauman [20] which uses multiple segment proposals followed by an spatial-temporal graph-cut, and Grundmann et al. [sent-400, score-0.884]
89 creates a hierarchy of segment tracks, and we report the score of the best segment track among all levels. [sent-404, score-1.154]
90 In addition, we adapt a recent tracking-by-detection approach [26] to our segment tracking problem (represented as Pairwise ([26]) in Table 2), in order to make a comparison between our long-term appearance models and tracking based on pairwise appearance similarities. [sent-405, score-0.893]
91 In our adoption we put 0 as the unary term (since we do not have detectors) and the similarity computed by the exponential χ2 kernel on our feature descriptors as the pairwise terms connecting segments in adjacent frames. [sent-407, score-0.413]
92 CPMC Best represents the average score for the best CPMC segment in each frame, respectively. [sent-412, score-0.491]
93 SPT is able to reduce the over 1, 000 segments in each frame from CPMC down to about 60 segment tracks while still capturing most of the objects. [sent-416, score-1.184]
94 Overlap of the best segment from each algorithm on SegTrack v2. [sent-419, score-0.491]
95 Conclusion In this paper we present a new unsupervised video segmentation approach by tracking a pool of holistic, figureground segments on each frame, generated by a multiple figure-ground segmentation algorithm. [sent-428, score-0.869]
96 Long-term appearance models are learnt using a regression-to-overlap framework on many segment tracks initialized from all the segment proposals in the pool. [sent-429, score-1.529]
97 By using the same training examples for many segment tracks, we are able to track 2198 frame in the Dri ft ing Car video. [sent-430, score-0.799]
98 The segment generation and feature computation steps are still very slow at the moment, which we aim to improve in future work. [sent-431, score-0.491]
99 Note in frames other than the first one, training and testing time scales linearly with the number of frames a segment can start on, but overlap computation takes shorter after pruning the targets. [sent-433, score-0.71]
100 Besides, an algorithm based on composite statistical inference is proposed to refine the segment tracks using the learnt appearance models as high-order potentials, and shown to be efficient while able to improve the appearance and temporal consistency in many sequences. [sent-436, score-1.222]
wordName wordTfidf (topN-words)
[('segment', 0.491), ('stk', 0.322), ('tracks', 0.297), ('segments', 0.292), ('csi', 0.249), ('segtrack', 0.185), ('spt', 0.181), ('track', 0.172), ('tracking', 0.124), ('segmentation', 0.119), ('superpixel', 0.113), ('frame', 0.104), ('composite', 0.103), ('proposals', 0.101), ('cpmc', 0.099), ('superpixels', 0.083), ('overlap', 0.079), ('appearance', 0.077), ('penguin', 0.074), ('video', 0.064), ('frog', 0.062), ('bmx', 0.06), ('paradi', 0.06), ('pool', 0.06), ('ati', 0.058), ('boundaries', 0.057), ('frames', 0.055), ('xi', 0.054), ('greedy', 0.054), ('rf', 0.054), ('worm', 0.053), ('temporal', 0.053), ('suppose', 0.049), ('hundreds', 0.049), ('flow', 0.049), ('carreira', 0.049), ('figureground', 0.047), ('nt', 0.047), ('sequences', 0.046), ('xt', 0.046), ('matching', 0.045), ('inference', 0.045), ('learnt', 0.045), ('ht', 0.045), ('unsupervised', 0.044), ('wt', 0.042), ('pools', 0.041), ('dri', 0.041), ('cheet', 0.04), ('hummingbi', 0.04), ('recombines', 0.04), ('wkbjst', 0.04), ('wkfjst', 0.04), ('xj', 0.04), ('overlaps', 0.039), ('refinement', 0.038), ('tk', 0.038), ('exponential', 0.038), ('statistic', 0.038), ('nms', 0.037), ('motion', 0.036), ('monkeydog', 0.036), ('rift', 0.036), ('sjk', 0.036), ('holistic', 0.036), ('statistical', 0.034), ('grundmann', 0.034), ('predictions', 0.033), ('lebanon', 0.033), ('ft', 0.032), ('car', 0.032), ('ct', 0.031), ('regression', 0.031), ('squares', 0.031), ('regularized', 0.031), ('drift', 0.031), ('objects', 0.031), ('adjacent', 0.03), ('saliency', 0.03), ('start', 0.03), ('cholesky', 0.03), ('hungarian', 0.03), ('strike', 0.03), ('besides', 0.029), ('optical', 0.029), ('bi', 0.028), ('adoption', 0.027), ('initialized', 0.027), ('costly', 0.027), ('links', 0.027), ('incrementally', 0.026), ('kernel', 0.026), ('foreground', 0.026), ('updated', 0.026), ('breaks', 0.025), ('deform', 0.024), ('target', 0.024), ('boundary', 0.024), ('imperfect', 0.024), ('metric', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
2 0.34027028 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
3 0.3008337 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
4 0.226467 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
5 0.19808699 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
6 0.18404968 57 iccv-2013-BOLD Features to Detect Texture-less Objects
7 0.18239543 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
8 0.17076124 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context
9 0.16332778 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
10 0.16327663 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
11 0.15982096 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
12 0.15736738 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
13 0.15342925 414 iccv-2013-Temporally Consistent Superpixels
14 0.15038384 282 iccv-2013-Multi-view Object Segmentation in Space and Time
15 0.14596026 150 iccv-2013-Exemplar Cut
16 0.14135443 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
17 0.12872888 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
18 0.12682535 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
19 0.12641621 74 iccv-2013-Co-segmentation by Composition
20 0.12514891 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
topicId topicWeight
[(0, 0.261), (1, -0.062), (2, 0.133), (3, 0.127), (4, 0.085), (5, 0.028), (6, -0.12), (7, 0.157), (8, 0.006), (9, -0.011), (10, -0.043), (11, 0.079), (12, 0.132), (13, 0.058), (14, -0.079), (15, -0.035), (16, -0.043), (17, -0.02), (18, -0.083), (19, -0.098), (20, 0.087), (21, -0.101), (22, -0.037), (23, -0.092), (24, -0.013), (25, -0.044), (26, -0.029), (27, -0.022), (28, -0.148), (29, 0.014), (30, -0.067), (31, 0.003), (32, -0.128), (33, -0.078), (34, -0.158), (35, 0.22), (36, 0.074), (37, -0.008), (38, 0.085), (39, -0.113), (40, 0.088), (41, 0.154), (42, -0.01), (43, -0.038), (44, -0.058), (45, 0.119), (46, 0.008), (47, 0.05), (48, -0.077), (49, 0.115)]
simIndex simValue paperId paperTitle
same-paper 1 0.97704244 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
2 0.73601514 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
Author: Shahriar Shariat, Vladimir Pavlovic
Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.
3 0.71610039 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
4 0.70804518 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
5 0.67231011 57 iccv-2013-BOLD Features to Detect Texture-less Objects
Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.
6 0.6585331 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
7 0.6391533 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
8 0.57942462 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
9 0.56870872 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
10 0.56129688 74 iccv-2013-Co-segmentation by Composition
11 0.551278 379 iccv-2013-Semantic Segmentation without Annotating Segments
12 0.54238361 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
13 0.53493726 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
14 0.52741998 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
15 0.52552253 150 iccv-2013-Exemplar Cut
16 0.51089716 317 iccv-2013-Piecewise Rigid Scene Flow
17 0.49964148 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
18 0.49914408 186 iccv-2013-GrabCut in One Cut
19 0.4894973 414 iccv-2013-Temporally Consistent Superpixels
20 0.48783746 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
topicId topicWeight
[(2, 0.056), (7, 0.016), (12, 0.013), (26, 0.106), (31, 0.045), (34, 0.024), (40, 0.03), (42, 0.079), (48, 0.011), (57, 0.129), (64, 0.167), (73, 0.033), (89, 0.171), (98, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.89771086 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
2 0.87716877 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
3 0.87684751 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
Author: Kuan-Chuan Peng, Tsuhan Chen
Abstract: Most sky models only describe the cloudiness ofthe overall sky by a single category or parameter such as sky index, which does not account for the distribution of the clouds across the sky. To capture variable cloudiness, we extend the concept of sky index to a random field indicating the level of cloudiness of each sky pixel in our proposed sky representation based on the Igawa sky model. We formulate the problem of solving the sky index of every sky pixel as a labeling problem, where an approximate solution can be efficiently found. Experimental results show that our proposed sky model has better expressiveness, stability with respect to variation in camera parameters, and geo-location estimation in outdoor images compared to the uniform sky index model. Potential applications of our proposed sky model include sky image rendering, where sky images can be generated with an arbitrary cloud distribution at any time and any location, previously impossible with traditional sky models.
4 0.87350476 88 iccv-2013-Constant Time Weighted Median Filtering for Stereo Matching and Beyond
Author: Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu
Abstract: Despite the continuous advances in local stereo matching for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for disparity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggregation. We also develop the first constant time algorithmfor the previously time-consuming weighted median filter. This makes the simple combination “box aggregation + weighted median ” an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering unleashes its potential in other applications that were hampered by high complexities. We show its superiority in various applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.
5 0.87029409 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
6 0.86941117 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
7 0.86851698 166 iccv-2013-Finding Actors and Actions in Movies
8 0.8675943 441 iccv-2013-Video Motion for Every Visible Point
9 0.85568821 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
10 0.85314041 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
11 0.84949911 86 iccv-2013-Concurrent Action Detection with Structural Prediction
12 0.84911299 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces
13 0.84469932 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
14 0.83984077 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
15 0.83768845 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
16 0.83673441 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
17 0.82589549 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
18 0.82587206 338 iccv-2013-Randomized Ensemble Tracking
19 0.82271671 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
20 0.82033288 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition