iccv iccv2013 iccv2013-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. [sent-7, score-1.006]
2 However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. [sent-8, score-0.436]
3 Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. [sent-10, score-0.472]
4 Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. [sent-11, score-1.262]
5 We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. [sent-12, score-0.371]
6 Introduction Motion segmentation in stationary camera videos is relatively straightforward and a pixelwise background model can accurately classify pixels as background or foreground. [sent-18, score-0.931]
7 While background segmentation for stationary cameras can be estimated accurately, separating the nonmoving objects from moving ones when the camera is moving is significantly more challenging. [sent-20, score-0.72]
8 Since the camera’s motion causes most image pixels to move, pixelwise mod- Figure 1. [sent-21, score-0.373]
9 (b) Visualization of the ground truth optical flow vectors (using code from [ 19]). [sent-25, score-0.437]
10 The optical flow vectors and magnitudes on the trees depend on the distance of the trees from the camera. [sent-28, score-0.494]
11 The orientations are not depth-dependent and can much more reliably predict that all the trees are part of the coherently moving background entity. [sent-29, score-0.483]
12 A common theme in moving camera motion segmentation is to use image plane motion (optical flow) or trajectories as a surrogate for real-world object motion. [sent-31, score-0.85]
13 Image plane motion can be used directly as a cue for clustering [ 17, 15, 1, 9, 5, 13] or to compensate for the camera motion so that the pixelwise model from the previous frame can be adjusted in order to remain accurate [7, 6, 14]. [sent-32, score-0.734]
14 The major drawback of using optical flow is that an object’s projected motion on the image plane depends on the object’s distance from the camera. [sent-33, score-0.647]
15 Objects that have the same real-world motion can have different optical flows depending on their depth. [sent-34, score-0.473]
16 This can cause a clustering algorithm to label two background objects at different depths as two separate objects although they both have zero motion 11557777 in the real-world. [sent-35, score-0.564]
17 For example, in Figure 1, the optical flow vectors separate the forest background into many smaller tree segments. [sent-37, score-0.675]
18 Post-processing is required to merge smaller segments into one background cluster. [sent-38, score-0.299]
19 If the number of distinct background layers is known, mixture modeling of the background motion is another solution. [sent-40, score-0.659]
20 Our goal is to segment the scene into coherent regions based on the real-world motion of the objects in it. [sent-42, score-0.321]
21 This can be challenging since the information about 3-D motion in the scene is only available in the form of the optical flow field. [sent-43, score-0.647]
22 Our solution is based on the well-known property that for translational camera motion, while optical flow magnitudes and vectors depend on the depth of the object in the scene, the orientations of the optical flow vectors do not. [sent-44, score-1.261]
23 Figure 1 is an example that shows that the optical flow orientations are reliable indicators of real-world motion, much more so than the flow vectors or magnitudes. [sent-45, score-0.796]
24 Assuming only translational motions in the scene, given the motion parameters of the objects and knowledge about which pixels belong to each object, it is straightforward to predict the orientations at each pixel exactly. [sent-46, score-0.644]
25 Our problem is the converse: Given the observed optical flow orientations at each pixel, estimate the motion parameters and pixel labels. [sent-48, score-0.926]
26 Since multiple motions (one camera motion and possibly other independent object motions) are possible, we use a mixture model to determine which motions are present in the scene and which pixels belong to each motion. [sent-50, score-0.59]
27 A similar system involving optical flow magnitudes is much more complicated because in addition to estimating the motion parameters, it would be required to determine the object depth at each pixel. [sent-52, score-0.788]
28 Our system is capable of segmenting background objects at different depths into one segment and identifying the various regions that correspond to coherently moving foreground segments. [sent-56, score-0.695]
29 Although the optical flow orientations are effective in many scenarios, they are not always reliable. [sent-57, score-0.582]
30 Also, a foreground object that moves in a direction consistent with the flow orientations due to the camera’s motion will go undetected until it changes its motion direction. [sent-59, score-0.946]
31 Earlier approaches to motion segmentation with a moving camera relied on motion compensation [7, 6, 14] after estimating the camera’s motion as a 2-D affine transformation or a homography. [sent-61, score-0.993]
32 More recent techniques have performed segmentation by clustering the trajectory information from multiple frames [15, 1, 5, 13]. [sent-63, score-0.356]
33 [15] use a factorization method to find the bases for the background trajectories and label outlier trajectories as foreground. [sent-65, score-0.316]
34 Brox and Malik [1] segment trajectories by computing the pairwise distances between all trajectories and finding a low-dimensional embedding using spectral clustering. [sent-67, score-0.288]
35 Elqursh and Elgammal [5] proposed an online extension of spectral clustering by considering trajectories from 5 frames at a time. [sent-70, score-0.282]
36 Because they rely on distance between optical flow vectors, these spectral methods are not guaranteed to group all the background pixels into one cluster. [sent-71, score-0.759]
37 To obtain the complete background as one segment, a post-processing merging step is required where segments with similar motions are merged [1, 13]. [sent-72, score-0.354]
38 Any trajectory that is not well explained by the mixture of Gaussians model is assumed to be a foreground trajectory. [sent-75, score-0.313]
39 They use a Bayesian filtering framework that combines block-based color appearance models with separate motion models for the background and foreground to estimate the labels at each pixel. [sent-79, score-0.622]
40 In comparison to the above methods, we use motion information only from two frames at a time and do not require the use of trajectory information from multiple frames. [sent-83, score-0.346]
41 These tracking systems require an initial human-labeled foreground object while our goal is to build a foreground-background segmentation algorithm without any human intervention. [sent-92, score-0.368]
42 [10] detect object-like segments called key-segments in the image, hypothesize which segments are more likely to be foreground objects, and finally use a spatio-temporal graph to perform segmentation. [sent-94, score-0.291]
43 Earlier background segmentation methods report results only on 3 or 4 out of 26 videos from the Hopkins segmentation data set [ 1]. [sent-97, score-0.616]
44 In addition to all 26 videos from this set, we also include results from the SegTrack motion segmentation data set [2]. [sent-98, score-0.483]
45 Although good segmentation results are achieved on these data sets, these videos have few cases of depth disparity in the background. [sent-99, score-0.351]
46 To the best of our knowledge, this is the first work to report moving background segmentation results on such a large number of videos spanning different scenarios. [sent-101, score-0.559]
47 Segmentation using optical flow orientations Given a camera’s translation t = (tx , ty, tz), the resulting optical flows vx and vy in the x and y image dimensions Figure 2. [sent-105, score-0.968]
48 A mixture model for segmentation based on optical flow orientations. [sent-110, score-0.683]
49 The optical flow orientations, F(t, x, y) = arctan(tz y − ty f,tz x − tx f) , (2) are thus independent of the depth Z of the points. [sent-114, score-0.602]
50 Figure 2 shows the optical flow orientations for a −feπw,π d]. [sent-116, score-0.582]
51 We call the 2-D matrix of optical flow orientations at each pixel the flow orientation field (FOF). [sent-119, score-0.946]
52 In the probabilistic model given in Figure 3, the orientation values returned by an optical flow estimation algorithm [ 19] are the observed variables and the labels for each pixel 11557799 are latent. [sent-120, score-0.643]
53 At pixel number i, whose location is given by xi = (xi, yi), we have an observed optical flow orientation ai and a label li that represents which segment the pixel belongs to. [sent-121, score-0.873]
54 Each segment k is associated with a motion parameter tuple Φk = (tkx , tyk , tzk) representing the translation along x, y, and z directions respectively. [sent-122, score-0.382]
55 For a given motion parameter tuple t, denote the resulting flow orientation field at pixel location x to be F(t, x), which is computed using Equation 2. [sent-127, score-0.663]
56 The last equation means that given the label li = k for a pixel at location xi and motion parameter Φk = φk, the observed orientation ai is a Gaussian random variable whose mean is F(φk , xi). [sent-134, score-0.495]
57 Gradient descent for largest component Improvements can be made to the results by finding a better fit for the largest segment’s motion than provided by the relatively coarse initial sampling of library motion parameters. [sent-165, score-0.543]
58 With the motion parameters corresponding to the largest segment as the starting point, gradient descent is used to find the motion parameters that result in an FOF with minimum average L1 distance to the observed orientations. [sent-167, score-0.589]
59 The resulting minimum motion parameter tuple is added as an additional motion parameter to the set of library motions. [sent-169, score-0.512]
60 This process helps in the proper segmentation of observed background orientation patterns that are not well explained by any of the initial set of motions. [sent-170, score-0.505]
61 Handling pixels with near-zero motion One of the implications of using the orientations is that the orientation is not defined for pixels that do not move. [sent-173, score-0.545]
62 To account for this possibility, pixels that have optical flow component magnitudes less than a threshold Tf (typically 0. [sent-175, score-0.553]
63 Segmentation comparisons The proposed FOF segmentation is compared to existing motion segmentation methods. [sent-179, score-0.532]
64 Spectral clustering of trajectory information [1, 5, 13] has been shown to be use11558800 ful for motion segmentation. [sent-180, score-0.33]
65 Further, their method uses a merging step that joins segments that have similar motion parameters. [sent-184, score-0.313]
66 Note that FOF segmentation uses only flow information from two consecutive frames and performs no post-processing to merge segments. [sent-185, score-0.505]
67 FOF segmentation, despite only using information from two frames and no merging procedure, successfully segments the background in most examples. [sent-187, score-0.36]
68 Here spectral clustering with a subsequent merge step fails and the background is over-segmented depending on depth. [sent-189, score-0.377]
69 In order to classify each pixel as background or foreground, the component with the largest number of pixels is considered as the background component. [sent-193, score-0.541]
70 In addition to using the FOF-based segmentation, we maintain a color appearance model for the background and foreground at each pixel [15]. [sent-194, score-0.49]
71 A history of pixel data samples from the previous frames is maintained and after clas- × sification of pixels in each new frame, new data samples are added to the history. [sent-195, score-0.314]
72 To account for motion, the maintained history at each pixel is motion compensated and moved to a new location as predicted by the optical flow in the current frame. [sent-196, score-0.867]
73 Kernel density estimation (KDE) is used with the data samples to obtain the color likelihoods for the background and foreground processes. [sent-197, score-0.464]
74 Let btx−1 be the observed background color at pixel location x in the previous frame. [sent-202, score-0.419]
75 Using a Gaussian kernel with covariance ΣCB in the color dimensions, our KDE background likelihood for the color vector c in the video frame numbered t is given by Pxt(c|bg;ΣCB,ΣSB) =Z1Δ? [sent-203, score-0.399]
76 The covariance matrix ΣCB controls the amount of variation allowed in the color values of the background pixels. [sent-210, score-0.283]
77 Mixing a uniform distribution component In cases when the background has been occluded in all the previous T frames, there are no reliable history pixels for the background. [sent-227, score-0.339]
78 To allow the system to recover from such a situation, a uniform color distribution is mixed into the color likelihood: Pˆxt(c|bg) = γbxg Pxt(c|bg; ΣB) + (1− γxbg) U, (8) where U is a uniform distribution over all possible color values. [sent-228, score-0.309]
79 Despite the use of a post-processing merge step in the implementation, in many images, spectral clustering is not certain about some background keypoints (white squares) and in cases with large depth disparity, the background is broken into smaller sections. [sent-233, score-0.636]
80 The implication of this mixture prPoportioPn is thBat if the history pixels are highly con- × fident background pixels, then no uniform distribution is added to the likelihood. [sent-238, score-0.424]
81 The background posterior probability at each pixel in the previous frame is motioncompensated according to optical flow and used as the pixelwise background prior for the current frame. [sent-245, score-1.107]
82 The posterior probability of background in the current frame can now be computed by combining the color likelihoods, the segmentation label likelihoods from the graphical model, and the prior: Pxt(bg|c,lx) =? [sent-248, score-0.546]
83 (9) The use of color likelihoods and prior information helps to recover from errors in the FOF-based segmentation as we explain in the results. [sent-250, score-0.346]
84 In addition to these benchmarks, we also present results on a new set of videos that include several with complex background phenomena to highlight the strengths of the system. [sent-253, score-0.294]
85 The first benchmark is a motion segmentation data set [1], derived from the Hopkins data set [20], which consists of 26 moving camera videos. [sent-254, score-0.573]
86 The third data set,1 which we produced ourselves, is a challenging one with complex backgrounds including trees in a forest and large occluding objects in front of the moving foreground object. [sent-257, score-0.365]
87 This data set is extremely challenging for traditional motion segmentation algorithms. [sent-258, score-0.371]
88 We present results of FOF segmentation as well as segmentation that combines FOF with color appearance and prior models. [sent-260, score-0.421]
89 Among the SegTrack data set, three videos (marked with *) have multiple moving objects, but the ground truth intended for tracking analysis marks only one primary object as the foreground, causing our system to appear less accurate. [sent-268, score-0.298]
90 In videos where there is rotation in many frames (forest, drive, store), FOF segmentation is less accurate. [sent-271, score-0.348]
91 Despite these challenges in the complex background videos, our system performs segmentation with reasonable accuracy across all three data sets. [sent-274, score-0.385]
92 However, segmentation of each frame is performed by considering trajectory information from the current frame as well as four future frames. [sent-280, score-0.328]
93 FOF segmentation is a frame-to-frame segmentation method and hence solving a different problem with the aim of achieving real-time processing of frames. [sent-281, score-0.322]
94 Discussion We have presented a system for motion segmentation by using optical flow orientations. [sent-291, score-0.85]
95 The use of optical flow orientations avoids the over-segmentation of the scene into depth-dependent entities. [sent-292, score-0.582]
96 The columns are the original image, the observed FOF, FOF segmentation results, and results from combining FOF with color and prior models, respectively. [sent-308, score-0.316]
97 When the observed FOF cannot distinguish between the foreground and the background, FOF segmentation is not accurate. [sent-310, score-0.384]
98 Occasionally, the largest detected segment is the foreground object, which gets labeled as background (row 3 in (c)). [sent-314, score-0.462]
99 naturalistic In ECCV, A open source movie for optical flow evaluation. [sent-324, score-0.437]
100 A benchmark for the comparison of 3-d motion segmentation algorithms. [sent-439, score-0.371]
wordName wordTfidf (topN-words)
[('fof', 0.463), ('pxt', 0.235), ('optical', 0.223), ('flow', 0.214), ('motion', 0.21), ('background', 0.182), ('foreground', 0.167), ('segmentation', 0.161), ('orientations', 0.145), ('btx', 0.13), ('bg', 0.127), ('videos', 0.112), ('tz', 0.111), ('sb', 0.108), ('elqursh', 0.107), ('fofs', 0.104), ('moving', 0.104), ('pixelwise', 0.104), ('camera', 0.098), ('elgammal', 0.096), ('cb', 0.088), ('kwak', 0.086), ('mixture', 0.085), ('spectral', 0.081), ('hopkins', 0.081), ('bsum', 0.078), ('fsum', 0.078), ('narayana', 0.078), ('pixel', 0.078), ('frames', 0.075), ('segment', 0.073), ('orientation', 0.072), ('motions', 0.069), ('bj', 0.068), ('trajectories', 0.067), ('color', 0.063), ('segments', 0.062), ('ty', 0.062), ('tx', 0.061), ('trajectory', 0.061), ('clustering', 0.059), ('gibbs', 0.059), ('pixels', 0.059), ('history', 0.059), ('magnitudes', 0.057), ('forest', 0.056), ('observed', 0.056), ('ochs', 0.055), ('segtrack', 0.055), ('merge', 0.055), ('frame', 0.053), ('bxg', 0.052), ('bxt', 0.052), ('coherently', 0.052), ('complexbackground', 0.052), ('hanson', 0.052), ('rousso', 0.052), ('kde', 0.052), ('likelihoods', 0.052), ('translation', 0.05), ('tuple', 0.049), ('kbg', 0.046), ('translational', 0.045), ('library', 0.043), ('brox', 0.043), ('arctan', 0.043), ('chockalingam', 0.043), ('fj', 0.043), ('maintained', 0.043), ('system', 0.042), ('depth', 0.042), ('dirichlet', 0.042), ('auxiliary', 0.041), ('merging', 0.041), ('sheikh', 0.041), ('location', 0.04), ('tracking', 0.04), ('flows', 0.04), ('largest', 0.04), ('ai', 0.039), ('uniform', 0.039), ('covariance', 0.038), ('objects', 0.038), ('depths', 0.037), ('vy', 0.037), ('concentration', 0.037), ('occasional', 0.037), ('labeling', 0.036), ('prior', 0.036), ('tracked', 0.036), ('disparity', 0.036), ('vx', 0.036), ('posterior', 0.035), ('keypoints', 0.035), ('prone', 0.035), ('segmented', 0.034), ('gaussians', 0.034), ('helps', 0.034), ('stationary', 0.033), ('dir', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999839 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
2 0.29511309 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
3 0.28223455 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
4 0.23160045 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
Author: Ali Elqursh, Ahmed Elgammal
Abstract: The vast majority of work on motion segmentation adopts the affine camera model due to its simplicity. Under the affine model, the motion segmentation problem becomes that of subspace separation. Due to this assumption, such methods are mainly offline and exhibit poor performance when the assumption is not satisfied. This is made evident in state-of-the-art methods that relax this assumption by using piecewise affine spaces and spectral clustering techniques to achieve better results. In this paper, we formulate the problem of motion segmentation as that of manifold separation. We then show how label propagation can be used in an online framework to achieve manifold separation. The performance of our framework is evaluated on a benchmark dataset and achieves competitive performance while being online.
5 0.23120105 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
6 0.20848542 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
7 0.19790481 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
8 0.19057643 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.17866465 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
10 0.17604415 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
11 0.16609253 263 iccv-2013-Measuring Flow Complexity in Videos
12 0.16332778 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
13 0.15986212 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
14 0.15756957 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
15 0.14847548 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
16 0.14829822 143 iccv-2013-Estimating Human Pose with Flowing Puppets
17 0.14480186 82 iccv-2013-Compensating for Motion during Direct-Global Separation
18 0.14138426 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
19 0.13989253 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering
20 0.13873091 282 iccv-2013-Multi-view Object Segmentation in Space and Time
topicId topicWeight
[(0, 0.277), (1, -0.162), (2, 0.078), (3, 0.195), (4, 0.019), (5, 0.089), (6, -0.085), (7, 0.132), (8, 0.134), (9, 0.072), (10, 0.028), (11, 0.086), (12, 0.183), (13, -0.055), (14, -0.102), (15, -0.016), (16, -0.088), (17, 0.002), (18, 0.037), (19, 0.02), (20, 0.066), (21, -0.057), (22, 0.067), (23, 0.069), (24, 0.002), (25, 0.013), (26, 0.103), (27, 0.026), (28, 0.029), (29, -0.012), (30, -0.024), (31, 0.04), (32, -0.062), (33, -0.031), (34, -0.161), (35, 0.055), (36, -0.025), (37, 0.089), (38, -0.073), (39, 0.015), (40, -0.033), (41, 0.005), (42, -0.002), (43, -0.026), (44, 0.015), (45, -0.007), (46, -0.029), (47, -0.028), (48, 0.056), (49, -0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.98110956 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
2 0.85538489 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
3 0.82566249 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
4 0.79939574 263 iccv-2013-Measuring Flow Complexity in Videos
Author: Saad Ali
Abstract: In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a ‘braid’ based representation. The mapping is based on the observation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among particles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories becomes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathematical tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and object densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.
5 0.71731907 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
6 0.69842875 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
7 0.69337499 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
8 0.69222397 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
9 0.67187059 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
10 0.66986012 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
11 0.65584534 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
12 0.65558934 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
13 0.65151852 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
14 0.64619666 145 iccv-2013-Estimating the Material Properties of Fabric from Video
15 0.63745481 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
16 0.63591814 82 iccv-2013-Compensating for Motion during Direct-Global Separation
17 0.61333793 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
18 0.60715461 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
19 0.59185547 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
20 0.59121096 143 iccv-2013-Estimating Human Pose with Flowing Puppets
topicId topicWeight
[(2, 0.062), (6, 0.017), (7, 0.017), (26, 0.123), (31, 0.059), (34, 0.012), (40, 0.026), (42, 0.087), (61, 0.177), (64, 0.053), (73, 0.062), (89, 0.218), (98, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.87113607 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
2 0.86361057 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.
3 0.85872382 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
4 0.85069019 131 iccv-2013-EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory
Author: Victor Fragoso, Pradeep Sen, Sergio Rodriguez, Matthew Turk
Abstract: Algorithms based on RANSAC that estimate models usingfeature correspondences between images can slow down tremendously when the percentage of correct correspondences (inliers) is small. In this paper, we present a probabilistic parametric model that allows us to assign confidence values for each matching correspondence and therefore accelerates the generation of hypothesis models for RANSAC under these conditions. Our framework leverages Extreme Value Theory to accurately model the statistics of matching scores produced by a nearest-neighbor feature matcher. Using a new algorithm based on this model, we are able to estimate accurate hypotheses with RANSAC at low inlier ratios significantly faster than previous stateof-the-art approaches, while still performing comparably when the number ofinliers is large. Wepresent results ofhomography and fundamental matrix estimation experiments for both SIFT and SURF matches that demonstrate that our method leads to accurate and fast model estimations.
5 0.83759707 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
Author: Yuandong Tian, Srinivasa G. Narasimhan
Abstract: Real-world surfaces such as clothing, water and human body deform in complex ways. The image distortions observed are high-dimensional and non-linear, making it hard to estimate these deformations accurately. The recent datadriven descent approach [17] applies Nearest Neighbor estimators iteratively on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure for the Nearest Neighbor estimators, each of which can have only a local image support. We demonstrate in both theory and practice that this algorithm has several advantages over the nonhierarchical version: it guarantees global optimality with significantly fewer training samples, is several orders faster, provides a metric to decide whether a given image is “hard” (or “easy ”) requiring more (or less) samples, and can handle more complex scenes that include both global motion and local deformation. The proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
6 0.83123338 414 iccv-2013-Temporally Consistent Superpixels
7 0.82942128 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
8 0.82818413 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
9 0.82745522 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
10 0.82720786 150 iccv-2013-Exemplar Cut
11 0.82670724 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
12 0.8264215 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
13 0.82577384 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
14 0.82562411 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
15 0.82538229 309 iccv-2013-Partial Enumeration and Curvature Regularization
16 0.82511044 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
17 0.82488316 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
19 0.82406348 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
20 0.82372892 396 iccv-2013-Space-Time Robust Representation for Action Recognition