cvpr cvpr2013 cvpr2013-290 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
Reference: text
sentIndex sentText sentNum sentScore
1 ch Abstract In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. [sent-3, score-0.328]
2 By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. [sent-4, score-1.436]
3 We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. [sent-7, score-1.045]
4 We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth. [sent-9, score-0.713]
5 In contrast, cameras are only playing a minor role in self-driving cars but are already commonly found in commercially-off-the-shelf (COTS) cars for driving and/or parking assistance. [sent-12, score-0.342]
6 An example is the Nissan Quasquai Around View Monitor where four cameras are mounted on the car to provide full omnidirectional view around the car. [sent-13, score-0.549]
7 Although such camera system is currently used only for driving and/or parking assistance, it offers huge potential to be the main sensor for self-driving cars without the need for major modifications. [sent-14, score-0.316]
8 A multi-camera setup can be modeled as a generalized camera as described by Pless [12]. [sent-19, score-0.4]
9 In this work, he derived the generalized epipolar constraint (GEC) and it was shown in [8] that the full relative motion can be obtained with metric scale. [sent-20, score-0.621]
10 We make use of the fact that the generalized camera that is rigidly fixed onto a car that has to adhere by the Ackermann steering principle [15] to simplify the GEC and design an efficient and robust algorithm for visual ego-motion estimation. [sent-21, score-0.744]
11 We show that two point correspondences are sufficient to estimate the generalized essential matrix with metric scale by using the Ackermann motion model (i. [sent-22, score-0.881]
12 circular motion on a plane) where there are 2 free parameters - scale and yaw angle for the relative motion between 2 consecutive frames. [sent-24, score-0.913]
13 A maximum of up to 6 solutions exists for the relative motion in both cases. [sent-26, score-0.303]
14 The small number of necessary point correspondences and solutions makes it suitable for robust estimation with RANSAC and real-time operations. [sent-27, score-0.346]
15 We use the special case with only intra-camera correspondences as the default case for our implementation since there is always more intra-camera correspondences than inter-camera correspondences. [sent-29, score-0.535]
16 We propose a practical method to retrieve the scale when the car undergoes straight motion from one additional inter-camera correspondence and the known yaw angle which essentially reverts the special case to the general case. [sent-30, score-1.109]
17 The relative motions can be concatenated together to get the full trajectory of the car. [sent-31, score-0.322]
18 Finally, we relax the Ackermann motion constraint by doing a full 6 degrees of freedom (DOF) pose-graph [11] optimization for loop-closure followed by bundle adjustment for all the poses and 3D points. [sent-33, score-0.352]
19 We verify our approach by comparing our results on a large real-world dataset collected from a generalized camera setup that consists of 4 cameras mounted on a car (see Figure 1) looking front, rear, left and right with minimal field-of-views against the GPS/INS ground truth. [sent-34, score-0.989]
20 Our main contributions can be summarized as follows: • • • A practical ego-motion estimation algorithm with metric scale from a new formulation of the generalized essential matrix using the GEC and Ackermann motion model. [sent-35, score-0.663]
21 An investigation and a practical solution to the degenerate case of straight motion with only intra-camera correspondences. [sent-37, score-0.313]
22 Related Works Our work builds on top of previous works about generalized cameras and it is also related to other works with multicamera systems that are not using the generalized camera formulation. [sent-39, score-0.763]
23 The idea of a generalized camera system where a single epipolar constraint is used to describe the relative motion of a set of cameras mounted rigidly on a single body over two different frames was first proposed by Pless [12]. [sent-40, score-1.014]
24 The main difference between a generalized camera system and a single pinhole camera is the absence of a single center of projection. [sent-41, score-0.576]
25 Pless derived a generalized essential matrix, which is a 6x6 matrix with 18 unknowns from the GEC. [sent-42, score-0.323]
26 He suggested a linear 17-point algorithm to solve for the generalized essential matrix. [sent-43, score-0.323]
27 [8] extended the work on the GEC by identifying the degenerated cases for the locally-central generalized camera setup. [sent-47, score-0.496]
28 The locally-central generalized camera refers to a configuration where the cameras only do intra-camera correspondences. [sent-48, score-0.539]
29 In contrast to the general case where the generalized camera does inter-camera correspondences, Li et al. [sent-49, score-0.4]
30 A minimal solution for the generalized essential matrix, suitable for RANSAC hypothesis generation, was proposed by Stew e´nius et al. [sent-56, score-0.439]
31 The derived minimal solution uses 6-point correspondences to solve the GEC problem. [sent-58, score-0.338]
32 And for the first time, this allows an efficient motion estimation of a generalized camera system with a robust estimator like RANSAC on a large real-world dataset. [sent-65, score-0.606]
33 developed a method where the motion and metric scale could be computed with a 2-point algorithm from a single monocular camera. [sent-72, score-0.301]
34 Similar to our algorithm for generalized camera with only intra-camera correspondences, the scale becomes 222777444755 unobservable for straight motion. [sent-74, score-0.645]
35 In contrast, we propose a practical method to retrieve the metric scale when the car is moving straight with our formulation using the generalized camera setup. [sent-75, score-0.937]
36 estimated the relative motion without the scale using the 5-point algorithm [10] in one camera, while the metric scale is retrieved from an additional point from another camera. [sent-78, score-0.504]
37 estimated the relative motions of two cameras with non-overlapping field of view by using the 5-point algorithm [10]. [sent-82, score-0.303]
38 In this section, we briefly describe the concept of the generalized camera model which is essential for understanding the remaining paper. [sent-94, score-0.499]
39 Figure 2 shows an illustration of a generalized camera setup on a car. [sent-96, score-0.4]
40 The generalized camera has a reference frame denoted by V . [sent-98, score-0.451]
41 The dependency on a single camera projection center to describe a 3D point Xj is removed by using the 6-vector Pl¨ ucker line lij = [uiTj (tCi × uij)T]T (1) which describes the light ray that connects xij and Xj . [sent-101, score-0.49]
42 This changes the point correspondences from 2 image coordinates to 2 intersecting rays and, as shown in [12], the epipolar constraint now becomes liTj,k+1? [sent-103, score-0.306]
43 EGC is the generalized essential matrix from the GEC. [sent-114, score-0.323]
44 R is the rotation matrix between the generalized camera reference frames at k and k + 1. [sent-115, score-0.441]
45 It is important to note that t is determined only up to scale from the conventional essential matrix for a single camera but the metric scale can be fully determined using the generalized camera as shown in [8]. [sent-119, score-0.891]
46 System overview for motion estimation of the generalized camera on a car. [sent-186, score-0.606]
47 Figure 3 shows the overview of the pipeline for the estimation of the relative motion between two consecutive car frames using the generalized camera. [sent-187, score-0.877]
48 Our 2-Point RANSAC algorithm always computes the relative motion based on the intra-camera correspondences. [sent-190, score-0.296]
49 In the case where the relative yaw angle θ is found to be near zero from the 2-Point RANSAC, the inter-camera point correspondences are extracted and used to compute the scale ρ from a 1-point exhaustive search. [sent-191, score-0.768]
50 Point Correspondences Our motion estimation algorithm relies on two sets of point correspondences - intra-camera and inter-camera. [sent-196, score-0.463]
51 Specifically, intra-camera point correspondences refer to correspondences which are seen by the same camera over 222777444866 ? [sent-197, score-0.655]
52 two consecutive frames and inter-camera refers to correspondences which are seen by different cameras over two consecutive frames as illustrated in Figure 4. [sent-235, score-0.579]
53 In principle, our generalized camera con- figuration on the car always allows inter-camera point correspondence. [sent-237, score-0.73]
54 This is because part of the scene from the front camera will be seen by the left and right cameras at the next frame. [sent-238, score-0.362]
55 Similarly, part of the scenes from the left and right cameras will always be seen by the rear camera in the next frame. [sent-239, score-0.447]
56 Hence, we chose to use intra-camera correspondences as the default case for our implementation and rely on one-additional inter-camera correspondences to retrieve the scale in the degenerated case when the car is moving straight. [sent-241, score-0.914]
57 Figure 5 shows the illustration of a generalized camera on a car that undergoes the Ackermann motion over 2 consecutive frames k and k + 1. [sent-247, score-1.002]
58 Specifically, the car undergoes a circular motion about the Instantaneous Center of Rotation (ICR) with the Ackermann model. [sent-248, score-0.54]
59 The radius of the circular motion goes to infinity when the car is moving straight. [sent-249, score-0.506]
60 The main objective of motion estimation is to compute the relative motion R and t between Vk and Vk+1 . [sent-250, score-0.459]
61 Following the derivation in [13], using Vk as the reference frame, it can be observed from the diagram that R =⎣⎡csoi0nsθθ −cos0isnθθ 010⎦⎤, t = ρ⎣⎡csions0ϕϕvv⎦⎤ (3) ×× where θ is the relative yaw angle and ρ is the scale of the relative translation. [sent-251, score-0.55]
62 We immediately see that the relative motion between frame Vk and Vk+1 is dependent on only 2 parameters - scale ρ and yaw angle θ. [sent-254, score-0.682]
63 We need 2 Pl¨ ucker line correspondences to solve for the 2 unknowns ρ and θ in Equation 5. [sent-291, score-0.431]
64 2θ 2θ sin2 + cos2 = α2+ β2 = 1 (9) Using the Sylvester Resultant [3] method to eliminate α = cos 2θ from the two polynomials in Equations 8 and 9, we get a 6 degrees polynomial equation in terms of β = sin 2θ which can be further reduced to a cubic polynomial by making γ = β2. [sent-294, score-0.391]
65 Finally, the yaw angle of the 2θ relative motion can be computed from β = sin and the scale ρ can be be computed from Equation 7. [sent-305, score-0.689]
66 Therefore, we adopted the intracamera point correspondences strategy in our implementation. [sent-309, score-0.335]
67 Degenerated Case: Metric Scale Computation The metric scale of the relative motion cannot be uniquely determined from the GEC when the car is moving straight i. [sent-312, score-0.851]
68 This suggests that we can make use of one additional inter-camera point correspondence to find the metric scale when (ρ = 1, θ = 0) turns out to be the solution with the highest inliers from the pure intra-camera correspondences case. [sent-326, score-0.572]
69 In practice, this can be done effectively by doing an exhaustive search through all inter-camera point correspondences for inliers. [sent-327, score-0.304]
70 The essential matrix of each individual camera can be computed from the hypotheses of the relative motion R and t between the car reference frame V and the extrinsics TCi of the camera. [sent-332, score-0.923]
71 The number of iterations m needed in RANSAC is given by m = where n is the number of correspondences neelnde(d1− to form the hypothesis, p is the probability that all selected features are inliers and υ is the probability that any selected correspondence is an inlier. [sent-333, score-0.368]
72 j) are the point correspondences from camera Ci and Ci? [sent-348, score-0.433]
73 The set C gives the intra- and inter-camera indices for all the point correspondences over frame k and k 1. [sent-352, score-0.308]
74 R and t are the relative motion of the car defined by Equation 3 which are functions of the parameters ρ and we are optimizing over. [sent-362, score-0.505]
75 Kalman Filtering We implement the Kalman filter with constant velocity prior for both the scale ρ and yaw angle to smooth out any noisy estimation from our algorithm. [sent-366, score-0.417]
76 We know that the control inputs for a car involves independent steering and linear speed. [sent-367, score-0.292]
77 This means that two independent 1D Kalman filters can be applied to smooth out the estimates for the scale ρ and yaw angle respectively. [sent-368, score-0.378]
78 Figure 6 shows examples of the relative motions of our car from the 2-point algorithm (blue line) and the smoothed estimates from the Kalman filters (green line). [sent-369, score-0.416]
79 Four cameras with fisheye lens are mounted onto the car on the front and rear of tMo/rRehs0 . [sent-373, score-0.607]
80 Example of scales ρ (a) and yaw angles θ (b) between consecutive frames after Kalman filtering. [sent-377, score-0.433]
81 The intrinsics of the cameras are calibrated with [9] and the extrinsics are provided by the car manufacturer. [sent-382, score-0.532]
82 The dataset was collected while driving the car in a loop that is about 600m around a car park. [sent-383, score-0.541]
83 Scales ρ (a) and yaw angles θ (b) between all consecutive frames from our motion estimation algorithm compare with GPS/INS ground truth. [sent-394, score-0.641]
84 Figure 8(a) and 8(b) are the plots of all the 2499 relative motions - scales and yaw angles estimated with our 2-point algorithm. [sent-395, score-0.488]
85 Figure 8(b) shows that the car is moving straight for most of the trajectory except for three major turns at around frame 900, 1300 and 2200. [sent-397, score-0.569]
86 These are the segments of the trajectory where both the scales and yaw angles are estimated solely with intra-camera point correspondences. [sent-398, score-0.474]
87 9% of the trajectory when the car is moving straight. [sent-400, score-0.407]
88 Figure 8(b) shows that the estimates for the yaw angles follow closely to the GPS/INS ground truth even without Kalman filtering. [sent-406, score-0.326]
89 The relative motions estimated with our algorithm are concatenated together to form the full trajectory of the car. [sent-408, score-0.322]
90 The blue line on Figure 9 shows the trajectory recovered from the relative motions estimated with Kalman filter- × ing. [sent-409, score-0.332]
91 Finally, we do a full bundle adjustment over the whole trajectory and all the reconstructed 3D points. [sent-412, score-0.3]
92 Note that we relaxed the Ackermann constraint and do the full 6 DOF optimization over the car poses in the loop-closure and bundle adjustment. [sent-413, score-0.387]
93 Images from the cameras come at 12 fps and this means that a real-time implementation on the car is possible if we skip every other frame. [sent-419, score-0.391]
94 Conclusion In this paper, we demonstrated visual ego-motion estimation for a car equipped with a multi-camera system with minimal field-of-views. [sent-421, score-0.409]
95 The camera system was modeled as a generalized camera and we showed that the generalized essential matrix simplifies significantly when constraining the motion to the Ackerman motion model (i. [sent-422, score-1.282]
96 We derived an analytical 2-point minimal solution for the general case with at least one intercamera correspondence and a special case with only intracamera correspondences. [sent-425, score-0.366]
97 We showed that a maximum of up to 6 solutions exists for the relative motion in both cases. [sent-426, score-0.352]
98 Top view of trajectory and 3D map points after pose-graph loop-closure and full bundle adjustment compared with GPS/INS ground truth. [sent-430, score-0.337]
99 A linear approach to motion estimation using generalized camera models. [sent-493, score-0.606]
100 Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. [sent-525, score-0.505]
wordName wordTfidf (topN-words)
[('gec', 0.311), ('car', 0.252), ('yaw', 0.24), ('generalized', 0.224), ('correspondences', 0.222), ('kalman', 0.207), ('tc', 0.192), ('ackermann', 0.192), ('camera', 0.176), ('motion', 0.167), ('ucker', 0.156), ('cameras', 0.139), ('ransac', 0.138), ('trajectory', 0.115), ('straight', 0.111), ('tci', 0.107), ('scaramuzza', 0.104), ('vk', 0.102), ('essential', 0.099), ('degenerated', 0.096), ('extrinsics', 0.092), ('bundle', 0.092), ('rear', 0.089), ('relative', 0.086), ('scale', 0.082), ('minimal', 0.081), ('pless', 0.08), ('mounted', 0.08), ('correspondence', 0.079), ('equation', 0.078), ('intracamera', 0.078), ('rcti', 0.078), ('motions', 0.078), ('undergoes', 0.074), ('pl', 0.07), ('xij', 0.07), ('fraundorfer', 0.069), ('egc', 0.069), ('consecutive', 0.068), ('inliers', 0.067), ('polynomial', 0.065), ('cars', 0.063), ('uniquely', 0.061), ('june', 0.058), ('sin', 0.058), ('nius', 0.057), ('angle', 0.056), ('odometry', 0.055), ('stew', 0.055), ('cubic', 0.054), ('line', 0.053), ('metric', 0.052), ('cots', 0.052), ('kazik', 0.052), ('posegraph', 0.052), ('tcwu', 0.052), ('unobservable', 0.052), ('uwu', 0.052), ('rigidly', 0.052), ('frame', 0.051), ('solutions', 0.05), ('adjustment', 0.05), ('epipolar', 0.049), ('refinement', 0.049), ('showed', 0.049), ('angles', 0.049), ('intrinsics', 0.049), ('special', 0.048), ('front', 0.047), ('exhaustive', 0.047), ('circular', 0.047), ('sampson', 0.046), ('rci', 0.046), ('analytical', 0.045), ('ux', 0.044), ('full', 0.043), ('always', 0.043), ('frames', 0.041), ('moving', 0.04), ('parking', 0.04), ('steering', 0.04), ('clipp', 0.04), ('siegwart', 0.04), ('estimation', 0.039), ('equipped', 0.037), ('surf', 0.037), ('cancel', 0.037), ('uij', 0.037), ('degeneracy', 0.037), ('driving', 0.037), ('ground', 0.037), ('cos', 0.036), ('omnidirectional', 0.035), ('polynomials', 0.035), ('roots', 0.035), ('scales', 0.035), ('pi', 0.035), ('solution', 0.035), ('point', 0.035), ('putting', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
2 0.19996966 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
Author: Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, Feng Wu, Yong Rui
Abstract: 3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers from noisy correspondences caused by the increasing number of target objects. To overcome this scalability bottleneck, we propose an efficient 2D-to-3D correspondence filtering approach, which combines a light-weight neighborhoodbased step with a finer-grained pairwise step to remove spurious correspondences based on 2D/3D geometric cues. On a dataset of 300 3D objects, our solution achieves ∼10 times speed improvement over the baseline, with a comparable recognition accuracy. A parallel implementation on a quad-core CPU can run at ∼3fps for 1280× 720 images.
3 0.15276864 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
Author: Fernando Flores-Mangas, Allan D. Jepson
Abstract: The problem of rigid motion segmentation of trajectory data under orthography has been long solved for nondegenerate motions in the absence of noise. But because real trajectory data often incorporates noise, outliers, motion degeneracies and motion dependencies, recently proposed motion segmentation methods resort to non-trivial representations to achieve state of the art segmentation accuracies, at the expense of a large computational cost. This paperproposes a method that dramatically reduces this cost (by two or three orders of magnitude) with minimal accuracy loss (from 98.8% achieved by the state of the art, to 96.2% achieved by our method on the standardHopkins 155 dataset). Computational efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Subsets of motion models with the best balance between prediction accuracy and model complexity are chosen from a pool of candidates, which are then used for segmentation. 1. Rigid Motion Segmentation Rigid motion segmentation (MS) consists on separating regions, features, or trajectories from a video sequence into spatio-temporally coherent subsets that correspond to independent, rigidly-moving objects in the scene (Figure 1.b or 1.f). The problem currently receives renewed attention, partly because of the extensive amount of video sources and applications that benefit from MS to perform higher level computer vision tasks, but also because the state of the art is reaching functional maturity. Motion Segmentation methods are widely diverse, but most capture only a small subset of constraints or algebraic properties from those that govern the image formation process of moving objects and their corresponding trajectories, such as the rank limit theorem [9, 10], the linear independence constraint (between trajectories from independent motions) [2, 13], the epipolar constraint [7], and the reduced rank property [11, 15, 13]. Model-selection based (a)Orignalvideoframes(b)Clas -label dtrajectories (c)Modelsup ort egion(d)Modelinlersandcontrolpoints (e)Modelresiduals (f) Segmenta ion result Figure 1: Model instantiation and segmentation. a) fth original frame, Italian Grand Prix ?c 2012 Formula 1. b) Classilanbaell feradm, trajectory Gdaratan Wd P r(rixed ?,c green, bolrumeu alnad 1 .bbl a)c Ck correspond to chassis, helmet, background and outlier classes respectively). c) Spatially-local support subset for a candidate motion in blue. d) Candidate motion model inliers in red, control points from Eq. 3) in white. e) Residuals from Eq. 11) color-coded with label data, the radial coordinate is logarithmic. f) Segmentation result. Wˆf (rif (cif methods [11, 6, 8] balance model complexity with modeling accuracy and have been successful at incorporating more of these aspects into a single formulation. For instance, in [8] most model parameters are estimated automatically from the data, including the number of independent motions and their complexity, as well as the segmentation labels (including outliers). However, because of the large number of necessary motion hypotheses that need to be instantiated, as well as the varying and potentially very large number of 222222555977 model parameters that must be estimated, the flexibility offered by this method comes at a large computational cost. Current state of the art methods follow the trend of using sparse low-dimensional subspaces to represent trajectory data. This representation is then fed into a clustering algorithm to obtain a segmentation result. A prime example of this type of method is Sparse Subspace Clustering (SSC) [3] in which each trajectory is represented as a sparse linear combination of a few other basis trajectories. The assumption is that the basis trajectories must belong to the same rigid motion as the reconstructed trajectory (or else, the reconstruction would be impossible). When the assumption is true, the sparse mixing coefficients can be interpreted as the connectivity weights of a graph (or a similarity matrix), which is then (spectral) clustered to obtain a segmentation result. At the time of publication, SSC produced segmentation results three times more accurate than the best predecessor. The practical downside, however, is the inherently large computational cost of finding the optimal sparse representation, which is at least cubic on the number of trajectories. The work of [14] also falls within the class of subspace separation algorithms. Their approach is based on clustering the principal angles (CPA) of the local subspaces associated to each trajectory and its nearest neighbors. The clustering re-weights a traditional metric of subspace affinity between principal angles. Re-weighted affinities are then used for segmentation. The approach produces segmentation results with accuracies similar to those of SSC, but the computational cost is close to 10 times bigger than SSC’s. In this work we argue that competitive segmentation results are possible using a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. The proposed method is approximately 2 or 3 orders of magnitude faster than [3] and [14] respectively, currently considered the state of the art. 1.1. Affine Motion Projective Geometry is often used to model the image motion of trajectories from rigid objects between pairs of frames. However, alternative geometric relationships that facilitate parameter computation have also been proven useful for this purpose. For instance, in perspective projection, general image motion from rigid objects can be modeled via the composition of two elements: a 2D homography, and parallax residual displacements [5]. The homography describes the motion of an arbitrary plane, and the parallax residuals account for relative depths, that are unaccounted for by the planar surface model. Under orthography, in contrast, image motion of rigid objects can be modeled via the composition of a 2D affine transformation plus epipolar residual displacements. The 2D affine transformation models the motion of an arbitrary plane, and the epipolar residuals account for relative depths. Crucially, these two components can be computed separately and incrementally, which enables an explicit mechanism to deal with motion degeneracy. In the context of 3D motion, a motion is degenerate when the trajectories originate from a planar (or linear) object, or when neither the camera nor the imaged object exercise all of their degrees of freedom, such as when the object only translates, or when the camera only rotates. These are common situations in real world video sequences. The incremental nature of the decompositions described above, facilitate the transition between degenerate motions and nondegenerate ones. Planar Model Under orthography, the projection of trajectories from a planar surface can be modeled with the affine transformation: ⎣⎡xy1c ⎦⎤=?0D? 1t?⎡⎣yx1w ⎦⎤= A2wD→c⎣⎡yx1w ⎦⎤, (1) where D ∈ R2×2 is an invertible matrix, and t ∈ R2 is a threarnesl Datio ∈n v Rector. Trajectory icboloer mdiantartixe,s (axnwd , tyw ∈) are in the plane’s reference frame (modulo a 2D affine transformation) and (xc, yc) are image coordinates. Now, let W ∈ R2F×P be matrix of trajectory data that conNtaoiwns, tlehet x a n∈d y image coordinates of P feature points tracked through F frames, as in TocmputehW pa=r⎢m⎣ ⎡⎢ etyx e1F r.,s 1 ofA· . ·2.Dfyx r1Fo m., P tra⎦⎥ ⎤je.ctorydat,(l2e)t C = [c1, c2 , c3] ∈ R2f×3 be three columns (three full trajectories) from W, and let = be the x and y coordinates of the i-th control trajectory at frame f. Then the transformation between points from an arbitrary source frame s to a target frame f can be written as: cif [ci2f−1, ci2f]? ?c1f1 c12f c1f3?= A2sD→f?c11s and A2s→Df c12s c13s?, (3) ?−1. (4) can be simply computed as: A2sD→f= ? c11f c12f c13f ? ? c11s c12s c13s The inverse in the right-hand side matrix of Eq. 4 exists so long as the points cis are not collinear. For simplicity we refer to as and consequently A2sD is the identity matrix. A2s→Df A2fD 222222556088 3D Model In order to upgrade a planar (degenerate) model into a full 3D one, relative depth must be accounted using the epipolar residual displacements. This means extending Eq. 1 with a direction vector, scaled by the corresponding relative depth of each point, as in: ⎣⎡xy1c ⎦⎤=?0?D? t1? ⎡⎣xy1w ⎦⎤+ δzw⎣⎡a 0213 ⎦⎤. The depth δzw is relative to the arbitrary plane tion is modeled by A2D; a point that lies on would have δzw = 0. We call the orthographic the plane plus parallax decomposition, the 2D Epipolar (2DAPE) decomposition. Eq. 5 is equivalent to (5) whose mosuch plane version of Affine Plus wher⎣⎡ ixyt1c is⎤⎦cl=ear⎡⎣tha 120t hea p201a2rma2e10t3erst1o2f⎦⎤A⎣⎢⎡3Dδyxd1zwefin⎦⎥⎤ean(o6r)thographically projected 3D affine transformation. Deter- mining the motion and structure parameters of a 3D model from point correspondences can be done using the classical matrix factorization approach [10], but besides being sensitive to noise and outliers, the common scenario where the solution becomes degenerate makes the approach difficult to use in real-world applications. Appropriately accommodating and dealing with the degenerate cases is one of the key features of our work. 2. Overview of the Method The proposed motion segmentation algorithm has three stages. First, a pool of M motion model hypotheses M = s{tMag1e , . . . , rMst,M a} p oiso generated using a omdeetlh hoydp tohthate csoesm Mbine =s a MRandom Sampling naenrda eCdon usseinngsu as m(ReAthNodS tAhaCt) [o4m] bteinche-s nique with the 2DAPE decomposition. The goal is to generate one motion model for each of the N independent, rigidly-moving objects in the scene; N is assumed to be known a priori. The method instantiates many more models than those expected necessary (M ? N) in an attempt ienlscr tehaasne tthhoes elik eexlpiheocotedd o nfe generating Mco ?rrec Nt m) iond aenl proposals for all N motions. A good model accurately describes a large subset of coherently moving trajectories with the smallest number of parameters (§3). Ialnl ethste n suemcobnedr stage, msubetseertss o§f3 )m.otion models from M are ncom thebi sneecdo ntod explain ualbl sthetes trajectories mino tdheel sequence. The problem is framed as an objective function that must be minimized. The objective function is the negative loglikelihood over prediction accuracy, regularized by model complexity (number of model parameters) and modeling overlap (trajectories explained by multiple models). Notice that after this stage, the segmentation that results from the optimal model combination could be reported as a segmentation result (§5). ioTnhe r tshuilrtd ( stage incorporates the results from a set of model combinations that are closest to the optimal. Segmentation results are aggregated into an affinity matrix, which is then passed to a spectral clustering algorithm to produce the final segmentation result. This refinement stage generally results in improved accuracy and reduced segmentation variability (§6). 3. Motion Model Instantiation Each model M ∈ M is instantiated independently using RacAhN mSAodCel. MThis ∈ c Mhoi cies niss manotitiavteatded in bdeecpaeunsdee otlfy th usismethod’s well-known computational efficiency and robustness to outliers, but also because of its ability to incorporate spatially local constraints and (as explained below) because most of the computations necessary to evaluate a planar model can be reused to estimate the likelihoods of a potentially necessary 3D model, yielding significant computational savings. The input to our model instantiation algorithm is a spatially-local, randomly drawn subset of trajectory data Wˆ[2F×I] ⊆ W[2F×P] (§3.1). In turn, at each RANSAC trial, the algorithm draw(§s3 uniformly d,i asttr eibaucthed R, A rNanSdoAmC subsets of three control trajectories (C[2F×3] ⊂ Wˆ[2F×I]). Each set of control trajectories is used to estim⊂ate the family of 2D affine transformations {A1, . . . , AF} between the iblyase o ffr 2aDm aef ainnde aralln sotfoherrm fartaimoness { iAn the sequence, wtwheicehn are then used to determine a complete set of model parameters M = {B, σ, C, ω}. The matrix B ∈ {0, 1}[F×I] indicates Mwhe =the {rB t,hσe ,iC-th, trajectory asthroixu Bld ∈b e predicted by model M at frame f (inlier, bif = 1) or not (outlier, bif = 0), σ = {σ1 , . . . , σF} are estimates of the magnitude of the σnois =e {foσr each fram}e a, aen eds ω ∈at {s2 oDf, t3hDe} m isa tnhietu edsetim ofa ttehde nmooidseel f type. hTh fera goal aisn dto ω ωfin ∈d {t2heD c,3oDntr}ol is points tainmda ttehed associated parameters that minimize the objective function O(Wˆ,M) =f?∈Fi?∈IbifLω? wˆif| Af,σf?+ Ψ(ω) + Γ(B) across (7) wˆfi a number of RANSAC trials, where = = are the coordinates of the i-th trajectory from the support subset at frame f. The negative log-likelihood term Lω (·) penalizes reconstruction error, while Ψ(·) and Γ(·) are regularizers. Tcohen tthrureceti otenr mer-s are ,d wefhinileed Ψ Ψ b(e·l)ow an. Knowing that 2D and 3D affine models have 6 and 8 degrees of freedom respectively, Ψ(ω) regularizes over model complexity using: (xif, yif) ( wˆ 2if−1, wˆ i2f) Wˆ Ψ(ω) =?86((FF − − 1 1)), i f ωω== 32DD. 222222556199 (8) Γ(B) strongly penalizes models that describe too few trajectories: Γ(B) =?0∞,, oifth?erwI?iseFbif< Fλi (9) The control set C whose M minimizes Eq. 7 across a number of RANSAC trials becomes part of the pool of candidates M. 2D likelihoods. For the planar case (ω = 2D) the negative log-likelihood term is evaluated with: L2D( wˆif| Af,σf) = −log?2π|Σ1|21exp?−21rif?Σ−1rif??, (10) which is a zero-mean 2D Normal distribution evaluated at the residuals The spherical covariance matrix is Σ = rif. rif (σf)2I. The residuals are determined by the differences between the predictions made by a hypothesized model Af, and the observations at each frame ?r?1f?=? w˜1?f?− Af? w˜1?s?. (11) 3D likelihoods. The negative log-likelihood term for the 3D case is based on the the 2DAPE decomposition. The 2D affinities Af and residuals rf are reused, but to account for the effect of relative depth, an epipolar line segment ef is robustly fit to the residual data at each frame (please see supplementary material for details on the segment fitting algorithm). The 2DAPE does not constrain relative depths to remain constant across frames, but only requires trajectories to be close to the epipolar line. So, if the unitary vector ef indicates the orthogonal direction to ef, then the negativ⊥e log-likelihood term for the 3D case is estimated with: L3D( wˆfi| Af,σf) = −2log⎜⎝⎛√21πσfexp⎪⎨⎪⎧−?r2if(?σfe)f⊥2?2⎪⎬⎪⎫⎞⎟⎠, ⎠(12,) which is also a zero-mean 2D Norma⎩l distribution ⎭computed as the product of two identical, separable, singlevariate, normal distributions, evaluated at the distance from the residual to the epipolar line. The first one corresponds to the actual deviation in the direction of ef , which is analyti- cally computed using rif?ef. The seco⊥nd one corresponds to an estimate of the deviat⊥ion in the perpendicular direction (ef), which cannot be determined using the 2DAPE decomposition model, but can be approximated to be equal to rif ? ef, which is a plausible estimate under the isotropic noise as⊥sumption. Note that Eq. 7 does not evaluate the quality of a model using the number of inliers, as it is typical for RANSAC. Instead, we found that better motion models resulted from Algorithm 1: Motion model instantiation × Algorithm 1: Motion model instantiation Input:b Traasejec frtoamrye d bata W[2F×P], number of RANSAC trials K, arbitrary Output: Parameters of the motion model M = {B , σn , ω} // determine the training set c ← rand( 1, P) ; r ← rand(rmin , rmax ) // random center and radius I] ← t ra j e ct oriesWithinDis k (W, r,c) // support subset X ← homoCoords(Wˆb) // points at base frame for K RANSAC trials do Wˆ[2F Wˆ return M = {B} optimizing over the accuracy ofthe model predictions for an (estimated) inlier subset, which also means that the effect of outliers is explicitly uncounted. Figure 1.b shows an example of class-labeled trajectory data, 1.c shows a typical spatially-local support subset. Figures 1.d and 1.e show a model’s control points and its corresponding (class-labeled) residuals, respectively. A pseudocode description of the motion instantiation algorithm is provided in Algorithm 1. Details on how to determine Wˆ, as well as B, σ, and ω follow. 3.1. Local Coherence The subset of trajectories Wˆ given to RANSAC to generate a model M is constrained to a spatially local region. The probability ofchoosing an uncontaminated set of 3 control trajectories, necessary to compute a 2D affine model, from a dataset with a ratio r of inliers, after k trials is: p = 1 − (1 − r3)k. This means that the number of trials pne =ede 1d −to (fi1n d− a subset of 3 inliers with probability p is k =lloogg((11 − − r p3)). (13) A common assumption is that trajectories from the same underlying motion are locally coherent. Hence, a compact region is likely to increase r, exponentially reducing 222222666200 Figure 2: Predictions (red) from a 2D affine model with standard Gaussian noise (green) on one of the control points (black). Noiseless model predictions in blue. All four scenarios have identical noise. The magnitude of the extrapolation error changes with the distance between the control points. k, and with it, RANSAC’s computation time by a proportional amount. The trade-off that results from drawing model control points from a small region, however, is extrapolation error. A motion model is extrapolated when utilized to make predictions for trajectories outside the region defined by the control points. The magnitude of modeling error depends on the magnitude of the noise affecting the control points, and although hard to characterize in general, extrapolation error can be expected to grow with the distance from the prediction to the control points, and inversely with the distance between the control points themselves. Figure 2 shows a series of synthetic scenarios where one of the control points is affected by zero mean Gaussian noise of small magnitude. Identical noise is added to the same trajectory in all four scenarios. The figure illustrates the relation between the distance between the control points and the magnitude of the extrapolation errors. Our goal is to maximize the region size while limiting the number of outliers. Without any prior knowledge regarding the scale of the objects in the scene, determining a fixed size for the support region is unlikely to work in general. Instead, the issue is avoided by randomly sampling disk-shaped regions of varying sizes and locations to construct a diverse set of support subsets. Each support subset is then determined by Wˆ = {wi | (xbi − ox)2 + (ybi − oy)2 < r2}, (14) where (ox , oy) are the coordinates of the center of a disk of radius r. To promote uniform image coverage, the disk is centered at a randomly chosen trajectory (ox , oy) = (xbi, yib) with uniformly distributed i ∼ U(1, P) and base frame b) w∼i h U u(1n,i fFor)m. yTo d asltrloibwu efodr idi ∼ffer Ue(n1t, region ds bizaesse, tfhraem read bius ∼ r is( ,cFho)s.en T ofro amllo a u fnoirfo dirmffe rdeinsttr riebugtiioonn r ∼s, tUh(erm raidni,u ursm rax i)s. Ihfo tsheenre f are mI a trajectories swtritihbiunt othne support region, then ∈ R2F×I. It is worth noting that the construction of theW support region does not incorporate any knowledge about the motion of objects in the scene, and in consequence will likely contain trajectories that originate from more than one independently moving object (Figure 3). Wˆ Wˆ Figure 3: Two randomly drawn local support sets. Left: A mixed set with some trajectories from the blue and green classes. Right: Another mixed set with all of the trajectories in the red class and some from the blue class. 4. Characterizing the Residual Distribution At each RANSAC iteration, residuals rf are computed using the 2D affine model Af that results from the constraints provided by the control trajectories C. Characterizing the distribution of rf has three initial goals. The first one is to determine 2D model inliers b2fD (§4.1), the second one is to compute estimates of the magnitude ,o tfh thee s ncooinsed at every frame σ2fD (§4.2), and the third one is to determine whether the residual( §d4i.s2t)r,ib auntidon th originates efr iosm to a planar or a 3D object (§4.3). If the object is suspected 3D, then two more goals n (§e4ed.3 )to. bIfe t achieved. sT shues pfiercstt one Dis, t hoe nde ttweromine 3D model inliers b3fD (§4.4), and the second one is to estimate the magnitude of the noise of a 3D model (§4.5). (σf3D) to reflect the use 4.1. 2D Inlier Detection Suppose the matrix Wˆ contains trajectories Wˆ1 ∈ R2F×I and Wˆ2 ∈ R2F×J from two independently moving objects, and ∈tha Rt these trajectories are contaminated with zero-mean Gaussian noise of spherical covariance η ∼ N(0, (σf)2I): Wˆ = ?Wˆ1|Wˆ2? + η. (15) A1f Now, assume we know the true affine transformations and that describe the motion of trajectories for the subsets Wˆ1 and Wˆ2, respectively. If is used to compute predictions for all of Wˆ (at frame f), the expected value (denoted by ?·?) of the magnitude of the residuals (rf from Eq. 11) for trajectories aing nWiˆtud1 will be in the order of the magnitude of the underlying noise ?|rif |? = σf for each i∈ {1, . . . , I}. But in this scenario, trajectories in Wˆ2 ewaicl h b ie ∈ predicted using tth ien wrong emnaodrioel,, resulting isn i nr esid?uals? wit?h magnitudes de?termined by the motion differential A2f ???rif??? A1f ???(A1f − A2f) wˆib???. If we = can assume that the motion ?d??riff???er =en???t(iAal is bigger tha???n. tIhfe w deis cpalnac aesmsuemnte d thuea t toh eno miseo:t ???(A1f − A2f)wib??? 222222666311 > σf, (16) then the model inliers can be determined by thresholding | with the magnitude of the noise, scaled by a constant |(τr =| w wλitσhσtf h):e |rif bif=?10,, |orthife|r ≤wi τse. (17) But because σf is generally unknown, the threshold (τ) is estimated from the residual data. To do so, let be the vector of residual magnitudes where rˆi ≤ ˆ ri+1 . Now, let r˜ = median ( rˆi+1 −ˆ r i). The threshold i≤s trˆ h en defined as r τ = min{ rˆi | (ri+1 − ri) > λr˜ r}, (18) which corresponds to the smallest residual magnitude before a salient magnitude gap. Our experiments showed this test to be efficient and effective. Figure 1.e shows classlabeled residuals. Notice the presence of a (low density) gap between the residuals from the trajectories explained by the correct model (in red, close to the origin), and the rest. 4.2. Magnitude of the Noise, 2D Model r2fD Let contain only the residuals of the inlier trajectories (those where = 1), and let USV? be the singular value decomposition of the covariance matrix bif ofˆ r2fD: USV?= svd??1bpf( rˆ2fD)? rˆ2fD?.(19) Then the magnitude of the n?oise corresponds to the largest singular value σ2 = s1, because if the underlying geometry is in fact planar, then the only unaccounted displacements captured by the residuals are due to noise. Model capacity can also be determined from S, as explained next. 4.3. Model Capacity The ratio of largest over smallest singular values (s1/s2) determines when upgrading to a 3D model is beneficial. When the underlying geometry is actually non-planar, the residuals from a planar model should distribute along a line (the epipolar line), reflecting that their relative depth is being unaccounted for. This produces a covariance matrix with a large ratio s1/s2 ? 1. If on the other hand, if s1/s2 ≈ 1, then there is no in 1d.ic Iafti oonn tohfe unexplained relative depth, tihn wnh thicehr case, fitting a olinne o tfo u spherically distributed residuals will only increase the model complexity without explaining the residual variance much better. A small spherical residual covariance strongly suggests a planar underlying geometry. 4.4. 3D Inlier Detection When the residual distribution is elongated (s1/s2 ? 1), a line segment is robustly fit to the (potentially con?tam 1i)-, nated) set of residuals. The segment must go through the origin and its parameters are computed using a Hough transform. Further details about this algorithm can be found in the supplementary material. Inlier detection The resulting line segment is used to determine 3D model inliers. Trajectory ibecomes an inlier at frame f if it satisfies two conditions. First, the projection of rif onto the line must lie within the segment limits (β ≤ ≤ γ). Second, the normalized distance to the rif?ef (ef?rif line must be below a threshold ≤ σ2λd). Notice that the threshold depends on the smalle≤st singular value from Eq. 19 to (roughly) account for the presence of noise in the direction perpendicular to the epipolar (ef). 4.5. Magnitude of the Noise, 3D Model let rˆf3D Similarly to the 2D case, contain the residual data from the corresponding 3D inlier trajectories. An estimate for the magnitude of the noise that reflects the use of a 3D model can be obtained from the singular value decomposition of the covariance matrix of r3fD (as in Eq. 19). In this case, the largest singular value s1 captures the spread of residuals along the epipolar line, so its magnitude is mainly related to the magnitude of the displacements due to relative depth. However, s2 captures deviations from the epipolar line, which in a rigid 3D object can only be attributed to noise, making σ2 = s2 a reasonable estimate for its magnitude. Optimal model parameters When both 2D and 3D models are instantiated, the one with the smallest penalized negative log-likelihood (7) becomes the winning model for the current RANSAC run. The same penalized negative loglikelihood metric is used to determine the better model from across all RANSAC iterations. The winning model is added to the pool M, and the process is repeated M times, forming hthee p pool MM, a=n d{ tMhe1 , . . . , MssM is} r.e 5. Optimal Model Subset The next step is to find the model combination M? ⊂ M thhea tn mxta xstiempiz iess t prediction accuracy finora othne Mwhol⊂e trajectory mdaaxtaim Wize,s w phreiledi minimizing cmyod foerl complexity and modelling overlap. For this purpose, let Mj = {Mj,1 , . . . , Mj,N} be the j-th m thoisdel p ucorpmosbein,at lieotn, M Mand let {Mj} be the set o}f baell MheC jN-th = m N!(MM−!N)!) caotimonb,in aantdio lnest of N-sized models than can be draNw!(nM fr−oNm) M). The model soefle Nct-sioinze problem sis t hthanen c faonr bmeu dlartaewdn as M?= ar{gMmj}inOS(Mj), (20) 222222666422 where the objective is ?N ?P OS(Mj) = ??πp,nE (wp,Mj,n) ?n=1p?P=1 + λΦ?Φ(wp,Mj,n) + λΨ?Ψ(Mj,n). ?N (21) i?= ?1 n?= ?1 The first term accounts for prediction accuracy, the other two are regularization terms. Details follow. Prediction Accuracy In order to determine how well a model M predicts an arbitrary trajectory w, the affine transformations estimated by RANSAC could be re-used. However, the inherent noise in the control points, and the potentially short distance between them, often render this approach impractical, particularly when w is spatially distant from the control points (see §3. 1). Instead, model parametferorsm are computed owinittsh a efeac §to3r.1iz)a.ti Ionnst e baadse,d m [o1d0e]l mpaertahmode.Given the inlier labeling B in M, let WB be the subset of trajectories where bif = 1for at least half of the frames. The orthonormal basis S of a ω = 2D (or 3D) motion model can be determined by the 2 (or 3) left singular vectors of WB. Using S as the model’s motion matrices, prediction accuracy can be computed using: E(w, M) = ??SS?w − w??2 , (22) which is the sum of squared?? Euclidean d??eviations from the predictions (SS?w), to the observed data (w). Our experiments indicated that, although sensitive to outliers, these model predictions are much more robust to noise. Ownership variables Π ∈ {0, 1}[P×N] indicate whether trajectory p i ps explained by t {he0 ,n1-}th model (πp,n = 1) or not (πp,n = 0), and are determined by maximum prediction accuracy (i.e. minimum Euclidean deviation): πp,n=⎨⎧01,, oift hMerjw,nis=e. aMrg∈mMinjE(wp,M) (23) Regularization terms The second term from Eq. 21 penalizes situations where multiple models explain a trajectory (w) with relatively small residuals. For brevity, let M) = exp{−E(w, M)}, then: Eˆ(w, Φ(w,Mj) = −logMMm∈?∈aMMxjE ˆ ( w , MM)). (24) The third term regularizes over the number of model parameters, and is evaluated using Eq. 8. The constants λΦ and λΨ modulate the effect of the corresponding regularizer. Table 1: Accuracy and run-time for the H155 dataset. Naive RANSAC included as a baseline with overall accuracy and total computation time estimated using data from [12]. SOCARAPSulgrCAoNs[S r[31itA]4h]CmAverage89 A689 c. 71c 7695u racy[%]Compu1 t4a 237t1i506o70 n0 time[s] 6. Refinement The optimal model subset M? yields ownership variableTsh Πe o? wtimhicahl can already tb e M interpreted as a segmentation result. However, we found that segmentation accuracy can be improved by incorporating the labellings Πt from the top T subsets {Mt? | 1 ≤ t ≤ T} closest to optimal. Multiple labellings are incorporated oinsetos an affinity matrix F, where the fi,j entry indicates the frequency with which trajectory i is given the same label as trajectory j across all T labelli?ngs, weighted b?y the relative objective function O˜t = exp ?−OOSS((WW||MMt??))? for such a labelling: fi,j=?tT=11O˜tt?T=1?πit,:πjt,?:?O˜t (25) Note that the inne?r product between the label vectors (πi,:πj?,:) is equal to one only when the labels are the same. A spectral clustering method is applied on F to produce the method’s final segmentation result. 7. Experiments Evaluation was made through three experimental setups. Hopkins 155 The Hopkins 155 (H155) dataset has been the standard evaluation metric for the problem of motion segmentation of trajectory data since 2007. It consists of checkerboard, traffic and articulated sequences with either 2 or 3 motions. Data was automatically tracked, but tracking errors were manually corrected; further details are available in [12]. The use of a standard dataset enables direct comparison of accuracy and run-time performance. Table 1 shows the relevant figures for the two most competitive algorithms that we are aware of. The data indicates that our algorithm has run-times that are close to 2 or 3 orders of magnitude faster than the state of the art methods, with minimal accuracy loss. Computation times are measured in the same (or very similar) hardware architectures. Like in CPA, our implementation uses a single set of parameters for all the experiments, but as others had pointed out [14], it remains unclear whether the same is true for the results reported in the original SSC paper. 222222666533 Figure 4: Accuracy error-bars across artificial H155 datasets with controlled levels of Gaussian noise. Artificial Noise The second experimental setup complements an unexplored dimension in the H155 dataset: noise. The goal is to determine the effects of noise of different magnitudes towards the segmentation accuracy of our method, in comparison with the state of the art. We noted that H155 contains structured long-tailed noise, but for the purpose of this experiment we required a noise-free dataset as a baseline. To generate such a dataset, ground-truth labels were used to compute a rank 3 reconstruction of (mean-subtracted) trajectories for each segment. Then, multiple versions of H155 were computed by contaminating the noise-free dataset with Gaussian noise of magnitudes σn ∈ {0.01, 0.25, 0.5, 1, 2, 4, 8}. Our method, as well as SSC a∈nd { 0C.0PA1, were run on t2h,e4s,e8 }n.oi Ose-ucro mnterothlloedd, datasets; results are shown in Figure 4. The error bars on SSC and Ours indicate one standard deviation, computed over 20 runs. The plot for CPA is generated with only one run for each dataset (running time: 11.95 days). The graph indicates that our method only compromises accuracy for large levels of noise, while still being around 2 or 3 orders of magnitude faster than the most competitive algorithms. KLT Tracking The last experimental setup evaluates the applicability of the algorithm in real world conditions using raw tracks from an off-the-shelf implementation [1] of the Kanade-Lucas-Tomasi algorithm. Several sequences were tracked and the resulting trajectories classified by our method. Figure 5 shows qualitatively good motion segmentation results for four sequences. Challenges include very small relative motions, tracking noise, and a large presence of outliers. 8. Conclusions We introduced a computationally efficient motion segmentation algorithm for trajectory data. Efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Run-time comparisons indicate that our method is 2 or 3 orders of magnitude faster than the state of the art, with only a small loss in accuracy. The robustness of our method to Gaussian noise tracks from four Formula 1 sequences. Italian Grand Prix ?c2012 Formula 1. In this figure, all trajectories are given a m?co2ti0o1n2 l Faoberml, including hoiust fliigeurrse. of different magnitudes was found competitive with state of the art, while retaining the inherent computational efficiency. The method was also found to be useful for motion segmentation of real-world, raw trajectory data. References [1] ht tp : / /www . ce s . c l emn s on . edu / ˜stb / k lt . 8 [2] J. P. Costeira and T. Kanade. A Multibody Factorization Method for Independently Moving Objects. IJCV, 1998. 1 [3] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. CVPR, 2009. 2, 7 [4] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981. 3 [5] M. Irani and P. Anandan. Parallax geometry of pairs of points for 3d scene analysis. Proc. ECCV, 1996. 2 [6] K. Kanatani. Motion segmentation by subspace separation: Model selection and reliability evaluation. International Journal Image Graphics, 2002. 1 [7] H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, 1987. 1 [8] K. Schindler, D. Suter, , and H. Wang. A model-selection framework for multibody structure-and-motion of image sequences. Proc. IJCV, 79(2): 159–177, 2008. 1 [9] C. Tomasi and T. Kanade. Shape and motion without depth. Proc. [10] [11] [12] [13] [14] [15] ICCV, 1990. 1 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 1, 3, 7 P. Torr. Geometric motion segmentation and model selection. Phil. Tran. of the Royal Soc. of Lon. Series A: Mathematical, Physical and Engineering Sciences, 1998. 1 R. Tron and R. Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. In Proc. CVPR, 2007. 7 J. Yan and M. Pollefeys. A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. PAMI, 2008. 1 L. Zappella, E. Provenzi, X. Llad o´, and J. Salvi. Adaptive motion segmentation algorithm based on the principal angles configuration. Proc. ACCV, 2011. 2, 7 L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. Proc. CVPR, 2003. 1 222222666644
4 0.14713818 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
Author: Tak-Wai Hui, Ronald Chung
Abstract: We address the problem of recovering camera motion from video data, which does not require the establishment of feature correspondences or computation of optical flows but from normal flows directly. We have designed an imaging system that has a wide field of view by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, we discover that estimating the directions of translation and rotation components of the motion separately are possible and particularly efficient. In addition, the inherent ambiguities between translation and rotation also disappear. Magnitude of rotation is recovered subsequently. Experimental results on synthetic and real image data are provided. The results show that not only the accuracy of motion estimation is comparable to those of the state-of-the-art methods that require explicit feature correspondences or optical flows, but also a faster computation time.
5 0.14621732 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
Author: Zhuoyuan Chen, Hailin Jin, Zhe Lin, Scott Cohen, Ying Wu
Abstract: We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sac- rifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.
6 0.12500456 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
7 0.1217423 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
8 0.11554746 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
9 0.10991271 368 cvpr-2013-Rolling Shutter Camera Calibration
10 0.10931887 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
11 0.10844426 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
12 0.10816668 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
13 0.10804983 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
14 0.10754417 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
15 0.10603382 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree
16 0.10506988 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration
17 0.10226554 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
18 0.10166506 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?
19 0.099987224 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
20 0.096293256 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
topicId topicWeight
[(0, 0.192), (1, 0.159), (2, 0.001), (3, -0.012), (4, -0.026), (5, -0.043), (6, 0.014), (7, -0.086), (8, 0.007), (9, 0.061), (10, 0.026), (11, 0.162), (12, 0.097), (13, -0.063), (14, 0.013), (15, -0.067), (16, 0.034), (17, 0.09), (18, -0.04), (19, 0.004), (20, 0.058), (21, -0.083), (22, -0.032), (23, 0.012), (24, 0.111), (25, -0.074), (26, -0.096), (27, 0.034), (28, 0.01), (29, 0.008), (30, -0.062), (31, -0.027), (32, -0.048), (33, -0.028), (34, 0.073), (35, 0.046), (36, -0.025), (37, -0.002), (38, 0.013), (39, 0.038), (40, 0.019), (41, -0.002), (42, -0.019), (43, -0.07), (44, -0.015), (45, 0.025), (46, 0.111), (47, -0.04), (48, 0.051), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.96630526 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
2 0.74275786 74 cvpr-2013-CLAM: Coupled Localization and Mapping with Efficient Outlier Handling
Author: Jonathan Balzer, Stefano Soatto
Abstract: We describe a method to efficiently generate a model (map) of small-scale objects from video. The map encodes sparse geometry as well as coarse photometry, and could be used to initialize dense reconstruction schemes as well as to support recognition and localization of three-dimensional objects. Self-occlusions and the predominance of outliers present a challenge to existing online Structure From Motion and Simultaneous Localization and Mapping systems. We propose a unified inference criterion that encompasses map building and localization (object detection) relative to the map in a coupled fashion. We establish correspondence in a computationally efficient way without resorting to combinatorial matching or random-sampling techniques. Instead, we use a simpler M-estimator that exploits putative correspondence from tracking after photometric and topological validation. We have collected a new dataset to benchmark model building in the small scale, which we test our algorithm on in comparison to others. Although our system is significantly leaner than previous ones, it compares favorably to the state of the art in terms of accuracy and robustness.
3 0.71590269 368 cvpr-2013-Rolling Shutter Camera Calibration
Author: Luc Oth, Paul Furgale, Laurent Kneip, Roland Siegwart
Abstract: Rolling Shutter (RS) cameras are used across a wide range of consumer electronic devices—from smart-phones to high-end cameras. It is well known, that if a RS camera is used with a moving camera or scene, significant image distortions are introduced. The quality or even success of structure from motion on rolling shutter images requires the usual intrinsic parameters such as focal length and distortion coefficients as well as accurate modelling of the shutter timing. The current state-of-the-art technique for calibrating the shutter timings requires specialised hardware. We present a new method that only requires video of a known calibration pattern. Experimental results on over 60 real datasets show that our method is more accurate than the current state of the art.
4 0.70303124 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
Author: Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, Feng Wu, Yong Rui
Abstract: 3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers from noisy correspondences caused by the increasing number of target objects. To overcome this scalability bottleneck, we propose an efficient 2D-to-3D correspondence filtering approach, which combines a light-weight neighborhoodbased step with a finer-grained pairwise step to remove spurious correspondences based on 2D/3D geometric cues. On a dataset of 300 3D objects, our solution achieves ∼10 times speed improvement over the baseline, with a comparable recognition accuracy. A parallel implementation on a quad-core CPU can run at ∼3fps for 1280× 720 images.
5 0.69831526 283 cvpr-2013-Megastereo: Constructing High-Resolution Stereo Panoramas
Author: Christian Richardt, Yael Pritch, Henning Zimmer, Alexander Sorkine-Hornung
Abstract: We present a solution for generating high-quality stereo panoramas at megapixel resolutions. While previous approaches introduced the basic principles, we show that those techniques do not generalise well to today’s high image resolutions and lead to disturbing visual artefacts. As our first contribution, we describe the necessary correction steps and a compact representation for the input images in order to achieve a highly accurate approximation to the required ray space. Our second contribution is a flow-based upsampling of the available input rays which effectively resolves known aliasing issues like stitching artefacts. The required rays are generated on the fly to perfectly match the desired output resolution, even for small numbers of input images. In addition, the upsampling is real-time and enables direct interactive control over the desired stereoscopic depth effect. In combination, our contributions allow the generation of stereoscopic panoramas at high output resolutions that are virtually free of artefacts such as seams, stereo discontinuities, vertical parallax and other mono-/stereoscopic shape distortions. Our process is robust, and other types of multiperspective panoramas, such as linear panoramas, can also benefit from our contributions. We show various comparisons and high-resolution results.
6 0.69802755 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
7 0.6863746 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
8 0.67646164 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
9 0.6656729 428 cvpr-2013-The Episolar Constraint: Monocular Shape from Shadow Correspondence
10 0.610385 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
11 0.61005551 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation
12 0.60786581 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
13 0.60654896 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
14 0.59802794 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
15 0.58994281 84 cvpr-2013-Cloud Motion as a Calibration Cue
16 0.58667469 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
17 0.58036268 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
18 0.56768179 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
19 0.56208277 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
20 0.55614519 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
topicId topicWeight
[(10, 0.117), (16, 0.059), (26, 0.07), (28, 0.016), (33, 0.248), (47, 0.225), (67, 0.029), (69, 0.042), (87, 0.113)]
simIndex simValue paperId paperTitle
1 0.88589209 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
Author: Nikolaos Kyriazis, Antonis Argyros
Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
same-paper 2 0.85971558 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys
Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.
Author: Jörg Hendrik Kappes, Markus Speth, Gerhard Reinelt, Christoph Schnörr
Abstract: Discrete graphical models (also known as discrete Markov random fields) are a major conceptual tool to model the structure of optimization problems in computer vision. While in the last decade research has focused on fast approximative methods, algorithms that provide globally optimal solutions have come more into the research focus in the last years. However, large scale computer vision problems seemed to be out of reach for such methods. In this paper we introduce a promising way to bridge this gap based on partial optimality and structural properties of the underlying problem factorization. Combining these preprocessing steps, we are able to solve grids of size 2048 2048 in less than 90 seconds. On the hitherto unsolva2b04le8 C×h2i0ne4s8e character dataset of Nowozin et al. we obtain provably optimal results in 56% of the instances and achieve competitive runtimes on other recent benchmark problems. While in the present work only generalized Potts models are considered, an extension to general graphical models seems to be feasible.
4 0.81739235 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich
Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.
5 0.79165709 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
Author: Feng Lu, Yasuyuki Matsushita, Imari Sato, Takahiro Okabe, Yoichi Sato
Abstract: We propose an uncalibrated photometric stereo method that works with general and unknown isotropic reflectances. Our method uses a pixel intensity profile, which is a sequence of radiance intensities recorded at a pixel across multi-illuminance images. We show that for general isotropic materials, the geodesic distance between intensity profiles is linearly related to the angular difference of their surface normals, and that the intensity distribution of an intensity profile conveys information about the reflectance properties, when the intensity profile is obtained under uniformly distributed directional lightings. Based on these observations, we show that surface normals can be estimated up to a convex/concave ambiguity. A solution method based on matrix decomposition with missing data is developed for a reliable estimation. Quantitative and qualitative evaluations of our method are performed using both synthetic and real-world scenes.
6 0.78999376 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
8 0.78507406 155 cvpr-2013-Exploiting the Power of Stereo Confidences
9 0.78470635 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery
10 0.78448075 303 cvpr-2013-Multi-view Photometric Stereo with Spatially Varying Isotropic Materials
11 0.78431815 286 cvpr-2013-Mirror Surface Reconstruction from a Single Image
12 0.78402919 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation
13 0.78382564 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
14 0.78363091 394 cvpr-2013-Shading-Based Shape Refinement of RGB-D Images
15 0.78323233 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
16 0.78319699 298 cvpr-2013-Multi-scale Curve Detection on Surfaces
17 0.78298718 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
18 0.78219545 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
19 0.78175825 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus
20 0.78169638 431 cvpr-2013-The Variational Structure of Disparity and Regularization of 4D Light Fields