cvpr cvpr2013 cvpr2013-317 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Erik Ask, Olof Enqvist, Fredrik Kahl
Abstract: This paper is concerned with model fitting in the presence of noise and outliers. Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. This paper improves on these results in two ways. First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models.
Reference: text
sentIndex sentText sentNum sentScore
1 Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. [sent-2, score-0.191]
2 First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. [sent-4, score-0.29]
3 Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. [sent-5, score-0.232]
4 We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. [sent-6, score-0.419]
5 It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models. [sent-7, score-0.283]
6 Introduction Even though it is widely accepted that the truncated L2- norm is a good way to model noise and outliers, its use has been hindered by the difficulty in solving the corresponding optimization problem. [sent-9, score-0.29]
7 Here the objective is to register slices of prostate tissue stained in different colours. [sent-14, score-0.296]
8 (Left)Imagesoftwodifer ntsain gsfromaprostae biopsy with 791 hypothetical correspondences (cyan) obtained from SIFT. [sent-18, score-0.215]
9 (Right) Same images, but now with 21 inlier correspondences (green) of the optimal truncated L2-fit. [sent-19, score-0.672]
10 In [14], it was shown that the problem of maximizing the number of inliers can be solved in polynomial-time in the number of measurements. [sent-37, score-0.222]
11 This result was recently generalized in [7] by improving the complexity bound and weaken111777222200 ing the conditions on the class of residual functions that can be handled. [sent-38, score-0.224]
12 Still, only the 0 1loss function (or more precisely, a piece-wise ncolyns thtaen 0t f−un 1ct loiosns) f can bioen h (oanrd mleodr ein p r[e7c],i aenlyd, not the statistically motivated truncated L2-norm. [sent-40, score-0.29]
13 The sequential (or greedy) approach removes all correspondences that are deemed inliers for the most dominant model, and then the process is repeated. [sent-42, score-0.328]
14 Experimental results and comparisons to (exhaustive) RANSAC are given for challenging registration and stitching problems. [sent-48, score-0.419]
15 We show that maximizing the number of inliers can be done in polynomial-time given a fixed number ofmodels and model parameters. [sent-50, score-0.222]
16 The proof is based on similar ideas and concepts as in the truncated L2-case. [sent-52, score-0.323]
17 As noted above, most approaches for robust estimation in vision count the number of inliers to assess the quality of a solution, where inliers are - by definition - residual errors less than some prescribed threshold ? [sent-55, score-0.553]
18 , but also that the distribution of the inlier errors is not modelled. [sent-59, score-0.237]
19 The assumption is that inlier residuals have a clock-shaped error distribution similar to the Gaussian distribution, whereas outlier residuals have approximately uniformly distributed errors. [sent-61, score-0.753]
20 A robust loss function (solid black) as suggested in [4], and the truncated L2-error (red) which can be optimized using the proposed framework; in this case truncated at ? [sent-65, score-0.659]
21 We will work with a truncated quadratic loss function. [sent-68, score-0.402]
22 Let D be a d-dimensional differentiable manifold, embedded in Rm (m ≥ d) by a set of equality constraints hj (θ) = 0. [sent-75, score-0.308]
23 (Gmive ≥n a s bety of ertes oidfu eqaul functions ri : D → R, i = 1, . [sent-76, score-0.242]
24 To make this more concrete, we look at the registration problem. [sent-86, score-0.196]
25 The residuals are simply the Euclidean distances ri (θ) = | |R(θ)xi + t(θ) − yi | | . [sent-93, score-0.555]
26 Given two sets of residual functions I O, and let D(I, O) denote the set of θ ∈ D such that alellt ri ∈I IO a)n dde ri (tθe) h≥e ? [sent-118, score-0.632]
27 Given two sets of residual functions I and O and a differentiable function f : D → R, θ∈Dm(iIn,O)f(θ). [sent-124, score-0.203]
28 (9) The constraints on θ can also be written, (θ) = gi (θ) = gi ri ? [sent-125, score-0.642]
29 (θ) − ≤ 0, for all i∈ I ri (θ) ≤ 0, for all i∈ O hj (θ) = 0, for all j , − ? [sent-126, score-0.42]
30 w Teh we lsliz oen loyf the problem is defined to be the number of residuals |I∪ O|. [sent-137, score-0.211]
31 We require that D is a d-dimensional differentiable manifold embedded in Rm using constraints hj (θ) = 0. [sent-148, score-0.305]
32 We also require that the hj are continuously differentiable and that the gi ’s are differentiable. [sent-149, score-0.415]
33 In order to find one point in the green region (left), it is enough to consider a subproblem with only d = 2 residuals (right). [sent-155, score-0.211]
34 Fitting Under Truncated L2-norm For any feasible model point θ ∈ D, the set of residuals willF obre partitioned imnotod etwl opo sets, an Din,li tehre s seett I o ffo rer widhuiaclhs ri (θ) ? [sent-164, score-0.453]
35 for all ri ∈ I and the complement set of outliers O. [sent-165, score-0.33]
36 ba fsoirc a ildl era i∈s Ito a compute mallp seumcehn partitions - tehres main result is that there are only O(nd) partitions - and then smoalivne a sutaltn idsa thrda least-squares problem for each inlier set. [sent-167, score-0.305]
37 If Condition 2 is satisfied and the residual functions are continuous, then there exists a minimizer to Problem 1. [sent-169, score-0.265]
38 Let O∗ denote the set of residuals that are not in I∗ . [sent-186, score-0.211]
39 111777222422 Lemma 1 shows that if we somehow knew the correct set I∗, we could find an optimal solution in truncated L2 sense by solving a standard least squares problem. [sent-215, score-0.396]
40 The key is then to find all candidates for the correct inlier set and the following theorem implies how this can be achieved. [sent-217, score-0.312]
41 Let I a non-empty subset of the residuals { ri }in=1 and be LOe tth Ie remaining ones s suucbhs ethta otf D the(I, r eOsi)d l∅s. [sent-220, score-0.453]
42 λj∇hj(θ) = 0, (21) where μi ≥ 0, gi (θ) = ri (θ) − ? [sent-267, score-0.424]
43 rAes i nteheq hj ’tsy are an embedding of a d-dimensional manifold in Rm, the set {∇hj } will span a (m d)-dimensional subspace perpendicular to twheil lm spaannif aol (dm tangent space oatn aθl. [sent-272, score-0.248]
44 This theorem tells us that ifwe can compute all FJ-points for all instances of Problem 3 of size |I ∪ O| ≤ d, we can faolsro a fliln idn astlal possible partitions ooff isnizlieer | Ian ∪d Oo|ut ≤lier d s,e wtse. [sent-294, score-0.181]
45 Fcaonr each FJ-point we simply check which residuals are < ? [sent-295, score-0.211]
46 Generically, there will be at most d such residuals and hence we have 2d possible inlier sets to examine. [sent-300, score-0.516]
47 Multiple Models The previous results also hold in the multi-model case, as fitting k d-dimensional models can be viewed as fitting one kd-dimensional model. [sent-328, score-0.258]
48 Let I1∗ and O1∗ be the set of inliers and outliers, respectively, to θ1∗ . [sent-343, score-0.222]
49 Note that a residual can be an inlier to more than one model, but this will not matter. [sent-344, score-0.346]
50 Each hypothesis (or FJ-point) can be represented by its set of inliers Ii and we want to find a maximum k-cover, |Cm|a=xk? [sent-359, score-0.222]
51 Applications We will look at three applications in more detail: Image registration, stitching and multi-model registration. [sent-381, score-0.223]
52 Image Registration We return to the image registration problem introduced in Section 2. [sent-392, score-0.196]
53 We write the constraints gi as gi(R, t) = (Rxi + t − yi)T(Rxi + t − yi) − ? [sent-396, score-0.218]
54 (26) We will need different polynomial solvers depending on the number of active constraints. [sent-398, score-0.191]
55 This amounts to 4 quadratic polynomial equations i n1 4= u 0n. [sent-400, score-0.217]
56 For problems of size < 3 we also need to consider the constraint posed on the gradients of gi, hj and f in (11). [sent-404, score-0.178]
57 This does not alter the form of gi (R(q)) ≥ 0 but introduces an equality constraint h(q) = | |q|( |R2 (−q)1) = ≥ 0 0. [sent-433, score-0.221]
58 Using methods presented in [1] a polynomial solver exploiting the symmetry of the quaternion representation was constructed. [sent-436, score-0.202]
59 For problems where only two inequality constraints are active we can no longer formulate a solvable system. [sent-437, score-0.183]
60 Just as for the case of registration we use the implied linear dependence and det(∇f ∇g1 ∇g2 ∇h) = 0 to obtain this equation. [sent-439, score-0.196]
61 Each solution (or FJ-point) provides us with an inlier set. [sent-447, score-0.273]
62 Picking the maximum k-cover among all possible inlier sets is then performed using CPLEX with the provided MATLAB interface. [sent-448, score-0.276]
63 This section will present such a method for fast outlier rejection which works both for registration and stitching. [sent-451, score-0.401]
64 Note that only correspondences that can be shown to not be part of an optimal inlier set will be discarded. [sent-452, score-0.382]
65 Rather than working directly with the truncated L2-norm we will ? [sent-458, score-0.29]
66 Suppose that for a set of corresponding points there exists a transformation T (for registration or stitching) such that all residuals are less than ? [sent-467, score-0.452]
67 such that residual ri = 0 and all other residuals rj are less than 2? [sent-470, score-0.603]
68 Then a rotation Rα about xi yi will map xi exactly to yi and | |Rα | | ≤ ? [sent-486, score-0.315]
69 -inliers given that ri = 0 is also a bound for the number of ? [sent-494, score-0.317]
70 For both registration and stitching, getting a bound is fairly easy as tthhirse gcoinstsrtartaioinnt afinxdess titthceh tirnagn,s gfoertmtinagtion up to a one-dimensional rotation using correspondence (xi, yi). [sent-497, score-0.308]
71 To get a bound on the number of inliers we need to find a point that lies in as many of these intervals as possible. [sent-500, score-0.297]
72 2 I flo wge en d)o, wthihsic foh ri se significantly cheaper eth gaent tahe c optimal algorithm. [sent-504, score-0.312]
73 For registration and stitching, the fast outlier rejection method of Section 6 is applied as a preprocessing step to Algorithm 1. [sent-509, score-0.401]
74 For multi-model registration, we use Algorithm 2 with CPLEX for max k-cover, but before running the polynomial solver we do a simple test verifying that pairwise distances are consistent. [sent-512, score-0.182]
75 Registration For the registration experiments the problem of matching two differently stained tissue slices of a prostate biopsy is addressed. [sent-516, score-0.587]
76 Compared with standard RANSAC which optimizes the size of the inlier set 111777222755 (c) Figure 4. [sent-521, score-0.237]
77 (a) Image pairs of prostate tissue in two different stainings. [sent-522, score-0.248]
78 (c) Image pair of a lung tissue biopsy which has three components. [sent-525, score-0.246]
79 REGISTRATION 2110505105 2110505105 0 2 4 6 8 Improvement of truncated L2 error 0 0. [sent-528, score-0.29]
80 2 Improvement of truncated L2 error STITCHING I0mprove2ment o4f trunca6ted L28error I0mprovement o0. [sent-530, score-0.29]
81 Histogram of performance differences between our optimal method and standard RANSAC (left) and RANSAC which eval- uates the truncated L2-error for each sample (right). [sent-533, score-0.36]
82 Compared with RANSAC that uses the same truncated quadratic loss, 42 pairs gave no difference. [sent-537, score-0.37]
83 It is clear from the histograms that the optimal method significantly outperforms the best possible obtainable result from a standard inlier optimizing RANSAC approach. [sent-542, score-0.307]
84 Even though the truncated L2-RANSAC performs considerably better, it is still not optimal in half of the cases. [sent-543, score-0.36]
85 Note that RANSAC hardly ever finds the optimal solution even though the sampling of minimal sets is exhaustive. [sent-548, score-0.182]
86 Similarly with the truncated L2-RANSAC, there were 38 pairs with no difference while 15 pairs obtained improved results compared with the optimal method, see the lower, right of Figure 5. [sent-558, score-0.454]
87 The relative performance between our optimal method and RANSAC follows the same pattern as for the registration experiment. [sent-559, score-0.266]
88 There is a significant difference compared to standard RANSAC while for the truncated L2-RANSAC the performance difference is smaller. [sent-560, score-0.29]
89 These findings are consistent with [12] where a truncated quadratic loss is also found to perform better than simply counting inliers in the RANSAC-loop. [sent-561, score-0.624]
90 Multiple Model Registration For the experiments on simultaneous multiple model registration we emulate structures as the one displayed in Figure 4(c). [sent-564, score-0.196]
91 For the cases with two models we used 20 inliers per model, and in total 40 outliers. [sent-567, score-0.222]
92 For the three model case we used models of different dominance and set the inlier sizes to 10, 20 and 30. [sent-568, score-0.237]
93 (Left) Runtime as a function of the number of correspondences for the fast rejection method. [sent-572, score-0.186]
94 Average running time for the 2-model examples was 18 s and 90 s for fitting 3 models. [sent-577, score-0.166]
95 Runtimes and Time Complexity By randomly selecting subproblems of different sizes from the registration data, we examined the running times of the fast outlier rejection and Algorithm 1as a function of problem size. [sent-580, score-0.438]
96 As most of the outliers are removed by the outlier rejection step and the runtime of the second step depends mainly on the number of inliers. [sent-582, score-0.344]
97 Conclusions We have shown how to minimize the truncated quadratic loss function, which accurately models the noise for both inliers and outliers. [sent-589, score-0.624]
98 Our experiments demonstrate that this yields a practical approach when combined with a fast outlier rejection step. [sent-590, score-0.241]
99 On the other hand, in such a case, it is not crucial to find all inliers in order to get an accurate estimate. [sent-592, score-0.222]
100 A polynomial-time bound for matching and registration with outliers. [sent-697, score-0.271]
wordName wordTfidf (topN-words)
[('truncated', 0.29), ('ri', 0.242), ('inlier', 0.237), ('rxi', 0.231), ('stitching', 0.223), ('inliers', 0.222), ('residuals', 0.211), ('registration', 0.196), ('ransac', 0.189), ('gi', 0.182), ('hj', 0.178), ('enqvist', 0.144), ('fitting', 0.129), ('rxj', 0.116), ('minimizer', 0.111), ('rejection', 0.111), ('residual', 0.109), ('tissue', 0.106), ('polynomial', 0.103), ('yi', 0.102), ('biopsy', 0.095), ('prostate', 0.095), ('outlier', 0.094), ('outliers', 0.088), ('loss', 0.079), ('bound', 0.075), ('correspondences', 0.075), ('theorem', 0.075), ('optimal', 0.07), ('lemma', 0.067), ('fj', 0.064), ('inequality', 0.063), ('caratheodory', 0.058), ('ifj', 0.058), ('renumbering', 0.058), ('stn', 0.058), ('quaternion', 0.057), ('differentiable', 0.055), ('normally', 0.052), ('iei', 0.051), ('runtime', 0.051), ('equations', 0.051), ('str', 0.051), ('slices', 0.048), ('yj', 0.048), ('stained', 0.047), ('eiffel', 0.047), ('active', 0.047), ('pairs', 0.047), ('oo', 0.045), ('hypothetical', 0.045), ('josephson', 0.045), ('igi', 0.045), ('lung', 0.045), ('exists', 0.045), ('kahl', 0.043), ('solver', 0.042), ('solvers', 0.041), ('rj', 0.041), ('cplex', 0.041), ('conditions', 0.04), ('instances', 0.04), ('sets', 0.039), ('european', 0.039), ('equality', 0.039), ('ask', 0.038), ('minimal', 0.037), ('xi', 0.037), ('solvable', 0.037), ('rotation', 0.037), ('consensus', 0.037), ('running', 0.037), ('condition', 0.036), ('constraints', 0.036), ('solution', 0.036), ('manifold', 0.036), ('practical', 0.036), ('vanishing', 0.034), ('partitions', 0.034), ('span', 0.034), ('quadratic', 0.033), ('matlab', 0.033), ('proof', 0.033), ('tells', 0.032), ('horn', 0.032), ('partitioning', 0.032), ('exhaustive', 0.032), ('sequential', 0.031), ('amounts', 0.03), ('iofth', 0.03), ('tha', 0.03), ('squared', 0.029), ('precisely', 0.029), ('unknowns', 0.029), ('embed', 0.029), ('cyan', 0.029), ('hence', 0.029), ('flickr', 0.029), ('sorting', 0.029), ('det', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
Author: Erik Ask, Olof Enqvist, Fredrik Kahl
Abstract: This paper is concerned with model fitting in the presence of noise and outliers. Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. This paper improves on these results in two ways. First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models.
2 0.25154945 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
Author: Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille
Abstract: We present a new point matching algorithm for robust nonrigid registration. The method iteratively recovers the point correspondence and estimates the transformation between two point sets. In the first step of the iteration, feature descriptors such as shape context are used to establish rough correspondence. In the second step, we estimate the transformation using a robust estimator called L2E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step. The transformation is specified in a functional space, more specifically a reproducing kernel Hilbert space. We apply our method to nonrigid sparse image feature correspondence on 2D images and 3D surfaces. Our results quantitatively show that our approach outperforms state-ofthe-art methods, particularly when there are a large number of outliers. Moreover, our method of robustly estimating transformations from correspondences is general and has many other applications.
3 0.19734208 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
Author: Fernando Flores-Mangas, Allan D. Jepson
Abstract: The problem of rigid motion segmentation of trajectory data under orthography has been long solved for nondegenerate motions in the absence of noise. But because real trajectory data often incorporates noise, outliers, motion degeneracies and motion dependencies, recently proposed motion segmentation methods resort to non-trivial representations to achieve state of the art segmentation accuracies, at the expense of a large computational cost. This paperproposes a method that dramatically reduces this cost (by two or three orders of magnitude) with minimal accuracy loss (from 98.8% achieved by the state of the art, to 96.2% achieved by our method on the standardHopkins 155 dataset). Computational efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Subsets of motion models with the best balance between prediction accuracy and model complexity are chosen from a pool of candidates, which are then used for segmentation. 1. Rigid Motion Segmentation Rigid motion segmentation (MS) consists on separating regions, features, or trajectories from a video sequence into spatio-temporally coherent subsets that correspond to independent, rigidly-moving objects in the scene (Figure 1.b or 1.f). The problem currently receives renewed attention, partly because of the extensive amount of video sources and applications that benefit from MS to perform higher level computer vision tasks, but also because the state of the art is reaching functional maturity. Motion Segmentation methods are widely diverse, but most capture only a small subset of constraints or algebraic properties from those that govern the image formation process of moving objects and their corresponding trajectories, such as the rank limit theorem [9, 10], the linear independence constraint (between trajectories from independent motions) [2, 13], the epipolar constraint [7], and the reduced rank property [11, 15, 13]. Model-selection based (a)Orignalvideoframes(b)Clas -label dtrajectories (c)Modelsup ort egion(d)Modelinlersandcontrolpoints (e)Modelresiduals (f) Segmenta ion result Figure 1: Model instantiation and segmentation. a) fth original frame, Italian Grand Prix ?c 2012 Formula 1. b) Classilanbaell feradm, trajectory Gdaratan Wd P r(rixed ?,c green, bolrumeu alnad 1 .bbl a)c Ck correspond to chassis, helmet, background and outlier classes respectively). c) Spatially-local support subset for a candidate motion in blue. d) Candidate motion model inliers in red, control points from Eq. 3) in white. e) Residuals from Eq. 11) color-coded with label data, the radial coordinate is logarithmic. f) Segmentation result. Wˆf (rif (cif methods [11, 6, 8] balance model complexity with modeling accuracy and have been successful at incorporating more of these aspects into a single formulation. For instance, in [8] most model parameters are estimated automatically from the data, including the number of independent motions and their complexity, as well as the segmentation labels (including outliers). However, because of the large number of necessary motion hypotheses that need to be instantiated, as well as the varying and potentially very large number of 222222555977 model parameters that must be estimated, the flexibility offered by this method comes at a large computational cost. Current state of the art methods follow the trend of using sparse low-dimensional subspaces to represent trajectory data. This representation is then fed into a clustering algorithm to obtain a segmentation result. A prime example of this type of method is Sparse Subspace Clustering (SSC) [3] in which each trajectory is represented as a sparse linear combination of a few other basis trajectories. The assumption is that the basis trajectories must belong to the same rigid motion as the reconstructed trajectory (or else, the reconstruction would be impossible). When the assumption is true, the sparse mixing coefficients can be interpreted as the connectivity weights of a graph (or a similarity matrix), which is then (spectral) clustered to obtain a segmentation result. At the time of publication, SSC produced segmentation results three times more accurate than the best predecessor. The practical downside, however, is the inherently large computational cost of finding the optimal sparse representation, which is at least cubic on the number of trajectories. The work of [14] also falls within the class of subspace separation algorithms. Their approach is based on clustering the principal angles (CPA) of the local subspaces associated to each trajectory and its nearest neighbors. The clustering re-weights a traditional metric of subspace affinity between principal angles. Re-weighted affinities are then used for segmentation. The approach produces segmentation results with accuracies similar to those of SSC, but the computational cost is close to 10 times bigger than SSC’s. In this work we argue that competitive segmentation results are possible using a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. The proposed method is approximately 2 or 3 orders of magnitude faster than [3] and [14] respectively, currently considered the state of the art. 1.1. Affine Motion Projective Geometry is often used to model the image motion of trajectories from rigid objects between pairs of frames. However, alternative geometric relationships that facilitate parameter computation have also been proven useful for this purpose. For instance, in perspective projection, general image motion from rigid objects can be modeled via the composition of two elements: a 2D homography, and parallax residual displacements [5]. The homography describes the motion of an arbitrary plane, and the parallax residuals account for relative depths, that are unaccounted for by the planar surface model. Under orthography, in contrast, image motion of rigid objects can be modeled via the composition of a 2D affine transformation plus epipolar residual displacements. The 2D affine transformation models the motion of an arbitrary plane, and the epipolar residuals account for relative depths. Crucially, these two components can be computed separately and incrementally, which enables an explicit mechanism to deal with motion degeneracy. In the context of 3D motion, a motion is degenerate when the trajectories originate from a planar (or linear) object, or when neither the camera nor the imaged object exercise all of their degrees of freedom, such as when the object only translates, or when the camera only rotates. These are common situations in real world video sequences. The incremental nature of the decompositions described above, facilitate the transition between degenerate motions and nondegenerate ones. Planar Model Under orthography, the projection of trajectories from a planar surface can be modeled with the affine transformation: ⎣⎡xy1c ⎦⎤=?0D? 1t?⎡⎣yx1w ⎦⎤= A2wD→c⎣⎡yx1w ⎦⎤, (1) where D ∈ R2×2 is an invertible matrix, and t ∈ R2 is a threarnesl Datio ∈n v Rector. Trajectory icboloer mdiantartixe,s (axnwd , tyw ∈) are in the plane’s reference frame (modulo a 2D affine transformation) and (xc, yc) are image coordinates. Now, let W ∈ R2F×P be matrix of trajectory data that conNtaoiwns, tlehet x a n∈d y image coordinates of P feature points tracked through F frames, as in TocmputehW pa=r⎢m⎣ ⎡⎢ etyx e1F r.,s 1 ofA· . ·2.Dfyx r1Fo m., P tra⎦⎥ ⎤je.ctorydat,(l2e)t C = [c1, c2 , c3] ∈ R2f×3 be three columns (three full trajectories) from W, and let = be the x and y coordinates of the i-th control trajectory at frame f. Then the transformation between points from an arbitrary source frame s to a target frame f can be written as: cif [ci2f−1, ci2f]? ?c1f1 c12f c1f3?= A2sD→f?c11s and A2s→Df c12s c13s?, (3) ?−1. (4) can be simply computed as: A2sD→f= ? c11f c12f c13f ? ? c11s c12s c13s The inverse in the right-hand side matrix of Eq. 4 exists so long as the points cis are not collinear. For simplicity we refer to as and consequently A2sD is the identity matrix. A2s→Df A2fD 222222556088 3D Model In order to upgrade a planar (degenerate) model into a full 3D one, relative depth must be accounted using the epipolar residual displacements. This means extending Eq. 1 with a direction vector, scaled by the corresponding relative depth of each point, as in: ⎣⎡xy1c ⎦⎤=?0?D? t1? ⎡⎣xy1w ⎦⎤+ δzw⎣⎡a 0213 ⎦⎤. The depth δzw is relative to the arbitrary plane tion is modeled by A2D; a point that lies on would have δzw = 0. We call the orthographic the plane plus parallax decomposition, the 2D Epipolar (2DAPE) decomposition. Eq. 5 is equivalent to (5) whose mosuch plane version of Affine Plus wher⎣⎡ ixyt1c is⎤⎦cl=ear⎡⎣tha 120t hea p201a2rma2e10t3erst1o2f⎦⎤A⎣⎢⎡3Dδyxd1zwefin⎦⎥⎤ean(o6r)thographically projected 3D affine transformation. Deter- mining the motion and structure parameters of a 3D model from point correspondences can be done using the classical matrix factorization approach [10], but besides being sensitive to noise and outliers, the common scenario where the solution becomes degenerate makes the approach difficult to use in real-world applications. Appropriately accommodating and dealing with the degenerate cases is one of the key features of our work. 2. Overview of the Method The proposed motion segmentation algorithm has three stages. First, a pool of M motion model hypotheses M = s{tMag1e , . . . , rMst,M a} p oiso generated using a omdeetlh hoydp tohthate csoesm Mbine =s a MRandom Sampling naenrda eCdon usseinngsu as m(ReAthNodS tAhaCt) [o4m] bteinche-s nique with the 2DAPE decomposition. The goal is to generate one motion model for each of the N independent, rigidly-moving objects in the scene; N is assumed to be known a priori. The method instantiates many more models than those expected necessary (M ? N) in an attempt ienlscr tehaasne tthhoes elik eexlpiheocotedd o nfe generating Mco ?rrec Nt m) iond aenl proposals for all N motions. A good model accurately describes a large subset of coherently moving trajectories with the smallest number of parameters (§3). Ialnl ethste n suemcobnedr stage, msubetseertss o§f3 )m.otion models from M are ncom thebi sneecdo ntod explain ualbl sthetes trajectories mino tdheel sequence. The problem is framed as an objective function that must be minimized. The objective function is the negative loglikelihood over prediction accuracy, regularized by model complexity (number of model parameters) and modeling overlap (trajectories explained by multiple models). Notice that after this stage, the segmentation that results from the optimal model combination could be reported as a segmentation result (§5). ioTnhe r tshuilrtd ( stage incorporates the results from a set of model combinations that are closest to the optimal. Segmentation results are aggregated into an affinity matrix, which is then passed to a spectral clustering algorithm to produce the final segmentation result. This refinement stage generally results in improved accuracy and reduced segmentation variability (§6). 3. Motion Model Instantiation Each model M ∈ M is instantiated independently using RacAhN mSAodCel. MThis ∈ c Mhoi cies niss manotitiavteatded in bdeecpaeunsdee otlfy th usismethod’s well-known computational efficiency and robustness to outliers, but also because of its ability to incorporate spatially local constraints and (as explained below) because most of the computations necessary to evaluate a planar model can be reused to estimate the likelihoods of a potentially necessary 3D model, yielding significant computational savings. The input to our model instantiation algorithm is a spatially-local, randomly drawn subset of trajectory data Wˆ[2F×I] ⊆ W[2F×P] (§3.1). In turn, at each RANSAC trial, the algorithm draw(§s3 uniformly d,i asttr eibaucthed R, A rNanSdoAmC subsets of three control trajectories (C[2F×3] ⊂ Wˆ[2F×I]). Each set of control trajectories is used to estim⊂ate the family of 2D affine transformations {A1, . . . , AF} between the iblyase o ffr 2aDm aef ainnde aralln sotfoherrm fartaimoness { iAn the sequence, wtwheicehn are then used to determine a complete set of model parameters M = {B, σ, C, ω}. The matrix B ∈ {0, 1}[F×I] indicates Mwhe =the {rB t,hσe ,iC-th, trajectory asthroixu Bld ∈b e predicted by model M at frame f (inlier, bif = 1) or not (outlier, bif = 0), σ = {σ1 , . . . , σF} are estimates of the magnitude of the σnois =e {foσr each fram}e a, aen eds ω ∈at {s2 oDf, t3hDe} m isa tnhietu edsetim ofa ttehde nmooidseel f type. hTh fera goal aisn dto ω ωfin ∈d {t2heD c,3oDntr}ol is points tainmda ttehed associated parameters that minimize the objective function O(Wˆ,M) =f?∈Fi?∈IbifLω? wˆif| Af,σf?+ Ψ(ω) + Γ(B) across (7) wˆfi a number of RANSAC trials, where = = are the coordinates of the i-th trajectory from the support subset at frame f. The negative log-likelihood term Lω (·) penalizes reconstruction error, while Ψ(·) and Γ(·) are regularizers. Tcohen tthrureceti otenr mer-s are ,d wefhinileed Ψ Ψ b(e·l)ow an. Knowing that 2D and 3D affine models have 6 and 8 degrees of freedom respectively, Ψ(ω) regularizes over model complexity using: (xif, yif) ( wˆ 2if−1, wˆ i2f) Wˆ Ψ(ω) =?86((FF − − 1 1)), i f ωω== 32DD. 222222556199 (8) Γ(B) strongly penalizes models that describe too few trajectories: Γ(B) =?0∞,, oifth?erwI?iseFbif< Fλi (9) The control set C whose M minimizes Eq. 7 across a number of RANSAC trials becomes part of the pool of candidates M. 2D likelihoods. For the planar case (ω = 2D) the negative log-likelihood term is evaluated with: L2D( wˆif| Af,σf) = −log?2π|Σ1|21exp?−21rif?Σ−1rif??, (10) which is a zero-mean 2D Normal distribution evaluated at the residuals The spherical covariance matrix is Σ = rif. rif (σf)2I. The residuals are determined by the differences between the predictions made by a hypothesized model Af, and the observations at each frame ?r?1f?=? w˜1?f?− Af? w˜1?s?. (11) 3D likelihoods. The negative log-likelihood term for the 3D case is based on the the 2DAPE decomposition. The 2D affinities Af and residuals rf are reused, but to account for the effect of relative depth, an epipolar line segment ef is robustly fit to the residual data at each frame (please see supplementary material for details on the segment fitting algorithm). The 2DAPE does not constrain relative depths to remain constant across frames, but only requires trajectories to be close to the epipolar line. So, if the unitary vector ef indicates the orthogonal direction to ef, then the negativ⊥e log-likelihood term for the 3D case is estimated with: L3D( wˆfi| Af,σf) = −2log⎜⎝⎛√21πσfexp⎪⎨⎪⎧−?r2if(?σfe)f⊥2?2⎪⎬⎪⎫⎞⎟⎠, ⎠(12,) which is also a zero-mean 2D Norma⎩l distribution ⎭computed as the product of two identical, separable, singlevariate, normal distributions, evaluated at the distance from the residual to the epipolar line. The first one corresponds to the actual deviation in the direction of ef , which is analyti- cally computed using rif?ef. The seco⊥nd one corresponds to an estimate of the deviat⊥ion in the perpendicular direction (ef), which cannot be determined using the 2DAPE decomposition model, but can be approximated to be equal to rif ? ef, which is a plausible estimate under the isotropic noise as⊥sumption. Note that Eq. 7 does not evaluate the quality of a model using the number of inliers, as it is typical for RANSAC. Instead, we found that better motion models resulted from Algorithm 1: Motion model instantiation × Algorithm 1: Motion model instantiation Input:b Traasejec frtoamrye d bata W[2F×P], number of RANSAC trials K, arbitrary Output: Parameters of the motion model M = {B , σn , ω} // determine the training set c ← rand( 1, P) ; r ← rand(rmin , rmax ) // random center and radius I] ← t ra j e ct oriesWithinDis k (W, r,c) // support subset X ← homoCoords(Wˆb) // points at base frame for K RANSAC trials do Wˆ[2F Wˆ return M = {B} optimizing over the accuracy ofthe model predictions for an (estimated) inlier subset, which also means that the effect of outliers is explicitly uncounted. Figure 1.b shows an example of class-labeled trajectory data, 1.c shows a typical spatially-local support subset. Figures 1.d and 1.e show a model’s control points and its corresponding (class-labeled) residuals, respectively. A pseudocode description of the motion instantiation algorithm is provided in Algorithm 1. Details on how to determine Wˆ, as well as B, σ, and ω follow. 3.1. Local Coherence The subset of trajectories Wˆ given to RANSAC to generate a model M is constrained to a spatially local region. The probability ofchoosing an uncontaminated set of 3 control trajectories, necessary to compute a 2D affine model, from a dataset with a ratio r of inliers, after k trials is: p = 1 − (1 − r3)k. This means that the number of trials pne =ede 1d −to (fi1n d− a subset of 3 inliers with probability p is k =lloogg((11 − − r p3)). (13) A common assumption is that trajectories from the same underlying motion are locally coherent. Hence, a compact region is likely to increase r, exponentially reducing 222222666200 Figure 2: Predictions (red) from a 2D affine model with standard Gaussian noise (green) on one of the control points (black). Noiseless model predictions in blue. All four scenarios have identical noise. The magnitude of the extrapolation error changes with the distance between the control points. k, and with it, RANSAC’s computation time by a proportional amount. The trade-off that results from drawing model control points from a small region, however, is extrapolation error. A motion model is extrapolated when utilized to make predictions for trajectories outside the region defined by the control points. The magnitude of modeling error depends on the magnitude of the noise affecting the control points, and although hard to characterize in general, extrapolation error can be expected to grow with the distance from the prediction to the control points, and inversely with the distance between the control points themselves. Figure 2 shows a series of synthetic scenarios where one of the control points is affected by zero mean Gaussian noise of small magnitude. Identical noise is added to the same trajectory in all four scenarios. The figure illustrates the relation between the distance between the control points and the magnitude of the extrapolation errors. Our goal is to maximize the region size while limiting the number of outliers. Without any prior knowledge regarding the scale of the objects in the scene, determining a fixed size for the support region is unlikely to work in general. Instead, the issue is avoided by randomly sampling disk-shaped regions of varying sizes and locations to construct a diverse set of support subsets. Each support subset is then determined by Wˆ = {wi | (xbi − ox)2 + (ybi − oy)2 < r2}, (14) where (ox , oy) are the coordinates of the center of a disk of radius r. To promote uniform image coverage, the disk is centered at a randomly chosen trajectory (ox , oy) = (xbi, yib) with uniformly distributed i ∼ U(1, P) and base frame b) w∼i h U u(1n,i fFor)m. yTo d asltrloibwu efodr idi ∼ffer Ue(n1t, region ds bizaesse, tfhraem read bius ∼ r is( ,cFho)s.en T ofro amllo a u fnoirfo dirmffe rdeinsttr riebugtiioonn r ∼s, tUh(erm raidni,u ursm rax i)s. Ihfo tsheenre f are mI a trajectories swtritihbiunt othne support region, then ∈ R2F×I. It is worth noting that the construction of theW support region does not incorporate any knowledge about the motion of objects in the scene, and in consequence will likely contain trajectories that originate from more than one independently moving object (Figure 3). Wˆ Wˆ Figure 3: Two randomly drawn local support sets. Left: A mixed set with some trajectories from the blue and green classes. Right: Another mixed set with all of the trajectories in the red class and some from the blue class. 4. Characterizing the Residual Distribution At each RANSAC iteration, residuals rf are computed using the 2D affine model Af that results from the constraints provided by the control trajectories C. Characterizing the distribution of rf has three initial goals. The first one is to determine 2D model inliers b2fD (§4.1), the second one is to compute estimates of the magnitude ,o tfh thee s ncooinsed at every frame σ2fD (§4.2), and the third one is to determine whether the residual( §d4i.s2t)r,ib auntidon th originates efr iosm to a planar or a 3D object (§4.3). If the object is suspected 3D, then two more goals n (§e4ed.3 )to. bIfe t achieved. sT shues pfiercstt one Dis, t hoe nde ttweromine 3D model inliers b3fD (§4.4), and the second one is to estimate the magnitude of the noise of a 3D model (§4.5). (σf3D) to reflect the use 4.1. 2D Inlier Detection Suppose the matrix Wˆ contains trajectories Wˆ1 ∈ R2F×I and Wˆ2 ∈ R2F×J from two independently moving objects, and ∈tha Rt these trajectories are contaminated with zero-mean Gaussian noise of spherical covariance η ∼ N(0, (σf)2I): Wˆ = ?Wˆ1|Wˆ2? + η. (15) A1f Now, assume we know the true affine transformations and that describe the motion of trajectories for the subsets Wˆ1 and Wˆ2, respectively. If is used to compute predictions for all of Wˆ (at frame f), the expected value (denoted by ?·?) of the magnitude of the residuals (rf from Eq. 11) for trajectories aing nWiˆtud1 will be in the order of the magnitude of the underlying noise ?|rif |? = σf for each i∈ {1, . . . , I}. But in this scenario, trajectories in Wˆ2 ewaicl h b ie ∈ predicted using tth ien wrong emnaodrioel,, resulting isn i nr esid?uals? wit?h magnitudes de?termined by the motion differential A2f ???rif??? A1f ???(A1f − A2f) wˆib???. If we = can assume that the motion ?d??riff???er =en???t(iAal is bigger tha???n. tIhfe w deis cpalnac aesmsuemnte d thuea t toh eno miseo:t ???(A1f − A2f)wib??? 222222666311 > σf, (16) then the model inliers can be determined by thresholding | with the magnitude of the noise, scaled by a constant |(τr =| w wλitσhσtf h):e |rif bif=?10,, |orthife|r ≤wi τse. (17) But because σf is generally unknown, the threshold (τ) is estimated from the residual data. To do so, let be the vector of residual magnitudes where rˆi ≤ ˆ ri+1 . Now, let r˜ = median ( rˆi+1 −ˆ r i). The threshold i≤s trˆ h en defined as r τ = min{ rˆi | (ri+1 − ri) > λr˜ r}, (18) which corresponds to the smallest residual magnitude before a salient magnitude gap. Our experiments showed this test to be efficient and effective. Figure 1.e shows classlabeled residuals. Notice the presence of a (low density) gap between the residuals from the trajectories explained by the correct model (in red, close to the origin), and the rest. 4.2. Magnitude of the Noise, 2D Model r2fD Let contain only the residuals of the inlier trajectories (those where = 1), and let USV? be the singular value decomposition of the covariance matrix bif ofˆ r2fD: USV?= svd??1bpf( rˆ2fD)? rˆ2fD?.(19) Then the magnitude of the n?oise corresponds to the largest singular value σ2 = s1, because if the underlying geometry is in fact planar, then the only unaccounted displacements captured by the residuals are due to noise. Model capacity can also be determined from S, as explained next. 4.3. Model Capacity The ratio of largest over smallest singular values (s1/s2) determines when upgrading to a 3D model is beneficial. When the underlying geometry is actually non-planar, the residuals from a planar model should distribute along a line (the epipolar line), reflecting that their relative depth is being unaccounted for. This produces a covariance matrix with a large ratio s1/s2 ? 1. If on the other hand, if s1/s2 ≈ 1, then there is no in 1d.ic Iafti oonn tohfe unexplained relative depth, tihn wnh thicehr case, fitting a olinne o tfo u spherically distributed residuals will only increase the model complexity without explaining the residual variance much better. A small spherical residual covariance strongly suggests a planar underlying geometry. 4.4. 3D Inlier Detection When the residual distribution is elongated (s1/s2 ? 1), a line segment is robustly fit to the (potentially con?tam 1i)-, nated) set of residuals. The segment must go through the origin and its parameters are computed using a Hough transform. Further details about this algorithm can be found in the supplementary material. Inlier detection The resulting line segment is used to determine 3D model inliers. Trajectory ibecomes an inlier at frame f if it satisfies two conditions. First, the projection of rif onto the line must lie within the segment limits (β ≤ ≤ γ). Second, the normalized distance to the rif?ef (ef?rif line must be below a threshold ≤ σ2λd). Notice that the threshold depends on the smalle≤st singular value from Eq. 19 to (roughly) account for the presence of noise in the direction perpendicular to the epipolar (ef). 4.5. Magnitude of the Noise, 3D Model let rˆf3D Similarly to the 2D case, contain the residual data from the corresponding 3D inlier trajectories. An estimate for the magnitude of the noise that reflects the use of a 3D model can be obtained from the singular value decomposition of the covariance matrix of r3fD (as in Eq. 19). In this case, the largest singular value s1 captures the spread of residuals along the epipolar line, so its magnitude is mainly related to the magnitude of the displacements due to relative depth. However, s2 captures deviations from the epipolar line, which in a rigid 3D object can only be attributed to noise, making σ2 = s2 a reasonable estimate for its magnitude. Optimal model parameters When both 2D and 3D models are instantiated, the one with the smallest penalized negative log-likelihood (7) becomes the winning model for the current RANSAC run. The same penalized negative loglikelihood metric is used to determine the better model from across all RANSAC iterations. The winning model is added to the pool M, and the process is repeated M times, forming hthee p pool MM, a=n d{ tMhe1 , . . . , MssM is} r.e 5. Optimal Model Subset The next step is to find the model combination M? ⊂ M thhea tn mxta xstiempiz iess t prediction accuracy finora othne Mwhol⊂e trajectory mdaaxtaim Wize,s w phreiledi minimizing cmyod foerl complexity and modelling overlap. For this purpose, let Mj = {Mj,1 , . . . , Mj,N} be the j-th m thoisdel p ucorpmosbein,at lieotn, M Mand let {Mj} be the set o}f baell MheC jN-th = m N!(MM−!N)!) caotimonb,in aantdio lnest of N-sized models than can be draNw!(nM fr−oNm) M). The model soefle Nct-sioinze problem sis t hthanen c faonr bmeu dlartaewdn as M?= ar{gMmj}inOS(Mj), (20) 222222666422 where the objective is ?N ?P OS(Mj) = ??πp,nE (wp,Mj,n) ?n=1p?P=1 + λΦ?Φ(wp,Mj,n) + λΨ?Ψ(Mj,n). ?N (21) i?= ?1 n?= ?1 The first term accounts for prediction accuracy, the other two are regularization terms. Details follow. Prediction Accuracy In order to determine how well a model M predicts an arbitrary trajectory w, the affine transformations estimated by RANSAC could be re-used. However, the inherent noise in the control points, and the potentially short distance between them, often render this approach impractical, particularly when w is spatially distant from the control points (see §3. 1). Instead, model parametferorsm are computed owinittsh a efeac §to3r.1iz)a.ti Ionnst e baadse,d m [o1d0e]l mpaertahmode.Given the inlier labeling B in M, let WB be the subset of trajectories where bif = 1for at least half of the frames. The orthonormal basis S of a ω = 2D (or 3D) motion model can be determined by the 2 (or 3) left singular vectors of WB. Using S as the model’s motion matrices, prediction accuracy can be computed using: E(w, M) = ??SS?w − w??2 , (22) which is the sum of squared?? Euclidean d??eviations from the predictions (SS?w), to the observed data (w). Our experiments indicated that, although sensitive to outliers, these model predictions are much more robust to noise. Ownership variables Π ∈ {0, 1}[P×N] indicate whether trajectory p i ps explained by t {he0 ,n1-}th model (πp,n = 1) or not (πp,n = 0), and are determined by maximum prediction accuracy (i.e. minimum Euclidean deviation): πp,n=⎨⎧01,, oift hMerjw,nis=e. aMrg∈mMinjE(wp,M) (23) Regularization terms The second term from Eq. 21 penalizes situations where multiple models explain a trajectory (w) with relatively small residuals. For brevity, let M) = exp{−E(w, M)}, then: Eˆ(w, Φ(w,Mj) = −logMMm∈?∈aMMxjE ˆ ( w , MM)). (24) The third term regularizes over the number of model parameters, and is evaluated using Eq. 8. The constants λΦ and λΨ modulate the effect of the corresponding regularizer. Table 1: Accuracy and run-time for the H155 dataset. Naive RANSAC included as a baseline with overall accuracy and total computation time estimated using data from [12]. SOCARAPSulgrCAoNs[S r[31itA]4h]CmAverage89 A689 c. 71c 7695u racy[%]Compu1 t4a 237t1i506o70 n0 time[s] 6. Refinement The optimal model subset M? yields ownership variableTsh Πe o? wtimhicahl can already tb e M interpreted as a segmentation result. However, we found that segmentation accuracy can be improved by incorporating the labellings Πt from the top T subsets {Mt? | 1 ≤ t ≤ T} closest to optimal. Multiple labellings are incorporated oinsetos an affinity matrix F, where the fi,j entry indicates the frequency with which trajectory i is given the same label as trajectory j across all T labelli?ngs, weighted b?y the relative objective function O˜t = exp ?−OOSS((WW||MMt??))? for such a labelling: fi,j=?tT=11O˜tt?T=1?πit,:πjt,?:?O˜t (25) Note that the inne?r product between the label vectors (πi,:πj?,:) is equal to one only when the labels are the same. A spectral clustering method is applied on F to produce the method’s final segmentation result. 7. Experiments Evaluation was made through three experimental setups. Hopkins 155 The Hopkins 155 (H155) dataset has been the standard evaluation metric for the problem of motion segmentation of trajectory data since 2007. It consists of checkerboard, traffic and articulated sequences with either 2 or 3 motions. Data was automatically tracked, but tracking errors were manually corrected; further details are available in [12]. The use of a standard dataset enables direct comparison of accuracy and run-time performance. Table 1 shows the relevant figures for the two most competitive algorithms that we are aware of. The data indicates that our algorithm has run-times that are close to 2 or 3 orders of magnitude faster than the state of the art methods, with minimal accuracy loss. Computation times are measured in the same (or very similar) hardware architectures. Like in CPA, our implementation uses a single set of parameters for all the experiments, but as others had pointed out [14], it remains unclear whether the same is true for the results reported in the original SSC paper. 222222666533 Figure 4: Accuracy error-bars across artificial H155 datasets with controlled levels of Gaussian noise. Artificial Noise The second experimental setup complements an unexplored dimension in the H155 dataset: noise. The goal is to determine the effects of noise of different magnitudes towards the segmentation accuracy of our method, in comparison with the state of the art. We noted that H155 contains structured long-tailed noise, but for the purpose of this experiment we required a noise-free dataset as a baseline. To generate such a dataset, ground-truth labels were used to compute a rank 3 reconstruction of (mean-subtracted) trajectories for each segment. Then, multiple versions of H155 were computed by contaminating the noise-free dataset with Gaussian noise of magnitudes σn ∈ {0.01, 0.25, 0.5, 1, 2, 4, 8}. Our method, as well as SSC a∈nd { 0C.0PA1, were run on t2h,e4s,e8 }n.oi Ose-ucro mnterothlloedd, datasets; results are shown in Figure 4. The error bars on SSC and Ours indicate one standard deviation, computed over 20 runs. The plot for CPA is generated with only one run for each dataset (running time: 11.95 days). The graph indicates that our method only compromises accuracy for large levels of noise, while still being around 2 or 3 orders of magnitude faster than the most competitive algorithms. KLT Tracking The last experimental setup evaluates the applicability of the algorithm in real world conditions using raw tracks from an off-the-shelf implementation [1] of the Kanade-Lucas-Tomasi algorithm. Several sequences were tracked and the resulting trajectories classified by our method. Figure 5 shows qualitatively good motion segmentation results for four sequences. Challenges include very small relative motions, tracking noise, and a large presence of outliers. 8. Conclusions We introduced a computationally efficient motion segmentation algorithm for trajectory data. Efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Run-time comparisons indicate that our method is 2 or 3 orders of magnitude faster than the state of the art, with only a small loss in accuracy. The robustness of our method to Gaussian noise tracks from four Formula 1 sequences. Italian Grand Prix ?c2012 Formula 1. In this figure, all trajectories are given a m?co2ti0o1n2 l Faoberml, including hoiust fliigeurrse. of different magnitudes was found competitive with state of the art, while retaining the inherent computational efficiency. The method was also found to be useful for motion segmentation of real-world, raw trajectory data. References [1] ht tp : / /www . ce s . c l emn s on . edu / ˜stb / k lt . 8 [2] J. P. Costeira and T. Kanade. A Multibody Factorization Method for Independently Moving Objects. IJCV, 1998. 1 [3] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. CVPR, 2009. 2, 7 [4] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981. 3 [5] M. Irani and P. Anandan. Parallax geometry of pairs of points for 3d scene analysis. Proc. ECCV, 1996. 2 [6] K. Kanatani. Motion segmentation by subspace separation: Model selection and reliability evaluation. International Journal Image Graphics, 2002. 1 [7] H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, 1987. 1 [8] K. Schindler, D. Suter, , and H. Wang. A model-selection framework for multibody structure-and-motion of image sequences. Proc. IJCV, 79(2): 159–177, 2008. 1 [9] C. Tomasi and T. Kanade. Shape and motion without depth. Proc. [10] [11] [12] [13] [14] [15] ICCV, 1990. 1 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 1, 3, 7 P. Torr. Geometric motion segmentation and model selection. Phil. Tran. of the Royal Soc. of Lon. Series A: Mathematical, Physical and Engineering Sciences, 1998. 1 R. Tron and R. Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. In Proc. CVPR, 2007. 7 J. Yan and M. Pollefeys. A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. PAMI, 2008. 1 L. Zappella, E. Provenzi, X. Llad o´, and J. Salvi. Adaptive motion segmentation algorithm based on the principal angles configuration. Proc. ACCV, 2011. 2, 7 L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. Proc. CVPR, 2003. 1 222222666644
4 0.16225018 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
Author: Qiang Hao, Rui Cai, Zhiwei Li, Lei Zhang, Yanwei Pang, Feng Wu, Yong Rui
Abstract: 3D model-based object recognition has been a noticeable research trend in recent years. Common methods find 2D-to-3D correspondences and make recognition decisions by pose estimation, whose efficiency usually suffers from noisy correspondences caused by the increasing number of target objects. To overcome this scalability bottleneck, we propose an efficient 2D-to-3D correspondence filtering approach, which combines a light-weight neighborhoodbased step with a finer-grained pairwise step to remove spurious correspondences based on 2D/3D geometric cues. On a dataset of 300 3D objects, our solution achieves ∼10 times speed improvement over the baseline, with a comparable recognition accuracy. A parallel implementation on a quad-core CPU can run at ∼3fps for 1280× 720 images.
5 0.15915136 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
Author: Byung-Woo Hong, Zhaojin Lu, Ganesh Sundaramoorthi
Abstract: In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-toimplement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multilabel Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and thefunctions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the stateof-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.
6 0.11229822 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
7 0.11123781 139 cvpr-2013-Efficient 3D Endfiring TRUS Prostate Segmentation with Globally Optimized Rotational Symmetry
8 0.10931887 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
9 0.10840023 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
10 0.10340255 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
11 0.10202605 264 cvpr-2013-Learning to Detect Partially Overlapping Instances
12 0.10046759 283 cvpr-2013-Megastereo: Constructing High-Resolution Stereo Panoramas
13 0.096883431 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
14 0.093196429 31 cvpr-2013-Accurate and Robust Registration of Nonrigid Surface Using Hierarchical Statistical Shape Model
15 0.090951815 342 cvpr-2013-Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso
16 0.089548372 97 cvpr-2013-Correspondence-Less Non-rigid Registration of Triangular Surface Meshes
17 0.089500397 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies
18 0.088506885 194 cvpr-2013-Groupwise Registration via Graph Shrinkage on the Image Manifold
19 0.087586775 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
20 0.082748622 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
topicId topicWeight
[(0, 0.176), (1, 0.08), (2, -0.015), (3, 0.04), (4, 0.058), (5, -0.034), (6, -0.009), (7, -0.096), (8, -0.042), (9, -0.018), (10, 0.063), (11, 0.113), (12, -0.056), (13, -0.089), (14, -0.035), (15, -0.107), (16, -0.003), (17, 0.08), (18, 0.094), (19, 0.012), (20, -0.031), (21, -0.043), (22, -0.02), (23, 0.01), (24, 0.152), (25, -0.048), (26, -0.187), (27, -0.009), (28, 0.134), (29, 0.026), (30, 0.033), (31, 0.079), (32, -0.084), (33, 0.044), (34, 0.057), (35, -0.099), (36, -0.034), (37, 0.114), (38, -0.014), (39, 0.053), (40, -0.094), (41, -0.077), (42, 0.026), (43, -0.082), (44, -0.003), (45, 0.08), (46, 0.042), (47, -0.029), (48, 0.101), (49, -0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.95271516 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
Author: Erik Ask, Olof Enqvist, Fredrik Kahl
Abstract: This paper is concerned with model fitting in the presence of noise and outliers. Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. This paper improves on these results in two ways. First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models.
2 0.76455539 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
Author: Jiayi Ma, Ji Zhao, Jinwen Tian, Zhuowen Tu, Alan L. Yuille
Abstract: We present a new point matching algorithm for robust nonrigid registration. The method iteratively recovers the point correspondence and estimates the transformation between two point sets. In the first step of the iteration, feature descriptors such as shape context are used to establish rough correspondence. In the second step, we estimate the transformation using a robust estimator called L2E. This is the main novelty of our approach and it enables us to deal with the noise and outliers which arise in the correspondence step. The transformation is specified in a functional space, more specifically a reproducing kernel Hilbert space. We apply our method to nonrigid sparse image feature correspondence on 2D images and 3D surfaces. Our results quantitatively show that our approach outperforms state-ofthe-art methods, particularly when there are a large number of outliers. Moreover, our method of robustly estimating transformations from correspondences is general and has many other applications.
3 0.69745976 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems
Author: Byung-Woo Hong, Zhaojin Lu, Ganesh Sundaramoorthi
Abstract: In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-toimplement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multilabel Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and thefunctions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the stateof-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.
4 0.66063744 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
Author: Yinqiang Zheng, Shigeki Sugimoto, Masatoshi Okutomi
Abstract: Due to its simplicity, the eight-point algorithm has been widely used in fundamental matrix estimation. Unfortunately, the rank-2 constraint of a fundamental matrix is enforced via a posterior rank correction step, thus leading to non-optimal solutions to the original problem. To address this drawback, existing algorithms need to solve either a very high order polynomial or a sequence of convex relaxation problems, both of which are computationally ineffective and numerically unstable. In this work, we present a new rank-2 constrained eight-point algorithm, which directly incorporates the rank-2 constraint in the minimization process. To avoid singularities, we propose to solve seven subproblems and retrieve their globally optimal solutions by using tailored polynomial system solvers. Our proposed method is noniterative, computationally efficient and numerically stable. Experiment results have verified its superiority over existing algebraic error based algorithms in terms of accuracy, as well as its advantages when used to initialize geometric error based algorithms.
5 0.65554023 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
Author: Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, David Suter
Abstract: We investigate projective estimation under model inadequacies, i.e., when the underpinning assumptions oftheprojective model are not fully satisfied by the data. We focus on the task of image stitching which is customarily solved by estimating a projective warp — a model that is justified when the scene is planar or when the views differ purely by rotation. Such conditions are easily violated in practice, and this yields stitching results with ghosting artefacts that necessitate the usage of deghosting algorithms. To this end we propose as-projective-as-possible warps, i.e., warps that aim to be globally projective, yet allow local non-projective deviations to account for violations to the assumed imaging conditions. Based on a novel estimation technique called Moving Direct Linear Transformation (Moving DLT), our method seamlessly bridges image regions that are inconsistent with the projective model. The result is highly accurate image stitching, with significantly reduced ghosting effects, thus lowering the dependency on post hoc deghosting.
6 0.60147595 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
7 0.59882909 162 cvpr-2013-FasT-Match: Fast Affine Template Matching
8 0.5826205 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
9 0.57556999 31 cvpr-2013-Accurate and Robust Registration of Nonrigid Surface Using Hierarchical Statistical Shape Model
10 0.54366374 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
11 0.54348093 429 cvpr-2013-The Generalized Laplacian Distance and Its Applications for Visual Matching
12 0.52945 51 cvpr-2013-Auxiliary Cuts for General Classes of Higher Order Functionals
13 0.52336109 139 cvpr-2013-Efficient 3D Endfiring TRUS Prostate Segmentation with Globally Optimized Rotational Symmetry
14 0.5159865 194 cvpr-2013-Groupwise Registration via Graph Shrinkage on the Image Manifold
15 0.51299524 448 cvpr-2013-Universality of the Local Marginal Polytope
16 0.50992697 128 cvpr-2013-Discrete MRF Inference of Marginal Densities for Non-uniformly Discretized Variable Space
17 0.5095098 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
18 0.50643647 342 cvpr-2013-Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso
19 0.49605888 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
20 0.48510748 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
topicId topicWeight
[(10, 0.126), (16, 0.065), (26, 0.093), (33, 0.226), (65, 0.015), (67, 0.042), (69, 0.044), (76, 0.234), (87, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.82754725 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
Author: Erik Ask, Olof Enqvist, Fredrik Kahl
Abstract: This paper is concerned with model fitting in the presence of noise and outliers. Previously it has been shown that the number of outliers can be minimized with polynomial complexity in the number of measurements. This paper improves on these results in two ways. First, it is shown that for a large class of problems, the statistically more desirable truncated L2-norm can be optimized with the same complexity. Then, with the same methodology, it is shown how to transform multi-model fitting into a purely combinatorial problem—with worst-case complexity that is polynomial in the number of measurements, though exponential in the number of models. We apply our framework to a series of hard registration and stitching problems demonstrating that the approach is not only of theoretical interest. It gives a practical method for simultaneously dealing with measurement noise and large amounts of outliers for fitting problems with lowdimensional models.
2 0.80092943 426 cvpr-2013-Tensor-Based Human Body Modeling
Author: Yinpeng Chen, Zicheng Liu, Zhengyou Zhang
Abstract: In this paper, we present a novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique. 3D human body modeling is important for 3D reconstruction and animation of realistic human body, which can be widely used in Tele-presence and video game applications. It is challenging due to a wide range of shape variations over different people and poses. The existing SCAPE model [4] is popular in computer vision for modeling 3D human body. However, it considers shape and pose deformations separately, which is not accurate since pose deformation is persondependent. Our tensor-based model addresses this issue by jointly modeling shape and pose deformations. Experimental results demonstrate that our tensor-based model outperforms the SCAPE model quite significantly. We also apply our model to capture human body using Microsoft Kinect sensors with excellent results.
3 0.79603195 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion
Author: Minsik Lee, Jungchan Cho, Chong-Ho Choi, Songhwai Oh
Abstract: Non-rigid structure from motion is a fundamental problem in computer vision, which is yet to be solved satisfactorily. The main difficulty of the problem lies in choosing the right constraints for the solution. In this paper, we propose new constraints that are more effective for non-rigid shape recovery. Unlike the other proposals which have mainly focused on restricting the deformation space using rank constraints, our proposal constrains the motion parameters so that the 3D shapes are most closely aligned to each other, which makes the rank constraints unnecessary. Based on these constraints, we define a new class ofprobability distribution called the Procrustean normal distribution and propose a new NRSfM algorithm, EM-PND. The experimental results show that the proposed method outperforms the existing methods, and it works well even if there is no temporal dependence between the observed samples.
4 0.79089922 434 cvpr-2013-Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior
Author: Gangqiang Zhao, Junsong Yuan, Gang Hua
Abstract: A topical video object refers to an object that is frequently highlighted in a video. It could be, e.g., the product logo and the leading actor/actress in a TV commercial. We propose a topic model that incorporates a word co-occurrence prior for efficient discovery of topical video objects from a set of key frames. Previous work using topic models, such as Latent Dirichelet Allocation (LDA), for video object discovery often takes a bag-of-visual-words representation, which ignored important co-occurrence information among the local features. We show that such data driven co-occurrence information from bottom-up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top down probabilistic topic modeling with bottom up priors in a unified model. Our experiments on challenging videos demonstrate that the proposed approach can discover different types of topical objects despite variations in scale, view-point, color and lighting changes, or even partial occlusions. The efficacy of the co-occurrence prior is clearly demonstrated when comparing with topic models without such priors.
5 0.77584445 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
Author: Hua Wang, Feiping Nie, Heng Huang, Chris Ding
Abstract: To better understand, search, and classify image and video information, many visual feature descriptors have been proposed to describe elementary visual characteristics, such as the shape, the color, the texture, etc. How to integrate these heterogeneous visual features and identify the important ones from them for specific vision tasks has become an increasingly critical problem. In this paper, We propose a novel Sparse Multimodal Learning (SMML) approach to integrate such heterogeneous features by using the joint structured sparsity regularizations to learn the feature importance of for the vision tasks from both group-wise and individual point of views. A new optimization algorithm is also introduced to solve the non-smooth objective with rigorously proved global convergence. We applied our SMML method to five broadly used object categorization and scene understanding image data sets for both singlelabel and multi-label image classification tasks. For each data set we integrate six different types of popularly used image features. Compared to existing scene and object cat- egorization methods using either single modality or multimodalities of features, our approach always achieves better performances measured.
6 0.76931727 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
7 0.75525689 311 cvpr-2013-Occlusion Patterns for Object Class Detection
8 0.7531783 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
9 0.75190973 440 cvpr-2013-Tracking People and Their Objects
10 0.75013548 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
11 0.74830574 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
12 0.74827892 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
13 0.74806941 152 cvpr-2013-Exemplar-Based Face Parsing
14 0.74740118 286 cvpr-2013-Mirror Surface Reconstruction from a Single Image
15 0.74667251 413 cvpr-2013-Story-Driven Summarization for Egocentric Video
16 0.74664205 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
17 0.74652612 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
18 0.74612284 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
19 0.74603611 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
20 0.74576336 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation