cvpr cvpr2013 cvpr2013-46 knowledge-graph by maker-knowledge-mining

46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures

Source: pdf

Author: Bastien Jacquet, Roland Angst, Marc Pollefeys

Abstract: Articulated objects represent an important class ofobjects in our everyday environment. Automatic detection of the type of articulated or otherwise restricted motion and extraction of the corresponding motion parameters are therefore of high value, e.g. in order to augment an otherwise static 3D reconstruction with dynamic semantics, such as rotation axes and allowable translation directions for certain rigid parts or objects. Hence, in this paper, a novel theory to analyse relative transformations between two motion-restricted parts will be presented. The analysis is based on linear subspaces spanned by relative transformations. Moreover, a signature for relative transformations will be introduced which uniquely specifies the type of restricted motion encoded in these relative transformations. This theoretic framework enables the derivation of novel algebraic constraints, such as low-rank constraints for subsequent rotations around two fixed axes for example. Lastly, given the type of restricted motion as predicted by the signature, the paper shows how to extract all the motion parameters with matrix manipulations from linear algebra. Our theory is verified on several real data sets, such as a rotating blackboard or a wheel rolling on the floor amongst others.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Automatic detection of the type of articulated or otherwise restricted motion and extraction of the corresponding motion parameters are therefore of high value, e. [sent-2, score-1.008]

2 in order to augment an otherwise static 3D reconstruction with dynamic semantics, such as rotation axes and allowable translation directions for certain rigid parts or objects. [sent-4, score-0.709]

3 Moreover, a signature for relative transformations will be introduced which uniquely specifies the type of restricted motion encoded in these relative transformations. [sent-7, score-1.138]

4 This theoretic framework enables the derivation of novel algebraic constraints, such as low-rank constraints for subsequent rotations around two fixed axes for example. [sent-8, score-0.5]

5 Lastly, given the type of restricted motion as predicted by the signature, the paper shows how to extract all the motion parameters with matrix manipulations from linear algebra. [sent-9, score-0.758]

6 However, the observation of articulated or otherwise restricted motions between objects can provide valuable information about the dynamic relationship between these objects and parts and ultimately also about semantic classes of objects. [sent-16, score-0.812]

7 therefore focus on such motions and the primary goal is to automatically detect the type of articulated or restricted motion class between two parts or objects and extract all relevant parameters of these restricted motions. [sent-19, score-1.267]

8 1 for an example application where the motion parameters of a wheel rolling on the street have been extracted automatically. [sent-21, score-0.419]

9 In order to achieve this, we propose to analyse the relative rigid transformations between two parts. [sent-22, score-0.454]

10 The paper will explain that the relative transformations can be arranged in a single motion matrix which encodes all the information for determining which type of restricted motion has been observed and for computing all the relevant parameters of that motion. [sent-25, score-1.046]

11 111555000644 Contribution: The main contribution of the present paper is the introduction of a so-called signature for relative transformations between two parts. [sent-26, score-0.539]

12 This signature is a function of the motion matrix which is entirely determined by considering observed transformations as data samples from a linear subspace. [sent-27, score-0.817]

13 The signature uniquely describes the type of restricted motion, such as a planar motion where an object translates on a plane and rotates around rotation axes which are orthogonal to this plane. [sent-28, score-1.35]

14 Different signatures basically represent an extensive catalogue of restricted motions in the sense that they enumerate various types of motions together with algebraic constraints which have to be met by a certain type of restriction. [sent-29, score-0.821]

15 Besides motivating and deriving properties of this signature, we will also show how all the relevant parameters of a restricted motion can be extracted by simply solving linear systems of equations. [sent-30, score-0.465]

16 In addition to subsuming well-known cases of articulated motions (such as rotations around a fixed point), our derivations will lead to an unified framework which also covers novel types of restricted motions, e. [sent-31, score-1.008]

17 articulated motions around two non-intersecting fixed rotation axes or around a translating rotation axis with fixed orientation can be treated in the very same way. [sent-33, score-1.471]

18 As we will see in the experiments, these novel types of restricted motions are practically highly relevant. [sent-34, score-0.437]

19 Being a unique feature of our analysis, our method can in particular detect a sequence of two subsequent rotations and untangle these so that the motion of a potential intermediate part can be hypothesized accurately. [sent-38, score-0.366]

20 Related Work We will mainly discuss related work about articulated motions and subspace representations for SfM, since our method relies on those techniques. [sent-40, score-0.666]

21 The analysis of articulated motions has been a topic of active research since many years. [sent-42, score-0.515]

22 [14] measured relative transformations between articulated parts with a magnetic motion capture system. [sent-45, score-0.847]

23 Assuming two parts rotate around a common fixed joint, this joint represents a fixed point under the relative transformations and can be computed with linear methods. [sent-46, score-0.524]

24 Framing the recovery of an articulated motion also as a non-linear and non-convex optimization problem, Ross et al. [sent-53, score-0.542]

25 However, more complex restricted motions such as combinations of rotational and translational joints are not addressed. [sent-57, score-0.546]

26 In those approaches, each articulated part has an associated 4D subspace which is given by the span of the trajectories of tracked feature points [18] on that articulated part. [sent-60, score-0.92]

27 As shown in [19, 21], the underlying reason for intersecting trajectory subspaces is again the fixed point assumption due to a joint at a fixed location relative to the two articulated parts. [sent-61, score-0.62]

28 Note that the very same subspace intersection constraints can also be used for motion segmentation purposes [20]. [sent-62, score-0.373]

29 While this already leads to powerful constraints for simple articulated motions around a single fixed joint, the following sections will present an algebraically motivated formulation for the analysis of relative transformations between parts. [sent-63, score-0.91]

30 The vectorized representation of relative transformations we will introduce is also related to recent work about rigid factorization-based SfM [2, 3]. [sent-65, score-0.424]

31 Specifically, the rotation matrix around an axis a by an angle α can be represented in the following way × −cosα)aaT × Ra,α = cosαI3 + (1 + sinα [a]× , (1) where [a] =de cnoostαesI t+he( 1cr−oscso-spαro)dauact m+asitnrixα, ia. [sent-93, score-0.449]

32 This highly non-linear special Euclidean group can be embedded in a higher-dimensional linear space and we are going to see that this renders the analysis of restricted motions particularly simple. [sent-98, score-0.437]

33 (vec(Rf)T ,tTf, 1) and F general rigid motions span a 12D( vaefcfin(eR su)bspace, 1em)bedded in RF which is spanned by the c(ovleumc(nRs of) the m,a1tr)ix [⇓f (vec(Rf)T,tfT,1)]. [sent-101, score-0.554]

34 the recovery of these rigid transformations based on factorization of feature trajectory data. [sent-107, score-0.414]

35 In contrast, the present work analyses restriTcted motions and afss ∈u [mFe]s that the relative transformations Tf at each frame f ∈ [F] between two parts are given as inTput. [sent-108, score-0.554]

36 The goal is tfhe ∈n [ tFo ]extract all the aspects of the restricteTd motion by analfyz ∈in [gF ]the subspace structure of the matrix in Eq. [sent-109, score-0.416]

37 These aspects include the determination of the type of articulated or restricted motion and all its parameters, e. [sent-111, score-0.786]

38 the orientation and location of rotation axes amongst others. [sent-113, score-0.4]

39 Motion Signatures The important observation is that restricted motions generally do not entirely span the aforementioned 12D space. [sent-117, score-0.662]

40 Indeed, in the following we are going to show that each type of restricted motion will yield a specific low-dimensional subspace structure which allows to distinguish between different motions by just considering the matrix M. [sent-118, score-0.909]

41 More specifically, we propose a tuple of integers called signature in order to capture the low-dimensional subspace structure of an restricted motion. [sent-119, score-0.645]

42 This signature is defined as a function of the motion matrix M in the following way sig (M) = (rank (M∶,1∶9 ) , rank (M) rank (M∶,1∶9 )) . [sent-120, score-0.61]

43 st entry of the sigsingat(uMre )sig (M) = (r, d) entirely determines the number of fixed rostiagtio(Mn a)x =es ( irn,vdo)lved in the restricted motion whereas the secsiognd(M Men)tr =y (spr,edc)ifies the dimensionality d of the subspace in which the object translates over time. [sent-125, score-0.789]

44 1 summarizes the results of the analysis based on our unified framework for articulated − and restricted motions whereas Tab. [sent-129, score-0.73]

45 Moreover, once the signature is computed and the type of restricted motion is thereby determined, all aspects of this motion can be directly extracted by carefully analyzing the nullspace structure of M (see Sec. [sent-131, score-1.15]

46 The translation subspace is spanned by the columns of the translational part M∶,10∶12 whereas the rotation subspace is spanned by the rot∶a,t1i0o∶1n2al part of the motion matrix M∶,1∶9. [sent-134, score-1.007]

47 Note that the second entry of the signature encodes th∶,e1 ∶d9imensionality d of the subspace of the translation subspace which is not yet contained in the rotation subspace. [sent-135, score-0.872]

48 It is not obvious why the translation subspace is entirely contained in the rotation sdu =bs 0pace in the absence of any dynamic translation, i. [sent-136, score-0.563]

49 Non-Translating Joints In the following, the signature for rotations around one, two, or three fixed axes will be explainedd u =nd 0er the assumption that the joint is not translating, i. [sent-150, score-0.738]

50 Specifically, we will derive the values of the two entrdies = o 0f the signature for each type of restricted motion. [sent-153, score-0.522]

51 MFirstly), the dimensionality of the rotation subspace span (M∶,1∶9) needs to be derived. [sent-155, score-0.452]

52 111555000866 Type of Articulation / Restricted MotionSignatureFormula for relative transformation Tfwith T ∈ R3×d caoxerrse sarpoe cndapst tuor esidg bnayt ucrheoo (2s,in2g) t wb corresponds to signature (2,2) t=h t aa r. [sent-156, score-0.37]

53 wi=th t a rotation axis orthogonal to the translation direction, i. [sent-160, score-0.478]

54 sScpoearcronens(pdMolnyd, sw teo sn)igeneadt rtoe (s2h,o2w) wthitaht at rhoet ttiroanns alxaitsio onrt sougbosnpaal tcoe span (M∶,10∶12) is entirely contained (irns,0id)e the rotation subssppaacne( Msin∶c,1e0 ∶a1 2s)ignature of the form (r, 0) encodes exactly tshpias n pr(oMperty. [sent-163, score-0.422]

55 What remains to be shown is that the translation subspace is entirely contained in the rotation subspace. [sent-172, score-0.525]

56 It is well-known, that the location of a non-translating joint is unaffected by a rotation around this joint and hence t represents a fixed point. [sent-173, score-0.452]

57 Rotations Around Two Axes The sequential application of rotations around two fixed axes a at location ta and b at location tb which are not necessarily located at the same point in space looks like [R0Tf t1f] = [R0bT,βf tb− R1b,βftb][R0aT,αf ta− R1a,αfta]. [sent-180, score-0.726]

58 The( ttran+stlati−onRs are aenqdua thl etfoo tf = tb + Rb,βf (−tb + ta Ra,αfta) and therefore − M∶,10(∶1−2 = +[⇓ft −−vRec(Rb,β)f a−ndI3 th)Te [etfbo ⊗reI3] v(Recb(R,βf (Ra−,αIf I3[)t)⊗T [Ita] ⊗ I3]] . [sent-184, score-0.363]

59 any rotation can be vec(AYB) vec = [B⊗ (AYB) = [BT ⊗ A]ve A] vec (Y). [sent-199, score-0.39]

60 fip observing the dynamic object can act itself as one of the two moving parts defining the relative motion between the two parts. [sent-207, score-0.418]

61 lTdsr waintshla Xt=i n tg⊗ ⊗ JIoints tˆf Modeling also a time-varying translational component in the restricted motion between two parts leads to slightly more complex formulas. [sent-215, score-0.55]

62 For example, a rotation around one axis which at the same time translates is modeled as [R0Tf t1f] = [0IT3 tˆ1f ][R0a,Tαf t−R1a,αft]. [sent-216, score-0.442]

63 Hen∶,c1e∶,9 the second entry of the signature equals d because the dynamic translations are restricted to a d-dimensional subspace. [sent-249, score-0.704]

64 Extraction of Parameters Having shown that the signature provides an unique pattern for varying types of restricted motion, an appropriate method to extract the motion parameters can be chosen according to the signature encoded in the motion matrix M. [sent-259, score-1.204]

65 Rotation Axes and Angles If the signature tells us that the motion is around one fixed axis (i. [sent-263, score-0.759]

66 the fir(sat ⊗entIry) o =f 0the signature equals 2), the linear system M∶,1∶9 (a ⊗ I3) = 0F×3 needs to be solved for the axis a. [sent-265, score-0.475]

67 The approach for rotations arounda tw = bo fixed axes is based on the fact that bTRb,βfRa,αfa = bTa which is constant throughout time. [sent-267, score-0.383]

68 ∈theR first entry equals 8), the one-dnime =ns ai ⊗ona bl nullspace n ∈ R9 of M∶,1∶9 is computed first, this nullspace is then resnh ∈ape Rd into a ∶3,1-b∶9y-3 matrix N, and lastly a rank-1 dNec o=m bpaonsit ∈io Rn of this reshaped matrix reveals the two axes N = baT. [sent-271, score-0.709]

69 ThenM thNe c=o 0lumn span of the columns of N restricted to the lastM MthNr[eTe = ]e 0ntries encode a basis for the orthogonal complement [T]? [sent-289, score-0.398]

70 For example, the tr=an tsla+tio Rns obs(−ertve+d bty − roRtations a)ro +u Tnd two axes equal tf = tb + Rb,βf (−tb + ta Ra,αfta) + Tt˜f, which are linear =in t th+e Runknow(−nst ta+, ttb,− −anRd ˜tf. [sent-306, score-0.558]

71 Actually, since ta and tb are only defined up to translations along a and b, resp=ec [ati]vely, the locations of the axes can be parametrized by ta = [a]? [sent-308, score-0.591]

72 SpecificallTy, t =he 0 locatidon = t 2a of an axis a is not uniquely defined if aTT = 0 and d+ +=T T2t since in that case, shifting the axis inside tThe =p l0ane tad d+ = =T 2t0 can be compensated by the time varyingT T =t˜ 0f. [sent-313, score-0.389]

73 Since our method is entirely based on relative transformations between two parts, let us quickly explain how we extracted these transformations from pure image data. [sent-318, score-0.648]

74 Feature points are extracted and matched across different frames and a robust RANSAC [9] stage extracts rigid transformations from twoand three-view relations. [sent-323, score-0.377]

75 In case of a static camera observing an object which undergoes a restricted motion (experiments in Sec. [sent-324, score-0.48]

76 Sequential 3-point RANSAC is then used to extract all rigid 111555 010 919 Figure 2: Our method accurately recovers the rotation axis a (in red) of a laptop opening and closing its screen from pure imagery data take(n b,0y a moving camera. [sent-333, score-0.615]

77 This articulated motion has signature (2, 0) and is thus equivalent to the motion of a door, for example. [sent-334, score-0.988]

78 Lastly, the relative motion between two parts can be recovered from the two corresponding groups by expressing their motions w. [sent-338, score-0.619]

79 This basically factors out the motion of the camera and only the motions of the parts remain. [sent-342, score-0.488]

80 Since this motion corresponds to a rotation arou(nd2, o0n)e single axis with fixed location, the signature equals (2, 0). [sent-347, score-0.905]

81 Despite a moving camera, we are able to accurately compute the orientation a and location t of the rotation axis based on our method (see Fig. [sent-349, score-0.423]

82 The extracted articulated motion parameters can be used to generate novel, unobserved configurations, as demonstrated in the supplemental material [11]. [sent-351, score-0.575]

83 These features undergo a complex motion since the blackboard stands on wheels and can therefore be rotated and translated according to a planar motion. [sent-356, score-0.425]

84 On top of that, the black writing area can be rotated around a hor(iz8o,2n)tal axis leading to a restricted motion with signature (8, 2) which previous approaches could not handle. [sent-357, score-0.93]

85 Our m(e8t,h2o)d can successfully extract the two rotation axes a and (b8 T,an2d) ∈ Rthe two-dimensional span of the translation directions T ∈ R3×2. [sent-358, score-0.585]

86 Due to one part of the motion being a planar motTion ∈, R Rth3×e2 location tb of the axis b is not defined. [sent-359, score-0.635]

87 The two extracted rotation axes are shown in red and the translation directions T (parallel to the floor) in green. [sent-364, score-0.476]

88 Note that the location ta of the axis a is well-defined (and recovered), whereas the location tb of axis b is not defined due to the planar motion. [sent-365, score-0.709]

89 g0l5ed1,0 we can compute the intermediate motion of a putative part as would be observed without the rotation Ra,αf , i. [sent-383, score-0.386]

90 This recovered intermediate motion together with silhouette images obtained with background subtraction permits for example the computation of a visual hull of the blackboard stand, as shown in Fig. [sent-386, score-0.41]

91 Rotation Around a Translating Axis This experiment is based on the motion of the front wheel of a car rolling on a straight line on the street. [sent-393, score-0.391]

92 While this motion has similarity to a hinge joint, the( 2o,n1e)-dimensional dynamic translation leads to a signature (2, 1) which makes this particular restricted motion a hard (in2,st1a)nce. [sent-395, score-1.037]

93 Having recovered the motion parameters including the time-varying rotation angles αf a=ndα Rtranslations Tt˜f, we can check for a linear relation t˜f = αR between αf and t˜f to recover the radius R of the whee=l αanRd its contact line with the street. [sent-399, score-0.476]

94 The vectorized relative transformations gave rise to a motion matrix. [sent-408, score-0.51]

95 A signature can be computed from the restricted motion subspace induced by this motion matrix which exactly specifies the type of restricted motion. [sent-409, score-1.375]

96 Together with a careful analysis of the nullspace-structure of the motion matrix, this leads to a general framework for articulated and restricted motions between two parts. [sent-410, score-0.952]

97 The framework has been successfully applied to several challenging data sets showcasing how existing and novel restricted motion types can be handled in the same way. [sent-411, score-0.437]

98 Furthermore, we are investigating robust model selection / rank-detections for the singular values of the motion matrix since SfM can return erroneous relative transformations, especially for nearly degenerate or ill-condition motion sequences. [sent-415, score-0.562]

99 Articulated and restricted motion subspaces and their signatures : Supplemental material. [sent-485, score-0.571]

100 A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. [sent-563, score-0.542]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('articulated', 0.293), ('signature', 0.251), ('motion', 0.222), ('motions', 0.222), ('restricted', 0.215), ('transformations', 0.213), ('axes', 0.195), ('axis', 0.179), ('tf', 0.165), ('rotation', 0.164), ('nullspace', 0.156), ('subspace', 0.151), ('rotations', 0.144), ('span', 0.137), ('rigid', 0.136), ('blackboard', 0.132), ('translations', 0.122), ('tb', 0.122), ('sfm', 0.116), ('wheel', 0.114), ('vec', 0.113), ('rf', 0.111), ('rb', 0.101), ('ra', 0.093), ('translation', 0.089), ('entirely', 0.088), ('subspaces', 0.082), ('ta', 0.076), ('relative', 0.075), ('planar', 0.071), ('ft', 0.071), ('translational', 0.069), ('angst', 0.068), ('brien', 0.066), ('around', 0.063), ('spanned', 0.059), ('fta', 0.058), ('recovered', 0.056), ('type', 0.056), ('rolling', 0.055), ('algebraic', 0.054), ('stand', 0.052), ('signatures', 0.052), ('cos', 0.051), ('sin', 0.049), ('orthogonal', 0.046), ('tracked', 0.046), ('equals', 0.045), ('pink', 0.045), ('btrb', 0.044), ('esidg', 0.044), ('smpa', 0.044), ('fixed', 0.044), ('parts', 0.044), ('static', 0.043), ('matrix', 0.043), ('joint', 0.041), ('location', 0.041), ('rotational', 0.04), ('translating', 0.04), ('bta', 0.039), ('moving', 0.039), ('factorization', 0.038), ('dynamic', 0.038), ('lastly', 0.038), ('tr', 0.037), ('translates', 0.036), ('ayb', 0.036), ('anrd', 0.036), ('dtim', 0.036), ('sig', 0.036), ('laptop', 0.036), ('angles', 0.034), ('fayad', 0.034), ('paladini', 0.034), ('contained', 0.033), ('entry', 0.033), ('cta', 0.032), ('supplemental', 0.032), ('es', 0.031), ('io', 0.031), ('pure', 0.031), ('unaffected', 0.031), ('fra', 0.031), ('ration', 0.031), ('skeletons', 0.031), ('uniquely', 0.031), ('opening', 0.03), ('analyse', 0.03), ('rank', 0.029), ('extracted', 0.028), ('integers', 0.028), ('tthe', 0.028), ('tio', 0.028), ('clouds', 0.027), ('derivations', 0.027), ('hence', 0.027), ('recovery', 0.027), ('actually', 0.027), ('tt', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures

Author: Bastien Jacquet, Roland Angst, Marc Pollefeys

2 0.21398036 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields

Author: Zhuoyuan Chen, Hailin Jin, Zhe Lin, Scott Cohen, Ying Wu

Abstract: We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sac- rifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.

3 0.17352794 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Author: Fernando Flores-Mangas, Allan D. Jepson

Abstract: The problem of rigid motion segmentation of trajectory data under orthography has been long solved for nondegenerate motions in the absence of noise. But because real trajectory data often incorporates noise, outliers, motion degeneracies and motion dependencies, recently proposed motion segmentation methods resort to non-trivial representations to achieve state of the art segmentation accuracies, at the expense of a large computational cost. This paperproposes a method that dramatically reduces this cost (by two or three orders of magnitude) with minimal accuracy loss (from 98.8% achieved by the state of the art, to 96.2% achieved by our method on the standardHopkins 155 dataset). Computational efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Subsets of motion models with the best balance between prediction accuracy and model complexity are chosen from a pool of candidates, which are then used for segmentation. 1. Rigid Motion Segmentation Rigid motion segmentation (MS) consists on separating regions, features, or trajectories from a video sequence into spatio-temporally coherent subsets that correspond to independent, rigidly-moving objects in the scene (Figure 1.b or 1.f). The problem currently receives renewed attention, partly because of the extensive amount of video sources and applications that benefit from MS to perform higher level computer vision tasks, but also because the state of the art is reaching functional maturity. Motion Segmentation methods are widely diverse, but most capture only a small subset of constraints or algebraic properties from those that govern the image formation process of moving objects and their corresponding trajectories, such as the rank limit theorem [9, 10], the linear independence constraint (between trajectories from independent motions) [2, 13], the epipolar constraint [7], and the reduced rank property [11, 15, 13]. Model-selection based (a)Orignalvideoframes(b)Clas -label dtrajectories (c)Modelsup ort egion(d)Modelinlersandcontrolpoints (e)Modelresiduals (f) Segmenta ion result Figure 1: Model instantiation and segmentation. a) fth original frame, Italian Grand Prix ?c 2012 Formula 1. b) Classilanbaell feradm, trajectory Gdaratan Wd P r(rixed ?,c green, bolrumeu alnad 1 .bbl a)c Ck correspond to chassis, helmet, background and outlier classes respectively). c) Spatially-local support subset for a candidate motion in blue. d) Candidate motion model inliers in red, control points from Eq. 3) in white. e) Residuals from Eq. 11) color-coded with label data, the radial coordinate is logarithmic. f) Segmentation result. Wˆf (rif (cif methods [11, 6, 8] balance model complexity with modeling accuracy and have been successful at incorporating more of these aspects into a single formulation. For instance, in [8] most model parameters are estimated automatically from the data, including the number of independent motions and their complexity, as well as the segmentation labels (including outliers). However, because of the large number of necessary motion hypotheses that need to be instantiated, as well as the varying and potentially very large number of 222222555977 model parameters that must be estimated, the flexibility offered by this method comes at a large computational cost. Current state of the art methods follow the trend of using sparse low-dimensional subspaces to represent trajectory data. This representation is then fed into a clustering algorithm to obtain a segmentation result. A prime example of this type of method is Sparse Subspace Clustering (SSC) [3] in which each trajectory is represented as a sparse linear combination of a few other basis trajectories. The assumption is that the basis trajectories must belong to the same rigid motion as the reconstructed trajectory (or else, the reconstruction would be impossible). When the assumption is true, the sparse mixing coefficients can be interpreted as the connectivity weights of a graph (or a similarity matrix), which is then (spectral) clustered to obtain a segmentation result. At the time of publication, SSC produced segmentation results three times more accurate than the best predecessor. The practical downside, however, is the inherently large computational cost of finding the optimal sparse representation, which is at least cubic on the number of trajectories. The work of [14] also falls within the class of subspace separation algorithms. Their approach is based on clustering the principal angles (CPA) of the local subspaces associated to each trajectory and its nearest neighbors. The clustering re-weights a traditional metric of subspace affinity between principal angles. Re-weighted affinities are then used for segmentation. The approach produces segmentation results with accuracies similar to those of SSC, but the computational cost is close to 10 times bigger than SSC’s. In this work we argue that competitive segmentation results are possible using a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. The proposed method is approximately 2 or 3 orders of magnitude faster than [3] and [14] respectively, currently considered the state of the art. 1.1. Affine Motion Projective Geometry is often used to model the image motion of trajectories from rigid objects between pairs of frames. However, alternative geometric relationships that facilitate parameter computation have also been proven useful for this purpose. For instance, in perspective projection, general image motion from rigid objects can be modeled via the composition of two elements: a 2D homography, and parallax residual displacements [5]. The homography describes the motion of an arbitrary plane, and the parallax residuals account for relative depths, that are unaccounted for by the planar surface model. Under orthography, in contrast, image motion of rigid objects can be modeled via the composition of a 2D affine transformation plus epipolar residual displacements. The 2D affine transformation models the motion of an arbitrary plane, and the epipolar residuals account for relative depths. Crucially, these two components can be computed separately and incrementally, which enables an explicit mechanism to deal with motion degeneracy. In the context of 3D motion, a motion is degenerate when the trajectories originate from a planar (or linear) object, or when neither the camera nor the imaged object exercise all of their degrees of freedom, such as when the object only translates, or when the camera only rotates. These are common situations in real world video sequences. The incremental nature of the decompositions described above, facilitate the transition between degenerate motions and nondegenerate ones. Planar Model Under orthography, the projection of trajectories from a planar surface can be modeled with the affine transformation: ⎣⎡xy1c ⎦⎤=?0D? 1t?⎡⎣yx1w ⎦⎤= A2wD→c⎣⎡yx1w ⎦⎤, (1) where D ∈ R2×2 is an invertible matrix, and t ∈ R2 is a threarnesl Datio ∈n v Rector. Trajectory icboloer mdiantartixe,s (axnwd , tyw ∈) are in the plane’s reference frame (modulo a 2D affine transformation) and (xc, yc) are image coordinates. Now, let W ∈ R2F×P be matrix of trajectory data that conNtaoiwns, tlehet x a n∈d y image coordinates of P feature points tracked through F frames, as in TocmputehW pa=r⎢m⎣ ⎡⎢ etyx e1F r.,s 1 ofA· . ·2.Dfyx r1Fo m., P tra⎦⎥ ⎤je.ctorydat,(l2e)t C = [c1, c2 , c3] ∈ R2f×3 be three columns (three full trajectories) from W, and let = be the x and y coordinates of the i-th control trajectory at frame f. Then the transformation between points from an arbitrary source frame s to a target frame f can be written as: cif [ci2f−1, ci2f]? ?c1f1 c12f c1f3?= A2sD→f?c11s and A2s→Df c12s c13s?, (3) ?−1. (4) can be simply computed as: A2sD→f= ? c11f c12f c13f ? ? c11s c12s c13s The inverse in the right-hand side matrix of Eq. 4 exists so long as the points cis are not collinear. For simplicity we refer to as and consequently A2sD is the identity matrix. A2s→Df A2fD 222222556088 3D Model In order to upgrade a planar (degenerate) model into a full 3D one, relative depth must be accounted using the epipolar residual displacements. This means extending Eq. 1 with a direction vector, scaled by the corresponding relative depth of each point, as in: ⎣⎡xy1c ⎦⎤=?0?D? t1? ⎡⎣xy1w ⎦⎤+ δzw⎣⎡a 0213 ⎦⎤. The depth δzw is relative to the arbitrary plane tion is modeled by A2D; a point that lies on would have δzw = 0. We call the orthographic the plane plus parallax decomposition, the 2D Epipolar (2DAPE) decomposition. Eq. 5 is equivalent to (5) whose mosuch plane version of Affine Plus wher⎣⎡ ixyt1c is⎤⎦cl=ear⎡⎣tha 120t hea p201a2rma2e10t3erst1o2f⎦⎤A⎣⎢⎡3Dδyxd1zwefin⎦⎥⎤ean(o6r)thographically projected 3D affine transformation. Deter- mining the motion and structure parameters of a 3D model from point correspondences can be done using the classical matrix factorization approach [10], but besides being sensitive to noise and outliers, the common scenario where the solution becomes degenerate makes the approach difficult to use in real-world applications. Appropriately accommodating and dealing with the degenerate cases is one of the key features of our work. 2. Overview of the Method The proposed motion segmentation algorithm has three stages. First, a pool of M motion model hypotheses M = s{tMag1e , . . . , rMst,M a} p oiso generated using a omdeetlh hoydp tohthate csoesm Mbine =s a MRandom Sampling naenrda eCdon usseinngsu as m(ReAthNodS tAhaCt) [o4m] bteinche-s nique with the 2DAPE decomposition. The goal is to generate one motion model for each of the N independent, rigidly-moving objects in the scene; N is assumed to be known a priori. The method instantiates many more models than those expected necessary (M ? N) in an attempt ienlscr tehaasne tthhoes elik eexlpiheocotedd o nfe generating Mco ?rrec Nt m) iond aenl proposals for all N motions. A good model accurately describes a large subset of coherently moving trajectories with the smallest number of parameters (§3). Ialnl ethste n suemcobnedr stage, msubetseertss o§f3 )m.otion models from M are ncom thebi sneecdo ntod explain ualbl sthetes trajectories mino tdheel sequence. The problem is framed as an objective function that must be minimized. The objective function is the negative loglikelihood over prediction accuracy, regularized by model complexity (number of model parameters) and modeling overlap (trajectories explained by multiple models). Notice that after this stage, the segmentation that results from the optimal model combination could be reported as a segmentation result (§5). ioTnhe r tshuilrtd ( stage incorporates the results from a set of model combinations that are closest to the optimal. Segmentation results are aggregated into an affinity matrix, which is then passed to a spectral clustering algorithm to produce the final segmentation result. This refinement stage generally results in improved accuracy and reduced segmentation variability (§6). 3. Motion Model Instantiation Each model M ∈ M is instantiated independently using RacAhN mSAodCel. MThis ∈ c Mhoi cies niss manotitiavteatded in bdeecpaeunsdee otlfy th usismethod’s well-known computational efficiency and robustness to outliers, but also because of its ability to incorporate spatially local constraints and (as explained below) because most of the computations necessary to evaluate a planar model can be reused to estimate the likelihoods of a potentially necessary 3D model, yielding significant computational savings. The input to our model instantiation algorithm is a spatially-local, randomly drawn subset of trajectory data Wˆ[2F×I] ⊆ W[2F×P] (§3.1). In turn, at each RANSAC trial, the algorithm draw(§s3 uniformly d,i asttr eibaucthed R, A rNanSdoAmC subsets of three control trajectories (C[2F×3] ⊂ Wˆ[2F×I]). Each set of control trajectories is used to estim⊂ate the family of 2D affine transformations {A1, . . . , AF} between the iblyase o ffr 2aDm aef ainnde aralln sotfoherrm fartaimoness { iAn the sequence, wtwheicehn are then used to determine a complete set of model parameters M = {B, σ, C, ω}. The matrix B ∈ {0, 1}[F×I] indicates Mwhe =the {rB t,hσe ,iC-th, trajectory asthroixu Bld ∈b e predicted by model M at frame f (inlier, bif = 1) or not (outlier, bif = 0), σ = {σ1 , . . . , σF} are estimates of the magnitude of the σnois =e {foσr each fram}e a, aen eds ω ∈at {s2 oDf, t3hDe} m isa tnhietu edsetim ofa ttehde nmooidseel f type. hTh fera goal aisn dto ω ωfin ∈d {t2heD c,3oDntr}ol is points tainmda ttehed associated parameters that minimize the objective function O(Wˆ,M) =f?∈Fi?∈IbifLω? wˆif| Af,σf?+ Ψ(ω) + Γ(B) across (7) wˆfi a number of RANSAC trials, where = = are the coordinates of the i-th trajectory from the support subset at frame f. The negative log-likelihood term Lω (·) penalizes reconstruction error, while Ψ(·) and Γ(·) are regularizers. Tcohen tthrureceti otenr mer-s are ,d wefhinileed Ψ Ψ b(e·l)ow an. Knowing that 2D and 3D affine models have 6 and 8 degrees of freedom respectively, Ψ(ω) regularizes over model complexity using: (xif, yif) ( wˆ 2if−1, wˆ i2f) Wˆ Ψ(ω) =?86((FF − − 1 1)), i f ωω== 32DD. 222222556199 (8) Γ(B) strongly penalizes models that describe too few trajectories: Γ(B) =?0∞,, oifth?erwI?iseFbif< Fλi (9) The control set C whose M minimizes Eq. 7 across a number of RANSAC trials becomes part of the pool of candidates M. 2D likelihoods. For the planar case (ω = 2D) the negative log-likelihood term is evaluated with: L2D( wˆif| Af,σf) = −log?2π|Σ1|21exp?−21rif?Σ−1rif??, (10) which is a zero-mean 2D Normal distribution evaluated at the residuals The spherical covariance matrix is Σ = rif. rif (σf)2I. The residuals are determined by the differences between the predictions made by a hypothesized model Af, and the observations at each frame ?r?1f?=? w˜1?f?− Af? w˜1?s?. (11) 3D likelihoods. The negative log-likelihood term for the 3D case is based on the the 2DAPE decomposition. The 2D affinities Af and residuals rf are reused, but to account for the effect of relative depth, an epipolar line segment ef is robustly fit to the residual data at each frame (please see supplementary material for details on the segment fitting algorithm). The 2DAPE does not constrain relative depths to remain constant across frames, but only requires trajectories to be close to the epipolar line. So, if the unitary vector ef indicates the orthogonal direction to ef, then the negativ⊥e log-likelihood term for the 3D case is estimated with: L3D( wˆfi| Af,σf) = −2log⎜⎝⎛√21πσfexp⎪⎨⎪⎧−?r2if(?σfe)f⊥2?2⎪⎬⎪⎫⎞⎟⎠, ⎠(12,) which is also a zero-mean 2D Norma⎩l distribution ⎭computed as the product of two identical, separable, singlevariate, normal distributions, evaluated at the distance from the residual to the epipolar line. The first one corresponds to the actual deviation in the direction of ef , which is analyti- cally computed using rif?ef. The seco⊥nd one corresponds to an estimate of the deviat⊥ion in the perpendicular direction (ef), which cannot be determined using the 2DAPE decomposition model, but can be approximated to be equal to rif ? ef, which is a plausible estimate under the isotropic noise as⊥sumption. Note that Eq. 7 does not evaluate the quality of a model using the number of inliers, as it is typical for RANSAC. Instead, we found that better motion models resulted from Algorithm 1: Motion model instantiation × Algorithm 1: Motion model instantiation Input:b Traasejec frtoamrye d bata W[2F×P], number of RANSAC trials K, arbitrary Output: Parameters of the motion model M = {B , σn , ω} // determine the training set c ← rand( 1, P) ; r ← rand(rmin , rmax ) // random center and radius I] ← t ra j e ct oriesWithinDis k (W, r,c) // support subset X ← homoCoords(Wˆb) // points at base frame for K RANSAC trials do Wˆ[2F Wˆ return M = {B} optimizing over the accuracy ofthe model predictions for an (estimated) inlier subset, which also means that the effect of outliers is explicitly uncounted. Figure 1.b shows an example of class-labeled trajectory data, 1.c shows a typical spatially-local support subset. Figures 1.d and 1.e show a model’s control points and its corresponding (class-labeled) residuals, respectively. A pseudocode description of the motion instantiation algorithm is provided in Algorithm 1. Details on how to determine Wˆ, as well as B, σ, and ω follow. 3.1. Local Coherence The subset of trajectories Wˆ given to RANSAC to generate a model M is constrained to a spatially local region. The probability ofchoosing an uncontaminated set of 3 control trajectories, necessary to compute a 2D affine model, from a dataset with a ratio r of inliers, after k trials is: p = 1 − (1 − r3)k. This means that the number of trials pne =ede 1d −to (fi1n d− a subset of 3 inliers with probability p is k =lloogg((11 − − r p3)). (13) A common assumption is that trajectories from the same underlying motion are locally coherent. Hence, a compact region is likely to increase r, exponentially reducing 222222666200 Figure 2: Predictions (red) from a 2D affine model with standard Gaussian noise (green) on one of the control points (black). Noiseless model predictions in blue. All four scenarios have identical noise. The magnitude of the extrapolation error changes with the distance between the control points. k, and with it, RANSAC’s computation time by a proportional amount. The trade-off that results from drawing model control points from a small region, however, is extrapolation error. A motion model is extrapolated when utilized to make predictions for trajectories outside the region defined by the control points. The magnitude of modeling error depends on the magnitude of the noise affecting the control points, and although hard to characterize in general, extrapolation error can be expected to grow with the distance from the prediction to the control points, and inversely with the distance between the control points themselves. Figure 2 shows a series of synthetic scenarios where one of the control points is affected by zero mean Gaussian noise of small magnitude. Identical noise is added to the same trajectory in all four scenarios. The figure illustrates the relation between the distance between the control points and the magnitude of the extrapolation errors. Our goal is to maximize the region size while limiting the number of outliers. Without any prior knowledge regarding the scale of the objects in the scene, determining a fixed size for the support region is unlikely to work in general. Instead, the issue is avoided by randomly sampling disk-shaped regions of varying sizes and locations to construct a diverse set of support subsets. Each support subset is then determined by Wˆ = {wi | (xbi − ox)2 + (ybi − oy)2 < r2}, (14) where (ox , oy) are the coordinates of the center of a disk of radius r. To promote uniform image coverage, the disk is centered at a randomly chosen trajectory (ox , oy) = (xbi, yib) with uniformly distributed i ∼ U(1, P) and base frame b) w∼i h U u(1n,i fFor)m. yTo d asltrloibwu efodr idi ∼ffer Ue(n1t, region ds bizaesse, tfhraem read bius ∼ r is( ,cFho)s.en T ofro amllo a u fnoirfo dirmffe rdeinsttr riebugtiioonn r ∼s, tUh(erm raidni,u ursm rax i)s. Ihfo tsheenre f are mI a trajectories swtritihbiunt othne support region, then ∈ R2F×I. It is worth noting that the construction of theW support region does not incorporate any knowledge about the motion of objects in the scene, and in consequence will likely contain trajectories that originate from more than one independently moving object (Figure 3). Wˆ Wˆ Figure 3: Two randomly drawn local support sets. Left: A mixed set with some trajectories from the blue and green classes. Right: Another mixed set with all of the trajectories in the red class and some from the blue class. 4. Characterizing the Residual Distribution At each RANSAC iteration, residuals rf are computed using the 2D affine model Af that results from the constraints provided by the control trajectories C. Characterizing the distribution of rf has three initial goals. The first one is to determine 2D model inliers b2fD (§4.1), the second one is to compute estimates of the magnitude ,o tfh thee s ncooinsed at every frame σ2fD (§4.2), and the third one is to determine whether the residual( §d4i.s2t)r,ib auntidon th originates efr iosm to a planar or a 3D object (§4.3). If the object is suspected 3D, then two more goals n (§e4ed.3 )to. bIfe t achieved. sT shues pfiercstt one Dis, t hoe nde ttweromine 3D model inliers b3fD (§4.4), and the second one is to estimate the magnitude of the noise of a 3D model (§4.5). (σf3D) to reflect the use 4.1. 2D Inlier Detection Suppose the matrix Wˆ contains trajectories Wˆ1 ∈ R2F×I and Wˆ2 ∈ R2F×J from two independently moving objects, and ∈tha Rt these trajectories are contaminated with zero-mean Gaussian noise of spherical covariance η ∼ N(0, (σf)2I): Wˆ = ?Wˆ1|Wˆ2? + η. (15) A1f Now, assume we know the true affine transformations and that describe the motion of trajectories for the subsets Wˆ1 and Wˆ2, respectively. If is used to compute predictions for all of Wˆ (at frame f), the expected value (denoted by ?·?) of the magnitude of the residuals (rf from Eq. 11) for trajectories aing nWiˆtud1 will be in the order of the magnitude of the underlying noise ?|rif |? = σf for each i∈ {1, . . . , I}. But in this scenario, trajectories in Wˆ2 ewaicl h b ie ∈ predicted using tth ien wrong emnaodrioel,, resulting isn i nr esid?uals? wit?h magnitudes de?termined by the motion differential A2f ???rif??? A1f ???(A1f − A2f) wˆib???. If we = can assume that the motion ?d??riff???er =en???t(iAal is bigger tha???n. tIhfe w deis cpalnac aesmsuemnte d thuea t toh eno miseo:t ???(A1f − A2f)wib??? 222222666311 > σf, (16) then the model inliers can be determined by thresholding | with the magnitude of the noise, scaled by a constant |(τr =| w wλitσhσtf h):e |rif bif=?10,, |orthife|r ≤wi τse. (17) But because σf is generally unknown, the threshold (τ) is estimated from the residual data. To do so, let be the vector of residual magnitudes where rˆi ≤ ˆ ri+1 . Now, let r˜ = median ( rˆi+1 −ˆ r i). The threshold i≤s trˆ h en defined as r τ = min{ rˆi | (ri+1 − ri) > λr˜ r}, (18) which corresponds to the smallest residual magnitude before a salient magnitude gap. Our experiments showed this test to be efficient and effective. Figure 1.e shows classlabeled residuals. Notice the presence of a (low density) gap between the residuals from the trajectories explained by the correct model (in red, close to the origin), and the rest. 4.2. Magnitude of the Noise, 2D Model r2fD Let contain only the residuals of the inlier trajectories (those where = 1), and let USV? be the singular value decomposition of the covariance matrix bif ofˆ r2fD: USV?= svd??1bpf( rˆ2fD)? rˆ2fD?.(19) Then the magnitude of the n?oise corresponds to the largest singular value σ2 = s1, because if the underlying geometry is in fact planar, then the only unaccounted displacements captured by the residuals are due to noise. Model capacity can also be determined from S, as explained next. 4.3. Model Capacity The ratio of largest over smallest singular values (s1/s2) determines when upgrading to a 3D model is beneficial. When the underlying geometry is actually non-planar, the residuals from a planar model should distribute along a line (the epipolar line), reflecting that their relative depth is being unaccounted for. This produces a covariance matrix with a large ratio s1/s2 ? 1. If on the other hand, if s1/s2 ≈ 1, then there is no in 1d.ic Iafti oonn tohfe unexplained relative depth, tihn wnh thicehr case, fitting a olinne o tfo u spherically distributed residuals will only increase the model complexity without explaining the residual variance much better. A small spherical residual covariance strongly suggests a planar underlying geometry. 4.4. 3D Inlier Detection When the residual distribution is elongated (s1/s2 ? 1), a line segment is robustly fit to the (potentially con?tam 1i)-, nated) set of residuals. The segment must go through the origin and its parameters are computed using a Hough transform. Further details about this algorithm can be found in the supplementary material. Inlier detection The resulting line segment is used to determine 3D model inliers. Trajectory ibecomes an inlier at frame f if it satisfies two conditions. First, the projection of rif onto the line must lie within the segment limits (β ≤ ≤ γ). Second, the normalized distance to the rif?ef (ef?rif line must be below a threshold ≤ σ2λd). Notice that the threshold depends on the smalle≤st singular value from Eq. 19 to (roughly) account for the presence of noise in the direction perpendicular to the epipolar (ef). 4.5. Magnitude of the Noise, 3D Model let rˆf3D Similarly to the 2D case, contain the residual data from the corresponding 3D inlier trajectories. An estimate for the magnitude of the noise that reflects the use of a 3D model can be obtained from the singular value decomposition of the covariance matrix of r3fD (as in Eq. 19). In this case, the largest singular value s1 captures the spread of residuals along the epipolar line, so its magnitude is mainly related to the magnitude of the displacements due to relative depth. However, s2 captures deviations from the epipolar line, which in a rigid 3D object can only be attributed to noise, making σ2 = s2 a reasonable estimate for its magnitude. Optimal model parameters When both 2D and 3D models are instantiated, the one with the smallest penalized negative log-likelihood (7) becomes the winning model for the current RANSAC run. The same penalized negative loglikelihood metric is used to determine the better model from across all RANSAC iterations. The winning model is added to the pool M, and the process is repeated M times, forming hthee p pool MM, a=n d{ tMhe1 , . . . , MssM is} r.e 5. Optimal Model Subset The next step is to find the model combination M? ⊂ M thhea tn mxta xstiempiz iess t prediction accuracy finora othne Mwhol⊂e trajectory mdaaxtaim Wize,s w phreiledi minimizing cmyod foerl complexity and modelling overlap. For this purpose, let Mj = {Mj,1 , . . . , Mj,N} be the j-th m thoisdel p ucorpmosbein,at lieotn, M Mand let {Mj} be the set o}f baell MheC jN-th = m N!(MM−!N)!) caotimonb,in aantdio lnest of N-sized models than can be draNw!(nM fr−oNm) M). The model soefle Nct-sioinze problem sis t hthanen c faonr bmeu dlartaewdn as M?= ar{gMmj}inOS(Mj), (20) 222222666422 where the objective is ?N ?P OS(Mj) = ??πp,nE (wp,Mj,n) ?n=1p?P=1 + λΦ?Φ(wp,Mj,n) + λΨ?Ψ(Mj,n). ?N (21) i?= ?1 n?= ?1 The first term accounts for prediction accuracy, the other two are regularization terms. Details follow. Prediction Accuracy In order to determine how well a model M predicts an arbitrary trajectory w, the affine transformations estimated by RANSAC could be re-used. However, the inherent noise in the control points, and the potentially short distance between them, often render this approach impractical, particularly when w is spatially distant from the control points (see §3. 1). Instead, model parametferorsm are computed owinittsh a efeac §to3r.1iz)a.ti Ionnst e baadse,d m [o1d0e]l mpaertahmode.Given the inlier labeling B in M, let WB be the subset of trajectories where bif = 1for at least half of the frames. The orthonormal basis S of a ω = 2D (or 3D) motion model can be determined by the 2 (or 3) left singular vectors of WB. Using S as the model’s motion matrices, prediction accuracy can be computed using: E(w, M) = ??SS?w − w??2 , (22) which is the sum of squared?? Euclidean d??eviations from the predictions (SS?w), to the observed data (w). Our experiments indicated that, although sensitive to outliers, these model predictions are much more robust to noise. Ownership variables Π ∈ {0, 1}[P×N] indicate whether trajectory p i ps explained by t {he0 ,n1-}th model (πp,n = 1) or not (πp,n = 0), and are determined by maximum prediction accuracy (i.e. minimum Euclidean deviation): πp,n=⎨⎧01,, oift hMerjw,nis=e. aMrg∈mMinjE(wp,M) (23) Regularization terms The second term from Eq. 21 penalizes situations where multiple models explain a trajectory (w) with relatively small residuals. For brevity, let M) = exp{−E(w, M)}, then: Eˆ(w, Φ(w,Mj) = −logMMm∈?∈aMMxjE ˆ ( w , MM)). (24) The third term regularizes over the number of model parameters, and is evaluated using Eq. 8. The constants λΦ and λΨ modulate the effect of the corresponding regularizer. Table 1: Accuracy and run-time for the H155 dataset. Naive RANSAC included as a baseline with overall accuracy and total computation time estimated using data from [12]. SOCARAPSulgrCAoNs[S r[31itA]4h]CmAverage89 A689 c. 71c 7695u racy[%]Compu1 t4a 237t1i506o70 n0 time[s] 6. Refinement The optimal model subset M? yields ownership variableTsh Πe o? wtimhicahl can already tb e M interpreted as a segmentation result. However, we found that segmentation accuracy can be improved by incorporating the labellings Πt from the top T subsets {Mt? | 1 ≤ t ≤ T} closest to optimal. Multiple labellings are incorporated oinsetos an affinity matrix F, where the fi,j entry indicates the frequency with which trajectory i is given the same label as trajectory j across all T labelli?ngs, weighted b?y the relative objective function O˜t = exp ?−OOSS((WW||MMt??))? for such a labelling: fi,j=?tT=11O˜tt?T=1?πit,:πjt,?:?O˜t (25) Note that the inne?r product between the label vectors (πi,:πj?,:) is equal to one only when the labels are the same. A spectral clustering method is applied on F to produce the method’s final segmentation result. 7. Experiments Evaluation was made through three experimental setups. Hopkins 155 The Hopkins 155 (H155) dataset has been the standard evaluation metric for the problem of motion segmentation of trajectory data since 2007. It consists of checkerboard, traffic and articulated sequences with either 2 or 3 motions. Data was automatically tracked, but tracking errors were manually corrected; further details are available in [12]. The use of a standard dataset enables direct comparison of accuracy and run-time performance. Table 1 shows the relevant figures for the two most competitive algorithms that we are aware of. The data indicates that our algorithm has run-times that are close to 2 or 3 orders of magnitude faster than the state of the art methods, with minimal accuracy loss. Computation times are measured in the same (or very similar) hardware architectures. Like in CPA, our implementation uses a single set of parameters for all the experiments, but as others had pointed out [14], it remains unclear whether the same is true for the results reported in the original SSC paper. 222222666533 Figure 4: Accuracy error-bars across artificial H155 datasets with controlled levels of Gaussian noise. Artificial Noise The second experimental setup complements an unexplored dimension in the H155 dataset: noise. The goal is to determine the effects of noise of different magnitudes towards the segmentation accuracy of our method, in comparison with the state of the art. We noted that H155 contains structured long-tailed noise, but for the purpose of this experiment we required a noise-free dataset as a baseline. To generate such a dataset, ground-truth labels were used to compute a rank 3 reconstruction of (mean-subtracted) trajectories for each segment. Then, multiple versions of H155 were computed by contaminating the noise-free dataset with Gaussian noise of magnitudes σn ∈ {0.01, 0.25, 0.5, 1, 2, 4, 8}. Our method, as well as SSC a∈nd { 0C.0PA1, were run on t2h,e4s,e8 }n.oi Ose-ucro mnterothlloedd, datasets; results are shown in Figure 4. The error bars on SSC and Ours indicate one standard deviation, computed over 20 runs. The plot for CPA is generated with only one run for each dataset (running time: 11.95 days). The graph indicates that our method only compromises accuracy for large levels of noise, while still being around 2 or 3 orders of magnitude faster than the most competitive algorithms. KLT Tracking The last experimental setup evaluates the applicability of the algorithm in real world conditions using raw tracks from an off-the-shelf implementation [1] of the Kanade-Lucas-Tomasi algorithm. Several sequences were tracked and the resulting trajectories classified by our method. Figure 5 shows qualitatively good motion segmentation results for four sequences. Challenges include very small relative motions, tracking noise, and a large presence of outliers. 8. Conclusions We introduced a computationally efficient motion segmentation algorithm for trajectory data. Efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Run-time comparisons indicate that our method is 2 or 3 orders of magnitude faster than the state of the art, with only a small loss in accuracy. The robustness of our method to Gaussian noise tracks from four Formula 1 sequences. Italian Grand Prix ?c2012 Formula 1. In this figure, all trajectories are given a m?co2ti0o1n2 l Faoberml, including hoiust fliigeurrse. of different magnitudes was found competitive with state of the art, while retaining the inherent computational efficiency. The method was also found to be useful for motion segmentation of real-world, raw trajectory data. References [1] ht tp : / /www . ce s . c l emn s on . edu / ˜stb / k lt . 8 [2] J. P. Costeira and T. Kanade. A Multibody Factorization Method for Independently Moving Objects. IJCV, 1998. 1 [3] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. CVPR, 2009. 2, 7 [4] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981. 3 [5] M. Irani and P. Anandan. Parallax geometry of pairs of points for 3d scene analysis. Proc. ECCV, 1996. 2 [6] K. Kanatani. Motion segmentation by subspace separation: Model selection and reliability evaluation. International Journal Image Graphics, 2002. 1 [7] H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, 1987. 1 [8] K. Schindler, D. Suter, , and H. Wang. A model-selection framework for multibody structure-and-motion of image sequences. Proc. IJCV, 79(2): 159–177, 2008. 1 [9] C. Tomasi and T. Kanade. Shape and motion without depth. Proc. [10] [11] [12] [13] [14] [15] ICCV, 1990. 1 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 1, 3, 7 P. Torr. Geometric motion segmentation and model selection. Phil. Tran. of the Royal Soc. of Lon. Series A: Mathematical, Physical and Engineering Sciences, 1998. 1 R. Tron and R. Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. In Proc. CVPR, 2007. 7 J. Yan and M. Pollefeys. A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. PAMI, 2008. 1 L. Zappella, E. Provenzi, X. Llad o´, and J. Salvi. Adaptive motion segmentation algorithm based on the principal angles configuration. Proc. ACCV, 2011. 2, 7 L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. Proc. CVPR, 2003. 1 222222666644

4 0.16720313 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video

Author: Ravi Garg, Anastasios Roussos, Lourdes Agapito

Abstract: This paper offers the first variational approach to the problem of dense 3D reconstruction of non-rigid surfaces from a monocular video sequence. We formulate nonrigid structure from motion (NRSfM) as a global variational energy minimization problem to estimate dense low-rank smooth 3D shapes for every frame along with the camera motion matrices, given dense 2D correspondences. Unlike traditional factorization based approaches to NRSfM, which model the low-rank non-rigid shape using a fixed number of basis shapes and corresponding coefficients, we minimize the rank of the matrix of time-varying shapes directly via trace norm minimization. In conjunction with this low-rank constraint, we use an edge preserving total-variation regularization term to obtain spatially smooth shapes for every frame. Thanks to proximal splitting techniques the optimization problem can be decomposed into many point-wise sub-problems and simple linear systems which can be easily solved on GPU hardware. We show results on real sequences of different objects (face, torso, beating heart) where, despite challenges in tracking, illumination changes and occlusions, our method reconstructs highly deforming smooth surfaces densely and accurately directly from video, without the need for any prior models or shape templates.

5 0.1561358 334 cvpr-2013-Pose from Flow and Flow from Pose

Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi

Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.

6 0.15220334 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion

7 0.14933443 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform

8 0.14178434 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction

9 0.12852713 306 cvpr-2013-Non-rigid Structure from Motion with Diffusion Maps Prior

10 0.1217423 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera

11 0.11517198 135 cvpr-2013-Discriminative Subspace Clustering

12 0.11514013 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

13 0.11296041 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

14 0.10715814 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition

15 0.10692875 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

16 0.10370193 215 cvpr-2013-Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces

17 0.10309478 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure

18 0.10096467 162 cvpr-2013-FasT-Match: Fast Affine Template Matching

19 0.098549232 433 cvpr-2013-Top-Down Segmentation of Non-rigid Visual Objects Using Derivative-Based Search on Sparse Manifolds

20 0.097867072 465 cvpr-2013-What Object Motion Reveals about Shape with Unknown BRDF and Lighting

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.202), (1, 0.112), (2, -0.024), (3, -0.019), (4, -0.041), (5, -0.027), (6, 0.028), (7, -0.115), (8, -0.011), (9, -0.008), (10, 0.041), (11, 0.181), (12, -0.05), (13, -0.117), (14, 0.107), (15, 0.021), (16, 0.007), (17, -0.001), (18, -0.172), (19, 0.0), (20, 0.067), (21, -0.094), (22, 0.037), (23, -0.054), (24, 0.085), (25, 0.01), (26, -0.026), (27, -0.046), (28, 0.041), (29, 0.036), (30, -0.08), (31, 0.084), (32, -0.015), (33, -0.046), (34, -0.005), (35, -0.014), (36, -0.105), (37, 0.03), (38, -0.066), (39, 0.009), (40, 0.026), (41, 0.108), (42, -0.035), (43, 0.042), (44, 0.043), (45, -0.007), (46, 0.016), (47, 0.028), (48, -0.157), (49, -0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97485912 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures

Author: Bastien Jacquet, Roland Angst, Marc Pollefeys

2 0.69208151 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video

Author: Ravi Garg, Anastasios Roussos, Lourdes Agapito

3 0.68882346 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Author: Fernando Flores-Mangas, Allan D. Jepson

4 0.65155065 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform

Author: Tak-Wai Hui, Ronald Chung

Abstract: We address the problem of recovering camera motion from video data, which does not require the establishment of feature correspondences or computation of optical flows but from normal flows directly. We have designed an imaging system that has a wide field of view by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, we discover that estimating the directions of translation and rotation components of the motion separately are possible and particularly efficient. In addition, the inherent ambiguities between translation and rotation also disappear. Magnitude of rotation is recovered subsequently. Experimental results on synthetic and real image data are provided. The results show that not only the accuracy of motion estimation is comparable to those of the state-of-the-art methods that require explicit feature correspondences or optical flows, but also a faster computation time.

5 0.64251894 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion

Author: Minsik Lee, Jungchan Cho, Chong-Ho Choi, Songhwai Oh

Abstract: Non-rigid structure from motion is a fundamental problem in computer vision, which is yet to be solved satisfactorily. The main difficulty of the problem lies in choosing the right constraints for the solution. In this paper, we propose new constraints that are more effective for non-rigid shape recovery. Unlike the other proposals which have mainly focused on restricting the deformation space using rank constraints, our proposal constrains the motion parameters so that the 3D shapes are most closely aligned to each other, which makes the rank constraints unnecessary. Based on these constraints, we define a new class ofprobability distribution called the Procrustean normal distribution and propose a new NRSfM algorithm, EM-PND. The experimental results show that the proposed method outperforms the existing methods, and it works well even if there is no temporal dependence between the observed samples.

6 0.6259948 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera

7 0.59949684 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields

8 0.5942573 306 cvpr-2013-Non-rigid Structure from Motion with Diffusion Maps Prior

9 0.5797832 88 cvpr-2013-Compressible Motion Fields

10 0.57599789 118 cvpr-2013-Detecting Pulse from Head Motions in Video

11 0.56635243 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

12 0.55459172 368 cvpr-2013-Rolling Shutter Camera Calibration

13 0.54089159 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization

14 0.53641808 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT

15 0.53604591 283 cvpr-2013-Megastereo: Constructing High-Resolution Stereo Panoramas

16 0.52921391 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction

17 0.51134294 334 cvpr-2013-Pose from Flow and Flow from Pose

18 0.50467306 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

19 0.4994998 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition

20 0.49800643 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.1), (16, 0.027), (26, 0.03), (28, 0.015), (29, 0.01), (33, 0.463), (57, 0.017), (65, 0.011), (67, 0.054), (69, 0.056), (76, 0.011), (80, 0.016), (87, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99605024 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

Author: Arpit Jain, Abhinav Gupta, Mikel Rodriguez, Larry S. Davis

Abstract: How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate stateof-the-art performance on UCF50 and Olympics datasets.

2 0.99591142 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

Author: Raghuraman Gopalan

Abstract: Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different locations, we propose joint learning of information that images across locations share and vary upon. Starting with generative and discriminative subspaces pertaining to domains, which are obtained by a hierarchical grouping of images from adjacent locations, we present a top-down approach that first models cross-domain information transfer by utilizing the geometry ofthese subspaces, and then encodes the model results onto individual images to infer their location. We report competitive results for location recognition and clustering on two public datasets, im2GPS and San Francisco, and empirically validate the utility of various design choices involved in the approach.

same-paper 3 0.9957394 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures

Author: Bastien Jacquet, Roland Angst, Marc Pollefeys

4 0.9956612 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization

Author: Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang

Abstract: For the task of visual categorization, the learning model is expected to be endowed with discriminative visual feature representation and flexibilities in processing many categories. Many existing approaches are designed based on a flat category structure, or rely on a set of pre-computed visual features, hence may not be appreciated for dealing with large numbers of categories. In this paper, we propose a novel dictionary learning method by taking advantage of hierarchical category correlation. For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity. Moreover, the dictionaries in lower levels also inherit the dictionary of ancestor nodes, so that categories in lower levels are described with multi-scale visual information using our dictionary learning approach. Experiments on ImageNet object data subset and SUN397 scene dataset demonstrate that our approach achieves promising performance on data with large numbers of classes compared with some state-of-the-art methods, and is more efficient in processing large numbers of categories.

5 0.99531054 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit

Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox

Abstract: Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel. While progress in deep learning shows the importance of learning features through multiple layers, it is equally important to learn features through multiple paths. We propose Multipath Hierarchical Matching Pursuit (M-HMP), a novel feature learning architecture that combines a collection of hierarchical sparse features for image classification to capture multiple aspects of discriminative structures. Our building blocks are MI-KSVD, a codebook learning algorithm that balances the reconstruction error and the mutual incoherence of the codebook, and batch orthogonal matching pursuit (OMP); we apply them recursively at varying layers and scales. The result is a highly discriminative image representation that leads to large improvements to the state-of-the-art on many standard benchmarks, e.g., Caltech-101, Caltech-256, MITScenes, Oxford-IIIT Pet and Caltech-UCSD Bird-200.

6 0.99494261 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree

7 0.99487621 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

8 0.99468863 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

9 0.99432969 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification

10 0.9939391 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification

11 0.99387056 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition

12 0.99385273 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

13 0.9938274 287 cvpr-2013-Modeling Actions through State Changes

14 0.99365819 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval

15 0.99345136 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification

16 0.99339658 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields

17 0.99316746 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

18 0.99314106 315 cvpr-2013-Online Robust Dictionary Learning

19 0.99305928 255 cvpr-2013-Learning Separable Filters

20 0.99286431 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition