cvpr cvpr2013 cvpr2013-379 knowledge-graph by maker-knowledge-mining

379 cvpr-2013-Scalable Sparse Subspace Clustering

Source: pdf

Author: Xi Peng, Lei Zhang, Zhang Yi

Abstract: In this paper, we address two problems in Sparse Subspace Clustering algorithm (SSC), i.e., scalability issue and out-of-sample problem. SSC constructs a sparse similarity graph for spectral clustering by using ?1-minimization based coefficients, has achieved state-of-the-art results for image clustering and motion segmentation. However, the time complexity of SSC is proportion to the cubic of problem size such that it is inefficient to apply SSC into large scale setting. Moreover, SSC does not handle with out-ofsample data that are not used to construct the similarity graph. For each new datum, SSC needs recalculating the cluster membership of the whole data set, which makes SSC is not competitive in fast online clustering. To address the problems, this paper proposes out-of-sample extension of SSC, named as Scalable Sparse Subspace Clustering (SSSC), which makes SSC feasible to cluster large scale data sets. The solution of SSSC adopts a ”sampling, clustering, coding, and classifying” strategy. Extensive experimental results on several popular data sets demonstrate the effectiveness and efficiency of our method comparing with the state-of-the-art algorithms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 SSC constructs a sparse similarity graph for spectral clustering by using ? [sent-8, score-0.727]

2 1-minimization based coefficients, has achieved state-of-the-art results for image clustering and motion segmentation. [sent-9, score-0.279]

3 However, the time complexity of SSC is proportion to the cubic of problem size such that it is inefficient to apply SSC into large scale setting. [sent-10, score-0.065]

4 Moreover, SSC does not handle with out-ofsample data that are not used to construct the similarity graph. [sent-11, score-0.162]

5 For each new datum, SSC needs recalculating the cluster membership of the whole data set, which makes SSC is not competitive in fast online clustering. [sent-12, score-0.263]

6 To address the problems, this paper proposes out-of-sample extension of SSC, named as Scalable Sparse Subspace Clustering (SSSC), which makes SSC feasible to cluster large scale data sets. [sent-13, score-0.278]

7 The solution of SSSC adopts a ”sampling, clustering, coding, and classifying” strategy. [sent-14, score-0.057]

8 Extensive experimental results on several popular data sets demonstrate the effectiveness and efficiency of our method comparing with the state-of-the-art algorithms. [sent-15, score-0.04]

9 Introduction Clustering is one of the fundamental and important topics in pattern recognition and computer version communities, which aims to group the similar patterns into the same cluster by maximizing the inter-cluster dissimilarity and the intra-cluster similarity. [sent-17, score-0.093]

10 Over the past two decades, a number of clustering approaches such as k-means clustering have been extensively studied. [sent-18, score-0.558]

11 Recently, clustering in non-linearly separable data has became a hot topic. [sent-19, score-0.384]

12 To address this problem, numerous methods have been proposed, for example, kernel-based clustering [10], algebraic methods [20], iterative methods [27], statistical methods [19], and spectral clustering [21]. [sent-20, score-0.773]

13 In this paper, we mainly focus on scalable spectral clustering algorithms. [sent-21, score-0.622]

14 Spectral clustering belongs to the family of subspace Figure 1. [sent-22, score-0.453]

15 clustering [26] which aims at finding a low-dimensional subspace for each group of points. [sent-24, score-0.453]

16 The assumption of spectral clustering is that the high-dimensional data actually lie on a low-dimensional manifold. [sent-25, score-0.534]

17 It has achieved impressive results in various applications [2, 5]. [sent-26, score-0.033]

18 The basic idea of spectral clustering is to find a cluster membership of the data points by using the spectrum of the affinity matrix that depicts the similarity among data points, and thus the construction of similarity graph lies on its heart. [sent-27, score-1.064]

19 In a similarity graph, the vertex denotes a data point and the connection weight between two points represents the similarity. [sent-28, score-0.19]

20 Generally, there are two ways to build a similarity graph. [sent-29, score-0.085]

21 However, the points are close in terms of pairwise distance may not belong to the same subspace. [sent-33, score-0.106]

22 As a result, the way to construct similarity graph by using reconstruction representation coefficient has became more and more popular since it measures the similarity based on the data distribution. [sent-34, score-0.422]

23 Several algorithms have been proposed, for 444232880 example, Locally linear manifold clustering (LLMC) [13], SMCE [8], ? [sent-36, score-0.279]

24 Recently, Elhamifar and Vidal [7, 9] constructed a sparse similarity graph by using ? [sent-38, score-0.2]

25 1-minimization based coefficients for spectral clustering, named Sparse Subspace Clustering (SSC). [sent-39, score-0.309]

26 It automatically selects the nearby points for each datum by utilizing the principle of sparsity without a fixed global parameter to determine the size of neighborhood. [sent-40, score-0.171]

27 However, SSC requires solving n optimization problems over n data points and calculating the eigenvectors of the graph Laplacian matrix. [sent-41, score-0.199]

28 Its computational complexity is more than O(n3) even though the fast ? [sent-42, score-0.038]

29 1-solver is used, which means that any medium sized data set will bring up the scalability issues with SSC. [sent-43, score-0.139]

30 Moreover, SSC is an offline algorithm which does not handle with the data not used to construct similarity graph (out-of-sample data). [sent-44, score-0.2]

31 For each new datum, SSC has to perform the algorithm over the w- hole data set such that it is not suitable for fast online clustering. [sent-45, score-0.04]

32 In this paper, we propose a simple but effective out-ofsample extension of SSC, named as Scalable Sparse Subspace Clustering (SSSC), which resolves the scalability issue in SSC as a kind of out-of-sample problem. [sent-46, score-0.22]

33 The proposed method adopts a ”sampling, clustering, coding, and classifying” strategy, as shown in Figure 1. [sent-47, score-0.057]

34 Our motivation derives from the sparsity assumption that each data point can be represented as a linear combination of a few basis vectors. [sent-48, score-0.071]

35 It implies that the union of the linear subspaces spanned by in-sample data could equal or approximate to that spanned by the original data. [sent-49, score-0.184]

36 In other words, one could use a small number of data points (in-sample data) to represent the original one without loss of information. [sent-50, score-0.105]

37 Consequently, in theory, the solution of scalability issue may be not at the cost of clustering quality, which is an interesting conclusion. [sent-51, score-0.416]

38 The property may only belong to the representation coefficient based spectral clustering, which has not been exploited to develop a scalable method as far as we known. [sent-52, score-0.415]

39 The rest of the paper is organized as follows: Section 2 provides a brief review of SSC and some popular methods for large scale spectral clustering. [sent-53, score-0.242]

40 Related Works Except in some specified cases, lower-case bold letters represent column vectors and upper-case bold ones represent matrices. [sent-58, score-0.062]

41 AT denotes the transpose of the matrix A whose pseudo-inverse is A−1, and I reserved for idenis tity matrix. [sent-59, score-0.082]

42 Moreover, we summarize some notations used throughout the paper in Table 1. [sent-60, score-0.029]

43 Sparse Subspace Clustering Recently, some researchers have explored to utilize the inherent sparsity of sparse representation to construct a similarity graph for dimension reduction [25], image analy- sis [5], and so on. [sent-65, score-0.299]

44 In these works, Elhamifar and Vidal [7, 9] proposed the SSC algorithm for subspace segmentation with well-founded recovery theory for independent subspaces and disjoint subspaces. [sent-66, score-0.22]

45 SSC calculates the similarity among data points via solving the following optimization problem: min ? [sent-67, score-0.19]

46 2 < δ, (1) where ci ∈ Rn is the sparse representation of data point yi ∈ Rm over dictionary Yi ? [sent-73, score-0.216]

47 The solution of the (1) can b δe ≥ach 0ie ivse dth by using convex optimization mne otfhto hdes referring to [31] which provides an extensive survey. [sent-81, score-0.084]

48 After getting the coefficients of all data points, SSC performs spectral clustering [21] over the sparse coefficients as described in Algorithm 1. [sent-82, score-0.718]

49 It is easy to find that the computational complexity of SSC is very high. [sent-83, score-0.038]

50 For example, SSC needs O(tn2m2 + tmn3) to construct a similarity graph even though it adopts Homotopy optimizer [23] to get the sparsest solution, where Homotopy optimizer is one of the fastest ? [sent-84, score-0.425]

51 In addition, it will take O(n3) to calculate the eigenvectors of the Laplacian matrix L. [sent-86, score-0.114]

52 Consider L is a sparse matrix, the time complexity of this step could be reduced to O(mn + mn2) when Lanczos eigensolver is used. [sent-87, score-0.188]

53 Input: A set of data points Y ∈ Rm×n, and the number of udte:sir Aed se ctlu osft dearsta ak p. [sent-90, score-0.169]

54 which conOsisbttsa ionf tthhee efiigrsetn kv encotromr maliazterdix eigenvectors of L corresponding to its k smallest eigenvalues. [sent-100, score-0.082]

55 5: Get the segmentations of the data by performing kmeans on the row of V. [sent-101, score-0.11]

56 Large Scale Spectral Clustering Recently, some works have devoted to solve the scalability issue in spectral clustering. [sent-105, score-0.352]

57 One natural option is to reduce the time cost of eigen-decomposition over Laplacian matrix. [sent-106, score-0.06]

58 [11] proposed using Nystr ¨om method to avoid computing the whole similarity matrix. [sent-108, score-0.085]

59 Another option is to reduce the data size by performing sampling techniques or replacing the original data set with a small number of points. [sent-111, score-0.229]

60 [30] provided a framework for fast approximate spectral clustering by selecting some representative points based on k-means or random projection trees. [sent-113, score-0.559]

61 It firstly chooses p representative points (landmarks) from data using k-means clustering or randomly sampling; then, constructs a Laplacian matrix L = ATA, where the element aij of A ∈ Rp×n is the similarity between the original data point a ∈nd R Rthe landmark based on pairwise distance. [sent-115, score-0.651]

62 [22] proposed Spectral Embedded Clustering (SEC) which groups out-of-sample data by using subspace learning methods. [sent-120, score-0.214]

63 The second option, which selects some key data points to represent the others, has became more and more popular owing to its effectiveness and efficiency. [sent-121, score-0.232]

64 However, all these approaches focus on developing a large scale method for pairwise similarity based spectral clustering and did not explore the intrinsic characteristics of data structure. [sent-122, score-0.687]

65 In this paper, we will fill this gap by making SSC feasible for grouping out-of-sample data. [sent-123, score-0.082]

66 As far as we know, it is the first work to address the scalability issue of non-pairwise similarity based spectral clustering. [sent-124, score-0.437]

67 We make SSC feasible to cluster large scale data sets in ”sampling, clustering, coding, and classifying” manner. [sent-127, score-0.195]

68 The first two steps chose a small number of data points as in-sample data and perform SSC over it. [sent-128, score-0.145]

69 The third and fourth steps encode out-of-sample data as a linear combination of in-sample data and assign the non-sampled data to the cluster which produces the minimal residual over the chosen data, respectively. [sent-129, score-0.282]

70 Algorithm Description and Discussion The basic idea of our approach is: for a set of data points Y drawn from the linear subspaces {Si}ik=1, each subspace SYi disr spanned by a lcionlelaerc tsiuobns opafc cdeasta { points Bi ? [sent-132, score-0.471]

71 da {tay points yi ∈ Si and yi ∈/ Bi can be represented as a linear combina∈tio Sn of Bi. [sent-135, score-0.201]

72 ∈I/t implies that scalable spectral clustering could be resolved in two steps. [sent-136, score-0.622]

73 The first step is performing spectral clustering over a s- mall number of data points (in-sample data) which could exactly or approximately represent the original data space. [sent-137, score-0.702]

74 In other words, out-of-sample data has no influence on the segmentation result. [sent-138, score-0.04]

75 It is an interesting and challenging problem to select some key points as in-sample data. [sent-142, score-0.065]

76 Benefit from the characteristic of large scale data set, we adopt uniform random sampling technique which has been proved competitive in practice [4, 6, 30]. [sent-143, score-0.155]

77 After getting the cluster assignment of in-sample data, it is nature to obtain the cluster membership of out-of-sample data by performing classification over it. [sent-144, score-0.39]

78 Based on the above analysis, we can find that the clustering error of SSSC mainly comes from the grouping error of out-of-sample data. [sent-145, score-0.326]

79 Therefore, it is a key to design an effective approach for grouping the non-sampled data. [sent-146, score-0.047]

80 The most simple method is to directly assign out-of-sample data to the nearest cluster in terms of Euclidean distance or other pairwise distances. [sent-147, score-0.174]

81 However, most high-dimensional data are not lie into the Euclidean space such that Euclidean distance is not a good metric to measure the adjacency relationship among data points. [sent-148, score-0.08]

82 On the other hand, a important task of subspace clustering is to find a low-dimensional representation for each data point. [sent-149, score-0.524]

83 we adopted sparse representa444333002 tion based classification (SRC) [29] which could satisfy the requirements. [sent-151, score-0.077]

84 The first step of SRC is coding the testing sample y over the training data D: min ? [sent-152, score-0.088]

85 Although SRC has achieved impressive results in pattern recognition, a recent work [32] empirically showed that non-sparse linear representation could achieve competitive recognition rate with less time cost. [sent-163, score-0.099]

86 Therefore, to reduce the computational cost for grouping out-of-sample data, the main block in large scale clustering, we perform linear coding scheme but sparse one by solving mcin ? [sent-164, score-0.227]

87 The solution of (5) is named as Collaborative Representation by Zhang et al. [sent-169, score-0.057]

88 Then, the classification results are achieved by calculat- ing regularized residuals over all classes by computing ri(y) =? [sent-171, score-0.038]

89 (6) and assigning y to the class which produces the minimal ri (y) using (4). [sent-176, score-0.065]

90 Complexity Analysis Suppose p samples are randomly selected from n data points with dimensionality m, we need O(t1p2m2 + t1mp3 + p2 + t2pk2) to get the cluster membership over in-sample data when Homotopy optimizer [23] is used to solve ? [sent-180, score-0.45]

91 1-minimization problem and Lanczos eigensolver is used to compute the eigenvectors of Laplacian matrix L, where k is the number of desired clusters, and t1 and t2 are the number of iterations of Homotopy optimizer and kmeans clustering, respectively. [sent-181, score-0.283]

92 To group out-of-sample data points, our approach needs computing the inverse of the matrix XXT to get the linear representation of X¯ ∈ Rm× (n−p) . [sent-182, score-0.126]

93 Therefore, the time complexity is O(pm2 X+ p∈3 R+ (n − p)p2). [sent-183, score-0.038]

94 Input: A set of data points Y ∈ Rm×n, the number of udte:sir Aed s ectlu osfte drsa ak, p aonindt sth Ye rigid regression parameter = 1e − 6. [sent-185, score-0.169]

95 a points from Y using random sampling λ or other methods, e. [sent-187, score-0.118]

96 3: Calculate the linear representation of out-of-sample data X¯ over X by C¯i∗ = (XTX + λI) · XTX¯. [sent-194, score-0.071]

97 (7) 4: Calculate the normalized residuals of x¯i ∈ X¯ over all classes by rj(¯ xi) =? [sent-195, score-0.038]

98 (8) 5: Assign xi to the class which produces the minimal residual by identity( x¯i) = argjmin{rj( x¯i)}. [sent-200, score-0.069]

99 (9) Output: The cluster membership of X and X¯. [sent-201, score-0.188]

100 Putting everything together, the computational complexity of SSSC is O(t1mp3 t2pk2 np2) owing to k, m < p ? [sent-202, score-0.1]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ssc', 0.676), ('sssc', 0.292), ('clustering', 0.279), ('spectral', 0.215), ('subspace', 0.174), ('scalable', 0.128), ('homotopy', 0.113), ('scalability', 0.099), ('membership', 0.095), ('cluster', 0.093), ('optimizer', 0.091), ('src', 0.091), ('similarity', 0.085), ('laplacian', 0.078), ('sparse', 0.077), ('datum', 0.075), ('eigensolver', 0.073), ('sir', 0.073), ('udte', 0.073), ('yi', 0.068), ('points', 0.065), ('became', 0.065), ('lanczos', 0.065), ('owing', 0.062), ('option', 0.06), ('named', 0.057), ('adopts', 0.057), ('aed', 0.057), ('eigenvectors', 0.056), ('vidal', 0.054), ('xtx', 0.054), ('sampling', 0.053), ('spanned', 0.049), ('coding', 0.048), ('grouping', 0.047), ('subspaces', 0.046), ('elhamifar', 0.046), ('rm', 0.044), ('residual', 0.041), ('coefficient', 0.041), ('pairwise', 0.041), ('data', 0.04), ('aij', 0.039), ('graph', 0.038), ('residuals', 0.038), ('issue', 0.038), ('complexity', 0.038), ('classifying', 0.037), ('coefficients', 0.037), ('ri', 0.037), ('construct', 0.037), ('performing', 0.036), ('competitive', 0.035), ('rj', 0.035), ('feasible', 0.035), ('ak', 0.034), ('kmeans', 0.034), ('constructs', 0.033), ('impressive', 0.033), ('getting', 0.033), ('aonindt', 0.032), ('chengdu', 0.032), ('disr', 0.032), ('smce', 0.032), ('aai', 0.032), ('lsc', 0.032), ('argimin', 0.032), ('combina', 0.032), ('lec', 0.032), ('osfte', 0.032), ('landmarks', 0.032), ('representation', 0.031), ('sparsity', 0.031), ('rn', 0.031), ('bold', 0.031), ('euclidean', 0.03), ('hdes', 0.03), ('osft', 0.03), ('matrix', 0.029), ('identity', 0.029), ('calculate', 0.029), ('notations', 0.029), ('minimal', 0.028), ('lrr', 0.028), ('mcin', 0.028), ('daunting', 0.028), ('ivse', 0.028), ('dc', 0.028), ('scale', 0.027), ('argjmin', 0.027), ('reserved', 0.027), ('mall', 0.027), ('xxt', 0.027), ('nystr', 0.027), ('get', 0.026), ('extension', 0.026), ('mne', 0.026), ('tay', 0.026), ('kv', 0.026), ('transpose', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 379 cvpr-2013-Scalable Sparse Subspace Clustering

Author: Xi Peng, Lei Zhang, Zhang Yi

2 0.25666735 135 cvpr-2013-Discriminative Subspace Clustering

Author: Vasileios Zografos, Liam Ellis, Rudolf Mester

Abstract: We present a novel method for clustering data drawn from a union of arbitrary dimensional subspaces, called Discriminative Subspace Clustering (DiSC). DiSC solves the subspace clustering problem by using a quadratic classifier trained from unlabeled data (clustering by classification). We generate labels by exploiting the locality of points from the same subspace and a basic affinity criterion. A number of classifiers are then diversely trained from different partitions of the data, and their results are combined together in an ensemble, in order to obtain the final clustering result. We have tested our method with 4 challenging datasets and compared against 8 state-of-the-art methods from literature. Our results show that DiSC is a very strong performer in both accuracy and robustness, and also of low computational complexity.

3 0.24083541 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

Author: Bo Wang, Zhuowen Tu

Abstract: With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure. The real-world data however often come with noises and outliers; seldom, all the data live in a single linear subspace. Inspired by the recent advances in sparse subspace learning and diffusion-based approaches, we propose a new manifold denoising algorithm in which data neighborhoods are adaptively inferred via sparse subspace reconstruction; we then derive a new formulation to perform denoising to the original data. Experiments carried out on both toy and real applications demonstrate the effectiveness of our method; it is insensitive to parameter tuning and we show significant improvement over the competing algorithms.

4 0.18568176 93 cvpr-2013-Constraints as Features

Author: Shmuel Asafi, Daniel Cohen-Or

Abstract: In this paper, we introduce a new approach to constrained clustering which treats the constraints as features. Our method augments the original feature space with additional dimensions, each of which derived from a given Cannot-link constraints. The specified Cannot-link pair gets extreme coordinates values, and the rest of the points get coordinate values that express their spatial influence from the specified constrained pair. After augmenting all the new features, a standard unconstrained clustering algorithm can be performed, like k-means or spectral clustering. We demonstrate the efficacy of our method for active semi-supervised learning applied to image segmentation and compare it to alternative methods. We also evaluate the performance of our method on the four most commonly evaluated datasets from the UCI machine learning repository.

5 0.15129767 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

Author: Fernando Flores-Mangas, Allan D. Jepson

Abstract: The problem of rigid motion segmentation of trajectory data under orthography has been long solved for nondegenerate motions in the absence of noise. But because real trajectory data often incorporates noise, outliers, motion degeneracies and motion dependencies, recently proposed motion segmentation methods resort to non-trivial representations to achieve state of the art segmentation accuracies, at the expense of a large computational cost. This paperproposes a method that dramatically reduces this cost (by two or three orders of magnitude) with minimal accuracy loss (from 98.8% achieved by the state of the art, to 96.2% achieved by our method on the standardHopkins 155 dataset). Computational efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Subsets of motion models with the best balance between prediction accuracy and model complexity are chosen from a pool of candidates, which are then used for segmentation. 1. Rigid Motion Segmentation Rigid motion segmentation (MS) consists on separating regions, features, or trajectories from a video sequence into spatio-temporally coherent subsets that correspond to independent, rigidly-moving objects in the scene (Figure 1.b or 1.f). The problem currently receives renewed attention, partly because of the extensive amount of video sources and applications that benefit from MS to perform higher level computer vision tasks, but also because the state of the art is reaching functional maturity. Motion Segmentation methods are widely diverse, but most capture only a small subset of constraints or algebraic properties from those that govern the image formation process of moving objects and their corresponding trajectories, such as the rank limit theorem [9, 10], the linear independence constraint (between trajectories from independent motions) [2, 13], the epipolar constraint [7], and the reduced rank property [11, 15, 13]. Model-selection based (a)Orignalvideoframes(b)Clas -label dtrajectories (c)Modelsup ort egion(d)Modelinlersandcontrolpoints (e)Modelresiduals (f) Segmenta ion result Figure 1: Model instantiation and segmentation. a) fth original frame, Italian Grand Prix ?c 2012 Formula 1. b) Classilanbaell feradm, trajectory Gdaratan Wd P r(rixed ?,c green, bolrumeu alnad 1 .bbl a)c Ck correspond to chassis, helmet, background and outlier classes respectively). c) Spatially-local support subset for a candidate motion in blue. d) Candidate motion model inliers in red, control points from Eq. 3) in white. e) Residuals from Eq. 11) color-coded with label data, the radial coordinate is logarithmic. f) Segmentation result. Wˆf (rif (cif methods [11, 6, 8] balance model complexity with modeling accuracy and have been successful at incorporating more of these aspects into a single formulation. For instance, in [8] most model parameters are estimated automatically from the data, including the number of independent motions and their complexity, as well as the segmentation labels (including outliers). However, because of the large number of necessary motion hypotheses that need to be instantiated, as well as the varying and potentially very large number of 222222555977 model parameters that must be estimated, the flexibility offered by this method comes at a large computational cost. Current state of the art methods follow the trend of using sparse low-dimensional subspaces to represent trajectory data. This representation is then fed into a clustering algorithm to obtain a segmentation result. A prime example of this type of method is Sparse Subspace Clustering (SSC) [3] in which each trajectory is represented as a sparse linear combination of a few other basis trajectories. The assumption is that the basis trajectories must belong to the same rigid motion as the reconstructed trajectory (or else, the reconstruction would be impossible). When the assumption is true, the sparse mixing coefficients can be interpreted as the connectivity weights of a graph (or a similarity matrix), which is then (spectral) clustered to obtain a segmentation result. At the time of publication, SSC produced segmentation results three times more accurate than the best predecessor. The practical downside, however, is the inherently large computational cost of finding the optimal sparse representation, which is at least cubic on the number of trajectories. The work of [14] also falls within the class of subspace separation algorithms. Their approach is based on clustering the principal angles (CPA) of the local subspaces associated to each trajectory and its nearest neighbors. The clustering re-weights a traditional metric of subspace affinity between principal angles. Re-weighted affinities are then used for segmentation. The approach produces segmentation results with accuracies similar to those of SSC, but the computational cost is close to 10 times bigger than SSC’s. In this work we argue that competitive segmentation results are possible using a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. The proposed method is approximately 2 or 3 orders of magnitude faster than [3] and [14] respectively, currently considered the state of the art. 1.1. Affine Motion Projective Geometry is often used to model the image motion of trajectories from rigid objects between pairs of frames. However, alternative geometric relationships that facilitate parameter computation have also been proven useful for this purpose. For instance, in perspective projection, general image motion from rigid objects can be modeled via the composition of two elements: a 2D homography, and parallax residual displacements [5]. The homography describes the motion of an arbitrary plane, and the parallax residuals account for relative depths, that are unaccounted for by the planar surface model. Under orthography, in contrast, image motion of rigid objects can be modeled via the composition of a 2D affine transformation plus epipolar residual displacements. The 2D affine transformation models the motion of an arbitrary plane, and the epipolar residuals account for relative depths. Crucially, these two components can be computed separately and incrementally, which enables an explicit mechanism to deal with motion degeneracy. In the context of 3D motion, a motion is degenerate when the trajectories originate from a planar (or linear) object, or when neither the camera nor the imaged object exercise all of their degrees of freedom, such as when the object only translates, or when the camera only rotates. These are common situations in real world video sequences. The incremental nature of the decompositions described above, facilitate the transition between degenerate motions and nondegenerate ones. Planar Model Under orthography, the projection of trajectories from a planar surface can be modeled with the affine transformation: ⎣⎡xy1c ⎦⎤=?0D? 1t?⎡⎣yx1w ⎦⎤= A2wD→c⎣⎡yx1w ⎦⎤, (1) where D ∈ R2×2 is an invertible matrix, and t ∈ R2 is a threarnesl Datio ∈n v Rector. Trajectory icboloer mdiantartixe,s (axnwd , tyw ∈) are in the plane’s reference frame (modulo a 2D affine transformation) and (xc, yc) are image coordinates. Now, let W ∈ R2F×P be matrix of trajectory data that conNtaoiwns, tlehet x a n∈d y image coordinates of P feature points tracked through F frames, as in TocmputehW pa=r⎢m⎣ ⎡⎢ etyx e1F r.,s 1 ofA· . ·2.Dfyx r1Fo m., P tra⎦⎥ ⎤je.ctorydat,(l2e)t C = [c1, c2 , c3] ∈ R2f×3 be three columns (three full trajectories) from W, and let = be the x and y coordinates of the i-th control trajectory at frame f. Then the transformation between points from an arbitrary source frame s to a target frame f can be written as: cif [ci2f−1, ci2f]? ?c1f1 c12f c1f3?= A2sD→f?c11s and A2s→Df c12s c13s?, (3) ?−1. (4) can be simply computed as: A2sD→f= ? c11f c12f c13f ? ? c11s c12s c13s The inverse in the right-hand side matrix of Eq. 4 exists so long as the points cis are not collinear. For simplicity we refer to as and consequently A2sD is the identity matrix. A2s→Df A2fD 222222556088 3D Model In order to upgrade a planar (degenerate) model into a full 3D one, relative depth must be accounted using the epipolar residual displacements. This means extending Eq. 1 with a direction vector, scaled by the corresponding relative depth of each point, as in: ⎣⎡xy1c ⎦⎤=?0?D? t1? ⎡⎣xy1w ⎦⎤+ δzw⎣⎡a 0213 ⎦⎤. The depth δzw is relative to the arbitrary plane tion is modeled by A2D; a point that lies on would have δzw = 0. We call the orthographic the plane plus parallax decomposition, the 2D Epipolar (2DAPE) decomposition. Eq. 5 is equivalent to (5) whose mosuch plane version of Affine Plus wher⎣⎡ ixyt1c is⎤⎦cl=ear⎡⎣tha 120t hea p201a2rma2e10t3erst1o2f⎦⎤A⎣⎢⎡3Dδyxd1zwefin⎦⎥⎤ean(o6r)thographically projected 3D affine transformation. Deter- mining the motion and structure parameters of a 3D model from point correspondences can be done using the classical matrix factorization approach [10], but besides being sensitive to noise and outliers, the common scenario where the solution becomes degenerate makes the approach difficult to use in real-world applications. Appropriately accommodating and dealing with the degenerate cases is one of the key features of our work. 2. Overview of the Method The proposed motion segmentation algorithm has three stages. First, a pool of M motion model hypotheses M = s{tMag1e , . . . , rMst,M a} p oiso generated using a omdeetlh hoydp tohthate csoesm Mbine =s a MRandom Sampling naenrda eCdon usseinngsu as m(ReAthNodS tAhaCt) [o4m] bteinche-s nique with the 2DAPE decomposition. The goal is to generate one motion model for each of the N independent, rigidly-moving objects in the scene; N is assumed to be known a priori. The method instantiates many more models than those expected necessary (M ? N) in an attempt ienlscr tehaasne tthhoes elik eexlpiheocotedd o nfe generating Mco ?rrec Nt m) iond aenl proposals for all N motions. A good model accurately describes a large subset of coherently moving trajectories with the smallest number of parameters (§3). Ialnl ethste n suemcobnedr stage, msubetseertss o§f3 )m.otion models from M are ncom thebi sneecdo ntod explain ualbl sthetes trajectories mino tdheel sequence. The problem is framed as an objective function that must be minimized. The objective function is the negative loglikelihood over prediction accuracy, regularized by model complexity (number of model parameters) and modeling overlap (trajectories explained by multiple models). Notice that after this stage, the segmentation that results from the optimal model combination could be reported as a segmentation result (§5). ioTnhe r tshuilrtd ( stage incorporates the results from a set of model combinations that are closest to the optimal. Segmentation results are aggregated into an affinity matrix, which is then passed to a spectral clustering algorithm to produce the final segmentation result. This refinement stage generally results in improved accuracy and reduced segmentation variability (§6). 3. Motion Model Instantiation Each model M ∈ M is instantiated independently using RacAhN mSAodCel. MThis ∈ c Mhoi cies niss manotitiavteatded in bdeecpaeunsdee otlfy th usismethod’s well-known computational efficiency and robustness to outliers, but also because of its ability to incorporate spatially local constraints and (as explained below) because most of the computations necessary to evaluate a planar model can be reused to estimate the likelihoods of a potentially necessary 3D model, yielding significant computational savings. The input to our model instantiation algorithm is a spatially-local, randomly drawn subset of trajectory data Wˆ[2F×I] ⊆ W[2F×P] (§3.1). In turn, at each RANSAC trial, the algorithm draw(§s3 uniformly d,i asttr eibaucthed R, A rNanSdoAmC subsets of three control trajectories (C[2F×3] ⊂ Wˆ[2F×I]). Each set of control trajectories is used to estim⊂ate the family of 2D affine transformations {A1, . . . , AF} between the iblyase o ffr 2aDm aef ainnde aralln sotfoherrm fartaimoness { iAn the sequence, wtwheicehn are then used to determine a complete set of model parameters M = {B, σ, C, ω}. The matrix B ∈ {0, 1}[F×I] indicates Mwhe =the {rB t,hσe ,iC-th, trajectory asthroixu Bld ∈b e predicted by model M at frame f (inlier, bif = 1) or not (outlier, bif = 0), σ = {σ1 , . . . , σF} are estimates of the magnitude of the σnois =e {foσr each fram}e a, aen eds ω ∈at {s2 oDf, t3hDe} m isa tnhietu edsetim ofa ttehde nmooidseel f type. hTh fera goal aisn dto ω ωfin ∈d {t2heD c,3oDntr}ol is points tainmda ttehed associated parameters that minimize the objective function O(Wˆ,M) =f?∈Fi?∈IbifLω? wˆif| Af,σf?+ Ψ(ω) + Γ(B) across (7) wˆfi a number of RANSAC trials, where = = are the coordinates of the i-th trajectory from the support subset at frame f. The negative log-likelihood term Lω (·) penalizes reconstruction error, while Ψ(·) and Γ(·) are regularizers. Tcohen tthrureceti otenr mer-s are ,d wefhinileed Ψ Ψ b(e·l)ow an. Knowing that 2D and 3D affine models have 6 and 8 degrees of freedom respectively, Ψ(ω) regularizes over model complexity using: (xif, yif) ( wˆ 2if−1, wˆ i2f) Wˆ Ψ(ω) =?86((FF − − 1 1)), i f ωω== 32DD. 222222556199 (8) Γ(B) strongly penalizes models that describe too few trajectories: Γ(B) =?0∞,, oifth?erwI?iseFbif< Fλi (9) The control set C whose M minimizes Eq. 7 across a number of RANSAC trials becomes part of the pool of candidates M. 2D likelihoods. For the planar case (ω = 2D) the negative log-likelihood term is evaluated with: L2D( wˆif| Af,σf) = −log?2π|Σ1|21exp?−21rif?Σ−1rif??, (10) which is a zero-mean 2D Normal distribution evaluated at the residuals The spherical covariance matrix is Σ = rif. rif (σf)2I. The residuals are determined by the differences between the predictions made by a hypothesized model Af, and the observations at each frame ?r?1f?=? w˜1?f?− Af? w˜1?s?. (11) 3D likelihoods. The negative log-likelihood term for the 3D case is based on the the 2DAPE decomposition. The 2D affinities Af and residuals rf are reused, but to account for the effect of relative depth, an epipolar line segment ef is robustly fit to the residual data at each frame (please see supplementary material for details on the segment fitting algorithm). The 2DAPE does not constrain relative depths to remain constant across frames, but only requires trajectories to be close to the epipolar line. So, if the unitary vector ef indicates the orthogonal direction to ef, then the negativ⊥e log-likelihood term for the 3D case is estimated with: L3D( wˆfi| Af,σf) = −2log⎜⎝⎛√21πσfexp⎪⎨⎪⎧−?r2if(?σfe)f⊥2?2⎪⎬⎪⎫⎞⎟⎠, ⎠(12,) which is also a zero-mean 2D Norma⎩l distribution ⎭computed as the product of two identical, separable, singlevariate, normal distributions, evaluated at the distance from the residual to the epipolar line. The first one corresponds to the actual deviation in the direction of ef , which is analyti- cally computed using rif?ef. The seco⊥nd one corresponds to an estimate of the deviat⊥ion in the perpendicular direction (ef), which cannot be determined using the 2DAPE decomposition model, but can be approximated to be equal to rif ? ef, which is a plausible estimate under the isotropic noise as⊥sumption. Note that Eq. 7 does not evaluate the quality of a model using the number of inliers, as it is typical for RANSAC. Instead, we found that better motion models resulted from Algorithm 1: Motion model instantiation × Algorithm 1: Motion model instantiation Input:b Traasejec frtoamrye d bata W[2F×P], number of RANSAC trials K, arbitrary Output: Parameters of the motion model M = {B , σn , ω} // determine the training set c ← rand( 1, P) ; r ← rand(rmin , rmax ) // random center and radius I] ← t ra j e ct oriesWithinDis k (W, r,c) // support subset X ← homoCoords(Wˆb) // points at base frame for K RANSAC trials do Wˆ[2F Wˆ return M = {B} optimizing over the accuracy ofthe model predictions for an (estimated) inlier subset, which also means that the effect of outliers is explicitly uncounted. Figure 1.b shows an example of class-labeled trajectory data, 1.c shows a typical spatially-local support subset. Figures 1.d and 1.e show a model’s control points and its corresponding (class-labeled) residuals, respectively. A pseudocode description of the motion instantiation algorithm is provided in Algorithm 1. Details on how to determine Wˆ, as well as B, σ, and ω follow. 3.1. Local Coherence The subset of trajectories Wˆ given to RANSAC to generate a model M is constrained to a spatially local region. The probability ofchoosing an uncontaminated set of 3 control trajectories, necessary to compute a 2D affine model, from a dataset with a ratio r of inliers, after k trials is: p = 1 − (1 − r3)k. This means that the number of trials pne =ede 1d −to (fi1n d− a subset of 3 inliers with probability p is k =lloogg((11 − − r p3)). (13) A common assumption is that trajectories from the same underlying motion are locally coherent. Hence, a compact region is likely to increase r, exponentially reducing 222222666200 Figure 2: Predictions (red) from a 2D affine model with standard Gaussian noise (green) on one of the control points (black). Noiseless model predictions in blue. All four scenarios have identical noise. The magnitude of the extrapolation error changes with the distance between the control points. k, and with it, RANSAC’s computation time by a proportional amount. The trade-off that results from drawing model control points from a small region, however, is extrapolation error. A motion model is extrapolated when utilized to make predictions for trajectories outside the region defined by the control points. The magnitude of modeling error depends on the magnitude of the noise affecting the control points, and although hard to characterize in general, extrapolation error can be expected to grow with the distance from the prediction to the control points, and inversely with the distance between the control points themselves. Figure 2 shows a series of synthetic scenarios where one of the control points is affected by zero mean Gaussian noise of small magnitude. Identical noise is added to the same trajectory in all four scenarios. The figure illustrates the relation between the distance between the control points and the magnitude of the extrapolation errors. Our goal is to maximize the region size while limiting the number of outliers. Without any prior knowledge regarding the scale of the objects in the scene, determining a fixed size for the support region is unlikely to work in general. Instead, the issue is avoided by randomly sampling disk-shaped regions of varying sizes and locations to construct a diverse set of support subsets. Each support subset is then determined by Wˆ = {wi | (xbi − ox)2 + (ybi − oy)2 < r2}, (14) where (ox , oy) are the coordinates of the center of a disk of radius r. To promote uniform image coverage, the disk is centered at a randomly chosen trajectory (ox , oy) = (xbi, yib) with uniformly distributed i ∼ U(1, P) and base frame b) w∼i h U u(1n,i fFor)m. yTo d asltrloibwu efodr idi ∼ffer Ue(n1t, region ds bizaesse, tfhraem read bius ∼ r is( ,cFho)s.en T ofro amllo a u fnoirfo dirmffe rdeinsttr riebugtiioonn r ∼s, tUh(erm raidni,u ursm rax i)s. Ihfo tsheenre f are mI a trajectories swtritihbiunt othne support region, then ∈ R2F×I. It is worth noting that the construction of theW support region does not incorporate any knowledge about the motion of objects in the scene, and in consequence will likely contain trajectories that originate from more than one independently moving object (Figure 3). Wˆ Wˆ Figure 3: Two randomly drawn local support sets. Left: A mixed set with some trajectories from the blue and green classes. Right: Another mixed set with all of the trajectories in the red class and some from the blue class. 4. Characterizing the Residual Distribution At each RANSAC iteration, residuals rf are computed using the 2D affine model Af that results from the constraints provided by the control trajectories C. Characterizing the distribution of rf has three initial goals. The first one is to determine 2D model inliers b2fD (§4.1), the second one is to compute estimates of the magnitude ,o tfh thee s ncooinsed at every frame σ2fD (§4.2), and the third one is to determine whether the residual( §d4i.s2t)r,ib auntidon th originates efr iosm to a planar or a 3D object (§4.3). If the object is suspected 3D, then two more goals n (§e4ed.3 )to. bIfe t achieved. sT shues pfiercstt one Dis, t hoe nde ttweromine 3D model inliers b3fD (§4.4), and the second one is to estimate the magnitude of the noise of a 3D model (§4.5). (σf3D) to reflect the use 4.1. 2D Inlier Detection Suppose the matrix Wˆ contains trajectories Wˆ1 ∈ R2F×I and Wˆ2 ∈ R2F×J from two independently moving objects, and ∈tha Rt these trajectories are contaminated with zero-mean Gaussian noise of spherical covariance η ∼ N(0, (σf)2I): Wˆ = ?Wˆ1|Wˆ2? + η. (15) A1f Now, assume we know the true affine transformations and that describe the motion of trajectories for the subsets Wˆ1 and Wˆ2, respectively. If is used to compute predictions for all of Wˆ (at frame f), the expected value (denoted by ?·?) of the magnitude of the residuals (rf from Eq. 11) for trajectories aing nWiˆtud1 will be in the order of the magnitude of the underlying noise ?|rif |? = σf for each i∈ {1, . . . , I}. But in this scenario, trajectories in Wˆ2 ewaicl h b ie ∈ predicted using tth ien wrong emnaodrioel,, resulting isn i nr esid?uals? wit?h magnitudes de?termined by the motion differential A2f ???rif??? A1f ???(A1f − A2f) wˆib???. If we = can assume that the motion ?d??riff???er =en???t(iAal is bigger tha???n. tIhfe w deis cpalnac aesmsuemnte d thuea t toh eno miseo:t ???(A1f − A2f)wib??? 222222666311 > σf, (16) then the model inliers can be determined by thresholding | with the magnitude of the noise, scaled by a constant |(τr =| w wλitσhσtf h):e |rif bif=?10,, |orthife|r ≤wi τse. (17) But because σf is generally unknown, the threshold (τ) is estimated from the residual data. To do so, let be the vector of residual magnitudes where rˆi ≤ ˆ ri+1 . Now, let r˜ = median ( rˆi+1 −ˆ r i). The threshold i≤s trˆ h en defined as r τ = min{ rˆi | (ri+1 − ri) > λr˜ r}, (18) which corresponds to the smallest residual magnitude before a salient magnitude gap. Our experiments showed this test to be efficient and effective. Figure 1.e shows classlabeled residuals. Notice the presence of a (low density) gap between the residuals from the trajectories explained by the correct model (in red, close to the origin), and the rest. 4.2. Magnitude of the Noise, 2D Model r2fD Let contain only the residuals of the inlier trajectories (those where = 1), and let USV? be the singular value decomposition of the covariance matrix bif ofˆ r2fD: USV?= svd??1bpf( rˆ2fD)? rˆ2fD?.(19) Then the magnitude of the n?oise corresponds to the largest singular value σ2 = s1, because if the underlying geometry is in fact planar, then the only unaccounted displacements captured by the residuals are due to noise. Model capacity can also be determined from S, as explained next. 4.3. Model Capacity The ratio of largest over smallest singular values (s1/s2) determines when upgrading to a 3D model is beneficial. When the underlying geometry is actually non-planar, the residuals from a planar model should distribute along a line (the epipolar line), reflecting that their relative depth is being unaccounted for. This produces a covariance matrix with a large ratio s1/s2 ? 1. If on the other hand, if s1/s2 ≈ 1, then there is no in 1d.ic Iafti oonn tohfe unexplained relative depth, tihn wnh thicehr case, fitting a olinne o tfo u spherically distributed residuals will only increase the model complexity without explaining the residual variance much better. A small spherical residual covariance strongly suggests a planar underlying geometry. 4.4. 3D Inlier Detection When the residual distribution is elongated (s1/s2 ? 1), a line segment is robustly fit to the (potentially con?tam 1i)-, nated) set of residuals. The segment must go through the origin and its parameters are computed using a Hough transform. Further details about this algorithm can be found in the supplementary material. Inlier detection The resulting line segment is used to determine 3D model inliers. Trajectory ibecomes an inlier at frame f if it satisfies two conditions. First, the projection of rif onto the line must lie within the segment limits (β ≤ ≤ γ). Second, the normalized distance to the rif?ef (ef?rif line must be below a threshold ≤ σ2λd). Notice that the threshold depends on the smalle≤st singular value from Eq. 19 to (roughly) account for the presence of noise in the direction perpendicular to the epipolar (ef). 4.5. Magnitude of the Noise, 3D Model let rˆf3D Similarly to the 2D case, contain the residual data from the corresponding 3D inlier trajectories. An estimate for the magnitude of the noise that reflects the use of a 3D model can be obtained from the singular value decomposition of the covariance matrix of r3fD (as in Eq. 19). In this case, the largest singular value s1 captures the spread of residuals along the epipolar line, so its magnitude is mainly related to the magnitude of the displacements due to relative depth. However, s2 captures deviations from the epipolar line, which in a rigid 3D object can only be attributed to noise, making σ2 = s2 a reasonable estimate for its magnitude. Optimal model parameters When both 2D and 3D models are instantiated, the one with the smallest penalized negative log-likelihood (7) becomes the winning model for the current RANSAC run. The same penalized negative loglikelihood metric is used to determine the better model from across all RANSAC iterations. The winning model is added to the pool M, and the process is repeated M times, forming hthee p pool MM, a=n d{ tMhe1 , . . . , MssM is} r.e 5. Optimal Model Subset The next step is to find the model combination M? ⊂ M thhea tn mxta xstiempiz iess t prediction accuracy finora othne Mwhol⊂e trajectory mdaaxtaim Wize,s w phreiledi minimizing cmyod foerl complexity and modelling overlap. For this purpose, let Mj = {Mj,1 , . . . , Mj,N} be the j-th m thoisdel p ucorpmosbein,at lieotn, M Mand let {Mj} be the set o}f baell MheC jN-th = m N!(MM−!N)!) caotimonb,in aantdio lnest of N-sized models than can be draNw!(nM fr−oNm) M). The model soefle Nct-sioinze problem sis t hthanen c faonr bmeu dlartaewdn as M?= ar{gMmj}inOS(Mj), (20) 222222666422 where the objective is ?N ?P OS(Mj) = ??πp,nE (wp,Mj,n) ?n=1p?P=1 + λΦ?Φ(wp,Mj,n) + λΨ?Ψ(Mj,n). ?N (21) i?= ?1 n?= ?1 The first term accounts for prediction accuracy, the other two are regularization terms. Details follow. Prediction Accuracy In order to determine how well a model M predicts an arbitrary trajectory w, the affine transformations estimated by RANSAC could be re-used. However, the inherent noise in the control points, and the potentially short distance between them, often render this approach impractical, particularly when w is spatially distant from the control points (see §3. 1). Instead, model parametferorsm are computed owinittsh a efeac §to3r.1iz)a.ti Ionnst e baadse,d m [o1d0e]l mpaertahmode.Given the inlier labeling B in M, let WB be the subset of trajectories where bif = 1for at least half of the frames. The orthonormal basis S of a ω = 2D (or 3D) motion model can be determined by the 2 (or 3) left singular vectors of WB. Using S as the model’s motion matrices, prediction accuracy can be computed using: E(w, M) = ??SS?w − w??2 , (22) which is the sum of squared?? Euclidean d??eviations from the predictions (SS?w), to the observed data (w). Our experiments indicated that, although sensitive to outliers, these model predictions are much more robust to noise. Ownership variables Π ∈ {0, 1}[P×N] indicate whether trajectory p i ps explained by t {he0 ,n1-}th model (πp,n = 1) or not (πp,n = 0), and are determined by maximum prediction accuracy (i.e. minimum Euclidean deviation): πp,n=⎨⎧01,, oift hMerjw,nis=e. aMrg∈mMinjE(wp,M) (23) Regularization terms The second term from Eq. 21 penalizes situations where multiple models explain a trajectory (w) with relatively small residuals. For brevity, let M) = exp{−E(w, M)}, then: Eˆ(w, Φ(w,Mj) = −logMMm∈?∈aMMxjE ˆ ( w , MM)). (24) The third term regularizes over the number of model parameters, and is evaluated using Eq. 8. The constants λΦ and λΨ modulate the effect of the corresponding regularizer. Table 1: Accuracy and run-time for the H155 dataset. Naive RANSAC included as a baseline with overall accuracy and total computation time estimated using data from [12]. SOCARAPSulgrCAoNs[S r[31itA]4h]CmAverage89 A689 c. 71c 7695u racy[%]Compu1 t4a 237t1i506o70 n0 time[s] 6. Refinement The optimal model subset M? yields ownership variableTsh Πe o? wtimhicahl can already tb e M interpreted as a segmentation result. However, we found that segmentation accuracy can be improved by incorporating the labellings Πt from the top T subsets {Mt? | 1 ≤ t ≤ T} closest to optimal. Multiple labellings are incorporated oinsetos an affinity matrix F, where the fi,j entry indicates the frequency with which trajectory i is given the same label as trajectory j across all T labelli?ngs, weighted b?y the relative objective function O˜t = exp ?−OOSS((WW||MMt??))? for such a labelling: fi,j=?tT=11O˜tt?T=1?πit,:πjt,?:?O˜t (25) Note that the inne?r product between the label vectors (πi,:πj?,:) is equal to one only when the labels are the same. A spectral clustering method is applied on F to produce the method’s final segmentation result. 7. Experiments Evaluation was made through three experimental setups. Hopkins 155 The Hopkins 155 (H155) dataset has been the standard evaluation metric for the problem of motion segmentation of trajectory data since 2007. It consists of checkerboard, traffic and articulated sequences with either 2 or 3 motions. Data was automatically tracked, but tracking errors were manually corrected; further details are available in [12]. The use of a standard dataset enables direct comparison of accuracy and run-time performance. Table 1 shows the relevant figures for the two most competitive algorithms that we are aware of. The data indicates that our algorithm has run-times that are close to 2 or 3 orders of magnitude faster than the state of the art methods, with minimal accuracy loss. Computation times are measured in the same (or very similar) hardware architectures. Like in CPA, our implementation uses a single set of parameters for all the experiments, but as others had pointed out [14], it remains unclear whether the same is true for the results reported in the original SSC paper. 222222666533 Figure 4: Accuracy error-bars across artificial H155 datasets with controlled levels of Gaussian noise. Artificial Noise The second experimental setup complements an unexplored dimension in the H155 dataset: noise. The goal is to determine the effects of noise of different magnitudes towards the segmentation accuracy of our method, in comparison with the state of the art. We noted that H155 contains structured long-tailed noise, but for the purpose of this experiment we required a noise-free dataset as a baseline. To generate such a dataset, ground-truth labels were used to compute a rank 3 reconstruction of (mean-subtracted) trajectories for each segment. Then, multiple versions of H155 were computed by contaminating the noise-free dataset with Gaussian noise of magnitudes σn ∈ {0.01, 0.25, 0.5, 1, 2, 4, 8}. Our method, as well as SSC a∈nd { 0C.0PA1, were run on t2h,e4s,e8 }n.oi Ose-ucro mnterothlloedd, datasets; results are shown in Figure 4. The error bars on SSC and Ours indicate one standard deviation, computed over 20 runs. The plot for CPA is generated with only one run for each dataset (running time: 11.95 days). The graph indicates that our method only compromises accuracy for large levels of noise, while still being around 2 or 3 orders of magnitude faster than the most competitive algorithms. KLT Tracking The last experimental setup evaluates the applicability of the algorithm in real world conditions using raw tracks from an off-the-shelf implementation [1] of the Kanade-Lucas-Tomasi algorithm. Several sequences were tracked and the resulting trajectories classified by our method. Figure 5 shows qualitatively good motion segmentation results for four sequences. Challenges include very small relative motions, tracking noise, and a large presence of outliers. 8. Conclusions We introduced a computationally efficient motion segmentation algorithm for trajectory data. Efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Run-time comparisons indicate that our method is 2 or 3 orders of magnitude faster than the state of the art, with only a small loss in accuracy. The robustness of our method to Gaussian noise tracks from four Formula 1 sequences. Italian Grand Prix ?c2012 Formula 1. In this figure, all trajectories are given a m?co2ti0o1n2 l Faoberml, including hoiust fliigeurrse. of different magnitudes was found competitive with state of the art, while retaining the inherent computational efficiency. The method was also found to be useful for motion segmentation of real-world, raw trajectory data. References [1] ht tp : / /www . ce s . c l emn s on . edu / ˜stb / k lt . 8 [2] J. P. Costeira and T. Kanade. A Multibody Factorization Method for Independently Moving Objects. IJCV, 1998. 1 [3] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. CVPR, 2009. 2, 7 [4] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981. 3 [5] M. Irani and P. Anandan. Parallax geometry of pairs of points for 3d scene analysis. Proc. ECCV, 1996. 2 [6] K. Kanatani. Motion segmentation by subspace separation: Model selection and reliability evaluation. International Journal Image Graphics, 2002. 1 [7] H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, 1987. 1 [8] K. Schindler, D. Suter, , and H. Wang. A model-selection framework for multibody structure-and-motion of image sequences. Proc. IJCV, 79(2): 159–177, 2008. 1 [9] C. Tomasi and T. Kanade. Shape and motion without depth. Proc. [10] [11] [12] [13] [14] [15] ICCV, 1990. 1 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 1, 3, 7 P. Torr. Geometric motion segmentation and model selection. Phil. Tran. of the Royal Soc. of Lon. Series A: Mathematical, Physical and Engineering Sciences, 1998. 1 R. Tron and R. Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. In Proc. CVPR, 2007. 7 J. Yan and M. Pollefeys. A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. PAMI, 2008. 1 L. Zappella, E. Provenzi, X. Llad o´, and J. Salvi. Adaptive motion segmentation algorithm based on the principal angles configuration. Proc. ACCV, 2011. 2, 7 L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. Proc. CVPR, 2003. 1 222222666644

6 0.12770084 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

7 0.12172864 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

8 0.11524765 215 cvpr-2013-Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces

9 0.10034758 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

10 0.097377196 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

11 0.094464369 210 cvpr-2013-Illumination Estimation Based on Bilayer Sparse Coding

12 0.086449794 134 cvpr-2013-Discriminative Sub-categorization

13 0.085127048 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

14 0.080365404 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

15 0.079091303 191 cvpr-2013-Graph-Laplacian PCA: Closed-Form Solution and Robustness

16 0.075714305 220 cvpr-2013-In Defense of Sparsity Based Face Recognition

17 0.07470905 437 cvpr-2013-Towards Fast and Accurate Segmentation

18 0.071844734 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification

19 0.066452108 69 cvpr-2013-Boosting Binary Keypoint Descriptors

20 0.065278888 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.142), (1, -0.022), (2, -0.065), (3, 0.079), (4, 0.035), (5, -0.029), (6, -0.026), (7, -0.123), (8, -0.069), (9, -0.083), (10, 0.045), (11, -0.01), (12, -0.089), (13, -0.079), (14, -0.038), (15, 0.031), (16, -0.089), (17, -0.042), (18, -0.083), (19, -0.012), (20, 0.141), (21, 0.016), (22, 0.045), (23, -0.033), (24, 0.033), (25, -0.076), (26, -0.018), (27, -0.117), (28, 0.044), (29, -0.067), (30, 0.058), (31, 0.23), (32, 0.087), (33, -0.122), (34, -0.041), (35, 0.118), (36, -0.077), (37, 0.017), (38, -0.026), (39, -0.03), (40, -0.151), (41, -0.027), (42, -0.045), (43, 0.046), (44, 0.049), (45, -0.022), (46, 0.01), (47, 0.043), (48, 0.054), (49, -0.069)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97506601 379 cvpr-2013-Scalable Sparse Subspace Clustering

Author: Xi Peng, Lei Zhang, Zhang Yi

2 0.92740232 135 cvpr-2013-Discriminative Subspace Clustering

Author: Vasileios Zografos, Liam Ellis, Rudolf Mester

3 0.79343545 191 cvpr-2013-Graph-Laplacian PCA: Closed-Form Solution and Robustness

Author: Bo Jiang, Chris Ding, Bio Luo, Jin Tang

Abstract: Principal Component Analysis (PCA) is a widely used to learn a low-dimensional representation. In many applications, both vector data X and graph data W are available. Laplacian embedding is widely used for embedding graph data. Wepropose a graph-Laplacian PCA (gLPCA) to learn a low dimensional representation of X that incorporates graph structures encoded in W. This model has several advantages: (1) It is a data representation model. (2) It has a compact closed-form solution and can be efficiently computed. (3) It is capable to remove corruptions. Extensive experiments on 8 datasets show promising results on image reconstruction and significant improvement on clustering and classification.

4 0.79158407 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

Author: Bo Wang, Zhuowen Tu

5 0.77208918 93 cvpr-2013-Constraints as Features

Author: Shmuel Asafi, Daniel Cohen-Or

6 0.74927819 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

7 0.73461175 215 cvpr-2013-Improved Image Set Classification via Joint Sparse Approximated Nearest Subspaces

8 0.58065677 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

9 0.55311656 9 cvpr-2013-A Fast Semidefinite Approach to Solving Binary Quadratic Problems

10 0.53947723 253 cvpr-2013-Learning Multiple Non-linear Sub-spaces Using K-RBMs

11 0.52484787 134 cvpr-2013-Discriminative Sub-categorization

12 0.50295806 42 cvpr-2013-Analytic Bilinear Appearance Subspace Construction for Modeling Image Irradiance under Natural Illumination and Non-Lambertian Reflectance

13 0.46406433 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

14 0.46242857 429 cvpr-2013-The Generalized Laplacian Distance and Its Applications for Visual Matching

15 0.44584787 259 cvpr-2013-Learning a Manifold as an Atlas

16 0.43277287 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering

17 0.4290368 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

18 0.41847429 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

19 0.40031913 442 cvpr-2013-Transfer Sparse Coding for Robust Image Representation

20 0.38472167 366 cvpr-2013-Robust Region Grouping via Internal Patch Statistics

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.088), (16, 0.036), (17, 0.196), (19, 0.012), (26, 0.049), (33, 0.332), (67, 0.057), (69, 0.074), (87, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90672475 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos

Author: Mehrsan Javan Roshtkhari, Martin D. Levine

Abstract: We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors. In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.

same-paper 2 0.89284337 379 cvpr-2013-Scalable Sparse Subspace Clustering

Author: Xi Peng, Lei Zhang, Zhang Yi

3 0.87366688 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering

Author: Raghuraman Gopalan

Abstract: Estimating geographic location from images is a challenging problem that is receiving recent attention. In contrast to many existing methods that primarily model discriminative information corresponding to different locations, we propose joint learning of information that images across locations share and vary upon. Starting with generative and discriminative subspaces pertaining to domains, which are obtained by a hierarchical grouping of images from adjacent locations, we present a top-down approach that first models cross-domain information transfer by utilizing the geometry ofthese subspaces, and then encodes the model results onto individual images to infer their location. We report competitive results for location recognition and clustering on two public datasets, im2GPS and San Francisco, and empirically validate the utility of various design choices involved in the approach.

4 0.87335134 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning

Author: Yu-Tseh Chi, Mohsen Ali, Ajit Rajwade, Jeffrey Ho

Abstract: This paper proposes a dictionary learning framework that combines the proposed block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding schemes with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL) algorithm. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-gradient descent, and the details of the optimization algorithms are presented. We evaluate the proposed methods using well-known datasets, and favorable comparisons with state-of-the-art dictionary learning methods demonstrate the viability and validity of the proposed framework.

5 0.87290674 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

Author: Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, Silvio Savarese

Abstract: Visual scene understanding is a difficult problem interleaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relationships between objects whichfrequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving individual object detections.

6 0.87268835 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

7 0.87201256 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

8 0.87141138 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

9 0.87108564 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

10 0.87105739 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

11 0.87095666 417 cvpr-2013-Subcategory-Aware Object Classification

12 0.87090182 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

13 0.87081724 109 cvpr-2013-Dense Non-rigid Point-Matching Using Random Projections

14 0.87077445 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

15 0.87071657 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

16 0.87058604 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

17 0.8705681 304 cvpr-2013-Multipath Sparse Coding Using Hierarchical Matching Pursuit

18 0.87043232 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision

19 0.87001598 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

20 0.86998725 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations