cvpr cvpr2013 cvpr2013-118 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
Reference: text
sentIndex sentText sentNum sentScore
1 Detecting Pulse from Head Motions in Video Guha Balakrishnan, Fredo Durand, John Guttag MIT CSAIL {balakg , fredo , Abstract We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. [sent-1, score-1.079]
2 Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. [sent-2, score-0.441]
3 When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. [sent-5, score-0.267]
4 Additionally we were able to capture clinically relevant information about heart rate variability. [sent-6, score-0.339]
5 In this paper, we exploit subtle head oscillations that accompany the cardiac cycle to extract information about cardiac activity from videos. [sent-12, score-0.826]
6 In addition to providing an unobtrusive way of measuring heart rate, the method can be used to extract other clinically useful information about cardiac activity, such as the subtle changes in the length of heartbeats that are associated with the health of the autonomic nervous system. [sent-13, score-0.631]
7 The cyclical movement of blood from the heart to the head via the abdominal aorta and the carotid arteries (Fig. [sent-14, score-0.798]
8 edu carotid arteries on either side of the head [11]. [sent-18, score-0.353]
9 We extract an average pulse rate from this signal by examining its frequency spectrum and obtain precise beat locations with a simple peak detection algorithm. [sent-20, score-1.191]
10 Our method is complementary to the extraction of pulse rate from video via analysis of the subtle color changes in the skin caused by blood circulation [14, 18]. [sent-21, score-1.06]
11 They then either use these signals directly for analysis [18] or perform ICA to extract a single pulse wave [14]. [sent-23, score-0.733]
12 They find the frequency of maximal power in the frequency spectrum to provide a pulse estimate. [sent-24, score-0.948]
13 Philips also produced a commercial app that detects pulse from color changes in real-time [13]. [sent-25, score-0.699]
14 There have also been studies on non-invasive pulse estimation using modalities other than video such as thermal imagery [6] and photoplethysmography (measurement of the variations in trans333444223088 mitted or reflected light in the skin) [21]. [sent-28, score-0.749]
15 The analysis of body motion in videos has been used in different medical contexts, such as the measurement of respiration rate from chest movement [17, 13], or the monitoring of sleep apnea by recognizing abnormal respiration patterns [20]. [sent-29, score-0.568]
16 The movements involved in these approaches tend to be larger in amplitude than the involuntary head movements due to the pulse. [sent-31, score-0.542]
17 The idea of exploiting Newton’s Third Law to measure cardiac activity dates back to at least the 1930’s, when the ballistocardiogram (BCG) was invented [15]. [sent-34, score-0.325]
18 The subject was placed on a low-friction platform, and the displacement of the platform due to cardiac activity was measured. [sent-35, score-0.305]
19 Ballistocardiographic head movement of the sort studied here has generally gained less attention. [sent-38, score-0.379]
20 [7] proposed exploiting head motion measured by accelerometers for heart rate monitoring as a proxy for traditional BCG. [sent-41, score-0.667]
21 In this paper we first describe a novel technique that extracts a pulse rate and series of beat sequences from video recordings of head motion. [sent-42, score-1.242]
22 We then evaluate our system’s heart rate and beat location measurements on subjects against an electrocardiogram. [sent-43, score-0.519]
23 Results show that our method extracts accurate heart rates and can capture clinically relevant variability in cardiac activity. [sent-44, score-0.564]
24 Head Movement The head movements related to cardiac activity are small and mixed in with a variety of other involuntary head movements. [sent-48, score-0.945]
25 This structure allows the head unconstrained movement in most axes. [sent-50, score-0.379]
26 There are several sources of involuntary head movement that complicate the isolation of movements attributable to pulsatile activity. [sent-51, score-0.534]
27 One is the pendular oscillatory motion that keeps the head in dynamic equilibrium. [sent-52, score-0.342]
28 [7], we found that the vertical direction is the best axis to measure the movement of the upright head caused by pulse because the horizontal axis tends to capture most of the dynamic equilibrium swaying. [sent-54, score-1.084]
29 A second source of involuntary head movement is the bobbing caused by respiration. [sent-55, score-0.537]
30 The net acceleration of involuntary vertical head movement has been measured to be around 10 mG (≈ . [sent-57, score-0.477]
31 Using these numbers we can calculate a rough estimate of the displacement of head movement to be 21 · 0. [sent-60, score-0.379]
32 the head system, it does provide an indication of how small the movement is. [sent-64, score-0.379]
33 Beat-to-beat Variability Pulse rate captures the average heart rate over a period of time (e. [sent-67, score-0.377]
34 The most established of these measures is heart rate variability (HRV). [sent-72, score-0.325]
35 It provides an indication of the degree to which the sympathetic and parasymathetic nervous systems are modulating cardiac activity. [sent-74, score-0.263]
36 To measure HRV, the interarrival times of beats must be accurately measured, which can be determined by locating the ”R” peaks in successive beats in an electrocardiogram (ECG). [sent-75, score-0.278]
37 Method Our method takes an input video of a person’s head and returns a pulse rate as well as a series of beat locations which can be used for the analysis of beat-to-beat variability. [sent-79, score-1.22]
38 We first extract the motion of the head using feature tracking. [sent-80, score-0.364]
39 We then isolate the motion corresponding to the pulse and project it onto a 1D signal that allows us to extract individual beat boundaries from the peaks of the trajectory. [sent-81, score-1.11]
40 We project the trajectories of feature points onto this component and ex- tract the beat locations as local extrema. [sent-83, score-0.272]
41 We start by locating the head region and modeling head motion using trajectories of tracked feature points. [sent-87, score-0.679]
42 The trajectories have extraneous motions at frequencies outside the range of possible pulse rates, and so we temporally filter them. [sent-89, score-0.845]
43 Peak Detection ECG ground truth Figure 2: Overview of our pulse estimation approach. [sent-104, score-0.673]
44 (a) A region is selected within the head and feature points are tracked for all frames of the video. [sent-105, score-0.264]
45 PCA to decompose the trajectories into a set of independent source signals that describe the main elements of the head motion. [sent-112, score-0.427]
46 Region Selection and Tracking We find a region of interest containing the head and track feature points within the region. [sent-118, score-0.264]
47 We measure the movement of the head throughout the video by selecting and tracking feature points within the region. [sent-125, score-0.409]
48 Snilynce th a mveortdi-ern ECG device operates around 250 Hz to capture heart rate variability and our videos were only shot at 30 Hz, we apply a cubic spline interpolation to increase the sampling rate of each yn (t) to 250 Hz. [sent-131, score-0.493]
49 Temporal Filtering Not all frequencies of the trajectories are required or useful for pulse detection. [sent-136, score-0.774]
50 A normal adult’s resting pulse rate falls within [0. [sent-137, score-0.767]
51 This is because low-frequency movements like respiration and changes in posture have high amplitude and dominate the trajectories of the feature points. [sent-141, score-0.288]
52 PCA Decomposition The underlying source signal of interest is the movement of the head caused by the cardiovascular pulse. [sent-155, score-0.534]
53 The feature point trajectories are a mixture of this movement as well as other motions caused by sources like respiration, vestibular activity and changes in facial expression. [sent-156, score-0.334]
54 To do this we consider the multidimensional position of the head at each frame as a separate data point and use PCA to find a set of main dimensions along which the position varies. [sent-158, score-0.264]
55 We then select a dimension on which to project the position time-series to obtain the pulse signal. [sent-159, score-0.673]
56 Formally, given N feature points, we represent the Ndimensional position of the head at frame t as mt = [y1(t) , y2 (t) , · · · , yN (t)] . [sent-160, score-0.294]
57 Signal Selection The question remains of which eigenvector to use for pulse signal extraction. [sent-181, score-0.76]
58 Although φ1 explains most of the variance, s1 may not be the clearest pulse signal for analysis. [sent-183, score-0.778]
59 We label the maximal frequency of the chosen signal fpulse and approximate the pulse rate as fp6u0lse beats per minute. [sent-192, score-1.059]
60 Peak Detection Average pulse rate alone is not sufficient to fully evaluate the cardiovascular system. [sent-195, score-0.799]
61 The peaks are close to fpu1lse seconds apart with some variability due to the natural variability of heartbeats, variations of the head motion, and noise. [sent-198, score-0.433]
62 Visible Face We extracted pulse signals from 18 subjects with a frontal view of the face (as in Fig. [sent-209, score-0.788]
63 We calculate our program’s average pulse rate using the frequency of maximal power for the selected PCA component. [sent-213, score-0.921]
64 Similarly, we compute the true pulse rate by finding the main frequency of the ECG spectrum. [sent-214, score-0.859]
65 We also evaluate the ability of our signal to capture subtle heart rate variability. [sent-220, score-0.4]
66 We account for these cases by only considering intervals with length within 25% of the average detected pulse period. [sent-225, score-0.673]
67 Table 1: Average pulse rate and number of peaks detected from ECG and by our method. [sent-233, score-0.852]
68 1 Motion Amplitude Pulse motion constitutes only a part of total involuntary head movement. [sent-401, score-0.44]
69 For each subject we calculated the mean RMS amplitude of the trajectories before and after filtering to a passband within 5% of the pulse frequency. [sent-490, score-0.917]
70 The mean RMS amplitude after filtering to the pulse frequency was 0. [sent-495, score-0.861]
71 Thus the pulse motion had roughly 40% the RMS amplitude of other head motions within the [0. [sent-498, score-1.128]
72 2 Comparison to Color-Based Detection and Noise Analysis We compare the robustness of our method to a color-based pulse detection system [14] in the presence of noise. [sent-502, score-0.673]
73 The source with the largest peak in the power spectrum is then chosen as the pulse signal. [sent-504, score-0.819]
74 For each subject we found σmotion, the maximum noise standard deviation before our method first produced an average pulse rate outside 5% of the true rate. [sent-506, score-0.81]
75 Note that color failed to produce a correct pulse rate for subject 7 450 8 400 3001216 250 2 9 350 ro ? [sent-511, score-0.836]
76 This indicates that our method performs best for subjects with a large ballistocardiac motion relative to any other periodic head movement. [sent-524, score-0.45]
77 However, when simplifying the method to extracting a signal from the G channel alone, we found that noise performance is indeed strongly related to the ratio of power at the pulse frequency to the next largest power in the spectrum. [sent-527, score-0.888]
78 Figure 6: Log plot comparing σmotion (the max noise standard deviation before our method produced an incorrect pulse rate) and β (ratio of the total energy of feature points at the pulse frequency to the maximal energy at any other frequency) for each subject. [sent-536, score-1.47]
79 We also obtained a video of the baby’s actual pulse rate from a hospital-grade monitor measuring the perfusion of blood to its skin. [sent-541, score-0.902]
80 Our algorithm extracts a clean pulse signal that matches the pulse rate reported by the monitor. [sent-542, score-1.525]
81 Figure 7: Reference frames from two videos of the back of the head and one of a face covered with a mask. [sent-543, score-0.306]
82 Discussion Our results show it is possible to consistently obtain accurate pulse rate measurements from head motion. [sent-545, score-1.031]
83 The actual heart rate is about 152 bpm (top right). [sent-549, score-0.315]
84 Typically heart rate variability (HRV) is used to dichotomize patients into high and low risk groups, so the precise shape of the distribution is not relevant. [sent-554, score-0.397]
85 Second, extra variability might be introduced during the pulse transit time from the abdominal aorta to the head. [sent-561, score-0.775]
86 In particular, arterial compliance and head mechanics could affect our results. [sent-562, score-0.288]
87 This is complicated because, as our results show, even other involuntary head movements are quite large in relation to pulse motion. [sent-568, score-1.092]
88 333444333644 In this work we considered the frequency and variabil- ity of the pulse signal. [sent-570, score-0.765]
89 However, head movement can offer other information about the cardiac cycle. [sent-571, score-0.603]
90 If head displacement is proportional to the force of blood being pumped by the heart, it may serve as a useful metric to estimate blood stroke volume and cardiac output. [sent-572, score-0.65]
91 Another future direction is to better assess the strengths and weaknesses of the color and motion pulse estimation methods. [sent-575, score-0.777]
92 Our method takes video as input and uses feature tracking to extract heart rate and beat measurements from the subtle head motion caused by the Newtonian reaction to the pumping of blood at each heartbeat. [sent-584, score-1.035]
93 A combination of frequency filtering and PCA allows us to identify the component of motion corresponding to the pulse and we then extract peaks of the trajectory to identify individual beats. [sent-585, score-1.02]
94 Contact-free measurement of cardiac pulse based on the analysis of thermal imagery. [sent-610, score-0.95]
95 A continuous, wearable, and wireless heart monitor using head ballistocardiogram (bcg) and head electrocardio- [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] gram (ecg). [sent-614, score-0.804]
96 A new method for unconstrained pulse arrival time measurement on a chair. [sent-623, score-0.702]
97 Human head anatomy with external and internal carotid arteries. [sent-637, score-0.327]
98 Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. [sent-652, score-0.955]
99 Studies on the estimation of cardiac output in man, and of abnormalities in cardiac function, from the hearts recoil and the bloods impacts; the ballistocardiogram. [sent-656, score-0.448]
100 Computationally generated cardiac biomarkers for risk stratification after acute coronary syndrome. [sent-660, score-0.278]
wordName wordTfidf (topN-words)
[('pulse', 0.673), ('ecg', 0.299), ('head', 0.264), ('cardiac', 0.224), ('heart', 0.189), ('beat', 0.159), ('hrv', 0.126), ('movement', 0.115), ('hz', 0.108), ('involuntary', 0.098), ('rate', 0.094), ('frequency', 0.092), ('peaks', 0.085), ('blood', 0.081), ('motion', 0.078), ('subjects', 0.077), ('trajectories', 0.073), ('beats', 0.073), ('skin', 0.07), ('respiration', 0.07), ('amplitude', 0.066), ('signal', 0.063), ('ballistocardiogram', 0.063), ('carotid', 0.063), ('peak', 0.059), ('pca', 0.058), ('movements', 0.057), ('clinically', 0.056), ('subtle', 0.054), ('bcg', 0.047), ('electrocardiogram', 0.047), ('heartbeats', 0.047), ('motions', 0.047), ('ica', 0.045), ('rms', 0.044), ('subject', 0.043), ('monitoring', 0.042), ('videos', 0.042), ('variability', 0.042), ('clearest', 0.042), ('patients', 0.042), ('clinical', 0.04), ('component', 0.04), ('nervous', 0.039), ('activity', 0.038), ('signals', 0.038), ('eng', 0.035), ('distributions', 0.033), ('caused', 0.032), ('maximal', 0.032), ('yn', 0.032), ('abdominal', 0.032), ('biol', 0.032), ('bpm', 0.032), ('cardiovascular', 0.032), ('epilepsy', 0.032), ('fpulse', 0.032), ('fredo', 0.032), ('heightwise', 0.032), ('palsy', 0.032), ('passband', 0.032), ('reaction', 0.032), ('periodic', 0.031), ('rates', 0.031), ('isolate', 0.03), ('wearable', 0.03), ('eigenvectors', 0.03), ('power', 0.03), ('risk', 0.03), ('filtering', 0.03), ('mt', 0.03), ('video', 0.03), ('measurement', 0.029), ('facial', 0.029), ('spectrum', 0.029), ('med', 0.029), ('aorta', 0.028), ('sleep', 0.028), ('source', 0.028), ('frequencies', 0.028), ('opencv', 0.027), ('philips', 0.026), ('physiological', 0.026), ('arteries', 0.026), ('color', 0.026), ('newtonian', 0.024), ('extraneous', 0.024), ('acute', 0.024), ('thermal', 0.024), ('gait', 0.024), ('decompose', 0.024), ('affect', 0.024), ('eigenvector', 0.024), ('monitor', 0.024), ('cerebral', 0.023), ('talking', 0.022), ('posture', 0.022), ('studies', 0.022), ('extract', 0.022), ('extracts', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
2 0.079998642 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes
Author: Rui Li, Pengcheng Shi, Anne R. Haake
Abstract: Eliciting and representing experts ’ remarkable perceptual capability of locating, identifying and categorizing objects in images specific to their domains of expertise will benefit image understanding in terms of transferring human domain knowledge and perceptual expertise into image-based computational procedures. In this paper, we present a hierarchical probabilistic framework to summarize the stereotypical and idiosyncratic eye movement patterns shared within 11 board-certified dermatologists while they are examining and diagnosing medical images. Each inferred eye movement pattern characterizes the similar temporal and spatial properties of its corresponding seg. edu anne .haake @ rit . edu , ments of the experts ’ eye movement sequences. We further discover a subset of distinctive eye movement patterns which are commonly exhibited across multiple images. Based on the combinations of the exhibitions of these eye movement patterns, we are able to categorize the images from the perspective of experts’ viewing strategies. In each category, images share similar lesion distributions and configurations. The performance of our approach shows that modeling physicians ’ diagnostic viewing behaviors informs about medical images’ understanding to correct diagnosis.
3 0.07006155 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
Author: Fernando Flores-Mangas, Allan D. Jepson
Abstract: The problem of rigid motion segmentation of trajectory data under orthography has been long solved for nondegenerate motions in the absence of noise. But because real trajectory data often incorporates noise, outliers, motion degeneracies and motion dependencies, recently proposed motion segmentation methods resort to non-trivial representations to achieve state of the art segmentation accuracies, at the expense of a large computational cost. This paperproposes a method that dramatically reduces this cost (by two or three orders of magnitude) with minimal accuracy loss (from 98.8% achieved by the state of the art, to 96.2% achieved by our method on the standardHopkins 155 dataset). Computational efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Subsets of motion models with the best balance between prediction accuracy and model complexity are chosen from a pool of candidates, which are then used for segmentation. 1. Rigid Motion Segmentation Rigid motion segmentation (MS) consists on separating regions, features, or trajectories from a video sequence into spatio-temporally coherent subsets that correspond to independent, rigidly-moving objects in the scene (Figure 1.b or 1.f). The problem currently receives renewed attention, partly because of the extensive amount of video sources and applications that benefit from MS to perform higher level computer vision tasks, but also because the state of the art is reaching functional maturity. Motion Segmentation methods are widely diverse, but most capture only a small subset of constraints or algebraic properties from those that govern the image formation process of moving objects and their corresponding trajectories, such as the rank limit theorem [9, 10], the linear independence constraint (between trajectories from independent motions) [2, 13], the epipolar constraint [7], and the reduced rank property [11, 15, 13]. Model-selection based (a)Orignalvideoframes(b)Clas -label dtrajectories (c)Modelsup ort egion(d)Modelinlersandcontrolpoints (e)Modelresiduals (f) Segmenta ion result Figure 1: Model instantiation and segmentation. a) fth original frame, Italian Grand Prix ?c 2012 Formula 1. b) Classilanbaell feradm, trajectory Gdaratan Wd P r(rixed ?,c green, bolrumeu alnad 1 .bbl a)c Ck correspond to chassis, helmet, background and outlier classes respectively). c) Spatially-local support subset for a candidate motion in blue. d) Candidate motion model inliers in red, control points from Eq. 3) in white. e) Residuals from Eq. 11) color-coded with label data, the radial coordinate is logarithmic. f) Segmentation result. Wˆf (rif (cif methods [11, 6, 8] balance model complexity with modeling accuracy and have been successful at incorporating more of these aspects into a single formulation. For instance, in [8] most model parameters are estimated automatically from the data, including the number of independent motions and their complexity, as well as the segmentation labels (including outliers). However, because of the large number of necessary motion hypotheses that need to be instantiated, as well as the varying and potentially very large number of 222222555977 model parameters that must be estimated, the flexibility offered by this method comes at a large computational cost. Current state of the art methods follow the trend of using sparse low-dimensional subspaces to represent trajectory data. This representation is then fed into a clustering algorithm to obtain a segmentation result. A prime example of this type of method is Sparse Subspace Clustering (SSC) [3] in which each trajectory is represented as a sparse linear combination of a few other basis trajectories. The assumption is that the basis trajectories must belong to the same rigid motion as the reconstructed trajectory (or else, the reconstruction would be impossible). When the assumption is true, the sparse mixing coefficients can be interpreted as the connectivity weights of a graph (or a similarity matrix), which is then (spectral) clustered to obtain a segmentation result. At the time of publication, SSC produced segmentation results three times more accurate than the best predecessor. The practical downside, however, is the inherently large computational cost of finding the optimal sparse representation, which is at least cubic on the number of trajectories. The work of [14] also falls within the class of subspace separation algorithms. Their approach is based on clustering the principal angles (CPA) of the local subspaces associated to each trajectory and its nearest neighbors. The clustering re-weights a traditional metric of subspace affinity between principal angles. Re-weighted affinities are then used for segmentation. The approach produces segmentation results with accuracies similar to those of SSC, but the computational cost is close to 10 times bigger than SSC’s. In this work we argue that competitive segmentation results are possible using a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. The proposed method is approximately 2 or 3 orders of magnitude faster than [3] and [14] respectively, currently considered the state of the art. 1.1. Affine Motion Projective Geometry is often used to model the image motion of trajectories from rigid objects between pairs of frames. However, alternative geometric relationships that facilitate parameter computation have also been proven useful for this purpose. For instance, in perspective projection, general image motion from rigid objects can be modeled via the composition of two elements: a 2D homography, and parallax residual displacements [5]. The homography describes the motion of an arbitrary plane, and the parallax residuals account for relative depths, that are unaccounted for by the planar surface model. Under orthography, in contrast, image motion of rigid objects can be modeled via the composition of a 2D affine transformation plus epipolar residual displacements. The 2D affine transformation models the motion of an arbitrary plane, and the epipolar residuals account for relative depths. Crucially, these two components can be computed separately and incrementally, which enables an explicit mechanism to deal with motion degeneracy. In the context of 3D motion, a motion is degenerate when the trajectories originate from a planar (or linear) object, or when neither the camera nor the imaged object exercise all of their degrees of freedom, such as when the object only translates, or when the camera only rotates. These are common situations in real world video sequences. The incremental nature of the decompositions described above, facilitate the transition between degenerate motions and nondegenerate ones. Planar Model Under orthography, the projection of trajectories from a planar surface can be modeled with the affine transformation: ⎣⎡xy1c ⎦⎤=?0D? 1t?⎡⎣yx1w ⎦⎤= A2wD→c⎣⎡yx1w ⎦⎤, (1) where D ∈ R2×2 is an invertible matrix, and t ∈ R2 is a threarnesl Datio ∈n v Rector. Trajectory icboloer mdiantartixe,s (axnwd , tyw ∈) are in the plane’s reference frame (modulo a 2D affine transformation) and (xc, yc) are image coordinates. Now, let W ∈ R2F×P be matrix of trajectory data that conNtaoiwns, tlehet x a n∈d y image coordinates of P feature points tracked through F frames, as in TocmputehW pa=r⎢m⎣ ⎡⎢ etyx e1F r.,s 1 ofA· . ·2.Dfyx r1Fo m., P tra⎦⎥ ⎤je.ctorydat,(l2e)t C = [c1, c2 , c3] ∈ R2f×3 be three columns (three full trajectories) from W, and let = be the x and y coordinates of the i-th control trajectory at frame f. Then the transformation between points from an arbitrary source frame s to a target frame f can be written as: cif [ci2f−1, ci2f]? ?c1f1 c12f c1f3?= A2sD→f?c11s and A2s→Df c12s c13s?, (3) ?−1. (4) can be simply computed as: A2sD→f= ? c11f c12f c13f ? ? c11s c12s c13s The inverse in the right-hand side matrix of Eq. 4 exists so long as the points cis are not collinear. For simplicity we refer to as and consequently A2sD is the identity matrix. A2s→Df A2fD 222222556088 3D Model In order to upgrade a planar (degenerate) model into a full 3D one, relative depth must be accounted using the epipolar residual displacements. This means extending Eq. 1 with a direction vector, scaled by the corresponding relative depth of each point, as in: ⎣⎡xy1c ⎦⎤=?0?D? t1? ⎡⎣xy1w ⎦⎤+ δzw⎣⎡a 0213 ⎦⎤. The depth δzw is relative to the arbitrary plane tion is modeled by A2D; a point that lies on would have δzw = 0. We call the orthographic the plane plus parallax decomposition, the 2D Epipolar (2DAPE) decomposition. Eq. 5 is equivalent to (5) whose mosuch plane version of Affine Plus wher⎣⎡ ixyt1c is⎤⎦cl=ear⎡⎣tha 120t hea p201a2rma2e10t3erst1o2f⎦⎤A⎣⎢⎡3Dδyxd1zwefin⎦⎥⎤ean(o6r)thographically projected 3D affine transformation. Deter- mining the motion and structure parameters of a 3D model from point correspondences can be done using the classical matrix factorization approach [10], but besides being sensitive to noise and outliers, the common scenario where the solution becomes degenerate makes the approach difficult to use in real-world applications. Appropriately accommodating and dealing with the degenerate cases is one of the key features of our work. 2. Overview of the Method The proposed motion segmentation algorithm has three stages. First, a pool of M motion model hypotheses M = s{tMag1e , . . . , rMst,M a} p oiso generated using a omdeetlh hoydp tohthate csoesm Mbine =s a MRandom Sampling naenrda eCdon usseinngsu as m(ReAthNodS tAhaCt) [o4m] bteinche-s nique with the 2DAPE decomposition. The goal is to generate one motion model for each of the N independent, rigidly-moving objects in the scene; N is assumed to be known a priori. The method instantiates many more models than those expected necessary (M ? N) in an attempt ienlscr tehaasne tthhoes elik eexlpiheocotedd o nfe generating Mco ?rrec Nt m) iond aenl proposals for all N motions. A good model accurately describes a large subset of coherently moving trajectories with the smallest number of parameters (§3). Ialnl ethste n suemcobnedr stage, msubetseertss o§f3 )m.otion models from M are ncom thebi sneecdo ntod explain ualbl sthetes trajectories mino tdheel sequence. The problem is framed as an objective function that must be minimized. The objective function is the negative loglikelihood over prediction accuracy, regularized by model complexity (number of model parameters) and modeling overlap (trajectories explained by multiple models). Notice that after this stage, the segmentation that results from the optimal model combination could be reported as a segmentation result (§5). ioTnhe r tshuilrtd ( stage incorporates the results from a set of model combinations that are closest to the optimal. Segmentation results are aggregated into an affinity matrix, which is then passed to a spectral clustering algorithm to produce the final segmentation result. This refinement stage generally results in improved accuracy and reduced segmentation variability (§6). 3. Motion Model Instantiation Each model M ∈ M is instantiated independently using RacAhN mSAodCel. MThis ∈ c Mhoi cies niss manotitiavteatded in bdeecpaeunsdee otlfy th usismethod’s well-known computational efficiency and robustness to outliers, but also because of its ability to incorporate spatially local constraints and (as explained below) because most of the computations necessary to evaluate a planar model can be reused to estimate the likelihoods of a potentially necessary 3D model, yielding significant computational savings. The input to our model instantiation algorithm is a spatially-local, randomly drawn subset of trajectory data Wˆ[2F×I] ⊆ W[2F×P] (§3.1). In turn, at each RANSAC trial, the algorithm draw(§s3 uniformly d,i asttr eibaucthed R, A rNanSdoAmC subsets of three control trajectories (C[2F×3] ⊂ Wˆ[2F×I]). Each set of control trajectories is used to estim⊂ate the family of 2D affine transformations {A1, . . . , AF} between the iblyase o ffr 2aDm aef ainnde aralln sotfoherrm fartaimoness { iAn the sequence, wtwheicehn are then used to determine a complete set of model parameters M = {B, σ, C, ω}. The matrix B ∈ {0, 1}[F×I] indicates Mwhe =the {rB t,hσe ,iC-th, trajectory asthroixu Bld ∈b e predicted by model M at frame f (inlier, bif = 1) or not (outlier, bif = 0), σ = {σ1 , . . . , σF} are estimates of the magnitude of the σnois =e {foσr each fram}e a, aen eds ω ∈at {s2 oDf, t3hDe} m isa tnhietu edsetim ofa ttehde nmooidseel f type. hTh fera goal aisn dto ω ωfin ∈d {t2heD c,3oDntr}ol is points tainmda ttehed associated parameters that minimize the objective function O(Wˆ,M) =f?∈Fi?∈IbifLω? wˆif| Af,σf?+ Ψ(ω) + Γ(B) across (7) wˆfi a number of RANSAC trials, where = = are the coordinates of the i-th trajectory from the support subset at frame f. The negative log-likelihood term Lω (·) penalizes reconstruction error, while Ψ(·) and Γ(·) are regularizers. Tcohen tthrureceti otenr mer-s are ,d wefhinileed Ψ Ψ b(e·l)ow an. Knowing that 2D and 3D affine models have 6 and 8 degrees of freedom respectively, Ψ(ω) regularizes over model complexity using: (xif, yif) ( wˆ 2if−1, wˆ i2f) Wˆ Ψ(ω) =?86((FF − − 1 1)), i f ωω== 32DD. 222222556199 (8) Γ(B) strongly penalizes models that describe too few trajectories: Γ(B) =?0∞,, oifth?erwI?iseFbif< Fλi (9) The control set C whose M minimizes Eq. 7 across a number of RANSAC trials becomes part of the pool of candidates M. 2D likelihoods. For the planar case (ω = 2D) the negative log-likelihood term is evaluated with: L2D( wˆif| Af,σf) = −log?2π|Σ1|21exp?−21rif?Σ−1rif??, (10) which is a zero-mean 2D Normal distribution evaluated at the residuals The spherical covariance matrix is Σ = rif. rif (σf)2I. The residuals are determined by the differences between the predictions made by a hypothesized model Af, and the observations at each frame ?r?1f?=? w˜1?f?− Af? w˜1?s?. (11) 3D likelihoods. The negative log-likelihood term for the 3D case is based on the the 2DAPE decomposition. The 2D affinities Af and residuals rf are reused, but to account for the effect of relative depth, an epipolar line segment ef is robustly fit to the residual data at each frame (please see supplementary material for details on the segment fitting algorithm). The 2DAPE does not constrain relative depths to remain constant across frames, but only requires trajectories to be close to the epipolar line. So, if the unitary vector ef indicates the orthogonal direction to ef, then the negativ⊥e log-likelihood term for the 3D case is estimated with: L3D( wˆfi| Af,σf) = −2log⎜⎝⎛√21πσfexp⎪⎨⎪⎧−?r2if(?σfe)f⊥2?2⎪⎬⎪⎫⎞⎟⎠, ⎠(12,) which is also a zero-mean 2D Norma⎩l distribution ⎭computed as the product of two identical, separable, singlevariate, normal distributions, evaluated at the distance from the residual to the epipolar line. The first one corresponds to the actual deviation in the direction of ef , which is analyti- cally computed using rif?ef. The seco⊥nd one corresponds to an estimate of the deviat⊥ion in the perpendicular direction (ef), which cannot be determined using the 2DAPE decomposition model, but can be approximated to be equal to rif ? ef, which is a plausible estimate under the isotropic noise as⊥sumption. Note that Eq. 7 does not evaluate the quality of a model using the number of inliers, as it is typical for RANSAC. Instead, we found that better motion models resulted from Algorithm 1: Motion model instantiation × Algorithm 1: Motion model instantiation Input:b Traasejec frtoamrye d bata W[2F×P], number of RANSAC trials K, arbitrary Output: Parameters of the motion model M = {B , σn , ω} // determine the training set c ← rand( 1, P) ; r ← rand(rmin , rmax ) // random center and radius I] ← t ra j e ct oriesWithinDis k (W, r,c) // support subset X ← homoCoords(Wˆb) // points at base frame for K RANSAC trials do Wˆ[2F Wˆ return M = {B} optimizing over the accuracy ofthe model predictions for an (estimated) inlier subset, which also means that the effect of outliers is explicitly uncounted. Figure 1.b shows an example of class-labeled trajectory data, 1.c shows a typical spatially-local support subset. Figures 1.d and 1.e show a model’s control points and its corresponding (class-labeled) residuals, respectively. A pseudocode description of the motion instantiation algorithm is provided in Algorithm 1. Details on how to determine Wˆ, as well as B, σ, and ω follow. 3.1. Local Coherence The subset of trajectories Wˆ given to RANSAC to generate a model M is constrained to a spatially local region. The probability ofchoosing an uncontaminated set of 3 control trajectories, necessary to compute a 2D affine model, from a dataset with a ratio r of inliers, after k trials is: p = 1 − (1 − r3)k. This means that the number of trials pne =ede 1d −to (fi1n d− a subset of 3 inliers with probability p is k =lloogg((11 − − r p3)). (13) A common assumption is that trajectories from the same underlying motion are locally coherent. Hence, a compact region is likely to increase r, exponentially reducing 222222666200 Figure 2: Predictions (red) from a 2D affine model with standard Gaussian noise (green) on one of the control points (black). Noiseless model predictions in blue. All four scenarios have identical noise. The magnitude of the extrapolation error changes with the distance between the control points. k, and with it, RANSAC’s computation time by a proportional amount. The trade-off that results from drawing model control points from a small region, however, is extrapolation error. A motion model is extrapolated when utilized to make predictions for trajectories outside the region defined by the control points. The magnitude of modeling error depends on the magnitude of the noise affecting the control points, and although hard to characterize in general, extrapolation error can be expected to grow with the distance from the prediction to the control points, and inversely with the distance between the control points themselves. Figure 2 shows a series of synthetic scenarios where one of the control points is affected by zero mean Gaussian noise of small magnitude. Identical noise is added to the same trajectory in all four scenarios. The figure illustrates the relation between the distance between the control points and the magnitude of the extrapolation errors. Our goal is to maximize the region size while limiting the number of outliers. Without any prior knowledge regarding the scale of the objects in the scene, determining a fixed size for the support region is unlikely to work in general. Instead, the issue is avoided by randomly sampling disk-shaped regions of varying sizes and locations to construct a diverse set of support subsets. Each support subset is then determined by Wˆ = {wi | (xbi − ox)2 + (ybi − oy)2 < r2}, (14) where (ox , oy) are the coordinates of the center of a disk of radius r. To promote uniform image coverage, the disk is centered at a randomly chosen trajectory (ox , oy) = (xbi, yib) with uniformly distributed i ∼ U(1, P) and base frame b) w∼i h U u(1n,i fFor)m. yTo d asltrloibwu efodr idi ∼ffer Ue(n1t, region ds bizaesse, tfhraem read bius ∼ r is( ,cFho)s.en T ofro amllo a u fnoirfo dirmffe rdeinsttr riebugtiioonn r ∼s, tUh(erm raidni,u ursm rax i)s. Ihfo tsheenre f are mI a trajectories swtritihbiunt othne support region, then ∈ R2F×I. It is worth noting that the construction of theW support region does not incorporate any knowledge about the motion of objects in the scene, and in consequence will likely contain trajectories that originate from more than one independently moving object (Figure 3). Wˆ Wˆ Figure 3: Two randomly drawn local support sets. Left: A mixed set with some trajectories from the blue and green classes. Right: Another mixed set with all of the trajectories in the red class and some from the blue class. 4. Characterizing the Residual Distribution At each RANSAC iteration, residuals rf are computed using the 2D affine model Af that results from the constraints provided by the control trajectories C. Characterizing the distribution of rf has three initial goals. The first one is to determine 2D model inliers b2fD (§4.1), the second one is to compute estimates of the magnitude ,o tfh thee s ncooinsed at every frame σ2fD (§4.2), and the third one is to determine whether the residual( §d4i.s2t)r,ib auntidon th originates efr iosm to a planar or a 3D object (§4.3). If the object is suspected 3D, then two more goals n (§e4ed.3 )to. bIfe t achieved. sT shues pfiercstt one Dis, t hoe nde ttweromine 3D model inliers b3fD (§4.4), and the second one is to estimate the magnitude of the noise of a 3D model (§4.5). (σf3D) to reflect the use 4.1. 2D Inlier Detection Suppose the matrix Wˆ contains trajectories Wˆ1 ∈ R2F×I and Wˆ2 ∈ R2F×J from two independently moving objects, and ∈tha Rt these trajectories are contaminated with zero-mean Gaussian noise of spherical covariance η ∼ N(0, (σf)2I): Wˆ = ?Wˆ1|Wˆ2? + η. (15) A1f Now, assume we know the true affine transformations and that describe the motion of trajectories for the subsets Wˆ1 and Wˆ2, respectively. If is used to compute predictions for all of Wˆ (at frame f), the expected value (denoted by ?·?) of the magnitude of the residuals (rf from Eq. 11) for trajectories aing nWiˆtud1 will be in the order of the magnitude of the underlying noise ?|rif |? = σf for each i∈ {1, . . . , I}. But in this scenario, trajectories in Wˆ2 ewaicl h b ie ∈ predicted using tth ien wrong emnaodrioel,, resulting isn i nr esid?uals? wit?h magnitudes de?termined by the motion differential A2f ???rif??? A1f ???(A1f − A2f) wˆib???. If we = can assume that the motion ?d??riff???er =en???t(iAal is bigger tha???n. tIhfe w deis cpalnac aesmsuemnte d thuea t toh eno miseo:t ???(A1f − A2f)wib??? 222222666311 > σf, (16) then the model inliers can be determined by thresholding | with the magnitude of the noise, scaled by a constant |(τr =| w wλitσhσtf h):e |rif bif=?10,, |orthife|r ≤wi τse. (17) But because σf is generally unknown, the threshold (τ) is estimated from the residual data. To do so, let be the vector of residual magnitudes where rˆi ≤ ˆ ri+1 . Now, let r˜ = median ( rˆi+1 −ˆ r i). The threshold i≤s trˆ h en defined as r τ = min{ rˆi | (ri+1 − ri) > λr˜ r}, (18) which corresponds to the smallest residual magnitude before a salient magnitude gap. Our experiments showed this test to be efficient and effective. Figure 1.e shows classlabeled residuals. Notice the presence of a (low density) gap between the residuals from the trajectories explained by the correct model (in red, close to the origin), and the rest. 4.2. Magnitude of the Noise, 2D Model r2fD Let contain only the residuals of the inlier trajectories (those where = 1), and let USV? be the singular value decomposition of the covariance matrix bif ofˆ r2fD: USV?= svd??1bpf( rˆ2fD)? rˆ2fD?.(19) Then the magnitude of the n?oise corresponds to the largest singular value σ2 = s1, because if the underlying geometry is in fact planar, then the only unaccounted displacements captured by the residuals are due to noise. Model capacity can also be determined from S, as explained next. 4.3. Model Capacity The ratio of largest over smallest singular values (s1/s2) determines when upgrading to a 3D model is beneficial. When the underlying geometry is actually non-planar, the residuals from a planar model should distribute along a line (the epipolar line), reflecting that their relative depth is being unaccounted for. This produces a covariance matrix with a large ratio s1/s2 ? 1. If on the other hand, if s1/s2 ≈ 1, then there is no in 1d.ic Iafti oonn tohfe unexplained relative depth, tihn wnh thicehr case, fitting a olinne o tfo u spherically distributed residuals will only increase the model complexity without explaining the residual variance much better. A small spherical residual covariance strongly suggests a planar underlying geometry. 4.4. 3D Inlier Detection When the residual distribution is elongated (s1/s2 ? 1), a line segment is robustly fit to the (potentially con?tam 1i)-, nated) set of residuals. The segment must go through the origin and its parameters are computed using a Hough transform. Further details about this algorithm can be found in the supplementary material. Inlier detection The resulting line segment is used to determine 3D model inliers. Trajectory ibecomes an inlier at frame f if it satisfies two conditions. First, the projection of rif onto the line must lie within the segment limits (β ≤ ≤ γ). Second, the normalized distance to the rif?ef (ef?rif line must be below a threshold ≤ σ2λd). Notice that the threshold depends on the smalle≤st singular value from Eq. 19 to (roughly) account for the presence of noise in the direction perpendicular to the epipolar (ef). 4.5. Magnitude of the Noise, 3D Model let rˆf3D Similarly to the 2D case, contain the residual data from the corresponding 3D inlier trajectories. An estimate for the magnitude of the noise that reflects the use of a 3D model can be obtained from the singular value decomposition of the covariance matrix of r3fD (as in Eq. 19). In this case, the largest singular value s1 captures the spread of residuals along the epipolar line, so its magnitude is mainly related to the magnitude of the displacements due to relative depth. However, s2 captures deviations from the epipolar line, which in a rigid 3D object can only be attributed to noise, making σ2 = s2 a reasonable estimate for its magnitude. Optimal model parameters When both 2D and 3D models are instantiated, the one with the smallest penalized negative log-likelihood (7) becomes the winning model for the current RANSAC run. The same penalized negative loglikelihood metric is used to determine the better model from across all RANSAC iterations. The winning model is added to the pool M, and the process is repeated M times, forming hthee p pool MM, a=n d{ tMhe1 , . . . , MssM is} r.e 5. Optimal Model Subset The next step is to find the model combination M? ⊂ M thhea tn mxta xstiempiz iess t prediction accuracy finora othne Mwhol⊂e trajectory mdaaxtaim Wize,s w phreiledi minimizing cmyod foerl complexity and modelling overlap. For this purpose, let Mj = {Mj,1 , . . . , Mj,N} be the j-th m thoisdel p ucorpmosbein,at lieotn, M Mand let {Mj} be the set o}f baell MheC jN-th = m N!(MM−!N)!) caotimonb,in aantdio lnest of N-sized models than can be draNw!(nM fr−oNm) M). The model soefle Nct-sioinze problem sis t hthanen c faonr bmeu dlartaewdn as M?= ar{gMmj}inOS(Mj), (20) 222222666422 where the objective is ?N ?P OS(Mj) = ??πp,nE (wp,Mj,n) ?n=1p?P=1 + λΦ?Φ(wp,Mj,n) + λΨ?Ψ(Mj,n). ?N (21) i?= ?1 n?= ?1 The first term accounts for prediction accuracy, the other two are regularization terms. Details follow. Prediction Accuracy In order to determine how well a model M predicts an arbitrary trajectory w, the affine transformations estimated by RANSAC could be re-used. However, the inherent noise in the control points, and the potentially short distance between them, often render this approach impractical, particularly when w is spatially distant from the control points (see §3. 1). Instead, model parametferorsm are computed owinittsh a efeac §to3r.1iz)a.ti Ionnst e baadse,d m [o1d0e]l mpaertahmode.Given the inlier labeling B in M, let WB be the subset of trajectories where bif = 1for at least half of the frames. The orthonormal basis S of a ω = 2D (or 3D) motion model can be determined by the 2 (or 3) left singular vectors of WB. Using S as the model’s motion matrices, prediction accuracy can be computed using: E(w, M) = ??SS?w − w??2 , (22) which is the sum of squared?? Euclidean d??eviations from the predictions (SS?w), to the observed data (w). Our experiments indicated that, although sensitive to outliers, these model predictions are much more robust to noise. Ownership variables Π ∈ {0, 1}[P×N] indicate whether trajectory p i ps explained by t {he0 ,n1-}th model (πp,n = 1) or not (πp,n = 0), and are determined by maximum prediction accuracy (i.e. minimum Euclidean deviation): πp,n=⎨⎧01,, oift hMerjw,nis=e. aMrg∈mMinjE(wp,M) (23) Regularization terms The second term from Eq. 21 penalizes situations where multiple models explain a trajectory (w) with relatively small residuals. For brevity, let M) = exp{−E(w, M)}, then: Eˆ(w, Φ(w,Mj) = −logMMm∈?∈aMMxjE ˆ ( w , MM)). (24) The third term regularizes over the number of model parameters, and is evaluated using Eq. 8. The constants λΦ and λΨ modulate the effect of the corresponding regularizer. Table 1: Accuracy and run-time for the H155 dataset. Naive RANSAC included as a baseline with overall accuracy and total computation time estimated using data from [12]. SOCARAPSulgrCAoNs[S r[31itA]4h]CmAverage89 A689 c. 71c 7695u racy[%]Compu1 t4a 237t1i506o70 n0 time[s] 6. Refinement The optimal model subset M? yields ownership variableTsh Πe o? wtimhicahl can already tb e M interpreted as a segmentation result. However, we found that segmentation accuracy can be improved by incorporating the labellings Πt from the top T subsets {Mt? | 1 ≤ t ≤ T} closest to optimal. Multiple labellings are incorporated oinsetos an affinity matrix F, where the fi,j entry indicates the frequency with which trajectory i is given the same label as trajectory j across all T labelli?ngs, weighted b?y the relative objective function O˜t = exp ?−OOSS((WW||MMt??))? for such a labelling: fi,j=?tT=11O˜tt?T=1?πit,:πjt,?:?O˜t (25) Note that the inne?r product between the label vectors (πi,:πj?,:) is equal to one only when the labels are the same. A spectral clustering method is applied on F to produce the method’s final segmentation result. 7. Experiments Evaluation was made through three experimental setups. Hopkins 155 The Hopkins 155 (H155) dataset has been the standard evaluation metric for the problem of motion segmentation of trajectory data since 2007. It consists of checkerboard, traffic and articulated sequences with either 2 or 3 motions. Data was automatically tracked, but tracking errors were manually corrected; further details are available in [12]. The use of a standard dataset enables direct comparison of accuracy and run-time performance. Table 1 shows the relevant figures for the two most competitive algorithms that we are aware of. The data indicates that our algorithm has run-times that are close to 2 or 3 orders of magnitude faster than the state of the art methods, with minimal accuracy loss. Computation times are measured in the same (or very similar) hardware architectures. Like in CPA, our implementation uses a single set of parameters for all the experiments, but as others had pointed out [14], it remains unclear whether the same is true for the results reported in the original SSC paper. 222222666533 Figure 4: Accuracy error-bars across artificial H155 datasets with controlled levels of Gaussian noise. Artificial Noise The second experimental setup complements an unexplored dimension in the H155 dataset: noise. The goal is to determine the effects of noise of different magnitudes towards the segmentation accuracy of our method, in comparison with the state of the art. We noted that H155 contains structured long-tailed noise, but for the purpose of this experiment we required a noise-free dataset as a baseline. To generate such a dataset, ground-truth labels were used to compute a rank 3 reconstruction of (mean-subtracted) trajectories for each segment. Then, multiple versions of H155 were computed by contaminating the noise-free dataset with Gaussian noise of magnitudes σn ∈ {0.01, 0.25, 0.5, 1, 2, 4, 8}. Our method, as well as SSC a∈nd { 0C.0PA1, were run on t2h,e4s,e8 }n.oi Ose-ucro mnterothlloedd, datasets; results are shown in Figure 4. The error bars on SSC and Ours indicate one standard deviation, computed over 20 runs. The plot for CPA is generated with only one run for each dataset (running time: 11.95 days). The graph indicates that our method only compromises accuracy for large levels of noise, while still being around 2 or 3 orders of magnitude faster than the most competitive algorithms. KLT Tracking The last experimental setup evaluates the applicability of the algorithm in real world conditions using raw tracks from an off-the-shelf implementation [1] of the Kanade-Lucas-Tomasi algorithm. Several sequences were tracked and the resulting trajectories classified by our method. Figure 5 shows qualitatively good motion segmentation results for four sequences. Challenges include very small relative motions, tracking noise, and a large presence of outliers. 8. Conclusions We introduced a computationally efficient motion segmentation algorithm for trajectory data. Efficiency comes from the use of a simple but powerful representation of motion that explicitly incorporates mechanisms to deal with noise, outliers and motion degeneracies. Run-time comparisons indicate that our method is 2 or 3 orders of magnitude faster than the state of the art, with only a small loss in accuracy. The robustness of our method to Gaussian noise tracks from four Formula 1 sequences. Italian Grand Prix ?c2012 Formula 1. In this figure, all trajectories are given a m?co2ti0o1n2 l Faoberml, including hoiust fliigeurrse. of different magnitudes was found competitive with state of the art, while retaining the inherent computational efficiency. The method was also found to be useful for motion segmentation of real-world, raw trajectory data. References [1] ht tp : / /www . ce s . c l emn s on . edu / ˜stb / k lt . 8 [2] J. P. Costeira and T. Kanade. A Multibody Factorization Method for Independently Moving Objects. IJCV, 1998. 1 [3] E. Elhamifar and R. Vidal. Sparse subspace clustering. In Proc. CVPR, 2009. 2, 7 [4] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 1981. 3 [5] M. Irani and P. Anandan. Parallax geometry of pairs of points for 3d scene analysis. Proc. ECCV, 1996. 2 [6] K. Kanatani. Motion segmentation by subspace separation: Model selection and reliability evaluation. International Journal Image Graphics, 2002. 1 [7] H. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, 1987. 1 [8] K. Schindler, D. Suter, , and H. Wang. A model-selection framework for multibody structure-and-motion of image sequences. Proc. IJCV, 79(2): 159–177, 2008. 1 [9] C. Tomasi and T. Kanade. Shape and motion without depth. Proc. [10] [11] [12] [13] [14] [15] ICCV, 1990. 1 C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 1, 3, 7 P. Torr. Geometric motion segmentation and model selection. Phil. Tran. of the Royal Soc. of Lon. Series A: Mathematical, Physical and Engineering Sciences, 1998. 1 R. Tron and R. Vidal. A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms. In Proc. CVPR, 2007. 7 J. Yan and M. Pollefeys. A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. PAMI, 2008. 1 L. Zappella, E. Provenzi, X. Llad o´, and J. Salvi. Adaptive motion segmentation algorithm based on the principal angles configuration. Proc. ACCV, 2011. 2, 7 L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications in multi-body and multi-sequence factorizations. Proc. CVPR, 2003. 1 222222666644
4 0.068363875 33 cvpr-2013-Active Contours with Group Similarity
Author: Xiaowei Zhou, Xiaojie Huang, James S. Duncan, Weichuan Yu
Abstract: Active contours are widely used in image segmentation. To cope with missing or misleading features in images, researchers have introduced various ways to model the prior of shapes and use the prior to constrain active contours. However, the shape prior is usually learnt from a large set of annotated data, which is not always accessible in practice. Moreover, it is often doubted that the existing shapes in the training set will be sufficient to model the new instance in the testing image. In this paper, we propose to use the group similarity of object shapes in multiple images as a prior to aid segmentation, which can be interpreted as an unsupervised approach of shape prior modeling. We show that the rank of the matrix consisting of multiple shapes is a good measure of the group similarity of the shapes, and the nuclear norm minimization is a simple and effective way to impose the proposed constraint on existing active contour models. Moreover, we develop a fast algorithm to solve the proposed model by using the accelerated proximal method. Experiments using echocardiographic image sequences acquired from acute canine experiments demonstrate that the proposed method can consistently improve the performance of active contour models and increase the robustness against image defects such as missing boundaries.
5 0.067583896 175 cvpr-2013-First-Person Activity Recognition: What Are They Doing to Me?
Author: Michael S. Ryoo, Larry Matthies
Abstract: This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand ‘what activity others are performing to it’ from continuous video inputs. These include friendly interactions such as ‘a person hugging the observer’ as well as hostile interactions like ‘punching the observer’ or ‘throwing objects to the observer’, whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multichannel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.
6 0.064370483 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
7 0.061817795 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
8 0.061553247 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree
9 0.061510023 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
10 0.059910927 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
11 0.058586452 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
12 0.056539077 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
13 0.055899955 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images
14 0.04882665 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
15 0.048242908 441 cvpr-2013-Tracking Sports Players with Context-Conditioned Motion Models
16 0.047377579 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
17 0.047160003 378 cvpr-2013-Sampling Strategies for Real-Time Action Recognition
18 0.045830745 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
19 0.04580066 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
20 0.045517448 40 cvpr-2013-An Approach to Pose-Based Action Recognition
topicId topicWeight
[(0, 0.115), (1, 0.017), (2, -0.011), (3, -0.033), (4, -0.046), (5, -0.008), (6, 0.001), (7, -0.054), (8, 0.029), (9, 0.007), (10, 0.044), (11, 0.005), (12, 0.016), (13, -0.023), (14, 0.056), (15, 0.04), (16, 0.026), (17, 0.012), (18, -0.001), (19, -0.018), (20, -0.017), (21, 0.013), (22, -0.037), (23, -0.014), (24, -0.006), (25, 0.024), (26, 0.026), (27, -0.036), (28, -0.011), (29, -0.01), (30, -0.019), (31, -0.003), (32, -0.036), (33, 0.002), (34, -0.003), (35, -0.015), (36, -0.005), (37, 0.033), (38, -0.055), (39, 0.012), (40, -0.05), (41, 0.041), (42, -0.052), (43, 0.096), (44, -0.051), (45, -0.006), (46, 0.022), (47, -0.01), (48, -0.025), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.90850896 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
2 0.70035982 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes
Author: Rui Li, Pengcheng Shi, Anne R. Haake
Abstract: Eliciting and representing experts ’ remarkable perceptual capability of locating, identifying and categorizing objects in images specific to their domains of expertise will benefit image understanding in terms of transferring human domain knowledge and perceptual expertise into image-based computational procedures. In this paper, we present a hierarchical probabilistic framework to summarize the stereotypical and idiosyncratic eye movement patterns shared within 11 board-certified dermatologists while they are examining and diagnosing medical images. Each inferred eye movement pattern characterizes the similar temporal and spatial properties of its corresponding seg. edu anne .haake @ rit . edu , ments of the experts ’ eye movement sequences. We further discover a subset of distinctive eye movement patterns which are commonly exhibited across multiple images. Based on the combinations of the exhibitions of these eye movement patterns, we are able to categorize the images from the perspective of experts’ viewing strategies. In each category, images share similar lesion distributions and configurations. The performance of our approach shows that modeling physicians ’ diagnostic viewing behaviors informs about medical images’ understanding to correct diagnosis.
3 0.63825381 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models
Author: Robert Anderson, Björn Stenger, Vincent Wan, Roberto Cipolla
Abstract: This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a ‘talking head’, given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
4 0.62062812 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
Author: Christian Thériault, Nicolas Thome, Matthieu Cord
Abstract: In this paper, we address the challenging problem of categorizing video sequences composed of dynamic natural scenes. Contrarily to previous methods that rely on handcrafted descriptors, we propose here to represent videos using unsupervised learning of motion features. Our method encompasses three main contributions: 1) Based on the Slow Feature Analysis principle, we introduce a learned local motion descriptor which represents the principal and more stable motion components of training videos. 2) We integrate our local motion feature into a global coding/pooling architecture in order to provide an effective signature for each video sequence. 3) We report state of the art classification performances on two challenging natural scenes data sets. In particular, an outstanding improvement of 11 % in classification score is reached on a data set introduced in 2012.
5 0.58609545 203 cvpr-2013-Hierarchical Video Representation with Trajectory Binary Partition Tree
Author: Guillem Palou, Philippe Salembier
Abstract: As early stage of video processing, we introduce an iterative trajectory merging algorithm that produces a regionbased and hierarchical representation of the video sequence, called the Trajectory Binary Partition Tree (BPT). From this representation, many analysis and graph cut techniques can be used to extract partitions or objects that are useful in the context of specific applications. In order to define trajectories and to create a precise merging algorithm, color and motion cues have to be used. Both types of informations are very useful to characterize objects but present strong differences of behavior in the spatial and the temporal dimensions. On the one hand, scenes and objects are rich in their spatial color distributions, but these distributions are rather stable over time. Object motion, on the other hand, presents simple structures and low spatial variability but may change from frame to frame. The proposed algorithm takes into account this key difference and relies on different models and associated metrics to deal with color and motion information. We show that the proposed algorithm outperforms existing hierarchical video segmentation algorithms and provides more stable and precise regions.
6 0.57975066 282 cvpr-2013-Measuring Crowd Collectiveness
7 0.57968432 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences
8 0.57038486 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
9 0.56855476 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
10 0.56657356 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
11 0.56092185 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction
12 0.54632729 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos
13 0.5440833 103 cvpr-2013-Decoding Children's Social Behavior
14 0.54078424 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
15 0.5389666 88 cvpr-2013-Compressible Motion Fields
16 0.5340457 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
17 0.52885878 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
18 0.51125592 100 cvpr-2013-Crossing the Line: Crowd Counting by Integer Programming with Local Features
19 0.50959677 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
20 0.486931 59 cvpr-2013-Better Exploiting Motion for Better Action Recognition
topicId topicWeight
[(10, 0.088), (16, 0.406), (26, 0.042), (33, 0.194), (39, 0.012), (67, 0.078), (69, 0.027), (87, 0.039)]
simIndex simValue paperId paperTitle
1 0.90619183 410 cvpr-2013-Specular Reflection Separation Using Dark Channel Prior
Author: Hyeongwoo Kim, Hailin Jin, Sunil Hadap, Inso Kweon
Abstract: We present a novel method to separate specular reflection from a single image. Separating an image into diffuse and specular components is an ill-posed problem due to lack of observations. Existing methods rely on a specularfree image to detect and estimate specularity, which however may confuse diffuse pixels with the same hue but a different saturation value as specular pixels. Our method is based on a novel observation that for most natural images the dark channel can provide an approximate specular-free image. We also propose a maximum a posteriori formulation which robustly recovers the specular reflection and chromaticity despite of the hue-saturation ambiguity. We demonstrate the effectiveness of the proposed algorithm on real and synthetic examples. Experimental results show that our method significantly outperforms the state-of-theart methods in separating specular reflection.
same-paper 2 0.8023802 118 cvpr-2013-Detecting Pulse from Head Motions in Video
Author: Guha Balakrishnan, Fredo Durand, John Guttag
Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.
3 0.76522011 27 cvpr-2013-A Theory of Refractive Photo-Light-Path Triangulation
Author: Visesh Chari, Peter Sturm
Abstract: 3D reconstruction of transparent refractive objects like a plastic bottle is challenging: they lack appearance related visual cues and merely reflect and refract light from the surrounding environment. Amongst several approaches to reconstruct such objects, the seminal work of Light-Path triangulation [17] is highly popular because of its general applicability and analysis of minimal scenarios. A lightpath is defined as the piece-wise linear path taken by a ray of light as it passes from source, through the object and into the camera. Transparent refractive objects not only affect the geometric configuration of light-paths but also their radiometric properties. In this paper, we describe a method that combines both geometric and radiometric information to do reconstruction. We show two major consequences of the addition of radiometric cues to the light-path setup. Firstly, we extend the case of scenarios in which reconstruction is plausible while reducing the minimal re- quirements for a unique reconstruction. This happens as a consequence of the fact that radiometric cues add an additional known variable to the already existing system of equations. Secondly, we present a simple algorithm for reconstruction, owing to the nature of the radiometric cue. We present several synthetic experiments to validate our theories, and show high quality reconstructions in challenging scenarios.
4 0.74717593 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
Author: Ahmed T. Kamal, Jay A. Farrell, Amit K. Roy-Chowdhury
Abstract: Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computer vision. Multitarget tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Since most cameras are directional sensors, it is often the case that neighboring sensors may not be sensing the same target. Such sensors that do not have information about a target are termed as “naive ” with respect to that target. In this paper, we propose consensus-based distributed multi-target tracking algorithms in a camera network that are designed to address this issue of naivety. The estimation errors in tracking and data association, as well as the effect of naivety, are jointly addressed leading to the development of an informationweighted consensus algorithm, which we term as the Multitarget Information Consensus (MTIC) algorithm. The incorporation of the probabilistic data association mecha- nism makes the MTIC algorithm very robust to false measurements/clutter. Experimental analysis is provided to support the theoretical results.
5 0.7420966 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
Author: Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, Stan Z. Li
Abstract: The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques [14, 23]. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).
6 0.73407251 271 cvpr-2013-Locally Aligned Feature Transforms across Views
7 0.72518331 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
8 0.69017923 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
10 0.61633694 349 cvpr-2013-Reconstructing Gas Flows Using Light-Path Approximation
11 0.60778326 454 cvpr-2013-Video Enhancement of People Wearing Polarized Glasses: Darkening Reversal and Reflection Reduction
12 0.60564232 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
13 0.5972048 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
14 0.59520566 130 cvpr-2013-Discriminative Color Descriptors
15 0.59170526 115 cvpr-2013-Depth Super Resolution by Rigid Body Self-Similarity in 3D
16 0.58570009 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
17 0.58321393 269 cvpr-2013-Light Field Distortion Feature for Transparent Object Recognition
18 0.58196789 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
19 0.57790279 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
20 0.57605898 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video