iccv iccv2013 iccv2013-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
Reference: text
sentIndex sentText sentNum sentScore
1 Bayesian 3D tracking from monocular video Ernesto Brau† erne st o @ c s . [sent-1, score-0.235]
2 uk †Computer Science University of Arizona Abstract Jinyan Guan† j guan1 @ emai l ari z ona . [sent-6, score-0.255]
3 Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . [sent-10, score-0.255]
4 edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. [sent-14, score-0.286]
5 We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. [sent-15, score-0.311]
6 A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. [sent-19, score-0.525]
7 We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. [sent-20, score-0.275]
8 Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. [sent-22, score-0.219]
9 We infer camera parameters and people’s sizes as part of the tracking process. [sent-31, score-0.29]
10 Given a model hypothesis, we project each person cylinder into each frame using the current camera, computing their visibility as a consequence of any existing occlusion. [sent-42, score-0.232]
11 We then evaluate the hypothesis using evidence from the output of person detectors and optical flow. [sent-43, score-0.222]
12 , [32, 23, 1]) and classical approaches like tracking as following evidence locally in time as is common in filtering methods (e. [sent-46, score-0.246]
13 To track multiple people in videos we infer an association between persons and detections, collaterally determin33336681 ing a likely set of 3D trajectories for the people in the scene. [sent-51, score-0.849]
14 During inference we also sample the global parameters for the video which includes the camera and the false detection rate, which we consider to be a function of the scene background. [sent-54, score-0.211]
15 Our data association approach extends that of Oh et al. [sent-56, score-0.41]
16 [8] who used Gaussian processes for trajectory smoothness while searching over associations by sampling. [sent-59, score-0.278]
17 Others [40, 7] use a similar data association model, but propose an effective non-sampling approach for inference. [sent-60, score-0.41]
18 All these efforts are focused on association of points alone; neither appearance or geometry are considered. [sent-61, score-0.41]
19 In particular, Isard and MacCormick [21] use a 3D cylinder model for multi-person tracking using a single, known camera. [sent-65, score-0.269]
20 Similarly, there is other work in tracking objects on the 3D ground plane [16, 13, 28] without considering data association. [sent-67, score-0.209]
21 Other approaches estimate data association as well as model parameters [39, 19, 10]. [sent-68, score-0.41]
22 However, we model data association explicitly in a generative way, as opposed to estimating it as a by-product of inference. [sent-69, score-0.446]
23 Andriyenko and Schindler [3] pose data association as an integer linear program. [sent-71, score-0.41]
24 [5] attempt to solve both data association and trajectory estimation problems using similar modeling ideas as in their previous work. [sent-74, score-0.515]
25 In contrast to our work, they simultaneously optimize both association and trajectory energy functions, which results in a space of varying dimensionality. [sent-75, score-0.515]
26 Model, priors, and likelihood In the data-association treatment of the multi-target tracking problem [30, 8], an unknown number of objects (targets) move in a volume, producing observations (detections) at discrete times. [sent-78, score-0.24]
27 The objective is to determine the association, ω, which specifies which detections were produced by which target, as well as which were generated spuriously. [sent-79, score-0.219]
28 Here, the targets are the people moving around the ground plane, and the observations (B) are detection boxes obtained by running a person detector [14] on each frame of a video. [sent-80, score-0.411]
29 pTrhieo prior over anss aoncdia ptio(nBs cωo)nt iasin tsh priors over quantities like the number of tracks and the number of detections per track. [sent-82, score-0.489]
30 In our model, each person in the scene has a 3D configuration zr, which is composed of their trajectory (a sequence of points on the ground plane) and their size, which consists of height, width, and girth. [sent-84, score-0.204]
31 We also model evidence from optical flow features [26], I. [sent-85, score-0.26]
32 Using all this, we can compute the likelihood function of an association by integrating? [sent-86, score-0.478]
33 Association Formally, an association ω = {τr ⊂ B}rm=0 is a partitionF oorfm mthalel sye,t a onf a adsestoeccitaiotinosn B ω, w=h e{τre τ1 , . [sent-96, score-0.41]
34 The association entity is based on well-known work by Oh et al. [sent-101, score-0.41]
35 [3 1], but we extend that work by (1) allowing tracks to produce multiple measurements at any given frame and (2) employing a prior on associations which allows parameters governing track dynamics and detector behavior to adapt to the environment of a particular video. [sent-102, score-0.57]
36 We assume an association is the result of the following generative process. [sent-103, score-0.446]
37 rm=1 art as 33336692 (a) Graphical model for an association (b) Graphical model after association Figure 1. [sent-116, score-0.82]
38 e, and l are the number of tracks created at each frame and their lengths; n and A are the detections from noise and tracks, respectively; ω is the resulting association. [sent-120, score-0.473]
39 (b) Graphical model of the joint distribution, omitting details about the association prior. [sent-122, score-0.41]
40 , τm}) and γ = (κ, λN) are parameters for t(whei tahss ωoc =iat i{oτn prior; xr d)e annodte γ trajectories, and dr are the dimensions of objects; C denotes the camera; B is the detection data and I image optical flow data. [sent-126, score-0.385]
41 Noise detections and noise optical flow vectors are omitted. [sent-128, score-0.405]
42 θ, the number oftrue detections at frame t, and Nt = nt+at as the total number of detections at t. [sent-129, score-0.512]
43 Finally, a fully-specified assignment in frame t is a permutation of its Nt detections, with the first nt associated to noise, the next a1t associated to the first track in the frame, etc. [sent-130, score-0.322]
44 The number of detections per target per f rPaomies(, as ,wte >ll as the number of noisy detections, are also Poisson distributed, with parameters λA and λN, respectively. [sent-136, score-0.219]
45 (a) An association with two tracks that span a video of five frames. [sent-139, score-0.59]
46 The red boxes make up τ1 and the blue boxes are τ2, while the black boxes are part of the set of false alarms τ0. [sent-140, score-0.551]
47 (b) The corresponding 3D scene with two trajectories z1 and z2, whose colors correspond to the tracks in (a). [sent-141, score-0.285]
48 Although τ1 has no detections at time t + 3, z1 still exists there with position x14. [sent-142, score-0.219]
49 total tracks m, entrances e, exits d, true detections a, noisy detections n, and track lengths l, as well as the number of ways to permute track labels within frames, and detections within tracks and frames. [sent-143, score-1.391]
50 Scene and Camera Each track τr ∈ ω, has a corresponding trajectory on the ground plane. [sent-158, score-0.292]
51 ∈The ω trajectory corresponding jtoe ctrtoacryk τr is xr = (xr 1, . [sent-159, score-0.257]
52 The length lr of trajectory xr is determined by the∈ fi rRst and last detections of track τr. [sent-163, score-0.734]
53 Note that, while τr contains no elements for frames where the person was not detected, xr j is specified for every j between the track’s initial and final frame. [sent-164, score-0.213]
54 We will denote the 3D configuration of track τr by zr = (xr, dr). [sent-166, score-0.245]
55 Specifically, trajectory xr × is the curve generated by a sample from a GP with inputs Sr = {1, . [sent-168, score-0.257]
56 Assuming independence between parameters, Nthe( camera prior is p(C) = p(η | μη , ση)p(ψ | μψ , σψ)p(f | μf , σf) where iCs p=( C(η), =f p). [sent-192, score-0.194]
57 We convert a 3D scene to a 2D representation by transforming every cylinder at every frame into a 2D box in the image via the camera. [sent-194, score-0.314]
58 Given a trajectory element xrj, we take uniformly-spaced (3D) points on the rims of the cylinder, project them onto the image plane using the camera C and find the minimum bounding box hrj around the resulting 2D points. [sent-195, score-0.595]
59 hrj that is not occluded from the camera, as follows. [sent-198, score-0.263]
60 First, we run various person detectors on the video frames to get bounding boxes Bt = {bt1, . [sent-210, score-0.208]
61 That is, for any assigned data box btj ∈ τr, r 0, and the corresponding model box (for simplicity, assume tr anadck τr s ctoarrtrse sapt otn d=i n1g) mhrot =el C bo(xxrt (f, odrr )si, we ihcaitvye, = ? [sent-234, score-0.28]
62 hrjxrj zr in frame j gets projected via camera onto the image plane, and model box hrj is computed around it. [sent-236, score-0.585]
63 Bottom-left: The likelihood for the x component of hrj (blue) given one of its corresponding data boxes b ∈ B (dark red), i. [sent-237, score-0.478]
64 hrj; that btxj − hrxt ∼ Laplace(μx, σx) (see Figure 3 bottomleft) whi−ch implies Lthaaptl btxj |( hrxt ∼ Laplace(hrxt σx), htrotp hrbtot. [sent-247, score-0.316]
65 , p(btxj) = and p(bttojp) h1I, w1I = for all false alarms btj ∈ τ0, where wI and hI are the width and height of the image. [sent-250, score-0.225]
66 Combining all these factors, and considering conditional independence, we get a box likelihood p(B | z, ω, C) given by ? [sent-251, score-0.214]
67 B\τ0 where h(b) is the model box of the cylinder for the target and frame corresponding to box b, and φB = (μx, σx, μtop, σtop, μbot, σbot). [sent-255, score-0.381]
68 Let IB be the set of boxes of all sizes and locations that fit within the image, and v¯t (b) be the average of the optical flow vectors from frame t contained in box b. [sent-258, score-0.512]
69 Nhro t+1, annsdid leert urt i=r (fu cxrot , usreyct)be the difference of their centers (called model direction) and v = (vx, vy) ∈ It be the average flow vector that cor33336714 responds to the box of location and size equal to hrt. [sent-264, score-0.204]
70 parsity of the trajectory boxes and dividing by the constant ? [sent-278, score-0.252]
71 (5) Finally, since detection boxes and optical flow are conditionally independent, we have that p(D | z, ω, C) = dp(itBio |n z, ω, iCn)dpep(Ie |n z, ω, ,C w) BO |czcl,uωs,ioCn). [sent-282, score-0.333]
72 Inference We wish to find the MAP estimate of ω as a good solu- tion to the data association problem. [sent-302, score-0.41]
73 In addition, we need to infer the camera parameters C, and the association prior parameters γ = (κ, θ, λN), which we consider functions of the video. [sent-303, score-0.567]
74 To search the space of associations and associated parameters we use Markov chain Monte Carlo (MCMC) sampling techniques. [sent-306, score-0.199]
75 The blue and red boxes belong to tracks τ1 and τ2, respectively, and the black boxes are part of the false alarms τ0. [sent-310, score-0.584]
76 If none of the detections from time t is assigned, we stop growing τm? [sent-348, score-0.219]
77 The new association is then set to 33336725 blue boxes represent the last detections of the track, the red line is fit to their bottoms and extrapolates the ideal position of the new boxes, represented by the center of the concentric circles. [sent-350, score-0.776]
78 The black boxes are then appended to the track based on their distance from the ideal point (e. [sent-351, score-0.334]
79 dI rne mbootvh,e th alel resulting association is ω? [sent-371, score-0.41]
80 We replace the standard MCMCDA merge and split moves with alternatives that exploit the fact that we allow tracks to contain multiple detections from a single frame. [sent-375, score-0.399]
81 ) proportional to the probability of birthing track τr? [sent-379, score-0.187]
82 We then choo∪se τ a pair based on those probabilities, and the resulting track becomes τr = τr? [sent-383, score-0.187]
83 To split track τr, we first choose tw=o f(rωa\m{τes t and} t)? [sent-392, score-0.187]
84 First select tracks r1 and r2 uniformly, and choose one detection from each track (with indices j and k) such that their locations are within a distance v times their temporal offset. [sent-408, score-0.367]
85 Then, the detections after j in track r1 and those before k in track r2 are swapped. [sent-409, score-0.593]
86 We use HMC to sample from the camera posterior p(C | γ, ω, B, I) ∝ p(B, IC, ω)p(C) , as | tchaims ehraas proved re pff(eCct |ivγe, ωin, Bth,eI t)a ∝sk po(Bf camera ωe)spti(mCa)t,io ans under a similar parametrization [12]. [sent-432, score-0.25]
87 For image data, we precomputed the dense optical flow of each frame using an existing software [26]. [sent-440, score-0.26]
88 To calibrate relevant parameters of the generative model, we match each detection box to the ground truth box with which it has maximum overlapping area, provided it is greater than 50%, otherwise it is counted as a false detection. [sent-446, score-0.295]
89 For the former, we simply average number of detections associated to each ground truth box; we estimate the latter using a maximum likelihood approach 1http://www. [sent-448, score-0.287]
90 The sampler is initialized with an empty association (ω = {}), and a camera C which is fit to the data aBs suoncdiaetri otnhe( bωo =x l {ik}e)l,i ahnododa (eq. [sent-453, score-0.495]
91 We use the CLEAR metrics [38] which consists of two measurements, multiple object tracking accuracy (MOTA) and multiple object tracking precision (MOTP). [sent-460, score-0.344]
92 MOTA is a measure of false positives, missed targets and track switches, and ranges from −∞ to 1m, swseitdh 1a being a perfect score. [sent-461, score-0.289]
93 Not surprisingly, the performance took the greatest blow when the tracker ignored optical flow features. [sent-470, score-0.247]
94 Discussion We presented a tracker which incorporates representations for data association and 3D scene in a principled way. [sent-520, score-0.509]
95 A generative statistical model for tracking multiple smooth trajectories. [sent-583, score-0.208]
96 Simultaneous 3d object tracking and camera parameter estimation by bayesian methods and transdimensional mcmc sampling. [sent-723, score-0.367]
97 Bayesian formulation of data association and markov chain monte carlo data association. [sent-732, score-0.704]
98 Markov chain Monte Carlo data association for general multiple target tracking prob- [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] lems. [sent-738, score-0.648]
99 Efficient track linking methods for track graphs using network-flow and set-cover techniques. [sent-802, score-0.374]
100 Coupling detection and data association for multiple object tracking. [sent-809, score-0.41]
wordName wordTfidf (topN-words)
[('association', 0.41), ('hrj', 0.263), ('detections', 0.219), ('track', 0.187), ('tracks', 0.18), ('tracking', 0.172), ('xr', 0.152), ('boxes', 0.147), ('laplace', 0.139), ('andriyenko', 0.122), ('monte', 0.109), ('trajectory', 0.105), ('box', 0.105), ('ona', 0.102), ('flow', 0.099), ('cylinder', 0.097), ('brau', 0.093), ('associations', 0.09), ('optical', 0.087), ('camera', 0.085), ('posterior', 0.08), ('carlo', 0.079), ('btxj', 0.079), ('emai', 0.079), ('hmc', 0.079), ('hrxt', 0.079), ('people', 0.076), ('ari', 0.074), ('tud', 0.074), ('frame', 0.074), ('evidence', 0.074), ('lr', 0.071), ('independence', 0.07), ('birth', 0.07), ('btj', 0.07), ('vague', 0.07), ('likelihood', 0.068), ('trajectories', 0.067), ('chain', 0.066), ('pets', 0.065), ('oh', 0.064), ('monocular', 0.063), ('bayesian', 0.062), ('tracker', 0.061), ('alarms', 0.061), ('nt', 0.061), ('person', 0.061), ('pages', 0.058), ('zr', 0.058), ('multitarget', 0.054), ('vx', 0.054), ('targets', 0.053), ('bttojp', 0.053), ('choo', 0.053), ('gilks', 0.053), ('hrxj', 0.053), ('kobus', 0.053), ('mcmcda', 0.053), ('methodmotamotpmtml', 0.053), ('pois', 0.053), ('urxt', 0.053), ('likelihoods', 0.052), ('priors', 0.051), ('dz', 0.05), ('graphical', 0.05), ('schindler', 0.05), ('false', 0.049), ('integral', 0.048), ('mcmc', 0.048), ('dr', 0.047), ('richardson', 0.047), ('stadtmitte', 0.047), ('smoothness', 0.046), ('uniformly', 0.046), ('height', 0.045), ('motp', 0.043), ('sampling', 0.043), ('conditional', 0.041), ('guan', 0.041), ('markov', 0.04), ('prior', 0.039), ('inference', 0.039), ('pero', 0.039), ('arizona', 0.039), ('mota', 0.039), ('scene', 0.038), ('isard', 0.037), ('processes', 0.037), ('vy', 0.037), ('plane', 0.037), ('grow', 0.037), ('generative', 0.036), ('proposal', 0.035), ('del', 0.035), ('berlin', 0.034), ('campus', 0.034), ('infer', 0.033), ('sources', 0.033), ('barnard', 0.032), ('thick', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
2 0.44184953 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
Author: Aleksandr V. Segal, Ian Reid
Abstract: We propose a novel parametrization of the data association problem for multi-target tracking. In our formulation, the number of targets is implicitly inferred together with the data association, effectively solving data association and model selection as a single inference problem. The novel formulation allows us to interpret data association and tracking as a single Switching Linear Dynamical System (SLDS). We compute an approximate posterior solution to this problem using a dynamic programming/message passing technique. This inference-based approach allows us to incorporate richer probabilistic models into the tracking system. In particular, we incorporate inference over inliers/outliers and track termination times into the system. We evaluate our approach on publicly available datasets and demonstrate results competitive with, and in some cases exceeding the state of the art.
3 0.20039405 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
4 0.18087563 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
5 0.17091504 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
Author: K.C. Amit Kumar, Christophe De_Vleeschouwer
Abstract: Given a set of plausible detections, detected at each time instant independently, we investigate how to associate them across time. This is done by propagating labels on a set of graphs that capture how the spatio-temporal and the appearance cues promote the assignment of identical or distinct labels to a pair of nodes. The graph construction is driven by the locally linear embedding (LLE) of either the spatio-temporal or the appearance features associated to the detections. Interestingly, the neighborhood of a node in each appearance graph is defined to include all nodes for which the appearance feature is available (except the ones that coexist at the same time). This allows to connect the nodes that share the same appearance even if they are temporally distant, which gives our framework the uncommon ability to exploit the appearance features that are available only sporadically along the sequence of detections. Once the graphs have been defined, the multi-object tracking is formulated as the problem of finding a label assignment that is consistent with the constraints captured by each of the graphs. This results into a difference of convex program that can be efficiently solved. Experiments are performed on a basketball and several well-known pedestrian datasets in order to validate the effectiveness of the proposed solution.
6 0.15982096 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
7 0.1536568 87 iccv-2013-Conservation Tracking
8 0.15193519 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
9 0.14901215 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
10 0.14079638 39 iccv-2013-Action Recognition with Improved Trajectories
11 0.14013487 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
12 0.13077052 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context
13 0.13026261 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
14 0.1302601 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
15 0.1300163 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
16 0.12365983 317 iccv-2013-Piecewise Rigid Scene Flow
17 0.12172788 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
18 0.1156773 143 iccv-2013-Estimating Human Pose with Flowing Puppets
19 0.11194871 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
20 0.11183956 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
topicId topicWeight
[(0, 0.263), (1, -0.109), (2, 0.045), (3, 0.119), (4, 0.061), (5, -0.036), (6, -0.092), (7, 0.147), (8, -0.018), (9, 0.17), (10, -0.029), (11, -0.137), (12, 0.086), (13, 0.06), (14, 0.028), (15, 0.007), (16, 0.001), (17, 0.077), (18, -0.027), (19, -0.023), (20, -0.14), (21, -0.074), (22, 0.092), (23, -0.016), (24, 0.055), (25, 0.033), (26, 0.08), (27, -0.195), (28, -0.067), (29, -0.022), (30, -0.048), (31, -0.009), (32, 0.046), (33, -0.065), (34, -0.008), (35, 0.091), (36, 0.109), (37, -0.016), (38, 0.044), (39, 0.016), (40, 0.107), (41, 0.017), (42, 0.003), (43, -0.068), (44, -0.173), (45, -0.064), (46, 0.001), (47, -0.008), (48, -0.017), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.9608618 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
2 0.88579822 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
Author: Aleksandr V. Segal, Ian Reid
Abstract: We propose a novel parametrization of the data association problem for multi-target tracking. In our formulation, the number of targets is implicitly inferred together with the data association, effectively solving data association and model selection as a single inference problem. The novel formulation allows us to interpret data association and tracking as a single Switching Linear Dynamical System (SLDS). We compute an approximate posterior solution to this problem using a dynamic programming/message passing technique. This inference-based approach allows us to incorporate richer probabilistic models into the tracking system. In particular, we incorporate inference over inliers/outliers and track termination times into the system. We evaluate our approach on publicly available datasets and demonstrate results competitive with, and in some cases exceeding the state of the art.
3 0.76924294 87 iccv-2013-Conservation Tracking
Author: Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
Abstract: The quality of any tracking-by-assignment hinges on the accuracy of the foregoing target detection / segmentation step. In many kinds of images, errors in this first stage are unavoidable. These errors then propagate to, and corrupt, the tracking result. Our main contribution is the first probabilistic graphical model that can explicitly account for over- and undersegmentation errors even when the number of tracking targets is unknown and when they may divide, as in cell cultures. The tracking model we present implements global consistency constraints for the number of targets comprised by each detection and is solved to global optimality on reasonably large 2D+t and 3D+t datasets. In addition, we empirically demonstrate the effectiveness of a postprocessing that allows to establish target identity even across occlusion / undersegmentation. The usefulness and efficiency of this new tracking method is demonstrated on three different and challenging 2D+t and 3D+t datasets from developmental biology.
4 0.70552999 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
Author: Seunghoon Hong, Suha Kwak, Bohyung Han
Abstract: We propose a novel offline tracking algorithm based on model-averaged posterior estimation through patch matching across frames. Contrary to existing online and offline tracking methods, our algorithm is not based on temporallyordered estimates of target state but attempts to select easyto-track frames first out of the remaining ones without exploiting temporal coherency of target. The posterior of the selected frame is estimated by propagating densities from the already tracked frames in a recursive manner. The density propagation across frames is implemented by an efficient patch matching technique, which is useful for our algorithm since it does not require motion smoothness assumption. Also, we present a hierarchical approach, where a small set of key frames are tracked first and non-key frames are handled by local key frames. Our tracking algorithm is conceptually well-suited for the sequences with abrupt motion, shot changes, and occlusion. We compare our tracking algorithm with existing techniques in real videos with such challenges and illustrate its superior performance qualitatively and quantitatively.
5 0.68141198 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
Author: Caglayan Dicle, Octavia I. Camps, Mario Sznaier
Abstract: We introduce a computationally efficient algorithm for multi-object tracking by detection that addresses four main challenges: appearance similarity among targets, missing data due to targets being out of the field of view or occluded behind other objects, crossing trajectories, and camera motion. The proposed method uses motion dynamics as a cue to distinguish targets with similar appearance, minimize target mis-identification and recover missing data. Computational efficiency is achieved by using a Generalized Linear Assignment (GLA) coupled with efficient procedures to recover missing data and estimate the complexity of the underlying dynamics. The proposed approach works with tracklets of arbitrary length and does not assume a dynamical model a priori, yet it captures the overall motion dynamics of the targets. Experiments using challenging videos show that this framework can handle complex target motions, non-stationary cameras and long occlusions, on scenarios where appearance cues are not available or poor.
6 0.65191162 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
7 0.6337778 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
8 0.62207073 393 iccv-2013-Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
9 0.59652483 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
10 0.57887346 395 iccv-2013-Slice Sampling Particle Belief Propagation
11 0.56542271 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context
12 0.56372708 120 iccv-2013-Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features
13 0.56049871 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
14 0.55183816 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
15 0.54674071 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking
16 0.53692657 128 iccv-2013-Dynamic Probabilistic Volumetric Models
17 0.53291428 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
18 0.52571493 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
19 0.52093792 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
20 0.51615584 168 iccv-2013-Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms
topicId topicWeight
[(2, 0.035), (7, 0.019), (26, 0.063), (31, 0.03), (34, 0.025), (40, 0.013), (42, 0.078), (64, 0.095), (73, 0.379), (89, 0.166), (95, 0.015), (98, 0.01)]
simIndex simValue paperId paperTitle
1 0.87492585 394 iccv-2013-Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal
Author: Ruixuan Wang, Emanuele Trucco
Abstract: This paper introduces a ‘low-rank prior’ for small oriented noise-free image patches: considering an oriented patch as a matrix, a low-rank matrix approximation is enough to preserve the texture details in the properly oriented patch. Based on this prior, we propose a single-patch method within a generalized joint low-rank and sparse matrix recovery framework to simultaneously detect and remove non-pointwise random-valued impulse noise (e.g., very small blobs). A weighting matrix is incorporated in the framework to encode an initial estimate of the spatial noise distribution. An accelerated proximal gradient method is adapted to estimate the optimal noise-free image patches. Experiments show the effectiveness of our framework in removing non-pointwise random-valued impulse noise.
2 0.86401582 98 iccv-2013-Cross-Field Joint Image Restoration via Scale Map
Author: Qiong Yan, Xiaoyong Shen, Li Xu, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Jiaya Jia
Abstract: Color, infrared, and flash images captured in different fields can be employed to effectively eliminate noise and other visual artifacts. We propose a two-image restoration framework considering input images in different fields, for example, one noisy color image and one dark-flashed nearinfrared image. The major issue in such a framework is to handle structure divergence and find commonly usable edges and smooth transition for visually compelling image reconstruction. We introduce a scale map as a competent representation to explicitly model derivative-level confidence and propose new functions and a numerical solver to effectively infer it following new structural observations. Our method is general and shows a principled way for cross-field restoration.
same-paper 3 0.83235943 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
4 0.79025847 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
Author: Yi Wu, Yoshihisa Ijiri, Ming-Hsuan Yang
Abstract: Detecting and registering nonrigid surfaces are two important research problems for computer vision. Much work has been done with the assumption that there exists only one instance in the image. In this work, we propose an algorithm that detects and registers multiple nonrigid instances of given objects in a cluttered image. Specifically, after we use low level feature points to obtain the initial matches between templates and the input image, a novel high-order affinity graph is constructed to model the consistency of local topology. A hierarchical clustering approach is then used to locate the nonrigid surfaces. To remove the outliers in the cluster, we propose a deterministic annealing approach based on the Thin Plate Spline (TPS) model. The proposed method achieves high accuracy even when the number of outliers is nineteen times larger than the inliers. As the matches may appear sparsely in each instance, we propose a TPS based match growing approach to propagate the matches. Finally, an approach that fuses feature and appearance information is proposed to register each nonrigid surface. Extensive experiments and evaluations demonstrate that the proposed algorithm achieves promis- ing results in detecting and registering multiple non-rigid surfaces in a cluttered scene.
5 0.77321994 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
Author: Jim Braux-Zin, Romain Dupont, Adrien Bartoli
Abstract: Dense motion field estimation (typically Romain Dupont1 romain . dupont @ cea . fr Adrien Bartoli2 adrien . bart o l @ gmai l com i . 2 ISIT, Universit e´ d’Auvergne/CNRS, France sions are explicitly modeled [32, 13]. Coarse-to-fine warping improves global convergence by making the assumption that optical flow, the motion of smaller structures is similar to the motion of stereo disparity and surface registration) is a key computer vision problem. Many solutions have been proposed to compute small or large displacements, narrow or wide baseline stereo disparity, but a unified methodology is still lacking. We here introduce a general framework that robustly combines direct and feature-based matching. The feature-based cost is built around a novel robust distance function that handles keypoints and “weak” features such as segments. It allows us to use putative feature matches which may contain mismatches to guide dense motion estimation out of local minima. Our framework uses a robust direct data term (AD-Census). It is implemented with a powerful second order Total Generalized Variation regularization with external and self-occlusion reasoning. Our framework achieves state of the art performance in several cases (standard optical flow benchmarks, wide-baseline stereo and non-rigid surface registration). Our framework has a modular design that customizes to specific application needs.
6 0.67616522 399 iccv-2013-Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
7 0.66173989 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections
8 0.65035188 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
9 0.64747399 23 iccv-2013-A New Image Quality Metric for Image Auto-denoising
10 0.63570005 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
11 0.62765241 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
12 0.6092397 27 iccv-2013-A Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration
13 0.60630113 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
14 0.6055519 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
15 0.60432512 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
16 0.60258698 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
17 0.6024211 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
18 0.59935308 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
19 0.59818661 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
20 0.59605753 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning