cvpr cvpr2013 cvpr2013-386 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: unkown-author
Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.
Reference: text
sentIndex sentText sentNum sentScore
1 Self-paced learning for long-term tracking James Steven Supan ˇci cˇ III Deva Ramanan Dept. [sent-1, score-0.415]
2 We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. [sent-7, score-0.57]
3 We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. [sent-8, score-0.526]
4 We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i. [sent-10, score-0.539]
5 Introduction Object tracking is a fundamental task in video processing. [sent-15, score-0.336]
6 Following much past work, we consider the scenario where one must track an unknown object, given a known bounding box in a single frame. [sent-16, score-0.266]
7 Our main contribution is an algorithm that minimizes drift by carefully choosing which frames from which to learn, using the framework of self-paced learning [2, 3]. [sent-23, score-0.395]
8 The second observation is the importance of detection, frames from which to extract additional training data as it progresses (shown in red). [sent-24, score-0.31]
9 We use such frames to define both positive examples and a very-large set of negative examples (all windows that do not overlap each positive). [sent-25, score-0.416]
10 We show that it is crucial to revisit old frames when adding training data; in terms of self-paced learning, a concept (frame) that initially looks hard may become easy in hindsight. [sent-27, score-0.618]
11 Self-paced learning: Curriculum learning is an approach inspired by the teaching of students, where easy concepts (say, a model learned from un-occluded frames) are taught before complex ones (a model learned from frames with partial occlusions) [2]. [sent-33, score-0.512]
12 One natural application of such a strategy would 222333777977 label frames as easy or complex as they are encountered by an online tracker. [sent-35, score-0.501]
13 We show that it is crucial to revisit old frames when learning. [sent-37, score-0.449]
14 In terms of self-paced learning, a student might initially think a concept is hard; however, once that student learns other concepts, it may become easy in retrospect. [sent-38, score-0.303]
15 Transductive learning: A unique aspect of our learning problem is that it is transductive rather than inductive [5]: when tracking a face, the learned model need not generalize to all faces, but only separate the particular face and background in the video. [sent-39, score-0.683]
16 Following [5], we use a transductive strategy for selecting frames; rather than choosing a frame that scores well under the current model (as most prior work does), we choose a frame that when selected for learning, produces a model that well-separates the object from the background. [sent-41, score-0.621]
17 Part of our contribution is a baseline detector that tracks by detection without any online learning or temporal reasoning; the detector is learned from the first labeled frame. [sent-44, score-0.532]
18 We believe this disparity exists because detection is an undervalued aspect of tracking; invariant gradient descriptors and largescale negative training sets appear crucial for building good object detectors [8], but are insufficiently used in tracking. [sent-46, score-0.385]
19 Computation: Our detection-based approach learns appearance models from large training sets with hundreds of thousands of negative examples. [sent-48, score-0.243]
20 Our self-paced learning scheme requires learning putative appearance models for each candidate image. [sent-49, score-0.241]
21 To address these computational burdens, we make use of dual coordinate descent SVM solvers that can be “warm-started” from previous solutions. [sent-50, score-0.3]
22 Appearance models: Because object appearance is likely to change over time, many tracks update appearance models through color histogram tracking [10] and online adaption [11]. [sent-57, score-0.749]
23 Our work is similar to these latter approaches, though we focus on the problem of carefully choosing a subset of frames from which to learn a classifier. [sent-60, score-0.307]
24 Moreover, our experiments suggest that multiple hypotheses may not even be necessary given a good appearance model; in such cases, tracking by detection is a simple and effective inference strategy. [sent-64, score-0.496]
25 Semi-supervised [6, 7, 25] trackers tend to proceed in a greedy online fashion, not revisiting past decisions. [sent-66, score-0.45]
26 Approach Our tracker operates by iterating over three stages. [sent-71, score-0.249]
27 Third, it selects a subset of frames from which to re-learn a detector for the next iteration. [sent-74, score-0.329]
28 We use dynamic programming to maintain multiple track hypothesis over time. [sent-98, score-0.356]
29 Note that the most-likely track at frame T (in green) can be revised to an alternate track hypothesis at a later frame S (in blue). [sent-100, score-0.841]
30 For each frame ti in Λ, we extract a single positive example at bounding box bi and extract a large set of negative examples at all other non-overlapping bounding boxes. [sent-113, score-0.416]
31 =2π(yt,yt−1) − w · φ(t,yt) (2) Local cost: The second term defines the local cost of placing the object at location yt in frame xt as the negative SVM score of w. [sent-127, score-0.429]
32 Given an initial detector w, we consider different methods for selecting frames in our SELECT stage. [sent-140, score-0.379]
33 We deem a selected frame as a true positive if the estimated track location correctly overlaps the ground-truth. [sent-141, score-0.432]
34 Frames selected based on the SVM objective value in (3) greatly outperform frames selected on SVM response. [sent-142, score-0.264]
35 We compute the best track by solving the shortest-path problem using dynamic programming [28]. [sent-145, score-0.31]
36 We also ex- perimented with an uninformative prior π(yt, yt−1); in this case, the best track is given by independently selecting the highest scoring location in each frame (“tracking by detection”). [sent-146, score-0.516]
37 Selecting good frames Our tracker operates by sequentially re-learning a model from previously-tracked frames. [sent-149, score-0.513]
38 To avoid template drift, we find it crucial to select “good” frames from which to learn. [sent-150, score-0.501]
39 Given a set of frames used for learning and an estimated track y1:N, we estimate a new set of good frames: Λ Λ ∪ (t, yt) where t = artg? [sent-151, score-0.584]
40 )) (3) where we define t ∈ Λ to refer to frames that are not in any fwrahmeree- wlocea dtieofnin pair i nΛ tΛo. [sent-155, score-0.264]
41 We generalize the above function to return a K-element set by independently finding the (K −|Λ|) frames with the smallest increase in the SVM objective, wΛr|)it fteranm as sS wEitLhE thCe sTmKa a(Λlle, syt1 i:nNcr). [sent-157, score-0.264]
42 Our approach directly follows from strategies for data selection in self-placed learning [3] and label assignment in transductive learning [5]. [sent-159, score-0.343]
43 A more standard approach may be to simply select the frame with the strongest model response w · φ(t, yt); Fig. [sent-160, score-0.247]
44 To build intuition as to why, consider tracking a face that rotates from frontal to profile. [sent-162, score-0.384]
45 Algorithm We now describe an online and an offline algorithm that make use of the previously-defined stages. [sent-166, score-0.275]
46 Online (causal) tracker Our online algorithm is outlined in Algorithm 1. [sent-169, score-0.346]
47 Intuitively, at each frame: we re-estimate the best track from the first to the current frame using the current model. [sent-170, score-0.38]
48 We then select half of the observed frames to learn/update an appearance model, which is used for the next frame. [sent-171, score-0.418]
49 A crucial aspect of our algorithm is that can select frames that were previously rejected (unlike, for example [6]). [sent-172, score-0.491]
50 However, every power-of-two frames, it learns a new model while revisiting past frames to correct mistakes made under previous models. [sent-174, score-0.437]
51 Efficiency: A naive implementation of an online algorithm would be very slow because it involves solving a shortest-path problem and learning an SVM at every timestep. [sent-176, score-0.261]
52 Moreover, the SELECT function requires learning (t) SVMs at each iteration (in order to evaluate OBJ for each possible frame to add). [sent-177, score-0.255]
53 First, we only apply these expensive operations on batches of frames that double in size (Line 8). [sent-180, score-0.264]
54 We do this by initializing the dual coordinate descent solvers of [4] with previously-computed dual variables. [sent-182, score-0.419]
55 It operates similarly to our online algorithm, but it has access to the entire set of frames in a video. [sent-198, score-0.494]
56 We iterate between tracking (over the whole video) and learning (from a select subset of frames) for a fixed number of iterations K. [sent-199, score-0.486]
57 As in curriculum learning, we found it useful to learn from the easy cases first. [sent-200, score-0.278]
58 We exponentially grow the number of selected frames such that at the last iteration, 50% of all frames are selected. [sent-201, score-0.528]
59 During each iteration, it updates the track, selects the r “easiest” new frames of training examples, and re-trains using these examples. [sent-206, score-0.31]
60 Thus, assuming all videos are of some fixed resolution, the complexity of our tracking 222333888200 algorithm isO(N)given a fixed model. [sent-211, score-0.433]
61 Repeatedly-learning SVMs: Each call to LEARN requires training a single SVM, and each call to SELECT requires training (t) SVMs, needed to evaluate the SVM objective OBJ for each possible frame to select. [sent-212, score-0.268]
62 We now show how to use the dual coordinate descent method of [4] to “warm-start” SVM training using previous solutions. [sent-214, score-0.26]
63 Warm-start: Given a model w and its dual variables αi and support vectors xi, we can quickly learn a new model and estimate the increase in OBJ due to adding an additional frame t. [sent-226, score-0.338]
64 We perform one pass of coordinate descent on examples from this new frame as follows: we run the model w on frame t and cache examples with a non-zero gradient in the dual objective (4). [sent-227, score-0.711]
65 In practice, we find that our dual QP converges after a small fixed number of coordinate descent passes over the cache, making the overall training time dominated by the single convolution. [sent-232, score-0.26]
66 Results Benchmark evaluation: We define a test suite of videos and ground truth labelings by merging the test videos of [7, 6]. [sent-234, score-0.242]
67 × Furthermore, pixel displacement is undefined for frames where the object is occluded or leaves the camera view (common in long-scale tracking). [sent-239, score-0.358]
68 On two videos, MILTrack loses track by frame 500 and so we only report accuracy over those initial 500 frames. [sent-248, score-0.38]
69 Our online algorithm slightly underperforms our offline variant, with an average F1 of 91%. [sent-250, score-0.275]
70 We point out those observations that seem inconsistent with the accepted wisdom in the tracking literature. [sent-262, score-0.336]
71 Our large negative training sets and retrospective learning greatly reduce the probability of false positives. [sent-272, score-0.335]
72 As we increase the number of latently labeled training frames (from 1 to 50%), performance generally increases. [sent-275, score-0.31]
73 We see this observation as emphasizing an under-appreciated connection between tracking and detection; it is well-known in the object detection community that large training sets of negatives are crucial for good performance [8]. [sent-279, score-0.63]
74 A single-hypothesis greedy tracker that greedily enforces the dynamic model in (2) given the best location in the previous frame improves performance to 76% (D3). [sent-283, score-0.44]
75 This suggests that multiple hypothesis tracking may not be crucial for good performance. [sent-284, score-0.565]
76 Learning: Learning is the most crucial aspect ofour system, improving performance from 76% (D5) to 91% (C6) – – for our online algorithm (and even more for our off-line). [sent-287, score-0.338]
77 We construct a restricted version of our online algorithm that does not require revisiting previous frames. [sent-288, score-0.245]
78 This suggests that its vital to edit previous tracks to produce better examples for retrospective learning. [sent-291, score-0.363]
79 Secondly, naively SELECTing all previously-seen frames for learning also significantly decreases performance to 84% (D9). [sent-292, score-0.343]
80 This suggests that selecting a good subset of reliable frames is also important. [sent-293, score-0.388]
81 † indicates a tracker was evaluated only on the initial (500) frames before it lost track. [sent-311, score-0.428]
82 Our trackers place special emphasis on long term tracking easn da can ktherus w recover farteomd osnulcyh fnai tluhree isn. [sent-312, score-0.479]
83 Conclusion We have a described a simple but effective system for tracking based on the selection of trustworthy frames for learning appearance. [sent-355, score-0.679]
84 We find the task of learning good appearance models to be crucial, as compared to say, maintaining multiple hypothesis for tracking. [sent-357, score-0.245]
85 To learn good appearance models, we find it important to use large sets of negative training examples, and to retrospectively edit and select previous frames for learning. [sent-358, score-0.756]
86 To do so in a principled and efficient manner, we use the formalism of self-paced learning and online solvers for SVMs. [sent-359, score-0.391]
87 Our tracker handles long videos with periods of occlusion/absence and large scale changes. [sent-360, score-0.261]
88 Belongie, “Robust object tracking with online multiple instance learning,” IEEE PAMI, vol. [sent-402, score-0.518]
89 El-Maraghi, “Robust online appearance models for visual tracking,” IEEE PAMI, vol. [sent-425, score-0.265]
90 Kriegman, “Visual tracking and recognition using probabilistic appearance manifolds,” CVIU, vol. [sent-440, score-0.419]
91 Ling, “Robust visual tracking and vehicle classification via sparse representation,” IEEE PAMI, vol. [sent-446, score-0.336]
92 Kulikowsk, “Robust tracking using local sparse appearance model and k-selection,” in CVPR, IEEE, 2011. [sent-454, score-0.419]
93 Leordeanu, “Online selection of discriminative tracking features,” IEEE PAMI, vol. [sent-458, score-0.336]
94 Torr, “Struck: Structured output tracking with kernels,” in ICCV, 2011. [sent-480, score-0.336]
95 Lee, “Robust visual tracking using autoregressive hidden markov model,” in CVPR, pp. [sent-498, score-0.336]
96 Tang, “Robust tracking via weakly supervised ranking svm,” in CVPR, pp. [sent-502, score-0.336]
97 Huang, “Color tracking by transductive learning,” in CVPR, vol. [sent-506, score-0.521]
98 Fitzgibbon, “Interactive feature tracking using kd trees and dynamic programming,” in CVPR, vol. [sent-518, score-0.384]
99 Shimshoni, “Robust fragmentsbased tracking using the integral histogram,” in CVPR, vol. [sent-528, score-0.336]
100 Tomasi, “Efficient visual object tracking with online nearest neighbor classifier,” ACCV, pp. [sent-541, score-0.518]
wordName wordTfidf (topN-words)
[('tracking', 0.336), ('frames', 0.264), ('track', 0.204), ('transductive', 0.185), ('online', 0.182), ('curriculum', 0.18), ('frame', 0.176), ('tracker', 0.164), ('obj', 0.148), ('retrospective', 0.144), ('trackers', 0.143), ('dual', 0.119), ('crucial', 0.109), ('mcde', 0.108), ('svm', 0.104), ('diagnostic', 0.099), ('videos', 0.097), ('panda', 0.096), ('displacement', 0.094), ('offline', 0.093), ('yt', 0.091), ('bi', 0.088), ('solvers', 0.086), ('appearance', 0.083), ('learning', 0.079), ('tld', 0.077), ('edit', 0.074), ('svms', 0.073), ('miltrack', 0.072), ('retrospectively', 0.072), ('bischof', 0.072), ('select', 0.071), ('negative', 0.066), ('student', 0.066), ('detector', 0.065), ('tracks', 0.065), ('revisiting', 0.063), ('negatives', 0.062), ('past', 0.062), ('transduction', 0.059), ('resistant', 0.059), ('cache', 0.059), ('grabner', 0.058), ('programming', 0.058), ('template', 0.057), ('prost', 0.056), ('easy', 0.055), ('qp', 0.055), ('descent', 0.054), ('pami', 0.053), ('location', 0.052), ('drift', 0.052), ('causal', 0.051), ('selecting', 0.05), ('dynamic', 0.048), ('learns', 0.048), ('operates', 0.048), ('frontal', 0.048), ('suite', 0.048), ('kalal', 0.048), ('ieee', 0.047), ('aspect', 0.047), ('kwon', 0.046), ('hypothesis', 0.046), ('training', 0.046), ('convolution', 0.046), ('uci', 0.044), ('formalism', 0.044), ('score', 0.044), ('examples', 0.043), ('ti', 0.043), ('motion', 0.043), ('learn', 0.043), ('concepts', 0.042), ('benchmark', 0.041), ('coordinate', 0.041), ('avidan', 0.041), ('saffari', 0.041), ('believe', 0.04), ('revisit', 0.04), ('thresholded', 0.04), ('detection', 0.04), ('leistner', 0.038), ('mikolajczyk', 0.038), ('matas', 0.038), ('suggests', 0.037), ('iterating', 0.037), ('boosting', 0.037), ('good', 0.037), ('old', 0.036), ('learned', 0.036), ('alternate', 0.035), ('initially', 0.035), ('transitions', 0.034), ('batch', 0.034), ('hog', 0.034), ('prior', 0.034), ('become', 0.033), ('bottleneck', 0.033), ('particle', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
Author: unkown-author
Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.
2 0.36811602 314 cvpr-2013-Online Object Tracking: A Benchmark
Author: Yi Wu, Jongwoo Lim, Ming-Hsuan Yang
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
3 0.35359335 414 cvpr-2013-Structure Preserving Object Tracking
Author: Lu Zhang, Laurens van_der_Maaten
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
4 0.31239283 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
Author: Rui Yao, Qinfeng Shi, Chunhua Shen, Yanning Zhang, Anton van_den_Hengel
Abstract: Despite many advances made in the area, deformable targets and partial occlusions continue to represent key problems in visual tracking. Structured learning has shown good results when applied to tracking whole targets, but applying this approach to a part-based target model is complicated by the need to model the relationships between parts, and to avoid lengthy initialisation processes. We thus propose a method which models the unknown parts using latent variables. In doing so we extend the online algorithm pegasos to the structured prediction case (i.e., predicting the location of the bounding boxes) with latent part variables. To better estimate the parts, and to avoid over-fitting caused by the extra model complexity/capacity introduced by theparts, wepropose a two-stage trainingprocess, based on the primal rather than the dual form. We then show that the method outperforms the state-of-the-art (linear and non-linear kernel) trackers.
5 0.28654435 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
Author: Shengfeng He, Qingxiong Yang, Rynson W.H. Lau, Jiang Wang, Ming-Hsuan Yang
Abstract: This paper presents a novel locality sensitive histogram algorithm for visual tracking. Unlike the conventional image histogram that counts the frequency of occurrences of each intensity value by adding ones to the corresponding bin, a locality sensitive histogram is computed at each pixel location and a floating-point value is added to the corresponding bin for each occurrence of an intensity value. The floating-point value declines exponentially with respect to the distance to the pixel location where the histogram is computed; thus every pixel is considered but those that are far away can be neglected due to the very small weights assigned. An efficient algorithm is proposed that enables the locality sensitive histograms to be computed in time linear in the image size and the number of bins. A robust tracking framework based on the locality sensitive histograms is proposed, which consists of two main components: a new feature for tracking that is robust to illumination changes and a novel multi-region tracking algorithm that runs in realtime even with hundreds of regions. Extensive experiments demonstrate that the proposed tracking framework outper- , forms the state-of-the-art methods in challenging scenarios, especially when the illumination changes dramatically.
6 0.20297819 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
8 0.19379491 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
9 0.18977576 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
10 0.18538584 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
11 0.18493204 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
12 0.18394245 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
13 0.17843144 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
14 0.17272736 440 cvpr-2013-Tracking People and Their Objects
15 0.16620338 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
16 0.16278298 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
17 0.15941773 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
18 0.15922172 267 cvpr-2013-Least Soft-Threshold Squares Tracking
19 0.15595782 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
20 0.15578055 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
topicId topicWeight
[(0, 0.305), (1, -0.037), (2, -0.03), (3, -0.143), (4, 0.022), (5, -0.048), (6, 0.234), (7, -0.206), (8, 0.136), (9, 0.244), (10, -0.098), (11, -0.15), (12, -0.065), (13, 0.106), (14, -0.055), (15, -0.046), (16, 0.024), (17, -0.063), (18, 0.046), (19, -0.004), (20, 0.031), (21, -0.006), (22, 0.054), (23, -0.056), (24, -0.063), (25, 0.028), (26, -0.06), (27, 0.085), (28, 0.061), (29, 0.061), (30, 0.02), (31, 0.023), (32, 0.025), (33, 0.037), (34, 0.001), (35, 0.057), (36, 0.011), (37, -0.03), (38, -0.047), (39, -0.007), (40, 0.013), (41, -0.0), (42, -0.05), (43, 0.0), (44, 0.025), (45, 0.034), (46, -0.026), (47, -0.024), (48, 0.058), (49, 0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.96797895 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
Author: unkown-author
Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.
2 0.93485576 314 cvpr-2013-Online Object Tracking: A Benchmark
Author: Yi Wu, Jongwoo Lim, Ming-Hsuan Yang
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
3 0.90247267 414 cvpr-2013-Structure Preserving Object Tracking
Author: Lu Zhang, Laurens van_der_Maaten
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
4 0.87277466 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
Author: Rui Yao, Qinfeng Shi, Chunhua Shen, Yanning Zhang, Anton van_den_Hengel
Abstract: Despite many advances made in the area, deformable targets and partial occlusions continue to represent key problems in visual tracking. Structured learning has shown good results when applied to tracking whole targets, but applying this approach to a part-based target model is complicated by the need to model the relationships between parts, and to avoid lengthy initialisation processes. We thus propose a method which models the unknown parts using latent variables. In doing so we extend the online algorithm pegasos to the structured prediction case (i.e., predicting the location of the bounding boxes) with latent part variables. To better estimate the parts, and to avoid over-fitting caused by the extra model complexity/capacity introduced by theparts, wepropose a two-stage trainingprocess, based on the primal rather than the dual form. We then show that the method outperforms the state-of-the-art (linear and non-linear kernel) trackers.
5 0.86134535 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
Author: Shengfeng He, Qingxiong Yang, Rynson W.H. Lau, Jiang Wang, Ming-Hsuan Yang
Abstract: This paper presents a novel locality sensitive histogram algorithm for visual tracking. Unlike the conventional image histogram that counts the frequency of occurrences of each intensity value by adding ones to the corresponding bin, a locality sensitive histogram is computed at each pixel location and a floating-point value is added to the corresponding bin for each occurrence of an intensity value. The floating-point value declines exponentially with respect to the distance to the pixel location where the histogram is computed; thus every pixel is considered but those that are far away can be neglected due to the very small weights assigned. An efficient algorithm is proposed that enables the locality sensitive histograms to be computed in time linear in the image size and the number of bins. A robust tracking framework based on the locality sensitive histograms is proposed, which consists of two main components: a new feature for tracking that is robust to illumination changes and a novel multi-region tracking algorithm that runs in realtime even with hundreds of regions. Extensive experiments demonstrate that the proposed tracking framework outper- , forms the state-of-the-art methods in challenging scenarios, especially when the illumination changes dramatically.
6 0.82291585 267 cvpr-2013-Least Soft-Threshold Squares Tracking
7 0.80442548 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
8 0.7822026 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
9 0.7796104 249 cvpr-2013-Learning Compact Binary Codes for Visual Tracking
10 0.76043206 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
12 0.65443099 440 cvpr-2013-Tracking People and Their Objects
13 0.61201292 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
14 0.61172134 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
15 0.60429317 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
16 0.58391106 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
17 0.58310437 224 cvpr-2013-Information Consensus for Distributed Multi-target Tracking
18 0.57468241 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
19 0.56187785 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
20 0.53536439 143 cvpr-2013-Efficient Large-Scale Structured Learning
topicId topicWeight
[(10, 0.548), (16, 0.012), (26, 0.024), (33, 0.21), (67, 0.07), (69, 0.022), (87, 0.056)]
simIndex simValue paperId paperTitle
1 0.93176043 295 cvpr-2013-Multi-image Blind Deblurring Using a Coupled Adaptive Sparse Prior
Author: Haichao Zhang, David Wipf, Yanning Zhang
Abstract: This paper presents a robust algorithm for estimating a single latent sharp image given multiple blurry and/or noisy observations. The underlying multi-image blind deconvolution problem is solved by linking all of the observations together via a Bayesian-inspired penalty function which couples the unknown latent image, blur kernels, and noise levels together in a unique way. This coupled penalty function enjoys a number of desirable properties, including a mechanism whereby the relative-concavity or shape is adapted as a function of the intrinsic quality of each blurry observation. In this way, higher quality observations may automatically contribute more to the final estimate than heavily degraded ones. The resulting algorithm, which requires no essential tuning parameters, can recover a high quality image from a set of observations containing potentially both blurry and noisy examples, without knowing a priorithe degradation type of each observation. Experimental results on both synthetic and real-world test images clearly demonstrate the efficacy of the proposed method.
2 0.91931325 307 cvpr-2013-Non-uniform Motion Deblurring for Bilayer Scenes
Author: Chandramouli Paramanand, Ambasamudram N. Rajagopalan
Abstract: We address the problem of estimating the latent image of a static bilayer scene (consisting of a foreground and a background at different depths) from motion blurred observations captured with a handheld camera. The camera motion is considered to be composed of in-plane rotations and translations. Since the blur at an image location depends both on camera motion and depth, deblurring becomes a difficult task. We initially propose a method to estimate the transformation spread function (TSF) corresponding to one of the depth layers. The estimated TSF (which reveals the camera motion during exposure) is used to segment the scene into the foreground and background layers and determine the relative depth value. The deblurred image of the scene is finally estimated within a regularization framework by accounting for blur variations due to camera motion as well as depth.
3 0.9173491 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
Author: M. Zeeshan Zia, Michael Stark, Konrad Schindler
Abstract: Despite the success of current state-of-the-art object class detectors, severe occlusion remains a major challenge. This is particularly true for more geometrically expressive 3D object class representations. While these representations have attracted renewed interest for precise object pose estimation, the focus has mostly been on rather clean datasets, where occlusion is not an issue. In this paper, we tackle the challenge of modeling occlusion in the context of a 3D geometric object class model that is capable of fine-grained, part-level 3D object reconstruction. Following the intuition that 3D modeling should facilitate occlusion reasoning, we design an explicit representation of likely geometric occlusion patterns. Robustness is achieved by pooling image evidence from of a set of fixed part detectors as well as a non-parametric representation of part configurations in the spirit of poselets. We confirm the potential of our method on cars in a newly collected data set of inner-city street scenes with varying levels of occlusion, and demonstrate superior performance in occlusion estimation and part localization, compared to baselines that are unaware of occlusions.
4 0.90974635 76 cvpr-2013-Can a Fully Unconstrained Imaging Model Be Applied Effectively to Central Cameras?
Author: Filippo Bergamasco, Andrea Albarelli, Emanuele Rodolà, Andrea Torsello
Abstract: Traditional camera models are often the result of a compromise between the ability to account for non-linearities in the image formation model and the need for a feasible number of degrees of freedom in the estimation process. These considerations led to the definition of several ad hoc models that best adapt to different imaging devices, ranging from pinhole cameras with no radial distortion to the more complex catadioptric or polydioptric optics. In this paper we dai s .unive . it ence points in the scene with their projections on the image plane [5]. Unfortunately, no real camera behaves exactly like an ideal pinhole. In fact, in most cases, at least the distortion effects introduced by the lens should be accounted for [19]. Any pinhole-based model, regardless of its level of sophistication, is geometrically unable to properly describe cameras exhibiting a frustum angle that is near or above 180 degrees. For wide-angle cameras, several different para- metric models have been proposed. Some of them try to modify the captured image in order to follow the original propose the use of an unconstrained model even in standard central camera settings dominated by the pinhole model, and introduce a novel calibration approach that can deal effectively with the huge number of free parameters associated with it, resulting in a higher precision calibration than what is possible with the standard pinhole model with correction for radial distortion. This effectively extends the use of general models to settings that traditionally have been ruled by parametric approaches out of practical considerations. The benefit of such an unconstrained model to quasipinhole central cameras is supported by an extensive experimental validation.
5 0.90467268 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation
Author: Dohyung Seo, Jeffrey Ho, Baba C. Vemuri
Abstract: In this paper, we introduce a novel framework for computing a path of diffeomorphisms between a pair of input diffeomorphisms. Direct computation of a geodesic path on the space of diffeomorphisms Diff(Ω) is difficult, and it can be attributed mainly to the infinite dimensionality of Diff(Ω). Our proposed framework, to some degree, bypasses this difficulty using the quotient map of Diff(Ω) to the quotient space Diff(M)/Diff(M)μ obtained by quotienting out the subgroup of volume-preserving diffeomorphisms Diff(M)μ. This quotient space was recently identified as the unit sphere in a Hilbert space in mathematics literature, a space with well-known geometric properties. Our framework leverages this recent result by computing the diffeomorphic path in two stages. First, we project the given diffeomorphism pair onto this sphere and then compute the geodesic path between these projected points. Sec- ond, we lift the geodesic on the sphere back to the space of diffeomerphisms, by solving a quadratic programming problem with bilinear constraints using the augmented Lagrangian technique with penalty terms. In this way, we can estimate the path of diffeomorphisms, first, staying in the space of diffeomorphisms, and second, preserving shapes/volumes in the deformed images along the path as much as possible. We have applied our framework to interpolate intermediate frames of frame-sub-sampled video sequences. In the reported experiments, our approach compares favorably with the popular Large Deformation Diffeomorphic Metric Mapping framework (LDDMM).
same-paper 6 0.89281923 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
7 0.8718226 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
8 0.87104344 3 cvpr-2013-3D R Transform on Spatio-temporal Interest Points for Action Recognition
9 0.83450246 458 cvpr-2013-Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds
10 0.83311254 198 cvpr-2013-Handling Noise in Single Image Deblurring Using Directional Filters
11 0.81283295 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
12 0.78302407 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
13 0.77201444 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation
14 0.76356679 314 cvpr-2013-Online Object Tracking: A Benchmark
15 0.75860143 131 cvpr-2013-Discriminative Non-blind Deblurring
16 0.74639326 414 cvpr-2013-Structure Preserving Object Tracking
17 0.74106085 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
18 0.73182845 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
19 0.72693408 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration