Author: Yi Wu, Jongwoo Lim, Ming-Hsuan Yang
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
1 edu Abstract Object tracking is one of the most important components in numerous applications of computer vision. [sent-5, score-0.439]
2 By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field. [sent-9, score-0.438]
3 Introduction Object tracking is one of the most important components in a wide range of applications in computer vision, such as surveillance, human computer interaction, and medical imaging [60, 12]. [sent-11, score-0.397]
4 , position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames. [sent-14, score-0.627]
5 Although object tracking has been studied for several decades, and much progress has been made in recent years [28, 16, 47, 5, 40, 26, 19], it remains a very challenging problem. [sent-15, score-0.424]
6 Numerous factors affect the performance of a tracking algorithm, such as illumination variation, occlusion, as well as background clutters, and there exists no single tracking approach that can successfully handle all scenarios. [sent-16, score-0.821]
7 Therefore, it is crucial to evaluate the performance of state-of-the-art trackers to demonstrate their strength and weakness and help identify future research directions in this field for designing more robust algorithms. [sent-17, score-0.51]
8 There exist several datasets for visual tracking in the surveillance scenarios, such as the VIVID [13], CAVIAR [21], and PETS databases. [sent-19, score-0.464]
9 Although some tracking datasets [47, 5, 33] for generic scenes are annotated with bounding box, most of them are not. [sent-21, score-0.524]
10 For sequences without labeled ground truth, it is difficult to evaluate tracking algorithms as the reported results are based on inconsistently annotated object locations. [sent-22, score-0.537]
11 Recently, more tracking source codes have been made publicly available, e. [sent-23, score-0.397]
12 However, the input and output formats of most trackers are different and thus it is inconve- nient for large scale performance evaluation. [sent-26, score-0.526]
13 In this work, we build a code library that includes most publicly available trackers and a test dataset with ground-truth annotations to facilitate the evaluation task. [sent-27, score-0.594]
14 Additionally each sequence in the dataset is annotated with attributes that often affect tracking performance, such as occlusion, fast motion, and illumination variation. [sent-28, score-0.554]
15 One common issue in assessing tracking algorithms is that the results are reported based on just a few sequences with different initial conditions or parameters. [sent-29, score-0.479]
16 We use the precision plots based on location error metric and the success plots based on the overlap metric, to analyze the performance of each algorithm. [sent-34, score-0.562]
17 We build a tracking dataset with 50 fully annotated sequences to facilitate tracking evaluation. [sent-36, score-0.969]
18 We integrate most publicly available trackers in our code library with uniform input and output formats to facilitate large scale performance evaluation. [sent-38, score-0.635]
19 The initial bounding boxes for tracking are sampled spatially and temporally to evaluate the robustness and characteristics of trackers. [sent-41, score-0.548]
20 This work mainly focuses on the online1 tracking of single target. [sent-43, score-0.397]
21 The code library, annotated dataset and all the tracking results are available on the website http://visualtracking. [sent-44, score-0.493]
22 Related Work In this section, we review recent algorithms for object tracking in terms of several main modules: target representation scheme, search mechanism, and model update. [sent-47, score-0.494]
23 In addition, some methods have been proposed that build on combing some trackers or mining context information. [sent-48, score-0.438]
24 Object representation is one of major components in any visual tracker and numerous schemes have been presented [35]. [sent-50, score-0.332]
25 Since the pioneering work of Lucas and Kanade [37, 8], holistic templates (raw intensity values) have been widely used for tracking [25, 39, 2]. [sent-51, score-0.489]
26 Subsequently, subspace-based tracking approaches [11, 47] have been proposed to better account for appearance changes. [sent-52, score-0.438]
27 Furthermore, Mei and Ling [40] proposed a tracking approach based on sparse representation to handle the corrupted appearance and recently it has been further improved [41, 57, 64, 10, 55, 42]. [sent-53, score-0.474]
28 In addition to template, many other visual features have been adopted in tracking algorithms, such as color histograms [16], histograms of oriented gradients (HOG) [17, 52], covariance region descriptor [53, 46, 56] and Haar-like features [54, 22]. [sent-54, score-0.494]
29 Recently, the discriminative model has been widely adopted in tracking [15, 4], where a binary classifier is learned online to discriminate the target from the background. [sent-55, score-0.638]
30 Numerous learning methods have been adapted to the tracking problem, such as SVM [3], structured output SVM [26], ranking SVM [7], boosting [4, 22], semiboosting [23] and multi-instance boosting [5]. [sent-56, score-0.502]
31 To make trackers more robust to pose variation and partial occlusion, an object can be represented by parts where each one is represented by descriptors or histograms. [sent-57, score-0.479]
32 When the tracking problem is posed within an optimization framework, assuming the objective function is differentiable with respect to the motion parameters, gradient descent methods can be used to locate the target efficiently [37, 16, 20, 49]. [sent-63, score-0.494]
33 However, these objective functions are 1Here, the word online means during tracking only the information of previous few frames is used for inference at any time instance. [sent-64, score-0.539]
34 [39] address the template update problem for the Lucas-Kanade algorithm [37] where the template is updated with the combination of the fixed reference template extracted from the first frame and the result from the most recent frame. [sent-71, score-0.29]
35 Effective update algorithms have also been proposed via online mixture model [29], online boosting [22], and incremental subspace update [47]. [sent-72, score-0.329]
36 To improve the tracking performance, some tracker fusion methods have been proposed recently. [sent-79, score-0.66]
37 [48] proposed an approach that combines static, moderately adaptive and highly adaptive trackers to account for appearance changes. [sent-81, score-0.545]
38 Even multiple trackers [34] or multiple feature sets [61] are maintained and selected in a Bayesian framework to better account for appearance changes. [sent-82, score-0.479]
39 Evaluated Algorithms and Datasets For fair evaluation, we test the tracking algorithms whose original source or binary codes are publicly available as all implementations inevitably involve technical details and specific parameter settings2. [sent-84, score-0.427]
40 Table 1shows the list of the evaluated tracking algorithms. [sent-85, score-0.397]
41 We also evaluate the trackers in the VIVID testbed [13] including the mean shift (MS-V), template matching (TM-V), ratio shift (RS-V) and peak difference (PD-V) methods. [sent-86, score-0.543]
42 There exist some datasets for the tracking in the surveillance scenario, such as the VIVID [13] and CAVIAR [21] datasets. [sent-88, score-0.437]
43 To facilitate fair performance evaluation, we have collected and annotated most commonly used tracking sequences. [sent-97, score-0.52]
44 Figure 1shows the first frame of each sequence where the target object is initialized with a bounding box. [sent-98, score-0.278]
45 Evaluating trackers is difficult because many factors can affect the tracking performance. [sent-100, score-0.835]
46 In addition, we evaluate the robustness of tracking algorithms in two aspects. [sent-115, score-0.45]
47 One widely used evaluation metric on tracking precision is the center location error, which is defined as the average Euclidean distance between the center locations of the tracked targets and the manually labeled ground truths. [sent-117, score-0.53]
48 However, when the tracker loses the target, the output location can be random and the average error value may not measure the tracking performance correctly [6]. [sent-119, score-0.66]
49 Recently the precision plot [6, 27] has been adopted to measure the overall tracking performance. [sent-120, score-0.529]
50 As the representative precision score for each tracker we use the score for the threshold = 20 pixels [6]. [sent-122, score-0.35]
51 The sequences are ordered based on our ranking results (See supplementary material): the ones on the top left are more difficult for tracking than the ones on the bottom right. [sent-128, score-0.556]
52 5) for tracker evaluation may not be fair or representative. [sent-136, score-0.34]
53 Instead we use the area under curve (AUC) of each success plot to rank the tracking algorithms. [sent-137, score-0.536]
54 The conventional way to evaluate trackers is to run them throughout a test sequence with initialization from the ground truth position in the first frame and report the average precision or success rate. [sent-139, score-0.745]
55 However a tracker may be sensitive to the initialization, and its performance with different initialization at a different start frame may become much worse or better. [sent-141, score-0.377]
56 The proposed test scenarios happen a lot in the realworld applications as a tracker is often initialized by an object detector, which is likely to introduce initialization errors in terms of position and scale. [sent-148, score-0.379]
57 In addition, an object detector may be used to re-initialize a tracker at different time instances. [sent-149, score-0.263]
58 By investigating a tracker’s characteristic in the robustness evaluation, more thorough understanding and analysis of the tracking algorithm can be carried out. [sent-150, score-0.45]
59 Given one initial frame together with the ground-truth bounding box of target, one tracker is initialized and runs to the end of the sequence, i. [sent-152, score-0.455]
60 The tracker is evaluated on each segment, and the overall statistics are tallied. [sent-155, score-0.263]
61 Table 1 lists the average FPS of each tracker in OPE running on a PC with Intel i7 3770 CPU (3. [sent-167, score-0.263]
62 For SRE, each tracker is evaluated 12 times on each sequence, where more than 350,000 bounding box results are generated. [sent-171, score-0.381]
63 For TRE, each sequence is partitioned into 20 segments and thus each tracker is performed on around 3 10,000 frames. [sent-172, score-0.301]
64 Overall Performance The overall performance for all the trackers is summarized by the success and precision plots as shown in Fig222444111422 are presented for clarity and complete plots are in the supplementary material (best viewed on high-resolution display). [sent-177, score-1.009]
65 For success plots, we use AUC scores to summarize and rank the trackers, while for precision plots we use the results at error threshold of 20 for ranking. [sent-179, score-0.356]
66 In the precision plots, the rankings of some trackers are slightly different from the rankings in the success plots in that they are based on different metrics which measure different characteristics of trackers. [sent-180, score-0.847]
67 Because the AUC score of success plot measures the overall performance which is more accurate than the score at one threshold of the plot, in the following we mainly analyze the rankings based on success plots but use the precision plots as auxiliary. [sent-181, score-0.744]
68 As the trackers tend to perform well in shorter sequences, the average of all the results in TRE tend to be higher. [sent-183, score-0.438]
69 The initialization errors tend to cause trackers to update with imprecise appearance information, thereby causing gradual drifts. [sent-185, score-0.601]
70 In the success plots, the top ranked tracker SCM in OPE outperforms Struck by 2. [sent-186, score-0.364]
71 The success plots of Struck in TRE and SRE show that the success rate of Struck is higher than SCM and ALSA when the overlap threshold is small, but less than SCM and ALSA when the overlap threshold is large. [sent-192, score-0.502]
72 These trackers perform well in SRE and TRE, which suggests sparse representations are effective models to account for appearance change (e. [sent-195, score-0.515]
73 The AUC score of ASLA deceases less than the other top 5 trackers from OPE to SRE and the ranking of ASLA also increases. [sent-200, score-0.477]
74 The VTD and VTS methods adopt mixture models to improve the tracking performance. [sent-203, score-0.397]
75 Attribute-based Performance Analysis By annotating the attributes of each sequence, we construct subsets with different dominant attributes which facilitates analyzing the performance of trackers for each challenging factor. [sent-208, score-0.586]
76 Due to space limitations, we only illustrate and analyze the success plots and precision plots of SRE for attributes OCC, SV, and FM as shown in Figure 4, and more results are presented in the supplementary material. [sent-209, score-0.628]
77 When an object moves fast, dense sampling based trackers (e. [sent-210, score-0.438]
78 However, the stochastic search based trackers with high overall performance (e. [sent-214, score-0.438]
79 These trackers can be further improved with dynam- × ic models with more effective particle filters. [sent-218, score-0.493]
80 The results show that trackers with affine motion models (e. [sent-222, score-0.438]
81 Initialization with Different Scale It has been known that trackers are often sensitive to initialization variations. [sent-227, score-0.516]
82 Figure 5 and Figure 6 show the summarized tracking performance with initialization at different scales. [sent-228, score-0.475]
83 When computing the overlap score, we rescale the tracking results so that the performance summary could be comparable with the original scale, i. [sent-229, score-0.43]
84 Figure 6 illustrates the average performance of all trackers for each scale which shows the performance often decreases significantly when the scale factor is large (e. [sent-232, score-0.516]
85 This indicates these trackers are more sensitive to background clutters. [sent-240, score-0.465]
86 Some trackers perform better when the scale factor is smaller, such as L1APG, MTT, LOT and CPF. [sent-241, score-0.477]
87 On the other hand, some trackers perform well or even better when the initial bounding box is enlarged, such as Struck, OAB, SemiT, and BSBT. [sent-243, score-0.556]
88 Concluding Remarks In this paper, we carry out large scale experiments to evaluate the performance of recent online tracking algorithms. [sent-247, score-0.54]
89 Based on our evaluation results and observations, we highlight some tracking components which are essential for improving tracking performance. [sent-248, score-0.841]
90 , Struck), or serving as the tracking context explicitly (e. [sent-252, score-0.397]
91 Second, local models are important for tracking as shown in the performance improvement of local sparse representation (e. [sent-255, score-0.433]
92 However, most of our evaluated trackers do not focus on this component. [sent-262, score-0.438]
93 Good location prediction based on the dynamic model could reduce the search range and thus improve the tracking efficiency and robustness. [sent-263, score-0.397]
94 The evaluation results show that significant progress in the field of object tracking has been made in the last decade. [sent-265, score-0.471]
95 We propose and demonstrate evaluation metrics for in-depth analysis of tracking algorithms from several perspectives. [sent-266, score-0.444]
96 This large scale performance evaluation facilitates better understanding of the state-of-the-art online object tracking approaches, and provides a platform for gauging new algorithms. [sent-267, score-0.587]
97 Only the top 10 trackers are presented for clarity and complete plots are in the supplementary material (best viewed on high-resolution display). [sent-296, score-0.686]
98 figure, the top 10 trackers are presented for clarity and complete plots are in the supplementary material. [sent-297, score-0.686]
99 Performance summary for the trackers initialized with different size of bounding box. [sent-299, score-0.545]
100 AVG (the last one) illustrates the average performance over all trackers for each scale. [sent-300, score-0.438]
