iccv iccv2013 iccv2013-424 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
Reference: text
sentIndex sentText sentNum sentScore
1 Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. [sent-2, score-0.422]
2 We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. [sent-4, score-0.369]
3 In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. [sent-6, score-0.608]
4 Besides, occlusion of target objects occurs quite often in real world scenarios. [sent-14, score-0.473]
5 The lack of a reasonable size and consistently constructed benchmark for tracking has been preventing persuasive comparisons. [sent-29, score-0.376]
6 Meanwhile, great popularity of affordable depth sensors, such as Microsoft Kinect, Asus Xtion and PrimeSense, make depth acquisition very easy. [sent-30, score-0.422]
7 Reliable depth maps can provide valuable additional information to significantly improve tracking results with robust occlusion and model drift handling. [sent-31, score-0.724]
8 Will the availability of depth significantly change the design of the standard tracking pipeline? [sent-33, score-0.463]
9 To establish a unified benchmark, we construct a RGBD dataset of 100 videos, named as Princeton Tracking Benchmark (PTB), which includes deformable objects, various occlusion con233 m o vKemineecnttstationary moving net occlusionno occlusiontarget occluded in some frames speed<0. [sent-37, score-0.373]
10 To build a set of diverse baseline algorithms, we design several tracking algorithms incorporating depth information to reduce model drift, and propose a simple scheme for occlusion handling. [sent-48, score-0.761]
11 To carefully design various kinds of baseline algorithms, including traditional 2D image patch based tracker, new 3D point cloud based tracker, low-level flow based tracker, and trivial algorithms without even using the video; 4. [sent-52, score-0.392]
12 To address target appearance and motion changes, [16] uses visual tracking decomposition scheme to integrate multiple observation and motion trackers, and [ 18] presents an incremental subspace learning algorithm. [sent-62, score-0.472]
13 with only people moving, which is obviously not enough to evaluate tracking algorithms for general objects. [sent-83, score-0.329]
14 [26] evaluate 29 2D tracking algorithms on 50 RGB videos combining from different sources captured and annotated in different settings. [sent-85, score-0.369]
15 In contrast, our benchmark evaluates both RGB and RGBD tracking algorithms, together with our proposed 3D tracking algorithms, on 100 RGBD videos consistently captured and annotated by us. [sent-86, score-0.647]
16 Furthermore, to separate the effect of various assumptions, we calculate upper bounds and lower bounds for the algorithms, and categorize error into three different types to analyze occlusion handling. [sent-87, score-0.328]
17 [ 13, 29] contain 3D bounding box annotations for cuboid-like objects in RGBD images. [sent-94, score-0.537]
18 Dataset construction To construct one unified benchmark dataset for different kinds of tracking algorithms, we recorded 100 video clips with both RGB and depth data, and manually annotated ground truth bounding boxes. [sent-103, score-0.999]
19 Annotation We manually annotate the ground truth (the target location) of the dataset by drawing a bounding box on each frame as follows: A minimum bounding box covering the target is initialized on the first frame. [sent-110, score-1.729]
20 In the next frame, if the target moves or its shape changes, the bounding box will be adjusted accordingly; otherwise, it remains the same. [sent-111, score-0.793]
21 When occlusion occurs, the ground truth is defined as the minimum bounding box covering only the visible portion of the target. [sent-114, score-0.85]
22 When the target is completely occluded there will be no bounding box for this frame. [sent-115, score-0.8]
23 Tte 2D confidence map shows the combined confidence from detector and optical flow tracker. [sent-122, score-0.358]
24 The 1D depth distribution is a Gaussian estimated from target depth histogram. [sent-123, score-0.673]
25 In the output, the target location (the green bounding boxes) is the position of the highest confidence. [sent-125, score-0.544]
26 Occluder (the blue bounding box) is recognized from its depth value. [sent-126, score-0.486]
27 how long the target is occluded, whether the target moves or undergoes appearance change during occlusion, and similarity between the occluder and target. [sent-135, score-0.591]
28 Bounding box distribution over all sequences Figure 3 shows the location and size distribution of ground truth boxes across all sequences. [sent-136, score-0.56]
29 The location distribution is computed as a normalized histogram in 640 480 image space, wtedhe arse vaa nluoerm on iezaecdh pixel represents t h×e possibility for a bounding box to cover that pixel. [sent-137, score-0.657]
30 The box size distribution is also over all sequences, which shows our dataset covers long and short, wide and narrow objects. [sent-138, score-0.322]
31 Bounding box variation over time Apart from overall statistics, we also provide average bounding box statistics within a single video sequence. [sent-139, score-0.832]
32 For each sequence, we compute a histogram of relative area and aspect ratio of the ground truth bounding boxes to the one in the first frame, as well as box center distance between consecutive frames. [sent-140, score-0.855]
33 The resultant average histogram is shown in Figure 4, which illustrates how much bounding boxes may deform or shift in a single video. [sent-142, score-0.431]
34 The first one is center position error (CPE), which is the Euclidean distance between centers of output bounding boxes and the ground truth. [sent-146, score-0.494]
35 tracking results are to the ground truth in each frame. [sent-149, score-0.323]
36 How- ever, the overall performance of trackers cannot be measured by averaging this distance, especially when trackers are misled by background clutter and produce faraway outliers. [sent-150, score-0.395]
37 Besides, this distance is undefined when trackers fail to output a bounding box or there is no ground truth bounding box (the target is totally occluded). [sent-151, score-1.636]
38 01 iofth reir>wi srte, (2) where ui is an indicator denoting whether the output bounding box of the i-th frame is acceptable, N is the number of frames, and rt is the minimum overlap ratio deciding whether an output is correct. [sent-156, score-0.847]
39 Since some trackers may produce outputs that have small overlap ratio over all frames while others give large overlap on some frames and fail completely on the rest, rt must be treated as a variable to conduct a fair comparison. [sent-157, score-0.53]
40 Type II error occurs when the target is invisible but tracker outputs ? [sent-159, score-0.535]
41 The red Gaussians denote the target model, and the green Gaussian denotes the occluder model. [sent-204, score-0.371]
42 Type III error occurs when the target is visible but the tracker fails to give any output. [sent-206, score-0.52]
43 The first one adopts traditional 2D image patch tracking with additional depth features; the second one is based on 3D point cloud and outputs 3D target bounding box in space, which is a more natural way of handling 3D data. [sent-209, score-1.527]
44 Detection based tracking Tracking by detection is done by building a discriminative model of the tracking target, and using it to classify potential targets in subsequent frames. [sent-215, score-0.552]
45 Candidate with highest confidence is regarded as the tracking result. [sent-216, score-0.341]
46 HOG for depth is obtained by treating depth data as a grayscale image. [sent-218, score-0.422]
47 Point cloud feature Point cloud feature is designed to capture the color and shape of cells of 3D points. [sent-220, score-0.317]
48 In the first frame, the SVM is trained by using user’s input bounding box as the positive example and randomly picked bounding boxes that do not overlap with the target as negative examples. [sent-256, score-1.196]
49 Afterwards, the SVM is retrained during non-occlusion state using the resulting bounding box and the positive support vectors in the previ- ous frames with hard negative mining. [sent-259, score-0.574]
50 Point tracking 2D optical flow tracking The 2D optical flow tracker adopts large displacement optical flow [5] on RGB data from consecutive frames, then generates the bounding box of points validated by forward-backward checking. [sent-262, score-1.801]
51 3D iterative closest point tracking For 3D tracking, we adopt Iterative Closest Point (ICP) algorithm [3 1], which iteratively computes a rigid transformation that minimize the sum of mean square error between two set of points in real time. [sent-263, score-0.334]
52 The rigidity assumption holds in most cases, but it might fail when the target deforms and produces a large error E(R, t, s). [sent-264, score-0.318]
53 In this case, according to the small motion assumption, our tracker looks for the target near its previous position. [sent-265, score-0.41]
54 We treat the biggest connected component in the neighborhood of the previous position as the new target position, and return a 3D bounding box that encompasses it. [sent-266, score-0.757]
55 Integration of detection and point tracking Both detection and point tracking are initialized by the input bounding box in the first frame and updated online. [sent-269, score-1.311]
56 In each frame, they run independently, and after their results are available, the confidence of detection result is adjusted as: c = cd + αr(t,d) , where cd is the confidence of the detection, and r(t,d) is the overlap ratio between the detection and point tracker’s resulting bounding boxes, i. [sent-270, score-0.734]
57 After thresholding on the adjusted confidence, the bounding box with highest confidence, if exists, is passed to occlusion checking, then output as the final result when occlusion is not detected. [sent-275, score-1.028]
58 Occlusion handling To handle occlusion, some traditional RGB trackers like [15] use forward-backward error to indicate tracking failure, and some like [ 1, 3] use a fragment-based model to reduce sensitivity to partial occlusion. [sent-279, score-0.508]
59 Here we propose a simple yet effective occlusion handling mechanism which actively detects occlusion and recovery. [sent-281, score-0.505]
60 Occlusion detection Occlusion handling is based on 2D bounding boxes (3D bounding boxes are projected back to 2D space). [sent-282, score-0.876]
61 To detect the occlusion, we assume that the target is the closest object that dominates the bounding box when not occluded. [sent-283, score-0.788]
62 A new object in front of the target inside the bounding box indicates the beginning of occlusion state. [sent-284, score-1.0]
63 Therefore, depth histogram inside bounding box is expected to have a newly rising peak with a smaller depth value than target, and/or a reduction in the size of bins around the target depth, as illustrated in Figure 8. [sent-285, score-1.251]
64 In the i-th frame, the depth histogram hi of all pixels inside a bounding box can be approximated as a Gaussian distribution: hi ∼ N(μi, σi2). [sent-286, score-0.908]
65 T−he σ number of pixels in the bounding box that have smaller depth value than target depth are considered the area of the occluder. [sent-296, score-1.179]
66 The search also includes the neighborhood of the target bounding box, which has a size of 0. [sent-298, score-0.495]
67 The target depth value is updated online, so a target moving towards the camera will not be treated as an occlusion. [sent-300, score-0.651]
68 its depth and color distribution, is initialized when entering the occlusion state, and its position is updated by an optical flow tracker. [sent-303, score-0.67]
69 With depth and color distributions of target and occluder, the local search is done by performing segmentation on RGB and depth data respectively and combining their results. [sent-361, score-0.679]
70 By examining the list of possible target candidates, the tracker interprets target recovery when at least one can- didate’s score evaluated by the SVM classifier is high, and its visible area is large enough compared to the target area before entering occlusion. [sent-363, score-0.943]
71 The occlusion subroutine ends if the target is recovered from occlusion. [sent-364, score-0.431]
72 3D bounding boxes are projected back to 2D for evaluation. [sent-367, score-0.391]
73 To understand how much depth information improves the performance and evaluate the contribution of each building blocks, we tested nine variations of our proposed RGBD tracker listed in Table 1. [sent-368, score-0.401]
74 To understand the impact of model assumptions, we use the ground truth to design several performance upper bounds under different model assumptions, such as fixed box size, aspect ratio, or target being always visible (Table 2). [sent-370, score-0.659]
75 The effect of using depth data can be seen by comparing the tracker with depth input (RGBD+OF) and without (RGB+OF). [sent-376, score-0.612]
76 After enabling the occlusion handler, the tracker (RGBTable 1. [sent-379, score-0.401]
77 RGBDOccUses RGBD HOG detection and optical flow with occlusion +OF handling. [sent-385, score-0.439]
78 PC(det+flow) Uses point cloud detection and 3D point tracking with occlusion handling. [sent-388, score-0.741]
79 Performance upper-bounds (GT:ground truth) GTfirstSizeUses the GT location and first frame box size. [sent-390, score-0.395]
80 GTfirstRatioUses the GT location and first frame box aspect ratio. [sent-392, score-0.43]
81 GTbestRatio Uses the GT location and fixed box aspect ratio that optimize the successful rate. [sent-393, score-0.402]
82 Performance lower-bound algorithms IIDfirstBBAlways outputs the first frame bounding box for all frames. [sent-396, score-0.716]
83 IIDcenterBB Always outputs the box locate at center of image, with first frame box size. [sent-397, score-0.654]
84 IIDrandSizeOutputs bounding boxes with the first frame box location and a random size based on dataset statistics. [sent-398, score-0.815]
85 The point cloud based tracker (PC) also achieves at least a 5. [sent-401, score-0.375]
86 For algorithms that assume size fixed bounding boxes, GTfirstSize is the upper bound they should be compared with. [sent-410, score-0.324]
87 Our proposed baseline algorithms use very powerful but computationally expensive features, classifiers, and a stateof-the-art optical flow algorithm, while some other trackers mainly focus on real-time performance. [sent-413, score-0.44]
88 Discussion Advantage from depth From the evaluation results, trackers that utilize depth have advantages especially when the target rotates, deforms or is under occlusion. [sent-418, score-0.875]
89 When the target is partially occluded (video “face”, Figure 11 Row 3), fragment based trackers (e. [sent-421, score-0.436]
90 [3, 11]) can locate the target but sometimes proaches, which do not produce output with low confidence, often lose track of the target at this point. [sent-423, score-0.473]
91 However, from depth data, trackers are able to identify the occluder and raise the confidence in its neighboring 3D region, compensating for the confidence loss due to partial occlusion, and thus identifies the target more accurately. [sent-424, score-0.933]
92 When the occluder gradually grows inside the target bounding box, ifnot excluded, will finally dominates the bounding box (video “sign” “walking people”, Figure 11 Row 2, 4). [sent-425, score-1.246]
93 With a reliable occlusion detection mechanism, the occluder can be recognized and hence will not be output as the result or used to update models. [sent-428, score-0.443]
94 2D image patch and 3D point cloud The results above show that between the two methods that utilize depth data, the 2D image patch based tracker slightly outperforms the one based on 3D point cloud. [sent-429, score-0.631]
95 Conclusions We propose a unified tracking benchmark for both RGB and RGBD tracking, and present the evaluation of sev239 #172#186RGBDOc +#OF19 RGB+OFP#C2d7e2t+flowTLD#3C6T9MIL? [sent-432, score-0.408]
96 Output bounding boxes and their center position error (CPE). [sent-438, score-0.428]
97 The CPE is undefined when trackers fail to output a bounding box or there is no ground truth bounding box (the target is totally occluded). [sent-440, score-1.636]
98 We design a simple occlusion handling algorithm based on the depth map, and also evaluate several state-of-the-art RGB tracking algorithms. [sent-442, score-0.72]
99 The re- sults demonstrate that by incorporating depth data, trackers can achieve better performance and handle occlusion much more reliably. [sent-443, score-0.595]
100 People tracking in rgb-d data with on-line boosted target models. [sent-552, score-0.472]
wordName wordTfidf (topN-words)
[('rgbd', 0.399), ('bounding', 0.275), ('box', 0.262), ('tracking', 0.252), ('target', 0.22), ('depth', 0.211), ('occlusion', 0.211), ('tracker', 0.19), ('rgb', 0.189), ('trackers', 0.173), ('occluder', 0.151), ('cloud', 0.14), ('boxes', 0.116), ('roigi', 0.098), ('roiti', 0.098), ('optical', 0.093), ('confidence', 0.089), ('flow', 0.087), ('cpe', 0.087), ('hog', 0.085), ('frame', 0.084), ('struck', 0.081), ('benchmark', 0.075), ('null', 0.074), ('ofuses', 0.073), ('ptb', 0.073), ('rgbdocc', 0.073), ('videos', 0.068), ('ihog', 0.065), ('tld', 0.063), ('gt', 0.059), ('rt', 0.056), ('ratio', 0.056), ('mil', 0.056), ('unified', 0.053), ('cell', 0.052), ('drift', 0.05), ('type', 0.049), ('location', 0.049), ('gtbestratio', 0.049), ('gtfirstsize', 0.049), ('iidrandlocoutputs', 0.049), ('misled', 0.049), ('pcdet', 0.049), ('persuasive', 0.049), ('algorithms', 0.049), ('princeton', 0.048), ('overlap', 0.048), ('detection', 0.048), ('names', 0.048), ('handling', 0.046), ('outputs', 0.046), ('point', 0.045), ('hi', 0.044), ('spinello', 0.043), ('occluded', 0.043), ('occurs', 0.042), ('compressive', 0.041), ('bounds', 0.04), ('histogram', 0.04), ('xiao', 0.038), ('truth', 0.038), ('afterwards', 0.038), ('ellipsoids', 0.038), ('baseline', 0.038), ('error', 0.037), ('frames', 0.037), ('color', 0.037), ('mechanism', 0.037), ('undefined', 0.036), ('adjusted', 0.036), ('aspect', 0.035), ('scharstein', 0.035), ('statistics', 0.033), ('kinds', 0.033), ('output', 0.033), ('iros', 0.033), ('ground', 0.033), ('online', 0.032), ('deforms', 0.032), ('vtd', 0.032), ('inside', 0.032), ('list', 0.031), ('rankings', 0.031), ('dominates', 0.031), ('entering', 0.031), ('annotate', 0.031), ('svm', 0.031), ('distribution', 0.031), ('visible', 0.031), ('ct', 0.031), ('adopts', 0.03), ('bias', 0.03), ('stereo', 0.029), ('kinect', 0.029), ('fail', 0.029), ('dataset', 0.029), ('rate', 0.028), ('evaluation', 0.028), ('people', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
2 0.25681496 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
4 0.2334819 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
5 0.22957781 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
Author: Stefan Duffner, Christophe Garcia
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.
6 0.22724855 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
7 0.22677101 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
8 0.21715611 338 iccv-2013-Randomized Ensemble Tracking
9 0.20929651 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
10 0.20892398 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
11 0.20870884 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
12 0.19722228 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
13 0.18451858 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
14 0.18087563 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
15 0.17951934 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
16 0.17224644 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
17 0.15869473 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
18 0.15868263 101 iccv-2013-DCSH - Matching Patches in RGBD Images
19 0.15816067 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
20 0.14998682 317 iccv-2013-Piecewise Rigid Scene Flow
topicId topicWeight
[(0, 0.31), (1, -0.159), (2, 0.026), (3, 0.065), (4, 0.088), (5, -0.137), (6, -0.143), (7, 0.126), (8, -0.149), (9, 0.204), (10, -0.05), (11, -0.126), (12, 0.083), (13, 0.044), (14, 0.057), (15, -0.202), (16, 0.034), (17, -0.0), (18, -0.025), (19, -0.05), (20, -0.083), (21, 0.108), (22, -0.061), (23, -0.007), (24, 0.058), (25, 0.016), (26, -0.042), (27, 0.021), (28, -0.04), (29, -0.001), (30, -0.034), (31, -0.056), (32, -0.079), (33, 0.047), (34, -0.088), (35, -0.013), (36, 0.066), (37, 0.012), (38, -0.033), (39, 0.109), (40, -0.018), (41, -0.082), (42, 0.052), (43, 0.003), (44, 0.086), (45, 0.047), (46, 0.136), (47, 0.005), (48, -0.021), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.98488909 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
3 0.71805531 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
Author: Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
Abstract: Representation is a fundamental problem in object tracking. Conventional methods track the target by describing its local or global appearance. In this paper we present that, besides the two paradigms, the composition of local region histograms can also provide diverse and important object cues. We use cells to extract local appearance, and construct complex cells to integrate the information from cells. With different spatial arrangements of cells, complex cells can explore various contextual information at multiple scales, which is important to improve the tracking performance. We also develop a novel template-matching algorithm for object tracking, where the template is composed of temporal varying cells and has two layers to capture the target and background appearance respectively. An adaptive weight is associated with each complex cell to cope with occlusion as well as appearance variation. A fusion weight is associated with each complex cell type to preserve the global distinctiveness. Our algorithm is evaluated on 25 challenging sequences, and the results not only confirm the contribution of each component in our tracking system, but also outperform other competing trackers.
4 0.71115249 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
5 0.70330459 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
Author: Stefan Duffner, Christophe Garcia
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.
6 0.70054686 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
7 0.69773716 338 iccv-2013-Randomized Ensemble Tracking
8 0.6949386 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
9 0.69472659 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
10 0.65790147 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
11 0.65014231 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
12 0.63643175 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
13 0.61427051 87 iccv-2013-Conservation Tracking
14 0.60724461 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
15 0.60495204 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
16 0.59690648 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
17 0.58166939 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
18 0.57213986 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
19 0.57002807 395 iccv-2013-Slice Sampling Particle Belief Propagation
20 0.55839765 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
topicId topicWeight
[(2, 0.056), (7, 0.021), (12, 0.017), (26, 0.06), (31, 0.039), (35, 0.015), (40, 0.02), (42, 0.091), (45, 0.117), (64, 0.174), (73, 0.057), (78, 0.011), (89, 0.21), (97, 0.012), (98, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.92225349 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
2 0.91473544 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
Author: Kuan-Chuan Peng, Tsuhan Chen
Abstract: Most sky models only describe the cloudiness ofthe overall sky by a single category or parameter such as sky index, which does not account for the distribution of the clouds across the sky. To capture variable cloudiness, we extend the concept of sky index to a random field indicating the level of cloudiness of each sky pixel in our proposed sky representation based on the Igawa sky model. We formulate the problem of solving the sky index of every sky pixel as a labeling problem, where an approximate solution can be efficiently found. Experimental results show that our proposed sky model has better expressiveness, stability with respect to variation in camera parameters, and geo-location estimation in outdoor images compared to the uniform sky index model. Potential applications of our proposed sky model include sky image rendering, where sky images can be generated with an arbitrary cloud distribution at any time and any location, previously impossible with traditional sky models.
3 0.91443777 441 iccv-2013-Video Motion for Every Visible Point
Author: Susanna Ricco, Carlo Tomasi
Abstract: Dense motion of image points over many video frames can provide important information about the world. However, occlusions and drift make it impossible to compute long motionpaths by merely concatenating opticalflow vectors between consecutive frames. Instead, we solve for entire paths directly, and flag the frames in which each is visible. As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. Unlike earlier methods, we allow paths to be anchored in any frame. By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. We achieve state-of-the-art results on real sequences including both rigid and non-rigid motions with significant occlusions.
4 0.9130981 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
5 0.91228008 88 iccv-2013-Constant Time Weighted Median Filtering for Stereo Matching and Beyond
Author: Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu
Abstract: Despite the continuous advances in local stereo matching for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for disparity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggregation. We also develop the first constant time algorithmfor the previously time-consuming weighted median filter. This makes the simple combination “box aggregation + weighted median ” an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering unleashes its potential in other applications that were hampered by high complexities. We show its superiority in various applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.
6 0.91192162 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
7 0.91080642 166 iccv-2013-Finding Actors and Actions in Movies
8 0.90365863 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
9 0.90125918 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
10 0.89740288 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
11 0.89573705 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
12 0.89217848 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
13 0.89186209 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
14 0.88763106 86 iccv-2013-Concurrent Action Detection with Structural Prediction
15 0.88471925 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
16 0.8705796 338 iccv-2013-Randomized Ensemble Tracking
17 0.87057084 117 iccv-2013-Discovering Details and Scene Structure with Hierarchical Iconoid Shift
18 0.86929822 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
19 0.86735392 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces
20 0.86482412 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking