iccv iccv2013 iccv2013-298 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
Reference: text
sentIndex sentText sentNum sentScore
1 Online Robust Non-negative Dictionary Learning for Visual Tracking Naiyan Wang† Jingdong Wang‡ Dit-Yan Yeung† † Hong Kong University of Science and Technology ‡ Microsoft Research win sty@ gmai l com . [sent-1, score-0.031]
2 hk Abstract This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. [sent-5, score-1.336]
3 In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. [sent-6, score-0.929]
4 Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. [sent-7, score-0.171]
5 In addition, we propose a new particle representation formulation using the Huber loss function. [sent-8, score-0.51]
6 The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. [sent-9, score-0.605]
7 We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. [sent-10, score-0.121]
8 The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. [sent-11, score-0.634]
9 Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable. [sent-12, score-0.333]
10 Introduction Visual tracking or object tracking in video sequences is a major topic in computer vision and related fields. [sent-14, score-0.595]
11 A typical setting of the problem is that an object identified, either manually or automatically, in the first frame of a video sequence is tracked in the subsequent frames by estimating its trajectory as it moves around. [sent-16, score-0.252]
12 While the problem is easy to state, it is often challenging to build a robust object tracker due to various factors which include noise, occlusion, fast and abrupt object motion, illumination changes, and variations in pose and scale. [sent-17, score-0.488]
13 The focus of this paper is on the widely-studied single object tracking problem. [sent-18, score-0.275]
14 1 tracker (L1T) [18] for robust visual tracking under the particle filter framework [5] based on the sparse coding technique [22]. [sent-20, score-1.317]
15 Learned templates for two video sequences: davidin and bolt. [sent-23, score-0.451]
16 For each sequence, the learned templates cover different appearances of the tracked object in the video while rejecting the cluttered background. [sent-24, score-0.616]
17 L1T describes the tracking target using basis vectors which consist of object templates and trivial templates, and reconstructs each candidate (particle) by a sparse linear combination of them. [sent-25, score-0.861]
18 While object templates correspond to the normal appearance of objects, trivial templates are used to handle noise or occlusion. [sent-26, score-0.827]
19 Specifically, each trivial template has only one nonzero element being one for a specific feature. [sent-27, score-0.205]
20 It is shown in [18] that a good target candidate should involve fewer trivial templates while keeping the reconstruction error low. [sent-29, score-0.478]
21 In this paper, we present an online robust non-negative dictionary learning algorithm for updating the object templates. [sent-32, score-0.388]
22 The learned templates for two video sequences are shown in Fig. [sent-33, score-0.393]
23 We devise a novel online projected gradient descent method to solve the dictionary learning problem. [sent-35, score-0.245]
24 In contrast to the ad hoc manner by replacing the least used template with the current tracking result as in [18, 30], our algorithm blends the past information and the current tracking result in a principled way. [sent-36, score-0.638]
25 It can automatically detect and reject the occlusion and cluttered background, yielding robust object templates. [sent-37, score-0.233]
26 Besides, we formulate the particle representation problem using the Huber loss function [10]. [sent-38, score-0.541]
27 This formulation can yield robust estimation without using trivial templates and thus lead to significant reduction of the computational cost. [sent-39, score-0.535]
28 Related Work Object tracking is an extensively studied research topic. [sent-42, score-0.221]
29 For a comprehensive survey of this topic, we refer readers 657 to the survey paper [26] and a recent benchmark [24]. [sent-43, score-0.14]
30 Here we only review some representative works which are categorized into two approaches for building object trackers, namely, generative and discriminative methods. [sent-44, score-0.166]
31 Generative trackers usually learn an appearance model to represent the object being tracked and make decision based on the reconstruction error. [sent-45, score-0.456]
32 Incremental visual tracking (IVT) [20] is a recent method which learns the dynamic appearance of the tracked object via incremental principal component analysis (PCA). [sent-46, score-0.439]
33 Visual tracking decomposition (VTD) [15] decomposes the tracking problem into several basic motion and observation models and extends the conventional particle filter framework to allow different basic models to interact. [sent-47, score-1.04]
34 The method that is most closely related to our paper is L1T [18] which, as said above, assumes that the tracked object can be represented well by a sparse linear combination of object templates and trivial templates. [sent-48, score-0.809]
35 Its drawback of having high computational cost has been alleviated by subsequent works [19, 4] to improve the tracking speed. [sent-49, score-0.258]
36 found that consider- ing the underlying relationships between sampled particles could greatly improve the tracking performance and proposed the multitask tracker (MTT) [30] and low rank sparse tracker (LRST) [29]. [sent-51, score-1.213]
37 proposed using alignment pooling in the sparse image representation to alleviate the drifting problem. [sent-53, score-0.129]
38 For a survey of sparse coding based trackers, we refer readers to [28]. [sent-54, score-0.21]
39 Unlike the generative approach, discriminative trackers formulate object tracking as a binary classification problem which considers the tracked object and the background as belonging to two different classes. [sent-55, score-0.841]
40 One example is the online AdaBoost (OAB) tracker [7] which uses online AdaBoost to select features for tracking. [sent-56, score-0.531]
41 The multiple instance learning (MIL) tracker [3] formulates object tracking as an online multiple instance learning problem which assumes that the samples belong to the positive or negative bags. [sent-57, score-0.707]
42 The P-N tracker [12] utilizes structured unlabeled data and uses an online semi-supervised learning algorithm. [sent-58, score-0.468]
43 A subsequent method called Tracking-Learning-Detection (TLD) [13] augments it by a detection phase, which has the advantage of recovering from failure even after the tracker has failed for an extended period of time. [sent-59, score-0.404]
44 The compressive tracker (CT) [27] utilizes a random sparse compressive matrix to perform efficient dimensionality reduction on the integral image. [sent-62, score-0.545]
45 Generally speaking, when there is less variability in the tracked object, generative trackers tend to yield more accurate results than discriminative trackers because generative methods typically use richer features. [sent-64, score-0.872]
46 However, in more complicated environments, discriminative trackers are often more robust than generative trackers because discriminative trackers use negative samples to avoid the drifting problem. [sent-65, score-0.998]
47 Besides object trackers, some other techniques related to our proposed method are (online) dictionary learning and (robust) non-negative matrix factorization (NMF). [sent-67, score-0.2]
48 Dictionary learning seeks to learn from data a dictionary which is an adaptive set of basis vectors or atoms, so that each data sample is represented by a sparse linear combination of the basis vectors. [sent-68, score-0.292]
49 , based on Gabor filters or discrete cosine transform) for many vision applications such as denoising [1] and image classification [25]. [sent-71, score-0.059]
50 Most dictionary learning methods are based on K-SVD [1] or online dictionary learning [17]. [sent-72, score-0.391]
51 There are also some NMF variants, such as sparse NMF [9] and robust NMF [14, 6]. [sent-75, score-0.117]
52 Online learning of basis vectors under the robust setting has also aroused a lot of interest, e. [sent-76, score-0.085]
53 Background To facilitate the presentation of our model in the next section, we first briefly review in this section the particle filter approach for visual tracking and the ? [sent-80, score-0.852]
54 Particle Filters for Visual Tracking The particle filter approach [5], also known as a sequential Monte Carlo (SMC) method for importance sampling, is commonly used for visual tracking. [sent-84, score-0.598]
55 Like a Kalman filter, a particle filter sequentially estimates the latent state variables of a dynamical system based on a sequence of observations. [sent-85, score-0.696]
56 The main difference is that, unlike a Kalman filter, the latent state variables are not restricted to the Gaussian distribution, not even distribution of any parametric form. [sent-86, score-0.098]
57 Let st and yt denote the latent state and observation, respectively, at time t. [sent-87, score-0.185]
58 A particle filter approximates the true posterior state distribution p(st | y1:t) by a set of samples {sti}in=1 (a. [sent-88, score-0.655]
59 particles) with corresponding weights m{pwlite}sin {=s1 }which sum ptoa 1rti. [sent-91, score-0.08]
60 Fleosr) t whei hsta cteor rterasnpsointidonin probability wq(s}t+1 | s1:t, y1:t), it is often assumed to follow a first-order Mark|o sv process so that it can be simplified to q(st+1 | st). [sent-92, score-0.034]
61 In this case, the weights are updated as wit+1 = wit |p (syt | sit). [sent-93, score-0.091]
62 In case the sum of weights of the particles before nor|m salization is less than a prespecified threshold, resampling is needed by drawing n particles from the current particle set in proportion to their weights and then resetting their weights to 1/n. [sent-94, score-1.141]
63 In the context of object tracking, the state si is often 658 characterized by six affine parameters which correspond to translation, scale, aspect ratio, rotation and skewness. [sent-95, score-0.111]
64 |T hse tracking result at each time step is taken to be the particle with the largest weight. [sent-97, score-0.723]
65 A key issue in the particle filter approach is to formulate the observation likelihood p(yt | sit). [sent-98, score-0.629]
66 In general, it should reflect the similarity of a particle sand the object templates while being robust against occlusion or appearance changes. [sent-99, score-1.002]
67 The particle filter framework is popularly used for visual tracking due partly to its simplicity and effectiveness. [sent-102, score-0.85]
68 First, as said before, this approach is more general than using Kalman filters because it is not restricted to the Gaussian distribution. [sent-103, score-0.112]
69 Also, the accuracy of the approximation generally increases with the number of particles used. [sent-104, score-0.226]
70 Moreover, instead of using point estimation which may lead to overfitting, the probability distribution of the latent state variables is approximated by a set of particles, making it possible for the tracker to recover from failure. [sent-105, score-0.431]
71 An excellent tutorial on using particle filters for visual tracking can be found in [2]. [sent-106, score-0.748]
72 1 Tracker (L1T) In each frame, L1T first generates candidate particles based on the particle filter framework. [sent-110, score-0.855]
73 Let Y ∈ Rm×n bdeasneodte othne t particles lweit fhil etearch fr aofm tehwe n kc. [sent-111, score-0.268]
74 We further let U ∈ Rm×r denote the object templates taincdle V. [sent-113, score-0.38]
75 W ∈e Rurtnh×err laentd U UV ∈T ∈ Rn×m denote the coefficients faonrd t hVe object templates an∈d Rtrivial templates, respectively. [sent-114, score-0.412]
76 For sparse coding of the particles, L1T solves the following optimization problem: Vm,VinT s. [sent-115, score-0.118]
77 × m which corresponds ntoo ttehse htriev iidale templates, ? [sent-135, score-0.074]
78 h particle is set to be inversely proportional to the reconstruc? [sent-145, score-0.468]
79 In each frame, the particle with the smallest reconstruction error (and hence largest weight) is chosen as the tracking result. [sent-147, score-0.72]
80 To reflect the appearance changes of an object, L1T takes an adaptive approach in updating the object templates. [sent-148, score-0.129]
81 It first maintains a weight for each template according to its usage in representing the tracking result. [sent-149, score-0.305]
82 When the current template set cannot represent the tracking result well, the template with the smallest weight will be replaced by the current tracking result. [sent-150, score-0.641]
83 Our Tracker We present our object tracker in this section. [sent-152, score-0.387]
84 The first part is robust sparse cod- ing which represents each particle using the dictionary templates by solving an optimization problem that involves the Huber loss function. [sent-154, score-1.099]
85 The second part is dictionary learning which updates the object templates over time. [sent-155, score-0.526]
86 Robust Particle Representation In terms of particle representation, we solve the following robust sparse coding problem based on the Huber loss: mVin f(V;U) =? [sent-158, score-0.633]
87 V ≥ 0, where yij is an element ofY, ui· and vj·are column vectors for the ith row of U and jth row of V, respectively, and ? [sent-166, score-0.057]
88 λ (·) denotes the Huber loss function [10] with parameter λ, (w·)hi dcehn oist edse ftihneed H as ? [sent-167, score-0.126]
wordName wordTfidf (topN-words)
[('particle', 0.468), ('tracker', 0.333), ('templates', 0.326), ('trackers', 0.271), ('particles', 0.226), ('tracking', 0.221), ('nmf', 0.175), ('dictionary', 0.146), ('huber', 0.138), ('tracked', 0.131), ('filter', 0.13), ('trivial', 0.121), ('online', 0.099), ('davidin', 0.095), ('kalman', 0.089), ('template', 0.084), ('generative', 0.079), ('sparse', 0.07), ('filters', 0.059), ('drifting', 0.059), ('state', 0.057), ('yij', 0.057), ('object', 0.054), ('compressive', 0.053), ('wit', 0.053), ('said', 0.053), ('adaboost', 0.049), ('sit', 0.049), ('survey', 0.048), ('coding', 0.048), ('robust', 0.047), ('vj', 0.046), ('st', 0.046), ('reject', 0.045), ('cluttered', 0.045), ('ui', 0.044), ('readers', 0.044), ('loss', 0.042), ('dyyeung', 0.042), ('naiyan', 0.042), ('vvt', 0.042), ('edse', 0.042), ('err', 0.042), ('fleosr', 0.042), ('ofy', 0.042), ('ptoa', 0.042), ('oist', 0.042), ('tehwe', 0.042), ('occlusion', 0.042), ('updating', 0.042), ('yt', 0.041), ('yield', 0.041), ('latent', 0.041), ('sty', 0.039), ('yfor', 0.039), ('bolt', 0.039), ('blends', 0.039), ('ttehse', 0.039), ('resetting', 0.039), ('wq', 0.039), ('principled', 0.039), ('basis', 0.038), ('weights', 0.038), ('subsequent', 0.037), ('hve', 0.037), ('prespecified', 0.037), ('jingdong', 0.037), ('sequences', 0.037), ('utilizes', 0.036), ('ntoo', 0.035), ('oab', 0.035), ('smc', 0.035), ('mvin', 0.035), ('mtt', 0.035), ('whei', 0.034), ('oaf', 0.034), ('augments', 0.034), ('hse', 0.034), ('hoc', 0.034), ('review', 0.033), ('reflect', 0.033), ('incremental', 0.033), ('faonrd', 0.032), ('ivt', 0.032), ('sand', 0.032), ('topic', 0.032), ('win', 0.031), ('struck', 0.031), ('vtd', 0.031), ('popularly', 0.031), ('smallest', 0.031), ('formulate', 0.031), ('candidate', 0.031), ('tld', 0.031), ('resampling', 0.031), ('video', 0.03), ('rm', 0.03), ('multitask', 0.03), ('haar', 0.03), ('rejecting', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
2 0.47438812 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
3 0.4074125 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
5 0.23114681 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
6 0.22983341 395 iccv-2013-Slice Sampling Particle Belief Propagation
7 0.22724855 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
8 0.19190608 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
9 0.19177595 338 iccv-2013-Randomized Ensemble Tracking
10 0.18097439 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
11 0.17606057 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
12 0.16977413 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
13 0.16516998 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
14 0.16500925 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
15 0.15169452 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
16 0.15127707 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
17 0.15093611 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
18 0.14895324 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
19 0.14149201 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
20 0.14010495 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
topicId topicWeight
[(0, 0.23), (1, -0.015), (2, -0.013), (3, 0.046), (4, -0.14), (5, -0.198), (6, -0.245), (7, 0.19), (8, -0.173), (9, 0.244), (10, -0.092), (11, -0.206), (12, 0.074), (13, 0.124), (14, 0.01), (15, -0.029), (16, 0.135), (17, 0.105), (18, -0.06), (19, -0.12), (20, -0.032), (21, 0.051), (22, -0.083), (23, -0.063), (24, -0.034), (25, 0.037), (26, 0.05), (27, 0.068), (28, 0.015), (29, 0.077), (30, 0.05), (31, 0.01), (32, -0.01), (33, 0.012), (34, 0.006), (35, -0.124), (36, -0.127), (37, -0.039), (38, -0.05), (39, -0.124), (40, -0.077), (41, -0.057), (42, 0.029), (43, -0.053), (44, 0.08), (45, 0.017), (46, -0.088), (47, -0.005), (48, 0.011), (49, 0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.97218472 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
2 0.89177686 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
3 0.89168161 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
5 0.7275849 395 iccv-2013-Slice Sampling Particle Belief Propagation
Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn
Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.
6 0.69967479 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
7 0.6174686 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
8 0.59338224 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
9 0.57786566 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
10 0.55053055 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
11 0.54492146 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
12 0.53993928 338 iccv-2013-Randomized Ensemble Tracking
13 0.52424657 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
14 0.50366205 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
15 0.45658416 87 iccv-2013-Conservation Tracking
16 0.43262127 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
17 0.42350838 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
18 0.41806263 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
19 0.41228417 128 iccv-2013-Dynamic Probabilistic Volumetric Models
20 0.39877963 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
topicId topicWeight
[(2, 0.038), (7, 0.015), (26, 0.079), (31, 0.034), (42, 0.083), (64, 0.494), (73, 0.033), (89, 0.114), (97, 0.024), (98, 0.01)]
simIndex simValue paperId paperTitle
1 0.91952449 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces
Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia
Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.
same-paper 2 0.89574981 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
3 0.87304878 88 iccv-2013-Constant Time Weighted Median Filtering for Stereo Matching and Beyond
Author: Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu
Abstract: Despite the continuous advances in local stereo matching for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for disparity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggregation. We also develop the first constant time algorithmfor the previously time-consuming weighted median filter. This makes the simple combination “box aggregation + weighted median ” an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering unleashes its potential in other applications that were hampered by high complexities. We show its superiority in various applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.
4 0.85328078 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.
5 0.82106549 166 iccv-2013-Finding Actors and Actions in Movies
Author: P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic
Abstract: We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty.
6 0.81034613 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
7 0.78509569 441 iccv-2013-Video Motion for Every Visible Point
8 0.7694841 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
9 0.71906698 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
10 0.70866138 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
11 0.67691368 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
12 0.65454036 86 iccv-2013-Concurrent Action Detection with Structural Prediction
13 0.64041823 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
14 0.63779515 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
15 0.62677079 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
16 0.62309182 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
17 0.59800994 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
18 0.57065988 338 iccv-2013-Randomized Ensemble Tracking
19 0.56139928 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
20 0.55999458 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects