iccv iccv2013 iccv2013-425 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
Reference: text
sentIndex sentText sentNum sentScore
1 The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. [sent-12, score-0.53]
2 We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. [sent-13, score-0.233]
3 In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. [sent-14, score-0.379]
4 The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. [sent-16, score-0.278]
5 Introduction Tracking problems can involve data that is represented by multiple views 1 of various types of visual features including intensity [28], color [4], edge [14], wavelet [12] and texture. [sent-19, score-0.324]
6 Exploiting these multiple sources of information can significantly improve tracking performance as a result oftheir complementary characteristics [2][14][7][18]. [sent-20, score-0.27]
7 Sparse representation has recently been introduced for tracking [19], in which a tracking candidate is sparsely represented as a linear combination of target templates and trivial templates. [sent-24, score-0.718]
8 In particle filter-based tracking methods, particles around the current state of the target are randomly sampled according to a zero-mean Gaussian distribution. [sent-25, score-0.922]
9 In [35], learning the representation of each particle is viewed as an individual task and a multi-task learning with joint sparsity for all particles is employed. [sent-28, score-0.69]
10 However, they assume that all tasks share a common set of features, which generally does not hold in visual tracking applications, since outlier tasks often exist. [sent-29, score-0.586]
11 For example, a small number of particles sampled far away from the majority ofparticles may have little overlap with other particles and will be considered as outliers. [sent-30, score-0.652]
12 The intensity appearance model with ℓ1 minimization is very robust to partial occlusion, noise, and other tracking challenges [19]. [sent-32, score-0.413]
13 To overcome the above problems, we propose to employ other visual features such as color, edge, and texture to complement intensity in the appearance representation, and to combine a multi-view representation with a robust multitask learning [9] to solve the visual tracking problem (Figure 1). [sent-34, score-0.629]
14 Within the proposed scheme, the sparse representation for each view is learned as a linear combination of atoms from an adaptive feature dictionary, i. [sent-35, score-0.278]
15 each view has its own sparse representation instead of sharing an identical one, which enables the tracker to capture different statistics carried by different views. [sent-37, score-0.469]
16 To exploit the interdependencies shared between different views and particles, we impose the ℓ1,2-norm group-sparsity regularization on the representation matrix to learn the multi-view sparse representation jointly in a multi-task manner. [sent-38, score-0.406]
17 To handle the outlier particles from particle sampling, we decompose the sparse representation into two collaborative parts, thereby enabling them to learn representative coefficients and detect outlier tasks simultaneously. [sent-39, score-1.304]
18 Related Work An extensive review on tracking and multi-view learning is beyond the scope of this paper. [sent-44, score-0.258]
19 Numerous existing trackers only use single feature and solve tracking in various ways. [sent-47, score-0.389]
20 [5] introduce a spatial kernel to regularize the color histogram-based feature representation of the target, which enables tracking to be reformulated as a gradient-based optimization problem solved by mean-shift. [sent-49, score-0.311]
21 present a tracking method that incrementally learns a low-dimensional subspace representation based on intensity features. [sent-53, score-0.383]
22 [13] propose a new tracking paradigm that combines the classical Lucas-Kanade method-based tracker with an online learned random-forest based detector using pixelwise comparison features. [sent-55, score-0.474]
23 The learned detector is notable for enabling reacquisition following tracking failures. [sent-56, score-0.225]
24 The above trackers nevertheless tend to be vulnerable in particular scenarios due to the limitations of the adopted features. [sent-57, score-0.198]
25 Various methods aim to overcome this problem by taking advantage of multiple types of features to enable a more robust tracker [21][14] [32]. [sent-58, score-0.318]
26 propose a probabilistic framework allowing the integration of multiple features for tracking by considering cue dependencies. [sent-60, score-0.225]
27 Sparse representation was recently introduced for tracking in [19] which casts tracking as a sparse representation problem in a particle filter framework [11] which was later exploited in [15][16] [20]. [sent-63, score-0.884]
28 In [35], a multi-task learning [3] approach is applied to tracking by learning a joint sparse representation of all the particles in a particle filter framework. [sent-64, score-0.981]
29 Compared to the original L1 tracker [19] that pursues the sparse representation independently, Multi-Task Tracking (MTT) achieves more robust performance by exploiting the interdependency between particles. [sent-65, score-0.439]
30 Motivated by the above advances, in this paper, we propose a Multi-Task Multi-View Tracking (MTMVT) method based on joint sparse representation to exploit the related information shared between particles and views in order to obtain improved performance. [sent-67, score-0.627]
31 Multi-task Multi-view Sparse Tracker The L1 tracker [19] tackles tracking as finding a sparse representation in the template subspace. [sent-69, score-0.695]
32 The representation is then used in a particle filter framework for visual tracking. [sent-70, score-0.315]
33 However, appearance representation based only 650 on intensity is prone to failure in difficult scenarios such as tracking non-rigid objects. [sent-71, score-0.427]
34 Employing multiple types of features has proven to be beneficial for tracking because the ensemble of multiple views provides a comprehensive representation of the target appearance undergoing various changes such as illumination and deformation. [sent-72, score-0.724]
35 Inspired by previous works [33][35], the dependencies of these views as well as the intrinsic relationship of sampled particles should be jointly considered. [sent-74, score-0.504]
36 In this section, we propose to employ other visual features such as color, edge, and texture to complement intensity in the target appearance representation, and to combine a multi-view representation with a robust multi-task learning [9] to solve the visual tracking problem. [sent-75, score-0.687]
37 The tracking problem can then be formulated as an estimation of the state probability 푝(푦푡 ∣푥1:푡), where 푥1:푡 = {푥1 , . [sent-77, score-0.225]
38 Sparse Representation-based Tracker In [19], the sparse representation of intensity feature 푥 is formulated as the minimum error reconstruction through a regularized ℓ1 minimization problem with nonnegativity constraints m푤in ∥ 푀푤 푥 ∥22 +휆 ∥ 푤 ∥ 1 , s. [sent-85, score-0.287]
39 푤 ≽ 0 , (1) − where 푀 = [퐷, 퐼, −퐼] is an over-complete dictionary that is composed [o퐷f target te]mplate set 퐷 and positive and negative trivial template set]s 퐼 and −퐼. [sent-87, score-0.314]
40 Each column in 퐷 is a target template generated by reshaping pixels uomf a c iann 퐷did iaste a region into a column vector; and each column in the trivial template sets is a unit vector that has only one nonzero [푎⊤,푒+⊤,푒−⊤]⊤ element. [sent-88, score-0.459]
41 푤 = is composed of target coefficients 푎 and[ positive and n]egative trivial coefficients 푒+, 푒−respectively. [sent-89, score-0.376]
42 Robust Multi-task Multi-view Sparse Learning We consider 푛 particle samples, each of which has 퐾 different views (e. [sent-93, score-0.358]
43 Note that this figure demonstrates a case that includes four particles, where the second particle is an outlier whose coefficients in comprise large values. [sent-104, score-0.53]
44 , 퐾, denote 푋푘 ∈ ℝ푑푘 ×푛 as the feature matrix which is a stack of 푛 colum∈ns oℝf normalized particle image feature vectors of dimension 푑푘, where 푑푘 is the dimension for the 푘th view. [sent-108, score-0.23]
45 We denote 퐷푘 ∈ ℝ푑푘×푁 as the target dictionary in which each column i∈s a target template from the 푘th view, where 푁 is the number of target templates. [sent-109, score-0.56]
46 The target dictionary is combined with trivial templates 퐼푑푘 to construct the complete dictionary 푀푘 = [퐷푘, 퐼푑푘]. [sent-110, score-0.348]
47 Based on the fact that most of the particles are relevant and outliers often exist, we introduce a robust multitask learning scheme [9] to capture the underlying relationships shared by all tasks. [sent-111, score-0.533]
48 , 푋퐾} with 푛 particles and learn the lvaietewnt m representations {푊1 , . [sent-115, score-0.31]
49 f particles mtop hoaseved different learned representations, and therefore exploit the independency of each view and capture the different statistical properties. [sent-120, score-0.38]
50 The same columns from each view in the dictionary should be activated to represent the particle in a joint sparse manner, since the corresponding columns represent the same sample of the object. [sent-122, score-0.543]
51 Therefore, the corresponding decomposed weight matrices 푃푘s and 푄푘s from all the views can be stacked horizontally to form two bigger matrices 푃 and 푄, respectively. [sent-123, score-0.192]
52 Group lasso penalty ℓ1,2 is applied to row groups ofthe first component 푃 for capturing the shared features am∑on∑g all tasks over all views, where we define ∥푃∥ 1,2 = ∑푖 (∑푗 푃푖2,푗)1/2, and 푃푖,푗 denotes the entry ifinn teh ∥e푃 푃푖t∥h row ∑and 푗∑th column in the matrix 푃. [sent-125, score-0.18]
53 The same group lasso pen∑alty∑ ∑is imposed on column groups of the second component 푄 to identify the outlier tasks simul651 taneously. [sent-126, score-0.341]
54 The multi-view sparse representations for all particles can be obtained from the following problem 푊m,푃i,n푄21푘∑=퐾1∣∣푀푘푊푘−푋푘∣∣2퐹+휆1∣∣푃∣∣1,2+휆2∣∣푄⊤∣∣1,2, (3) where 푊푘 = 푃푘 + 푄푘, 푃 = [푃1, . [sent-127, score-0.406]
55 The coefficients associated with the zero columns will be zeros based on the sparsity constraints from ℓ1 regularization and do not impact the minimization function in terms of the solution. [sent-137, score-0.239]
56 For a more intuitive view of the proposed formulation, we visualize an empirical example of the learned sparse coefficients in Figure 4, where 푊 = [퐴⊤, 퐸⊤]⊤ consists of target coefficients 퐴 and trivial coefficients 퐸 respectively. [sent-146, score-0.642]
57 In reference to the tracking result, the observation likelihood of the tracking candidate 푖 is defined as 푝푖=Γ1exp{−훼푘∑=퐾1∥ 퐷푘퐴푖푘− 푋푖푘∥2} , (4) where 퐴푖푘 is the coefficients of the 푖th candidate corresponding to the target templates of the 푘th view. [sent-147, score-0.749]
58 The tracking result is the particle that has the maximum observation likelihood. [sent-148, score-0.491]
59 To handle appearance variations, the target dictionary 퐷 is progressively updated similar to [19], and the templates are weighted in the course of tracking. [sent-149, score-0.274]
60 Outlier Rejection Although a majority of particles will share the same dictionary basis, some outlier tasks may exist. [sent-152, score-0.642]
61 These are the particles sampled far away from the target that have little overlap with other particles. [sent-153, score-0.467]
62 The proposed MTMVT in (3) is capable of capturing the outlier tasks by introducing the coefficient matrix 푄. [sent-154, score-0.265]
63 In particular, if the sum of the ℓ1 norm of the coefficients for the corresponding 푖th particle is larger than an adaptive threshold 훾, as ∑퐾 ∑ ∣ 푄푖푘 ∣> 훾 , (5) 푘∑= ∑1 where 푄푖푘 is the 푖th column of 푄푘, then it will be identified as an outlier and its observation likelihood will be Figure3. [sent-155, score-0.642]
64 Thegre nbounding boxes denote the outlier particles and the red bounding box denotes the tracked target. [sent-157, score-0.51]
65 set to zero, and thus the outliers will be ignored in the particle resampling process. [sent-160, score-0.342]
66 By denoting the number of detected outlier tasks as 푛표, the threshold 훾 is updated as follows ⎧⎨훾 n e w = 훾 o ld 휅/, 0,푛<푛표 푛>표=푁≤0표푁 , where 휅 (6) is a s⎩caling factor, and 푁표 is a predefined threshold for the number of outliers. [sent-162, score-0.265]
67 The seventh template in the dictionary is the most representative and results in brighter values in the seventh row of 푃 across all views, brighter values which indicate the presence of outliers. [sent-175, score-0.34]
68 with five other popular trackers: L1 Tracker (L1T) [19], Multi-Task Tracking (MTT) [35], tracking with Multiple Instance Learning (MIL) [1], Incremental Learning for Visual Tracking (IVT) [26], and Visual Tracking Decomposition (VTD) [14]. [sent-221, score-0.225]
69 It should be noted that VTD is a multiview tracker which employs hue, saturation, intensity, and edge template for the features. [sent-222, score-0.351]
70 The unit-norm normalization is applied to the extracted feature vector of each particle view respectively as done in [19]. [sent-231, score-0.3]
71 5, the number of particles 푛 = 400 (the same for L1T and MTT), the number of template samples 푁 = 10. [sent-233, score-0.381]
72 The template of intensity is set to one third size of the initial target (half size for those whose shorter side is less than 20), while the color histograms, HOG, LBP are extracted in a larger region that doubles the size of the intensity template. [sent-234, score-0.436]
73 For the Animal sequence, only MIL and MTMVT succeed in tracking the target over the whole sequence, while MTT is able to track most of the frames. [sent-239, score-0.411]
74 IVT gradually drifts from the target after the second frame and totally loses the target in the seventh frame. [sent-240, score-0.398]
75 However, MTT is not as robust as MTMVT since MTMVT takes advantage of the complementary features and is capable of detecting outlier tasks. [sent-243, score-0.285]
76 By contrast, VTD and MIL lose the target and L1T tends to include much of the background area into the bounding box when undergoing significant illumination changes. [sent-245, score-0.242]
77 The experimental results show that MTMVT is able to handle the scale changes, pose changes, fast motion, occlusion, appearance variation, and angle variation problems encountered in face tracking tasks. [sent-247, score-0.269]
78 The task is to track his face under significant illumination changes and appearance variations. [sent-249, score-0.194]
79 Our tracker is more robust to the illumination changes as a result of the employment of rich feature types. [sent-250, score-0.378]
80 In Sylv and Tiger1 sequences, the tasks are to track moving dolls in indoor scenes. [sent-255, score-0.193]
81 Almost all the trackers compared can track the doll in the earlier part of the Sylv sequence. [sent-256, score-0.225]
82 The Tiger1 sequence is much harder due to the significant appearance changes, occlusion, and distracting background, so all trackers continuously lock in the background except MTMVT. [sent-258, score-0.283]
83 Our tracker faithfully tracks the tiger, and obtains the best performance. [sent-259, score-0.278]
84 In the DH sequence, L1T and IVT lose the target because of the distracting background and fast motion. [sent-262, score-0.203]
85 MIL loses the target when the illumination changes suddenly. [sent-263, score-0.263]
86 VTD succeeds in Bolt and Gym because of the benefit of multiple types of features but drifts apart from the target at end of the Skating1 sequence. [sent-267, score-0.198]
87 However, only MTMVT successfully tracks all these targets in our experiments, which indicates the proposed tracker is not as sensitive to shape deformation as previous single view trackers, due to the effective use of the complementary features. [sent-268, score-0.436]
88 As shown in Figure 6, the error plots of our tracker are generally lower than those of other trackers. [sent-293, score-0.249]
89 This implies that our tracker outperforms other trackers on the test sequences. [sent-294, score-0.413]
90 This shows that our tracker achieves the best average performance over all tested sequences. [sent-296, score-0.249]
91 Outlier Handling Performance: To illustrate improvement of the proposed outlier handling method which including the introduction of auxiliary matrix 푄 and the outlier rejection scheme presented in Section 3. [sent-297, score-0.472]
92 We implement a Multi-Task Tracker with Outlier handling (MTT+O) using the robust multi-task sparse representa- tion presented in Section 3. [sent-299, score-0.165]
93 Conclusion In this paper, we have presented a robust multi-task multi-view joint sparse learning method for particle filterbased tracking. [sent-306, score-0.399]
94 By appropriately introducing the 푙1,2 norm regularization, the method not only exploits the underlying relationship shared by different views and different particles, but also captures the frequently emerging outlier tasks which have been ignored by previous works. [sent-307, score-0.468]
95 Location Error (in pixel) plot of each tracker on eight test sequences for quantitative comparison. [sent-315, score-0.295]
96 Probabilistic color and adaptive multi-feature tracking with dynamically switched priority between cues. [sent-331, score-0.286]
97 Robust tracking using local sparse appearance model and k-selection. [sent-419, score-0.365]
98 Robust visual tracking and vehicle classification via sparse representation. [sent-436, score-0.352]
99 Efficient minimum error bounded particle resampling l1 tracker with occlusion detection. [sent-444, score-0.582]
100 Visual tracking via adaptive tracker selection with multiple features. [sent-520, score-0.503]
wordName wordTfidf (topN-words)
[('mtmvt', 0.428), ('particles', 0.31), ('mtt', 0.256), ('tracker', 0.249), ('particle', 0.23), ('tracking', 0.225), ('outlier', 0.2), ('trackers', 0.164), ('ivt', 0.146), ('vtd', 0.141), ('views', 0.128), ('target', 0.125), ('gym', 0.11), ('mil', 0.106), ('intensity', 0.104), ('coefficients', 0.1), ('apg', 0.098), ('sparse', 0.096), ('bolt', 0.088), ('ivlm', 0.08), ('template', 0.071), ('view', 0.07), ('resampling', 0.068), ('dictionary', 0.067), ('multitask', 0.067), ('tasks', 0.065), ('sylv', 0.062), ('track', 0.061), ('shaking', 0.057), ('seventh', 0.055), ('representation', 0.054), ('tpami', 0.054), ('mei', 0.054), ('kitesurf', 0.053), ('zhibin', 0.053), ('accelerated', 0.053), ('proximal', 0.051), ('trivial', 0.051), ('collaborative', 0.049), ('loses', 0.049), ('illumination', 0.048), ('dacheng', 0.047), ('column', 0.047), ('sequences', 0.046), ('brighter', 0.046), ('complementary', 0.045), ('drifts', 0.044), ('outliers', 0.044), ('appearance', 0.044), ('rejection', 0.043), ('targets', 0.043), ('lbp', 0.042), ('changes', 0.041), ('dh', 0.041), ('columns', 0.04), ('robust', 0.04), ('distracting', 0.039), ('toyota', 0.039), ('lose', 0.039), ('shared', 0.039), ('animal', 0.038), ('david', 0.038), ('templates', 0.038), ('composite', 0.037), ('indoor', 0.037), ('sequence', 0.036), ('frequently', 0.036), ('observation', 0.036), ('tao', 0.036), ('controlling', 0.036), ('occlusion', 0.035), ('aggregation', 0.035), ('regularization', 0.035), ('vulnerable', 0.034), ('kalal', 0.034), ('zero', 0.034), ('dependencies', 0.034), ('learning', 0.033), ('regularized', 0.033), ('color', 0.032), ('sampled', 0.032), ('matrices', 0.032), ('multiview', 0.031), ('icdm', 0.031), ('tip', 0.031), ('visual', 0.031), ('sparsity', 0.03), ('undergoing', 0.03), ('compressive', 0.03), ('comaniciu', 0.03), ('moving', 0.03), ('adaptive', 0.029), ('xue', 0.029), ('kwon', 0.029), ('atoms', 0.029), ('lasso', 0.029), ('ross', 0.029), ('handling', 0.029), ('tracks', 0.029), ('types', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
2 0.47438812 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
3 0.32164782 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
5 0.22677101 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
6 0.19185145 395 iccv-2013-Slice Sampling Particle Belief Propagation
7 0.1863369 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
8 0.17311195 338 iccv-2013-Randomized Ensemble Tracking
9 0.16422206 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
10 0.15278006 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
11 0.14967906 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
12 0.14906579 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
13 0.14830257 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.14732841 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
15 0.13421887 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
16 0.12868318 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
17 0.12705332 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
18 0.11370222 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
19 0.11194871 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
20 0.10952643 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
topicId topicWeight
[(0, 0.243), (1, -0.034), (2, -0.025), (3, 0.042), (4, -0.117), (5, -0.155), (6, -0.179), (7, 0.184), (8, -0.119), (9, 0.237), (10, -0.081), (11, -0.198), (12, 0.062), (13, 0.113), (14, 0.046), (15, -0.053), (16, 0.116), (17, 0.064), (18, -0.057), (19, -0.103), (20, -0.017), (21, 0.039), (22, -0.078), (23, -0.086), (24, -0.022), (25, 0.025), (26, 0.024), (27, 0.043), (28, -0.001), (29, 0.041), (30, 0.069), (31, -0.011), (32, 0.007), (33, 0.002), (34, 0.009), (35, -0.056), (36, -0.062), (37, -0.033), (38, -0.021), (39, -0.04), (40, -0.037), (41, -0.089), (42, 0.017), (43, -0.038), (44, 0.081), (45, 0.028), (46, -0.065), (47, 0.003), (48, 0.0), (49, 0.052)]
simIndex simValue paperId paperTitle
1 0.96227777 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
same-paper 2 0.9565931 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
4 0.87612927 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
5 0.78356224 395 iccv-2013-Slice Sampling Particle Belief Propagation
Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn
Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.
6 0.75621939 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
7 0.71350557 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
8 0.65938532 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
9 0.64702207 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
10 0.63829577 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
11 0.59927934 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
12 0.59819752 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
13 0.58671135 338 iccv-2013-Randomized Ensemble Tracking
14 0.53757691 87 iccv-2013-Conservation Tracking
15 0.51483011 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
16 0.50366354 128 iccv-2013-Dynamic Probabilistic Volumetric Models
17 0.48936892 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
18 0.48208177 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
19 0.47224498 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
20 0.47218594 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
topicId topicWeight
[(2, 0.055), (7, 0.024), (26, 0.089), (31, 0.034), (35, 0.012), (40, 0.023), (42, 0.1), (64, 0.129), (73, 0.062), (89, 0.151), (97, 0.216), (98, 0.025)]
simIndex simValue paperId paperTitle
1 0.87204206 373 iccv-2013-Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics
Author: Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin, Thierry Dutoit
Abstract: Visual saliency has been an increasingly active research area in the last ten years with dozens of saliency models recently published. Nowadays, one of the big challenges in the field is to find a way to fairly evaluate all of these models. In this paper, on human eye fixations ,we compare the ranking of 12 state-of-the art saliency models using 12 similarity metrics. The comparison is done on Jian Li ’s database containing several hundreds of natural images. Based on Kendall concordance coefficient, it is shown that some of the metrics are strongly correlated leading to a redundancy in the performance metrics reported in the available benchmarks. On the other hand, other metrics provide a more diverse picture of models ’ overall performance. As a recommendation, three similarity metrics should be used to obtain a complete point of view of saliency model performance.
same-paper 2 0.83096474 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
3 0.82669312 347 iccv-2013-Recursive Estimation of the Stein Center of SPD Matrices and Its Applications
Author: Hesamoddin Salehian, Guang Cheng, Baba C. Vemuri, Jeffrey Ho
Abstract: Symmetric positive-definite (SPD) matrices are ubiquitous in Computer Vision, Machine Learning and Medical Image Analysis. Finding the center/average of a population of such matrices is a common theme in many algorithms such as clustering, segmentation, principal geodesic analysis, etc. The center of a population of such matrices can be defined using a variety of distance/divergence measures as the minimizer of the sum of squared distances/divergences from the unknown center to the members of the population. It is well known that the computation of the Karcher mean for the space of SPD matrices which is a negativelycurved Riemannian manifold is computationally expensive. Recently, the LogDet divergence-based center was shown to be a computationally attractive alternative. However, the LogDet-based mean of more than two matrices can not be computed in closed form, which makes it computationally less attractive for large populations. In this paper we present a novel recursive estimator for center based on the Stein distance which is the square root of the LogDet di– vergence that is significantly faster than the batch mode computation of this center. The key theoretical contribution is a closed-form solution for the weighted Stein center of two SPD matrices, which is used in the recursive computation of the Stein center for a population of SPD matrices. Additionally, we show experimental evidence of the convergence of our recursive Stein center estimator to the batch mode Stein center. We present applications of our recursive estimator to K-means clustering and image indexing depicting significant time gains over corresponding algorithms that use the batch mode computations. For the latter application, we develop novel hashing functions using the Stein distance and apply it to publicly available data sets, and experimental results have shown favorable com– ∗This research was funded in part by the NIH grant NS066340 to BCV. †Corresponding author parisons to other competing methods.
4 0.80526686 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
Author: Daniel M. Steinberg, Oscar Pizarro, Stefan B. Williams
Abstract: With the advent of cheap, high fidelity, digital imaging systems, the quantity and rate of generation of visual data can dramatically outpace a humans ability to label or annotate it. In these situations there is scope for the use of unsupervised approaches that can model these datasets and automatically summarise their content. To this end, we present a totally unsupervised, and annotation-less, model for scene understanding. This model can simultaneously cluster whole-image and segment descriptors, therebyforming an unsupervised model of scenes and objects. We show that this model outperforms other unsupervised models that can only cluster one source of information (image or segment) at once. We are able to compare unsupervised and supervised techniques using standard measures derived from confusion matrices and contingency tables. This shows that our unsupervised model is competitive with current supervised and weakly-supervised models for scene understanding on standard datasets. We also demonstrate our model operating on a dataset with more than 100,000 images col- lected by an autonomous underwater vehicle.
5 0.80008352 227 iccv-2013-Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning
Author: Zheyun Feng, Rong Jin, Anil Jain
Abstract: One of the key challenges in search-based image annotation models is to define an appropriate similarity measure between images. Many kernel distance metric learning (KML) algorithms have been developed in order to capture the nonlinear relationships between visual features and semantics ofthe images. Onefundamental limitation in applying KML to image annotation is that it requires converting image annotations into binary constraints, leading to a significant information loss. In addition, most KML algorithms suffer from high computational cost due to the requirement that the learned matrix has to be positive semi-definitive (PSD). In this paper, we propose a robust kernel metric learning (RKML) algorithm based on the regression technique that is able to directly utilize image annotations. The proposed method is also computationally more efficient because PSD property is automatically ensured by regression. We provide the theoretical guarantee for the proposed algorithm, and verify its efficiency and effectiveness for image annotation by comparing it to state-of-the-art approaches for both distance metric learning and image annotation. ,
6 0.7926777 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction
7 0.77520066 20 iccv-2013-A Max-Margin Perspective on Sparse Representation-Based Classification
8 0.76183695 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
9 0.75764608 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
10 0.74144751 338 iccv-2013-Randomized Ensemble Tracking
11 0.73858678 369 iccv-2013-Saliency Detection: A Boolean Map Approach
12 0.7384575 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
13 0.73598808 371 iccv-2013-Saliency Detection via Absorbing Markov Chain
14 0.73515916 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
15 0.73055792 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
16 0.72864282 396 iccv-2013-Space-Time Robust Representation for Action Recognition
17 0.72800672 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection
18 0.72589397 71 iccv-2013-Category-Independent Object-Level Saliency Detection
19 0.72254968 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
20 0.72153497 86 iccv-2013-Concurrent Action Detection with Structural Prediction