iccv iccv2013 iccv2013-320 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Wesierski, Patrick Horain
Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.
Reference: text
sentIndex sentText sentNum sentScore
1 we Abstract Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. [sent-4, score-0.667]
2 This generally makes tracking of poses of elongated objects very challenging. [sent-5, score-0.738]
3 We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. [sent-6, score-0.765]
4 The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. [sent-8, score-0.706]
5 In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. [sent-9, score-0.213]
6 While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. [sent-10, score-0.482]
7 Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. [sent-11, score-0.754]
8 The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. [sent-12, score-0.337]
9 To our knowledge, this is the first approach to generic tracking of elongated objects. [sent-13, score-0.721]
10 They can move fast under varying illumination and occlusions, in clutter, and deform in the camera projective space due to relaxed rigidity or change in viewpoint. [sent-16, score-0.312]
11 Yet, applications requiring pose tracking ofelongated objects are various and span, e. [sent-17, score-0.248]
12 Hence, tracking elongated objects is a challenging but important task. [sent-20, score-0.738]
13 Our goal is to track with one algorithm poses of plethora of elongated objects varying in shape, motion, and rigidity. [sent-24, score-0.688]
14 Our approach decomposes an elongated object into a chained assembly of segments ofmultiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints leveraging local rigidity over object segments. [sent-25, score-1.433]
15 As a result, we efficiently track elongated objects that can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating. [sent-26, score-0.885]
16 However, an algorithm that tracks precisely, robustly, and rapidly a plethora of elongated objects varying in shape, motion, and rigidity has not been proposed thus far. [sent-27, score-0.773]
17 , surface deformations of human face [36], articulating tree-based human pose [28]). [sent-30, score-0.3]
18 In contrast, structure-free, generic approaches, which are initialized simply by a single boundingbox, can localize arbitrary objects that are rigid [19, 34], deform less [5, 22, 33, 41], or more [6, 12, 23]. [sent-32, score-0.282]
19 They build object appearance on-line but strive to be robust against 2920 object deformations and thus neglect or filter out its pose. [sent-33, score-0.241]
20 Arguably, the single bounding-box annotation scenario currently limits their applicability to elongated objects that occupy rather expanded image regions. [sent-34, score-0.595]
21 In view of this, the paper addresses a new problem of developing a generic system for pose-based tracking of elongated objects, which we conformably define as chain-like image structures. [sent-35, score-0.721]
22 We position our approach between the structured and structure-free trackers by treating elongated objects as a structure of chained segments of parts with fixed appearance. [sent-36, score-1.133]
23 Notably, we introduce a generic, model-based tracker that admits a simple, oneshot configuration from annotated object parts in the first frame. [sent-38, score-0.409]
24 Apart from its computational efficiency, it also tracks objects robustly against partial occlusions and local appearance changes due to spatial support through partbased structure and re-detects them after full occlusions due to temporal support through fixed appearance. [sent-39, score-0.228]
25 We achieve this within a MAP-MRF setting of pictorial structures [10, 11] by developing a deformable model of chained parts that efficiently leverages object local rigidity over spatio-temporal domain. [sent-42, score-0.711]
26 This means the pixels can evolve freely within object parts during tracking, so achieving robustness to rotation and to local deformations caused by moderate change in viewpoint. [sent-44, score-0.358]
27 We then maintain spatial appearance of the whole object by decomposing it into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatiotemporal constraints. [sent-45, score-0.776]
28 We reference each segment of parts with an oriented polar coordinate system, effectively enforcing the spatial coherency of parts by promoting these part configurations that conform to the preferred relative angular deviations and distances over time. [sent-46, score-0.56]
29 Our other contribution is to devise the new task of generic tracking of elongated objects having arbitrary shapes and motions. [sent-48, score-0.781]
30 We also contribute by demonstrating that even though pictorial structures are usually considered slow [17], we integrate them into a hierarchical model that can register object pose up to speeds far exceeding real-time. [sent-49, score-0.294]
31 Related work We review related work on region and part-based trackers of object poses, and other chain-based assemblies representing elongated image structures. [sent-51, score-0.671]
32 The tracker determined object location in real-time by mean-shifting the kernel in the gradient-ascending direction of the differentiated objective function. [sent-53, score-0.259]
33 Also, they use a holistic appearance template that loses spatial information, reduces their robustness to occlusions [1], and renders them infeasible to track objects that deform heavily. [sent-57, score-0.392]
34 For instance, parts described by fixed, gray histograms voted for object location in [1]. [sent-62, score-0.226]
35 Kernels of parts were jointly mean-shifted in [9] to follow object deformations but required precomputing the subspace over their possible displacements on initial series of images to guide their joint convergence. [sent-64, score-0.275]
36 On the other hand, the prominent pictorial structures [10, 11] have been used extensively in object tracking by approximating complete graphs with star graphs [3, 3 1] and with other tree graph extensions [42, 44]. [sent-69, score-0.361]
37 2921 The graphs are trained off-line for specific objects, but can explain heavy foreshortening [35] and scale linearly with object parts. [sent-70, score-0.221]
38 We aim at an efficient and precise framework to track elongated objects that can vary in number of parts by several orders of magnitude. [sent-71, score-0.806]
39 In our setting, the primary advantage over particle filter and other pictorial structure trackers is that our tracker can render the global solution without approximative inference nor approximative object structure and its joint inference scales linearly with the number of parts. [sent-72, score-0.682]
40 A chain-based pictorial structure thus appears natural to track elongated regions, and our approach generalizes to such structures of arbitrary rigidity in a computationally efficient manner. [sent-73, score-0.914]
41 Our work is also related in approach to [17, 40] that use a chained pictorial structure, and loosely related to [20] that iteratively infers on a dense graph by evaluating an ensemble of chains. [sent-78, score-0.279]
42 However, [40] tracks non-deformable objects that shift and rotate, [20] requires a large set of training examples, and [17, 20] track object keypoints by filtering out object pose. [sent-79, score-0.283]
43 Approach We develop a model-based approach that can track the motion of the pose of an arbitrary, elongated object in the image plane. [sent-85, score-0.687]
44 We first partition an elongated object Oe into K segments Oe = {Oi}iK=1, as depicted in Fig. [sent-86, score-0.732]
45 Then, each segment i =is partitioned further into ki parts Oi = specified by square-like windows. [sent-88, score-0.335]
46 We link the the parts with a chain graph Gc = (V, E), where Wnoed elisn Vk are haess poacritaste wdi wthi ath c thhaei parts ahn Gd edges ,EE are asso- {pi,j}kj=i1 ncioadteeds Vwi athre eth aes sloinckiast ebdetw weithen t choen psaerctust ainvde parts sin E th aree ecah sasino-. [sent-90, score-0.52]
47 Model hierarchy, with an example of a deformable, elongated object, decomposed into K=3 segments that are referenced with planar coordinate systems. [sent-96, score-0.72]
48 Two segments share a part, which is anchored at their hinge, denoting heavy deformation (e. [sent-98, score-0.238]
49 The orientation of the coordinate system of each segment is estimated based on the tracked locations of the centers of the parts. [sent-101, score-0.249]
50 the last part of each segment is the first part of the next segment in the chain, so denoting a hinge. [sent-127, score-0.224]
51 As we update the orientation of segments during tracking, orientation variant features (e. [sent-135, score-0.317]
52 The elongated segments Oi extend over rigid or elastic current regions. [sent-138, score-0.749]
53 Pictorial structures whether model whole segments 2922 and search exhaustively for their orientations [10, 3 1], or split segments further into parts and model their constraints locally [42]. [sent-139, score-0.59]
54 We also split segments into parts but model them hierarchically with spatio-temporal constraints, i. [sent-140, score-0.337]
55 with local distances between parts and global orientations over segments to control their linear and angular deformations, respectively. [sent-142, score-0.483]
56 Constraining each segment in a chain with global orientation allows to control its local rigidity without the need for higher order cliques in the graph, which is the key to fast inference. [sent-143, score-0.43]
57 In this way, such a general, inertial temporal prior regularizes the dynamics of an object by favoring shift motion that is common during tracking [43]. [sent-145, score-0.259]
58 ject affects the distances, so we obtain: P(pti,j,pit,j+1) = P(lit,j,lit,j+1|sit,j,sit,j+1)P(sit,j,sit,j+1) (4) For simplicity, we model the joint scale prior P(sit,j , sit,j+1) for each pair of parts in the chain as a uniform distribution. [sent-154, score-0.275]
59 The bending of all the parts in the segment is then co−ntrlolled during tracking with the temporal term as: P(pit,j,pit,j+1|Oit−1) = M(θti,j;i,j+1;θi,j;i,j+1+Θit−1,κi) (6) where M denotes the von Mises distribution and κi denotes angular st idfefnneotses. [sent-163, score-0.53]
60 Therefore, our model favors such arrangements of parts of the segment, which maintain predefined geometrical configuration, presuming that the orientation Θit−1 does not change much between successive frames. [sent-165, score-0.266]
61 Configuration: Our system admits a simple, intuitive procedure for configuring the pose of an elongated object Oe in the initial frame I0. [sent-167, score-0.682]
62 We: (1) split Oe into |V| parts pi0,j by specifying their locations and sizes, (2i)n olin |kV neighbor parts with a chain Gc, (3) and specify K segments of parts awrtisth w thitheir a corresponding orientations Θi0. [sent-168, score-0.806]
63 Inference: We match our model (1) to each frame It by inferring on its negative log-posterior log(P(Oet |It, Oet−1)) with dynamic programming to −obltoaign( t(heO |MIAP configuration of the elongated object Oet,MAP. [sent-170, score-0.637]
64 The inference is fast and its complexity scales linearly with the number of object parts |V| . [sent-171, score-0.274]
65 e object is computed as the average over scales of all windows of parts and − + passed through the I filter as st = (1 − r)st−1 IR rst with the forgetting factor r. [sent-173, score-0.241]
66 Synthetic example of i-th segment of heavily deformable object, whose scale st increases. [sent-194, score-0.217]
67 The corresponding locations of parts between frames, translated back to the origin of the 2D CS, allow for recovering segment’s rotation Rit despite its incident deformation. [sent-196, score-0.22]
68 We show that our pose-configurable system can be used successfully to track elongated objects in the image plane, which can shift, rotate, change scale, be rigid and deform by flexing, articulating, and vibrating. [sent-200, score-0.885]
69 We also quantitatively evaluate our tracker on PROST dataset [34] with challenges of fast viewpoint changes, motion blur, heavy scale and illumination changes, and fre- × quent occlusions. [sent-201, score-0.355]
70 The tracker is compared against state-ofthe-art trackers on PROST that learn their appearance online. [sent-202, score-0.373]
71 We demonstrate that our spatio-temporal model with remarkably simple, fixed appearance term leads to competitive or better tracking performance. [sent-203, score-0.252]
72 As the occlusion event is not modeled explicitly, we enforce constant appearance so that the tracker is robust against occlusions and thus can recover easily by redetecting the object. [sent-204, score-0.332]
73 The frame processing speed scales linearly with the number of parts but also depends on their window sizes (optionally, the latter could be factored out with [30]). [sent-232, score-0.284]
74 Qualitative evaluation: We demonstrate that our method applies to tracking elongated objects of various shapes, which are rigid or deform by flexing, articulating, − 2, and vibrating in the image plane. [sent-233, score-1.012]
75 In Liquor, the tracker is very successful despite multiple and heavy occlusions of the glass bottle and is not confused by another bottle, which is fairly similar in color. [sent-236, score-0.479]
76 In Robotic arm, the tracker follows the 2D pose of the articulating robotic manipulator composed of two segments. [sent-237, score-0.517]
77 In Surgical suture, the suture is a very long object, which is thin and deforms heavily and unsystematically. [sent-238, score-0.221]
78 By splitting the suture into piece-wise linear segments, our poseconfigurable system can follow it very precisely. [sent-239, score-0.238]
79 Despite no constraints at the ends of the suture, the tracker stabilized both ends correctly, which is a challenging task [15]. [sent-240, score-0.213]
80 We posit this satisfactory behavior owes to the fact that, while some segments rotate, others only shift, and thus our hierarchical, spatio-temporal model renders the tracker stable. [sent-241, score-0.4]
81 In Toy tram, our model can explain the bending and scale change of the tram and is robust against moderate out-ofplane rotations affecting its appearance. [sent-242, score-0.263]
82 In Guitar string, the tracker is able to precisely register intricate deformations of the string with very little information available. [sent-243, score-0.399]
83 In this case though, the tracker ran with fixed scale to prevent the model from shrinking on the textureless, string region. [sent-245, score-0.405]
84 We can easily configure our region-based model to rigid objects with K=1 segment at initial orientation Θ10, and partition it evenly into k1 = 3 parts, i. [sent-249, score-0.364]
85 such that the parts span the segment with no (or very small) overlap (see, e. [sent-251, score-0.262]
86 To make the comparison fair, we fix the scale of our tracker and always output the same size of the ground truth bounding-box. [sent-260, score-0.268]
87 Note that in the first frame of each sequence, our tracker outputs center location of the whole object that is slightly misaligned (by several pixels) from the center of 2Last 3 video sequences were collected from YouTube. [sent-261, score-0.315]
88 The left column shows initialized layouts of chained segments of evenly annotated parts. [sent-265, score-0.298]
89 (i) The glass bottle is configured with K = 1 segment of k1 = 3 parts. [sent-267, score-0.295]
90 (ii) Articulating robotic arm is split into K = 2 segments of k1 = 6 and k2 = 5 parts. [sent-268, score-0.324]
91 (iii) We split surgical suture into K = 6 segments of ki = 11 parts. [sent-269, score-0.555]
92 (iv) The tram only bends so we configure it with K = 1 segment of k1 = 5 parts. [sent-270, score-0.292]
93 (v) One can expect the vibrating string to deform only slightly, so we configure it with K = 1segment, as well. [sent-271, score-0.381]
94 Our tracker with constant appearance yields competitive performance with respect to TLD [19] and GD [22], while outperforming others, and processes videos at ∼ 100 fps. [sent-277, score-0.283]
95 Top: Since we integrate color histograms into our appearance term, the tracker struggles with heavy illumination changes, present in the Box sequence (e. [sent-296, score-0.442]
96 Bottom: Unlike snakes models, the tracker is confused on textureless regions and shrinks when it updates scale. [sent-299, score-0.324]
97 In Guitar string, it cannot discern between the correct and smaller scale of the parts of the guitar string (with the same configuration as in Fig. [sent-300, score-0.413]
98 Complementary to on-line appearance update algorithms, our future work will pursue development of online reconfiguration update mechanisms for updating object rigidity constraints over time. [sent-305, score-0.33]
99 Since the proposed generic tracker allows for attributing local rigidity constraints over the spatio-temporal space occupied by various elongated objects, it thus opens opportunities to investigate dynamic adaptation of rigidity constraints for more robust tracking. [sent-306, score-1.083]
100 Object tracking by asymmetric kernel mean shift with automatic scale and orientation selection. [sent-568, score-0.334]
wordName wordTfidf (topN-words)
[('elongated', 0.535), ('tracker', 0.213), ('suture', 0.19), ('articulating', 0.176), ('segments', 0.151), ('parts', 0.15), ('chained', 0.147), ('rigidity', 0.146), ('tracking', 0.143), ('pictorial', 0.132), ('flexing', 0.119), ('tram', 0.117), ('deform', 0.116), ('segment', 0.112), ('string', 0.107), ('surgical', 0.105), ('guitar', 0.101), ('oet', 0.098), ('vibrating', 0.095), ('trackers', 0.09), ('heavy', 0.087), ('angular', 0.084), ('rotate', 0.084), ('robotic', 0.083), ('snakes', 0.081), ('deformations', 0.079), ('international', 0.074), ('hierarchy', 0.073), ('ki', 0.073), ('bottle', 0.071), ('prost', 0.07), ('appearance', 0.07), ('chain', 0.07), ('shift', 0.07), ('conference', 0.067), ('orientation', 0.066), ('pages', 0.065), ('configure', 0.063), ('rigid', 0.063), ('orientations', 0.062), ('track', 0.061), ('objects', 0.06), ('pattern', 0.06), ('glass', 0.059), ('liquor', 0.059), ('lit', 0.056), ('frame', 0.056), ('scale', 0.055), ('arm', 0.054), ('configured', 0.053), ('change', 0.05), ('rit', 0.05), ('assembly', 0.05), ('deformable', 0.05), ('occlusions', 0.049), ('tailored', 0.049), ('oe', 0.048), ('horain', 0.048), ('poseconfigurable', 0.048), ('wesierski', 0.048), ('gd', 0.047), ('object', 0.046), ('pose', 0.045), ('scales', 0.045), ('generic', 0.043), ('configurable', 0.042), ('approximative', 0.042), ('struggles', 0.042), ('bending', 0.041), ('structures', 0.04), ('arranged', 0.04), ('remarkably', 0.039), ('particle', 0.039), ('locations', 0.037), ('versatility', 0.037), ('stiffness', 0.037), ('renders', 0.036), ('control', 0.036), ('split', 0.036), ('oi', 0.035), ('blob', 0.035), ('coordinate', 0.034), ('update', 0.034), ('rotation', 0.033), ('linearly', 0.033), ('vision', 0.033), ('plethora', 0.032), ('santner', 0.032), ('deforms', 0.031), ('godec', 0.031), ('demonstrating', 0.031), ('updated', 0.031), ('coherency', 0.03), ('rescaling', 0.03), ('textureless', 0.03), ('finger', 0.03), ('histogram', 0.03), ('histograms', 0.03), ('ran', 0.03), ('cs', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
Author: Daniel Wesierski, Patrick Horain
Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.
2 0.18239543 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
3 0.17606057 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.
4 0.16856328 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
Author: Stefan Duffner, Christophe Garcia
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.
5 0.15430838 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
7 0.14967906 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
8 0.14956039 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
9 0.14423452 338 iccv-2013-Randomized Ensemble Tracking
10 0.13992234 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
11 0.13184665 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
12 0.12545137 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
13 0.12284242 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
14 0.11854736 57 iccv-2013-BOLD Features to Detect Texture-less Objects
15 0.11526615 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
16 0.11035921 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
17 0.10873231 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
18 0.10540982 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
19 0.1049969 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
20 0.10046367 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
topicId topicWeight
[(0, 0.234), (1, -0.08), (2, 0.033), (3, 0.062), (4, 0.065), (5, -0.07), (6, -0.09), (7, 0.128), (8, -0.075), (9, 0.102), (10, -0.051), (11, -0.084), (12, 0.026), (13, 0.024), (14, -0.02), (15, 0.046), (16, 0.094), (17, 0.012), (18, -0.041), (19, -0.063), (20, 0.029), (21, 0.024), (22, -0.024), (23, -0.07), (24, -0.029), (25, -0.005), (26, -0.007), (27, 0.03), (28, -0.005), (29, -0.013), (30, -0.022), (31, -0.046), (32, -0.085), (33, -0.062), (34, -0.043), (35, 0.068), (36, -0.017), (37, -0.089), (38, 0.008), (39, -0.066), (40, 0.058), (41, 0.087), (42, 0.009), (43, -0.007), (44, 0.026), (45, 0.091), (46, -0.018), (47, 0.073), (48, -0.051), (49, 0.073)]
simIndex simValue paperId paperTitle
same-paper 1 0.95541543 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
Author: Daniel Wesierski, Patrick Horain
Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.
2 0.73802906 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
Author: Yu Pang, Haibin Ling
Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.
4 0.70747024 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
Author: Seunghoon Hong, Suha Kwak, Bohyung Han
Abstract: We propose a novel offline tracking algorithm based on model-averaged posterior estimation through patch matching across frames. Contrary to existing online and offline tracking methods, our algorithm is not based on temporallyordered estimates of target state but attempts to select easyto-track frames first out of the remaining ones without exploiting temporal coherency of target. The posterior of the selected frame is estimated by propagating densities from the already tracked frames in a recursive manner. The density propagation across frames is implemented by an efficient patch matching technique, which is useful for our algorithm since it does not require motion smoothness assumption. Also, we present a hierarchical approach, where a small set of key frames are tracked first and non-key frames are handled by local key frames. Our tracking algorithm is conceptually well-suited for the sequences with abrupt motion, shot changes, and occlusion. We compare our tracking algorithm with existing techniques in real videos with such challenges and illustrate its superior performance qualitatively and quantitatively.
5 0.69977331 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
Author: Stefan Duffner, Christophe Garcia
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-theart tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.
6 0.67292601 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
7 0.66969204 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
8 0.65927005 395 iccv-2013-Slice Sampling Particle Belief Propagation
9 0.65726393 57 iccv-2013-BOLD Features to Detect Texture-less Objects
10 0.64501548 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
11 0.63597381 87 iccv-2013-Conservation Tracking
12 0.62697548 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
13 0.61597073 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking
14 0.61335135 128 iccv-2013-Dynamic Probabilistic Volumetric Models
15 0.60766065 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
16 0.59615165 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
17 0.58173704 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
18 0.57668549 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
19 0.57267338 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
20 0.57122475 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features
topicId topicWeight
[(2, 0.05), (6, 0.012), (7, 0.022), (26, 0.075), (31, 0.044), (35, 0.022), (40, 0.03), (42, 0.096), (48, 0.268), (64, 0.093), (73, 0.034), (89, 0.142)]
simIndex simValue paperId paperTitle
1 0.88993794 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
Author: Eran Swears, Anthony Hoogs, Kim Boyer
Abstract: Recognizing functional scene elemeents in video scenes based on the behaviors of moving objects that interact with them is an emerging problem ooff interest. Existing approaches have a limited ability to chharacterize elements such as cross-walks, intersections, andd buildings that have low activity, are multi-modal, or havee indirect evidence. Our approach recognizes the low activvity and multi-model elements (crosswalks/intersections) by introducing a hierarchy of descriptive clusters to fform a pyramid of codebooks that is sparse in the numbber of clusters and dense in content. The incorporation oof local behavioral context such as person-enter-building aand vehicle-parking nearby enables the detection of elemennts that do not have direct motion-based evidence, e.g. buuildings. These two contributions significantly improvee scene element recognition when compared against thhree state-of-the-art approaches. Results are shown on tyypical ground level surveillance video and for the first time on the more complex Wide Area Motion Imagery.
2 0.87553394 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
Author: Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestrians can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset1 that includes 3, 673 annotated samples collected from 171 surveillance videos. It is 20 times larger than existing public datasets.
3 0.84730792 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
Author: Masoud S. Nosrati, Shawn Andrews, Ghassan Hamarneh
Abstract: The inclusion of shape and appearance priors have proven useful for obtaining more accurate and plausible segmentations, especially for complex objects with multiple parts. In this paper, we augment the popular MumfordShah model to incorporate two important geometrical constraints, termed containment and detachment, between different regions with a specified minimum distance between their boundaries. Our method is able to handle multiple instances of multi-part objects defined by these geometrical hamarneh} @ s fu . ca (a)Standar laΩb ehlingΩfuhnctionseting(Ωb)hΩOuirseΩtijng Figure 1: The inside vs. outside ambiguity in (a) is resolved by our containment constraint in (b). constraints using a single labeling function while maintaining global optimality. We demonstrate the utility and advantages of these two constraints and show that the proposed convex continuous method is superior to other state-of-theart methods, including its discrete counterpart, in terms of memory usage, and metrication errors.
same-paper 4 0.82215756 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
Author: Daniel Wesierski, Patrick Horain
Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.
5 0.78281033 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
Author: Zhuoyuan Chen, Ying Wu
Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionaryfrom clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, , further analysis reveals the connection between our approach and the “partial” dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.
6 0.76648158 207 iccv-2013-Illuminant Chromaticity from Image Sequences
7 0.69342887 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
8 0.66949844 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
9 0.6676631 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
10 0.65495729 206 iccv-2013-Hybrid Deep Learning for Face Verification
11 0.64741635 106 iccv-2013-Deep Learning Identity-Preserving Face Space
12 0.63910186 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
13 0.63866937 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
14 0.63326633 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
15 0.63088465 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
16 0.62871337 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
17 0.62752134 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition
18 0.6271472 5 iccv-2013-A Color Constancy Model with Double-Opponency Mechanisms
19 0.62631142 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
20 0.62567699 312 iccv-2013-Perceptual Fidelity Aware Mean Squared Error