cvpr cvpr2013 cvpr2013-439 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
Reference: text
sentIndex sentText sentNum sentScore
1 Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. [sent-4, score-0.828]
2 Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. [sent-5, score-0.337]
3 In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. [sent-6, score-0.959]
4 Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. [sent-7, score-0.844]
5 The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). [sent-8, score-0.847]
6 Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. [sent-9, score-1.19]
7 We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm. [sent-10, score-0.271]
8 Representing occlusion is particularly important in estimating human motion because, as the human body is an articulated structure, different parts occlude each other frequently. [sent-13, score-0.712]
9 The human body is structurally symmetric and parts tend to be occluded by their symmetric counterparts, such as left knees by right knees (Figure 1). [sent-14, score-1.045]
10 During occlusions, the appearance symmetry of the human body can cause double counting: the same image evidence is used to explain the location of both symmetric parts. [sent-16, score-0.777]
11 If left unaddressed, double counting can critically affect subsequent processes, such as action recognition [37], affordance estimation [9], and pose reconstruction [24]. [sent-17, score-0.79]
12 Symmetric parts tend to cause double counting errors (a) in tree-structured models because they have similar appearance models as shown for a set of parts in (c). [sent-19, score-0.806]
13 Our method reasons about occlusions and tracks symmetric parts jointly, and thereby reduces double counting errors shown in (b). [sent-20, score-0.973]
14 Double counting occurs when symmetric part pairs have high detection scores at the same locations in the image (Figure 2). [sent-22, score-0.545]
15 This happens in two cases: (1) when image cues for one part of a symmetric pair dominate the other, and (2) in occlusion scenarios, in which the image only contains evidence for one part, such as profile views of a person. [sent-23, score-0.461]
16 Thus, dealing with double counting requires a representation for occlusion, as well as relationships between symmetric parts that enforce mutual exclusion. [sent-24, score-1.061]
17 It has been noted that even in the human visual system [28], temporal motion continuity serves occlusion reasoning. [sent-28, score-0.443]
18 The max marginals for symmetric parts (left and right knees) score highly on the same locations in the image because of the similar appearance of symmetric parts. [sent-31, score-0.727]
19 a system cannot reason about occlusion temporally, motion consistency will force it to struggle to find image evidence to support a smooth path when occlusion occurs. [sent-33, score-0.398]
20 In this work, we argue that temporal reasoning about occlusion is essential to tracking human pose and handling double counting. [sent-35, score-1.073]
21 We divide the body into a set of singleton parts and pairs of symmetric parts. [sent-36, score-0.54]
22 Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets” are related by an underlying articulated structure. [sent-37, score-0.844]
23 Our contributions are: (1) an occlusionaware model for tracking human pose that enforces both spatial and temporal consistency; (2) a method for jointly tracking symmetric parts that is inspired by optimal formulations for multi-target tracking. [sent-38, score-1.172]
24 We evaluate our method on an outdoor pose dataset and report results on two standard datasets. [sent-39, score-0.271]
25 We outperform a state-of-the-art baseline [23] and demonstrate a marked reduction in double counting errors. [sent-40, score-0.5]
26 Relevant Work There exists a large body of work that tackles the problem of human pose estimation. [sent-42, score-0.351]
27 Early methods [29, 22] used model-based representations to track human pose in video sequences, however these methods usually required good initializations and strong dynamic priors such as [18]. [sent-43, score-0.4]
28 There is also a large body of work that looks at the problem of directly estimating 3D human pose from video sequences. [sent-45, score-0.351]
29 These methods, while attractive for reasoning about occlusion in 3D, tend to require strong priors due to the larger set of possible configurations in 3D and do not generalize to arbitrary actions easily. [sent-46, score-0.259]
30 However, they struggle on images where the subject is undergoing self-occlusion and suffer from double counting of image evidence. [sent-51, score-0.535]
31 These models augment the tree-structure to capture occlusion relationships between parts not connected in the tree. [sent-53, score-0.288]
32 For the single image case, the work by Jiang [ 15, 13] enforces exclusion constraints by decoding trellis graphs for each part with constraints between the graphs to enforce mutual exclusion. [sent-56, score-0.498]
33 In the video domain, [26] found a frame with an easily detectable canonical pose to build up an appearance model of the person that can be used to aid tracking in the rest of the frames. [sent-58, score-0.471]
34 [23] generated multiple diverse high-scoring pose proposals from a tree-structured model and used a chain CRF to track the pose through the sequence. [sent-63, score-0.613]
35 Recent approaches have also looked to track extremities of multiple interacting people using a branch and bound framework on AND-OR graphs [2 1] and quadratic binary programming [34]. [sent-64, score-0.239]
36 We cast the problem of tracking human pose as a multitarget tracking problem where the ”targets” are related by an articulated skeleton. [sent-65, score-0.844]
37 Our formulation for the simultaneous tracking of symmetric parts draws inspiration from recent advances in the area of multiple target tracking. [sent-66, score-0.605]
38 Tracking Human Pose The (u, v) location of a part p in a frame at time instant f is denoted by We denote by xp = . [sent-70, score-0.283]
39 We use a tree-structured deformable parts model in each frame to generate proposals for each part. [sent-75, score-0.313]
40 In the first iteration, we track the head node using an LP tracking formulation. [sent-76, score-0.426]
41 Proposals for the next symmetric pair in the tree are generated by conditioning each tree on the tracked locations computed in the previous iteration. [sent-77, score-0.446]
42 Symmetric parts are tracked simultaneously with mutual exclusion constraints. [sent-78, score-0.522]
43 The method proceeds by sequentially conditioning the tracking of parts on their parents until all the parts are tracked. [sent-79, score-0.591]
44 while In breadth first fashion, select next part(s) do Compute max-marginals for current part(s) conditioned on the tracked locations of parent parts. [sent-82, score-0.281]
45 if is symmetric(part) then Track symmetric parts using LP multi-target tracking. [sent-83, score-0.381]
46 A symmetric part pair is a pair of parts (p, q) that share the same appearance. [sent-89, score-0.435]
47 The goal of human pose tracking is to estimate the location of each part of the person in every frame of the image sequence. [sent-90, score-0.683]
48 Given a set of proposals for the location of the head in each frame (Section 3. [sent-99, score-0.328]
49 Tracking a Singleton Part Given a set of proposals denoted by Xpf for part p in tGheiv eimna age s eatt eoafc hpr ofrpaomsael sf ,d ewnoe feidrst b yaug Xment the pro- posal sets with an occlusion state opf for each frame. [sent-106, score-0.332]
50 We form tracklets ptijk for each part by combining triplets (ixpf−1, jxfp, kxpf+1) where ixfp ∈ Xpf is a proposal at location iin the image or an occlusion state opf. [sent-107, score-0.546]
51 pXifjk We denote by the indicator variable that is associated with tracklet ptijk that takes values ∈ {0, 1} corresponding to the tracklet bethinagt s tealkeecste vda olure nso ∈t. [sent-108, score-0.302]
52 1), we generate proposals and track the next set of nodes 333777223088 red dots denote detections for each of the parts separately in each frame. [sent-127, score-0.426]
53 The gray nodes denote occlusion nodes for each frame. [sent-128, score-0.265]
54 The dotted lines depict mutual exclusion constraints between certain sets of nodes. [sent-129, score-0.345]
55 The symmetric tracking problem is to find the best scoring path in each of these graphs subject to the mutual-exclusion constraints. [sent-130, score-0.552]
56 Next, for a symmetric pair of parts whose tracks are given by (x3 , x4) we simultaneously estimate the optimal tracks (See Section 3. [sent-138, score-0.477]
57 x3 ,x4 (5) Tracking is conditioned on the optimal parent track by fixing the location of the parent in each of the frames to the tracked locations and re-running dynamic programming inference in each of the trees in each frame (Section 3. [sent-143, score-0.694]
58 We proceed in this manner, by conditioning the tracking of the child nodes on the optimal tracks of their parents and by tracking symmetric parts using a joint formulation, until all the parts have been tracked. [sent-145, score-1.167]
59 Tracking a Pair of Symmetric Parts Our approach treats the problem of tracking symmetric pairs of parts as a multi-target tracking problem. [sent-148, score-0.829]
60 We enforce mutual exclusion constraints that prevent the symmetric parts from occupying the same location in the image. [sent-170, score-0.843]
61 In a typical selfocclusion scenario the score of a particular location in the image will be high for both the symmetric parts. [sent-171, score-0.371]
62 In such a case the mutual-exclusion constraints enforce that only one part can occupy the location, while the symmetric counterpart is either pushed to an occlusion node or to another location in the image that is consistent with the constraints and has a high score. [sent-172, score-0.67]
63 We enforce these constraints by limiting the total flow at nodes in both networks that share the same location in the image. [sent-173, score-0.253]
64 This formulation corresponds to maximizing the flow through two separate networks that interact via the mutual exclusion constraints. [sent-174, score-0.335]
65 Occlusion Interpolation Once a solution is obtained, the location of the occluded part is estimated by interpolating between the image location of the node preceding and following occlusion using cubic B-spline interpolation. [sent-180, score-0.339]
66 Generating Part Proposals via Max-Marginals pXifjk qXifjk Human pose in a frame at each time instant is modeled with a tree-strutured deformable part model as in recent work by [36]. [sent-183, score-0.35]
67 (b) Proposal sets are augmented by tracking each proposal forwards and backwards to ensure smooth tracks. [sent-192, score-0.378]
68 (c) Foreground likelihood used to score tracklets (d) The detection likelihood for the head part. [sent-193, score-0.309]
69 To generate proposals for part locations in each frame, we compute the max-marginal of the above scoring function at each part. [sent-196, score-0.292]
70 The max-marginal for part iin frame f is given by: μ∗(xit = s) =xt m:xait=xsS(xt), (8) which is the maximum ofthe scoring function with the part i clamped to location s. [sent-197, score-0.346]
71 We perform non-maxima suppression on the max-marginal score map for each part to generate a set of location proposals in each frame. [sent-201, score-0.296]
72 We expand the proposal set by tracking each proposal forwards and backwards using a Lucas-Kanade template tracker [6] to obtain extended proposal sets Xit. [sent-202, score-0.51]
73 Once a parent part has been tracked, the max-marginals for the child nodes are recomputed by conditioning on the tracked locations of the parent nodes. [sent-204, score-0.47]
74 The conditioned maxmarginals for part iin frame f with a set of parent nodes pa(i) with tracked locations xp∗a(i) can be written as: μ∗(xif= s) = xfm:xaifx=s, S(xf). [sent-205, score-0.485]
75 Scoring Part Tracklets Each tracklet is assigned a likelihood score that consists of terms that measure the detection likelihood, the foreground likelihood and motion prior: uifjk= α+ss αforme(sXmifojt(kX) if+jk) α. [sent-209, score-0.376]
76 We normalize the maxmarginal score and obtain a likelihood of detection of part p at location ias: ldet(ixfp) ∝? [sent-213, score-0.229]
77 clusion nodes we assign a constant score for the occlusion nod ldet (iopf) ∝ podet. [sent-216, score-0.387]
78 We denote the two motion vectors as vij = and vjk = Our motion score is now given− −by x: xif−1 − xjf xjf − xkf+1. [sent-222, score-0.423]
79 4625 Table 1: PCP scores and keypoint localization error for the six sequences of the outdoor pose dataset. [sent-270, score-0.397]
80 In order to test the tracking method we model human pose with the state-of-the-art treestructured CRF model of [36]. [sent-273, score-0.545]
81 We model the human body with 26 parts as in [36]: 2 singleton parts for the head and neck and a total of 12 symmetric pairs of parts for the shoulders, torso, legs, and upper arms. [sent-275, score-0.978]
82 As our baseline, we compare the method of [23] that also uses a detector for pose in each frame [36] that is trained on the same training data. [sent-277, score-0.247]
83 The n-Best pose configurations are generated for each frame and tracking is performed by modeling pose tracking with a chain-CRF and performing viterbi-decoding like inference. [sent-278, score-0.918]
84 Human Eva-I: We evaluate our method on a standardized dataset that comprises of sequences of actors performing different actions in a indoor motion capture environ- ment. [sent-283, score-0.238]
85 As our method (and most 2D pose estimation methods) cannot distinguish between left and right limbs we report the score of the higher scoring assignment. [sent-298, score-0.367]
86 on the outdoor pose dataset as reported in Table 1. [sent-335, score-0.271]
87 The main improvements are in the tracking of the lower limbs which are especially susceptible to double counting errors. [sent-336, score-0.77]
88 Our method reduces the double counting artifacts and enforces temporal smoothness for each part resulting in smoother and more accurate tracks. [sent-337, score-0.617]
89 Double counting errors We observe a significant decrease in the number of double counting errors of our method over the baseline (Figure 9). [sent-341, score-0.79]
90 In the outdoor pose dataset we reduce the number of double counting errors by substantially by around 75 %, while we observe a decrease of approximately 41 % on the HumanEva-I sequences. [sent-342, score-0.815]
91 We reduce double counting errors by reasoning about occlusion and enforcing mutual exclusion constraints. [sent-347, score-1.051]
92 5 0 nBest+DP 1 5050 0 Sym etricTracking Figure 9: Reduction in double counting. [sent-386, score-0.298]
93 We achieve a reduction in double counting errors on both our evaluation datasets due to better occlusion reasoning and mutual exclusion constraints. [sent-387, score-1.051]
94 Clustered pose and nonlinear appearance models for human pose estimation. [sent-475, score-0.466]
95 Beyond trees: Common-factor models for 2d human pose recovery. [sent-490, score-0.28]
96 An efficient branchand-bound algorithm for optimal human pose estimation. [sent-567, score-0.28]
97 Hand tracking by binary quadratic programming and its application to retail activity recognition. [sent-574, score-0.267]
98 Multiple tree models for occlusion and spatial constraints in human pose estimation. [sent-579, score-0.483]
99 We show frames of symmetric tracking of human pose in comparison to the baseline [ ] on outdoor pose dataset. [sent-593, score-1.067]
100 Note that our method reduces double counting errors especially on frames when the person is entering a profile view with mutual occlusion. [sent-594, score-0.713]
wordName wordTfidf (topN-words)
[('double', 0.298), ('symmetric', 0.25), ('tracking', 0.224), ('dpn', 0.214), ('counting', 0.202), ('pose', 0.186), ('exclusion', 0.172), ('occlusion', 0.157), ('pcp', 0.147), ('ecarktgni', 0.134), ('itcmrmys', 0.134), ('xifjk', 0.134), ('xjf', 0.134), ('parts', 0.131), ('mutual', 0.127), ('tracklet', 0.124), ('proposals', 0.121), ('track', 0.12), ('ldet', 0.119), ('articulated', 0.116), ('ifjk', 0.107), ('xpf', 0.107), ('knees', 0.104), ('kle', 0.104), ('human', 0.094), ('tracked', 0.092), ('singleton', 0.088), ('xif', 0.088), ('outdoor', 0.085), ('parent', 0.083), ('head', 0.082), ('trackings', 0.08), ('ymmetric', 0.08), ('continuity', 0.08), ('scoring', 0.078), ('body', 0.071), ('keypoint', 0.07), ('conditioned', 0.067), ('proposal', 0.066), ('conditioning', 0.065), ('location', 0.064), ('temporal', 0.063), ('tracklets', 0.062), ('pb', 0.061), ('frame', 0.061), ('affordance', 0.057), ('score', 0.057), ('sequences', 0.056), ('xp', 0.055), ('argmax', 0.055), ('likelihood', 0.054), ('part', 0.054), ('nodes', 0.054), ('ipxifjk', 0.054), ('ixfp', 0.054), ('ixpf', 0.054), ('jog', 0.054), ('kxpf', 0.054), ('lpxfj', 0.054), ('ptijk', 0.054), ('pxifjk', 0.054), ('smot', 0.054), ('uifjk', 0.054), ('xfj', 0.054), ('enforce', 0.053), ('reasoning', 0.051), ('actions', 0.051), ('lp', 0.05), ('crf', 0.05), ('qualitative', 0.049), ('motion', 0.049), ('instant', 0.049), ('tracks', 0.048), ('gibson', 0.047), ('unaddressed', 0.047), ('forwards', 0.047), ('yaser', 0.047), ('action', 0.047), ('limbs', 0.046), ('constraints', 0.046), ('actors', 0.045), ('errors', 0.044), ('programming', 0.043), ('frames', 0.042), ('treestructured', 0.041), ('backwards', 0.041), ('structurally', 0.041), ('uniqueness', 0.041), ('parents', 0.04), ('baseball', 0.04), ('locations', 0.039), ('people', 0.038), ('foreground', 0.038), ('branch', 0.038), ('performing', 0.037), ('velocity', 0.036), ('flow', 0.036), ('struggle', 0.035), ('iin', 0.035), ('targets', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
Author: Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
2 0.23122753 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
Author: Anton Milan, Konrad Schindler, Stefan Roth
Abstract: When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional randomfield (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
3 0.2030894 334 cvpr-2013-Pose from Flow and Flow from Pose
Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi
Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.
4 0.19232148 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
Author: Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.
5 0.18493204 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
Author: unkown-author
Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the “right” frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time online (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.
6 0.17447287 335 cvpr-2013-Poselet Conditioned Pictorial Structures
9 0.16831356 311 cvpr-2013-Occlusion Patterns for Object Class Detection
10 0.16402167 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
11 0.16082163 40 cvpr-2013-An Approach to Pose-Based Action Recognition
12 0.15996225 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
13 0.15740891 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
14 0.15511294 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
15 0.15298086 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
16 0.15015319 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
17 0.14956018 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
18 0.14605141 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
19 0.14238791 457 cvpr-2013-Visual Tracking via Locality Sensitive Histograms
20 0.14195475 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
topicId topicWeight
[(0, 0.259), (1, -0.009), (2, 0.019), (3, -0.208), (4, -0.045), (5, -0.032), (6, 0.21), (7, -0.015), (8, 0.12), (9, 0.053), (10, -0.085), (11, 0.126), (12, -0.119), (13, 0.063), (14, 0.019), (15, 0.072), (16, 0.033), (17, -0.02), (18, -0.031), (19, -0.061), (20, 0.024), (21, 0.081), (22, -0.045), (23, 0.016), (24, -0.031), (25, -0.013), (26, 0.031), (27, -0.082), (28, -0.024), (29, -0.012), (30, 0.078), (31, -0.033), (32, 0.028), (33, 0.028), (34, 0.036), (35, -0.026), (36, 0.047), (37, 0.003), (38, 0.115), (39, -0.042), (40, -0.039), (41, -0.008), (42, 0.014), (43, 0.019), (44, -0.055), (45, 0.019), (46, -0.024), (47, -0.085), (48, -0.011), (49, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.97213167 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
Author: Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
2 0.76466876 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
Author: Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.
3 0.69388592 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
4 0.68698424 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
5 0.67118359 334 cvpr-2013-Pose from Flow and Flow from Pose
Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi
Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.
6 0.66413325 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
7 0.65707564 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
8 0.65658242 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
9 0.65639442 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
10 0.65215331 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
11 0.64972633 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
12 0.64518416 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
13 0.64387512 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
14 0.63530695 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
15 0.62311065 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
16 0.62262338 40 cvpr-2013-An Approach to Pose-Based Action Recognition
17 0.62159491 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
19 0.61952287 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
20 0.61856061 440 cvpr-2013-Tracking People and Their Objects
topicId topicWeight
[(10, 0.095), (16, 0.019), (26, 0.074), (33, 0.281), (34, 0.232), (67, 0.088), (69, 0.038), (80, 0.028), (87, 0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.86548084 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
Author: Varun Ramakrishna, Takeo Kanade, Yaser Sheikh
Abstract: The human body is structurally symmetric. Tracking by detection approaches for human pose suffer from double counting, where the same image evidence is used to explain two separate but symmetric parts, such as the left and right feet. Double counting, if left unaddressed can critically affect subsequent processes, such as action recognition, affordance estimation, and pose reconstruction. In this work, we present an occlusion aware algorithm for tracking human pose in an image sequence, that addresses the problem of double counting. Our key insight is that tracking human pose can be cast as a multi-target tracking problem where the ”targets ” are related by an underlying articulated structure. The human body is modeled as a combination of singleton parts (such as the head and neck) and symmetric pairs of parts (such as the shoulders, knees, and feet). Symmetric body parts are jointly tracked with mutual exclusion constraints to prevent double counting by reasoning about occlusion. We evaluate our algorithm on an outdoor dataset with natural background clutter, a standard indoor dataset (HumanEva-I), and compare against a state of the art pose estimation algorithm.
2 0.84936899 401 cvpr-2013-Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
Author: Joseph J. Lim, C. Lawrence Zitnick, Piotr Dollár
Abstract: We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both topdown and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200× faster than competing methods. We also achieve large improvements ainn dcoetmecptietoinn agc mceutrhaocdys f.o Wr teh ael sboot atochmi-evuep ltaarsgkse of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.
3 0.81215292 300 cvpr-2013-Multi-target Tracking by Lagrangian Relaxation to Min-cost Network Flow
Author: Asad A. Butt, Robert T. Collins
Abstract: We propose a method for global multi-target tracking that can incorporate higher-order track smoothness constraints such as constant velocity. Our problem formulation readily lends itself to path estimation in a trellis graph, but unlike previous methods, each node in our network represents a candidate pair of matching observations between consecutive frames. Extra constraints on binary flow variables in the graph result in a problem that can no longer be solved by min-cost network flow. We therefore propose an iterative solution method that relaxes these extra constraints using Lagrangian relaxation, resulting in a series of problems that ARE solvable by min-cost flow, and that progressively improve towards a high-quality solution to our original optimization problem. We present experimental results showing that our method outperforms the standard network-flow formulation as well as other recent algorithms that attempt to incorporate higher-order smoothness constraints.
4 0.80524278 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
Author: Jianguo Li, Yimin Zhang
Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.
5 0.80474597 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
6 0.80357218 311 cvpr-2013-Occlusion Patterns for Object Class Detection
8 0.80188811 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
9 0.8010838 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
10 0.79963481 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
11 0.79959875 202 cvpr-2013-Hierarchical Saliency Detection
12 0.79957283 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
13 0.79946691 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
14 0.79933685 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
15 0.7992416 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
16 0.79895824 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
17 0.79892051 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
18 0.79874504 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
19 0.79863232 334 cvpr-2013-Pose from Flow and Flow from Pose
20 0.79853427 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation