cvpr cvpr2013 cvpr2013-206 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
Reference: text
sentIndex sentText sentNum sentScore
1 Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. [sent-10, score-1.034]
2 In particular, we employ two-layered random forests as joint regressors. [sent-11, score-0.426]
3 The first layer acts as a discriminative, independent body part classifier. [sent-12, score-0.587]
4 The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. [sent-13, score-0.348]
5 This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. [sent-14, score-1.145]
6 In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods. [sent-15, score-1.357]
7 One of the most popular approaches in this area is the pictorial structure framework [13, 11], which models the spatial relations of rigid parts using usually a tree model. [sent-18, score-0.647]
8 , by learning better appearance [24, 9, 1] or shape models [42] of the body parts. [sent-21, score-0.336]
9 In object detection, one of the best performing methods relies on so called deformable part models [10], which use mixtures of star models over templates of parts. [sent-22, score-0.466]
10 Recently, [40] showed that mixtures of part templates can also be efficiently used in a tree model, leading to very powerful pose estimation models. [sent-23, score-0.736]
11 In particular, instead of modeling the transformations of a single body part template as in the classical pictorial structure model, the transformations of the structure (PS) model with independent part templates. [sent-24, score-1.11]
12 While the first layer consists of the same independent classifiers, the second layer regresses the locations of the joints in dependency of the independent part classifiers. [sent-34, score-0.621]
13 , nose (red), left hip joint (blue), and left knee (green), are more discriminative and resolve the ambiguities between the legs. [sent-37, score-0.507]
14 limbs are encoded by different deformable templates per body part. [sent-38, score-0.762]
15 While this approach outperforms classical pictorial structure models for human pose estimation, it has been shown in [41] that the used templates, which are scanningwindow templates trained with linear SVMs on HOG features [7], are very sensitive to noise and limit the performance. [sent-39, score-0.989]
16 In this work, we thus address the problem of obtaining better part templates in the context of a pictorial structure framework. [sent-40, score-0.821]
17 Similar to [40], we do not model the limb transformations explicitly, but use discriminative learned templates that allow the handling of limb pose variations im333000334199 plicitly. [sent-41, score-0.847]
18 However, contrary to [40], we do not use noise sensitive, scanning-window templates, but instead propose non-linear regressors for the joint locations. [sent-42, score-0.529]
19 As regressors, we rely on random forests that have shown to be fast, robust, and accurate in the context of predicting body parts or joint locations from depth data [29, 15]. [sent-43, score-0.987]
20 To this end, we train joint regressors that use the output of independent body part templates as input and thus predict the location of a joint in dependency of the cooccurrence of other body parts. [sent-46, score-1.918]
21 In this way, joint regressors are already able to resolve some typical problems of tree models, such as the discrimination of left and right limbs. [sent-47, score-0.717]
22 In our experiments, we show that the proposed body parts dependent joint regressors achieve a much higher joint localization accuracy than independent part templates or joint regressors. [sent-48, score-2.074]
23 Integrated into a pictorial structure framework, the approach achieves a better joint localization accuracy than a state-of-the-art method [40] at comparable running time of a few seconds per image. [sent-49, score-0.643]
24 In this section, we review only the most related work with a focus on pose estimation within a pictorial structure framework. [sent-53, score-0.555]
25 While many approaches relied at the beginning on simple geometric primitives for the body parts and simple color models or background subtraction for the likelihoods, many improvements have been made to the part templates. [sent-55, score-0.539]
26 For instance, linear SVMs for learning discriminative part templates were introduced in [26]. [sent-56, score-0.471]
27 In [18], a cascade of body parts detec- tors were proposed to obtain more discriminative templates. [sent-57, score-0.486]
28 Other approaches rely on several templates for a single body part [32, 40]. [sent-58, score-0.779]
29 Furthermore, human body models have been used to obtain better shapes of the body parts [42] or to synthesize training data [23]. [sent-59, score-0.881]
30 Another research direction has focused on introducing richer body models that overcome the limitation of tree structures. [sent-61, score-0.452]
31 For instance, a body part can be assigned with high confidence to two nodes of a tree in case of weak part templates or occlusions, e. [sent-62, score-1.001]
32 , the left and right body part are sometimes assigned to a single observation. [sent-64, score-0.417]
33 Besides of independent part templates for body parts, also hierarchies of part templates have been proposed [33, 38, 35]. [sent-70, score-1.308]
34 [33] also introduces attributes of body parts allowing the sharing of part templates of similar shape. [sent-71, score-0.901]
35 The hierarchy proposed in [38] even discards the semantic meaning of body parts and relies on the concept of poselets [4]. [sent-72, score-0.502]
36 Our work is focused on improving the body part templates or the likelihoods for the joint positions within a pictorial structure model. [sent-73, score-1.406]
37 In contrast to previous works, which run each body part template independently and use a tree structure or loopy models for modeling the dependencies among body parts, we propose to take the dependencies between body parts already into account for predicting the joint locations. [sent-74, score-1.802]
38 In this way, the joint or part templates are already able to discriminate left and right limbs and compensate already for some limitations of tree models. [sent-75, score-0.881]
39 Since the templates are implemented by efficient randomized regression forests that predict directly the joint locations, our approach is comparable in running time to a state-of-theart method [40], while providing a higher joint localization accuracy. [sent-76, score-1.096]
40 Random forests have been previously used for pose estimation from depth data [29, 15]. [sent-77, score-0.4]
41 Random forests have been also used to improve poselets for pose estimation from depth data [16] and for pedestrian detection [27]. [sent-79, score-0.444]
42 Pictorial Structure As a human body model, we use a classical pictorial structure framework [11]. [sent-83, score-0.802]
43 However, instead of using a limb representation for the body configuration, we use a joint representation J = {jk} where each joint jk = (xk) encreopdreess ethntea image lo =cat {iojn} o wf a joint. [sent-84, score-1.048]
44 Thh jeo rnoto jt of the tree is defined by the nose, the only non-joint point in the body configuration. [sent-85, score-0.452]
45 (2) Assuming independent part templates for the likelihood, the posterior can be written as p(J|I) ∝ ? [sent-92, score-0.504]
46 ,l) ∈E The unary potentials φk (jk) are in many cases only approximations of the likelihoods p(I|jk) and correspond to part templates. [sent-97, score-0.396]
47 fF tohre ein lsitkaenlciheo, oHdsO pG( fIe|ajtures [7] and linear SVMs are used as part templates in [40]. [sent-98, score-0.443]
48 While we use Gaussian binary potentials and perform inference as in [10], our work focuses only on extracting more discriminative unary potentials φk (jk). [sent-99, score-0.476]
49 In particular, we address the weakness of independent part templates and propose non-linear, parts dependent joint regressors instead. [sent-100, score-1.26]
50 Joint Regressors A joint representation as in (1) has the advantage that limb transformations like foreshortening do not need to be explicitly modeled in the pictorial structure model, which reduces complexity and running time. [sent-102, score-0.767]
51 The independence assumption of common part templates is relaxed by training the regressors on image features and confidence maps of other body parts, i. [sent-103, score-1.149]
52 In this work, we use the twerhmer ‘joint’ hfoer any lfan bdomdya prka point nli tkhei a wskorekle,to wne joint or the nose, whereas ‘body parts’ are defined as regions around the joints as illustrated Fig. [sent-106, score-0.436]
53 4, we discuss three variations, namely part templates using random forests, independentjoint regressors, and parts dependentjoint regressors. [sent-114, score-0.676]
54 Random Forests Random forests [5] or in general decision forests [6] have been used for many classification or regression tasks, for instance, labeling body parts in depth images [29], predicting the joint positions from depth data [15], or localizing facial feature points [8]. [sent-117, score-1.226]
55 For classifying body parts, the parameter space is the set of class labels or body parts. [sent-121, score-0.672]
56 Body Part Templates The body part templates are modeled as classical limb templates trained with a random forest. [sent-132, score-1.373]
57 We train a separate forest for each body part, where each forest is trained by body part patches sampled from a Gaussian distribution centered at the body part annotation and negative patches sampled uniformly from the background of the image. [sent-138, score-1.478]
58 Each patch P FPf is therefore augmented by a binary label c, which is k if it is sampled from body part lk. [sent-139, score-0.502]
59 We use the same number of body parts as joints, i. [sent-140, score-0.458]
60 c The unary potentials for the body parts lk are obtained by densely extracting image patches from the test image and passing them through the trained trees. [sent-151, score-0.851]
61 (11) After computing the unary potentials for an image, the unary potentials for each joint are normalized to be within the range [0, 1] . [sent-169, score-0.771]
62 To resolve this issue, we propose a third potential that predicts the joint locations as in (11), but also takes neighboring part potentials into account: φk(jk, L) = p(jk |I, L) (12) However, incorporating a multi-dimensional neighborhood structure is usually computationally demanding. [sent-189, score-0.612]
63 The first layer only calculates independent part potentials φk (lk) (9). [sent-191, score-0.395]
64 The second layer also predicts unary potentials but also incorporates the potentials of the first layer and their locations as additional feature maps. [sent-192, score-0.679]
65 e, leaf probabilities i ps( tche|L e, nLriTch) aendd s e pt( ovf|L fe,a LtuTr)e now depend on tahfe p probabilities (ocf| Lth,eL body parts |aLn,dL we obtain φk(jk,L)=? [sent-203, score-0.548]
66 Comparison of the joint localization accuracy of the proposed unary potentials and comparison with a state-of-the-art method [40]. [sent-210, score-0.544]
67 While the body part classification (9) and the independent joint regression (11) perform similarly, they are drastically outperformed by the proposed body parts dependent joint regressors (13). [sent-211, score-1.86]
68 Since the body parts dependent joint regressors do not encode any explicit information of the human skeleton, using a pictorial structure model (PS), which models the kinematic chain, gives an additional performance boost. [sent-212, score-1.566]
69 The body parts dependent joint regression together with a pictorial structure model outperforms [40]. [sent-213, score-1.231]
70 In our experiments, we compare our method to three related methods, namely linear and non-linear SVMs for part templates [18] and flexible mixtures-of-parts [40]. [sent-219, score-0.472]
71 Since clothing imposes a particular challenge for pose estimation in general, which is not well reflected in current datasets for pose estimation from still images, we collected a new dataset. [sent-222, score-0.379]
72 Each image contains a person where the full body is visible and is annotated by 12 joints and a point for the head, namely the nose. [sent-228, score-0.588]
73 The accuracy plots for individual joints using body parts dependent joint regressors with a pictorial structure model. [sent-232, score-1.693]
74 In our experiments, we measure the joint localization error as a fraction of the upper body size. [sent-246, score-0.626]
75 PCP declares a limb as correctly detected if the error of the predicted endpoints are within 50% of the limb length from the ground truth endpoints. [sent-251, score-0.34]
76 random forests for the body part templates, independent and parts dependent joint regression, we fixed some parameters intuitively. [sent-268, score-1.131]
77 We first evaluated the performance of the part templates FPf, (Section 4. [sent-274, score-0.443]
78 3), and the body parts dependent joint regressors (Section 4. [sent-276, score-1.092]
79 The proposed body parts dependent joint regressors clearly outperform the independent part templates and joint regressors. [sent-280, score-1.809]
80 Integrating them into a pictorial structure model (Section 3), which encodes the kinematic skeleton, improves the accuracy further. [sent-281, score-0.416]
81 We also evaluated the accuracy when the unary potentials for classification (9) and independent regression (11) are multiplied. [sent-284, score-0.417]
82 This shows that training the regressors depending on the body part templates (13) is essential for the performance gain. [sent-286, score-1.124]
83 [40] that uses a flexible mixture of templates modeled by linear SVMs. [sent-288, score-0.362]
84 A comparison of the approach [40] and the parts dependent joint regression is shown in Fig. [sent-290, score-0.517]
85 pictorial structure model with parts dependent joint regression outperforms [40]. [sent-295, score-0.895]
86 To this end, we added the neck and the top of the head as joints and converted our joint representation into a limb representation by using the joints as endpoints of the limbs. [sent-310, score-0.909]
87 The torso is obtained by the line between the average position of the two hip joints and the average position of the two shoulder joints. [sent-311, score-0.362]
88 The results of our method using body parts dependent joint regression with a pictorial structure are given in Table 2. [sent-312, score-1.231]
89 The comparison with a pictorial structure model that uses linear SVMs [ 18] or a cascade ofnon-linear SVMs [ 1 8] as part templates shows that our proposed unary potentials achieve a much higher accuracy. [sent-313, score-1.1]
90 The accuracy with respect to the normalized joint localization error for individual joints is plotted in Fig. [sent-314, score-0.513]
91 Our method outperforms a limb related methods using linear or non-linear SVMs for part templates within a pictorial structure framework. [sent-354, score-0.965]
92 Only [35] achieves a better performance, but this approach uses a more complex and more expensive model than pictorial structures with a tree structure. [sent-355, score-0.446]
93 Since the focus of this work is the improvement of the unary potentials in a pictorial structure framework, we used only a single tree model and have not performed clustering or used a more complex body model. [sent-361, score-1.109]
94 Conclusion In this paper, we have addressed robust human pose estimation from still images by proposing novel discrimi- native part template predictors within a pictorial structure framework. [sent-364, score-0.754]
95 Our joint location regressors consist of random forests that operate over two layers. [sent-365, score-0.742]
96 While the first layer acts as an independent body part classificator, the second one takes the predicted distributions of the first layer for estimating the joint locations into account, thus allowing to put the body parts into relation. [sent-366, score-1.369]
97 In the experimental part, we have shown that our model yields higher accurate human joint predictors than independent part templates and outperforms state-of-the-art methods that also use a tree structure for the human model. [sent-367, score-1.026]
98 Recovering human body [26] [27] [28] [29] [30] [3 1] [32] [33] [34] [35] configurations using pairwise constraints between parts. [sent-535, score-0.394]
99 Real-time human pose recognition in parts from single depth images. [sent-565, score-0.361]
100 Articulated part-based model for joint object detection and pose estimation. [sent-588, score-0.35]
wordName wordTfidf (topN-words)
[('templates', 0.362), ('body', 0.336), ('pictorial', 0.33), ('regressors', 0.316), ('joints', 0.223), ('joint', 0.213), ('forests', 0.179), ('potentials', 0.169), ('limb', 0.144), ('jk', 0.142), ('pose', 0.137), ('parts', 0.122), ('fpf', 0.121), ('tree', 0.116), ('unary', 0.11), ('hip', 0.107), ('dependent', 0.105), ('fashionpose', 0.096), ('layer', 0.084), ('part', 0.081), ('regression', 0.077), ('pcp', 0.074), ('forest', 0.072), ('lsp', 0.071), ('leeds', 0.068), ('pages', 0.062), ('independent', 0.061), ('patch', 0.06), ('ps', 0.059), ('human', 0.058), ('svms', 0.057), ('split', 0.055), ('lt', 0.054), ('localization', 0.052), ('head', 0.05), ('sports', 0.049), ('thresholds', 0.048), ('fashionista', 0.048), ('independentjoint', 0.048), ('tango', 0.048), ('structure', 0.048), ('goodness', 0.047), ('probabilities', 0.045), ('patches', 0.045), ('lk', 0.045), ('depth', 0.044), ('poselets', 0.044), ('articulated', 0.043), ('ambiguities', 0.042), ('limbs', 0.041), ('estimation', 0.04), ('knee', 0.04), ('nose', 0.039), ('kinematic', 0.038), ('resolve', 0.038), ('shotton', 0.037), ('likelihoods', 0.036), ('predicts', 0.036), ('loopy', 0.035), ('parsing', 0.035), ('already', 0.034), ('rescaled', 0.034), ('trees', 0.034), ('random', 0.034), ('predicting', 0.032), ('transformations', 0.032), ('shoulder', 0.032), ('ethz', 0.032), ('leaves', 0.032), ('fp', 0.031), ('skeleton', 0.031), ('relations', 0.031), ('template', 0.031), ('classical', 0.03), ('predictors', 0.029), ('neck', 0.029), ('clothes', 0.029), ('training', 0.029), ('kl', 0.029), ('dependencies', 0.029), ('namely', 0.029), ('hog', 0.029), ('sigal', 0.028), ('discriminative', 0.028), ('locations', 0.027), ('endpoints', 0.027), ('gall', 0.026), ('clothing', 0.025), ('hierarchies', 0.025), ('error', 0.025), ('confidence', 0.025), ('sampled', 0.025), ('acts', 0.025), ('trained', 0.024), ('tran', 0.024), ('annotate', 0.024), ('skin', 0.024), ('account', 0.024), ('tian', 0.024), ('deformable', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
2 0.37096164 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.35599008 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
Author: Ľ
Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.
4 0.28669289 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
5 0.28192478 40 cvpr-2013-An Approach to Pose-Based Action Recognition
Author: Chunyu Wang, Yizhou Wang, Alan L. Yuille
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the “best” one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations ofbodyparts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.
6 0.25814849 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
7 0.25123078 334 cvpr-2013-Pose from Flow and Flow from Pose
8 0.2354957 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
9 0.22831483 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
11 0.17249058 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.17022353 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
13 0.16702721 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
14 0.15559407 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
15 0.15356521 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
16 0.14956018 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
17 0.14280315 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
18 0.13881359 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
19 0.13759448 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
20 0.11929518 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
topicId topicWeight
[(0, 0.255), (1, -0.041), (2, 0.014), (3, -0.16), (4, 0.003), (5, 0.038), (6, 0.148), (7, 0.201), (8, 0.069), (9, -0.195), (10, -0.142), (11, 0.283), (12, -0.161), (13, 0.033), (14, -0.02), (15, 0.142), (16, -0.007), (17, -0.093), (18, -0.021), (19, -0.155), (20, -0.024), (21, 0.067), (22, -0.052), (23, -0.082), (24, -0.083), (25, 0.071), (26, -0.03), (27, 0.003), (28, 0.03), (29, -0.011), (30, 0.022), (31, -0.015), (32, -0.006), (33, -0.014), (34, 0.018), (35, -0.027), (36, -0.066), (37, 0.023), (38, 0.019), (39, 0.11), (40, 0.037), (41, -0.014), (42, 0.037), (43, 0.049), (44, 0.019), (45, 0.004), (46, -0.037), (47, 0.062), (48, -0.01), (49, -0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.97339052 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
2 0.92989063 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.87701511 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
4 0.874623 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
Author: Kota Hara, Rama Chellappa
Abstract: We present a hierarchical method for human pose estimation from a single still image. In our approach, a dependency graph representing relationships between reference points such as bodyjoints is constructed and thepositions of these reference points are sequentially estimated by a successive application of multidimensional output regressions along the dependency paths, starting from the root node. Each regressor takes image features computed from an image patch centered on the current node ’s position estimated by the previous regressor and is specialized for estimating its child nodes ’ positions. The use of the dependency graph allows us to decompose a complex pose estimation problem into a set of local pose estimation problems that are less complex. We design a dependency graph for two commonly used human pose estimation datasets, the Buffy Stickmen dataset and the ETHZ PASCAL Stickmen dataset, and demonstrate that our method achieves comparable accuracy to state-of-the-art results on both datasets with significantly lower computation time than existing methods. Furthermore, we propose an importance weighted boosted re- gression trees method for transductive learning settings and demonstrate the resulting improved performance for pose estimation tasks.
5 0.83552647 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
Author: Ľ
Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.
6 0.83128333 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
7 0.81525606 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
8 0.8137821 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
9 0.8067109 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
10 0.73518765 426 cvpr-2013-Tensor-Based Human Body Modeling
11 0.68983763 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
12 0.6675244 40 cvpr-2013-An Approach to Pose-Based Action Recognition
13 0.66274214 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
14 0.65992242 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
15 0.65283459 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
16 0.61349398 334 cvpr-2013-Pose from Flow and Flow from Pose
17 0.59266186 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
18 0.51240808 380 cvpr-2013-Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
19 0.48856807 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
20 0.48370573 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots
topicId topicWeight
[(10, 0.125), (16, 0.017), (26, 0.044), (28, 0.098), (33, 0.281), (63, 0.022), (65, 0.011), (67, 0.11), (69, 0.033), (80, 0.068), (87, 0.101), (93, 0.011)]
simIndex simValue paperId paperTitle
1 0.95522916 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
Author: Ishani Chakraborty, Hui Cheng, Omar Javed
Abstract: We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. 1 Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes; a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.
same-paper 2 0.94299328 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
3 0.93868583 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images
Author: Zinan Zeng, Shijie Xiao, Kui Jia, Tsung-Han Chan, Shenghua Gao, Dong Xu, Yi Ma
Abstract: We study in this paper the problem of learning classifiers from ambiguously labeled images. For instance, in the collection of new images, each image contains some samples of interest (e.g., human faces), and its associated caption has labels with the true ones included, while the samplelabel association is unknown. The task is to learn classifiers from these ambiguously labeled images and generalize to new images. An essential consideration here is how to make use of the information embedded in the relations between samples and labels, both within each image and across the image set. To this end, we propose a novel framework to address this problem. Our framework is motivated by the observation that samples from the same class repetitively appear in the collection of ambiguously labeled training images, while they are just ambiguously labeled in each image. If we can identify samples of the same class from each image and associate them across the image set, the matrix formed by the samples from the same class would be ideally low-rank. By leveraging such a low-rank assump- tion, we can simultaneously optimize a partial permutation matrix (PPM) for each image, which is formulated in order to exploit all information between samples and labels in a principled way. The obtained PPMs can be readily used to assign labels to samples in training images, and then a standard SVM classifier can be trained and used for unseen data. Experiments on benchmark datasets show the effectiveness of our proposed method.
4 0.93654251 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
Author: Brandon Rothrock, Seyoung Park, Song-Chun Zhu
Abstract: In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowingparts or collections ofparts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.
5 0.93328285 328 cvpr-2013-Pedestrian Detection with Unsupervised Multi-stage Feature Learning
Author: Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann Lecun
Abstract: Pedestrian detection is a problem of considerable practical interest. Adding to the list of successful applications of deep learning methods to vision, we report state-of-theart and competitive results on all major pedestrian datasets with a convolutional network model. The model uses a few new twists, such as multi-stage features, connections that skip layers to integrate global shape information with local distinctive motif information, and an unsupervised method based on convolutional sparse coding to pre-train the filters at each stage.
6 0.93177807 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.93021929 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery
9 0.93017399 335 cvpr-2013-Poselet Conditioned Pictorial Structures
10 0.9291268 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
11 0.92900401 183 cvpr-2013-GRASP Recurring Patterns from a Single View
12 0.92679459 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
13 0.9267633 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
14 0.92587042 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
15 0.92577034 192 cvpr-2013-Graph Matching with Anchor Nodes: A Learning Approach
16 0.92564023 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
17 0.92560005 428 cvpr-2013-The Episolar Constraint: Monocular Shape from Shadow Correspondence
18 0.92525691 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
19 0.9251132 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
20 0.92472672 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection