cvpr cvpr2013 cvpr2013-207 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ľ
Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Our goal is to detect humans and estimate their 2D pose in single images. [sent-11, score-0.179]
2 In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. [sent-12, score-0.312]
3 Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. [sent-14, score-0.204]
4 Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. [sent-15, score-0.601]
5 We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation. [sent-16, score-0.277]
6 These proceed by searching for the most probable location of body parts, estimating a per pixel cost for each part, and combining the costs using dynamic programming over the tree structure graph. [sent-20, score-0.421]
7 – Recent work has attempted to overcome these problems, for example by enforcing consistency of ensembles of parts [15, 23, 29] or eschewing the pictorialstructure formulation by directly learning poselets for human parts tightly clustered in both appearance and configuration spaces [3]. [sent-22, score-0.275]
8 In order to deal with (c), and also with self-occlusion, the work of [6] introduced a weak background model combined with a tight model of the human foreground. [sent-24, score-0.183]
9 The resulting method is one of the first to deal convincingly with the problem of self occlusion and clearly demonstrates the benefit of a background model. [sent-25, score-0.205]
10 Others have proposed modelling dependencies and relationships between multiple people [9], which addresses problem (a), and methods for efficiently sampling from pictorial structures [6, 11, 19]. [sent-26, score-0.265]
11 The outcome is the set of instances of persons in an image with the location of their joints, and the pixel-wise la- belling (segmentation) ofeach oftheir body parts. [sent-28, score-0.466]
12 Our work is a continuation of the theme of combining segmentation and human pose estimation [5, 12, 28]. [sent-29, score-0.25]
13 This figure shows two candidates that end up in the final solution, their HOG masks, body-part masks, instance color models, the result of the texture component and final instance and body-part segmentation. [sent-33, score-0.568]
14 Candidates consist of 16 joints (some of them could be invisible) and the pixel-wise labelling of 1 1body-part labels and background. [sent-34, score-0.551]
15 Our goal is three fold: to assign every pixel to a body part of an instance of a person or to the background; for each instance, to estimate the body layout in terms of joint positions and body parts and specify their visibility; and thirdly, to determine the number of instances. [sent-39, score-1.182]
16 The first goal is close to the traditional labelling problem of semantic segmentation, the second is close to the pose estimation which typically uses tree-structured pictorial structures. [sent-40, score-0.607]
17 In more detail, the method should predict a set of instances (subset of the set of candidates), consisting of their pose (joint positions and body parts). [sent-41, score-0.479]
18 Correspondingly, a pixel in the image x is labelled by the instance and the body part that overlaps it. [sent-42, score-0.478]
19 ) models the contribution of a pixel to an instance of an articulated model. [sent-48, score-0.214]
20 For a non-overlapping set of models this component corresponds to the negative sum of responses of a tree-structured mixture-of-parts pictorial structure model of [3 1]. [sent-49, score-0.275]
21 ) links the model with its body-part labelling: given a set of joint positions from an instance of the model M, the body part mask is a (soft) assignment of pixels to mattes of the body parts, e. [sent-51, score-0.864]
22 ) is a Gaussian mixture color model for the foreground and background built by thresh- olding the body part masks for the model. [sent-55, score-0.499]
23 ), is a semantic segmentation of the image into body parts and background. [sent-58, score-0.41]
24 It is computed independently of the instances and contributes information on the appearance and shape of body parts. [sent-59, score-0.352]
25 optimization labels the pixels (with the instances and parts) and takes account of the costs from the four terms. [sent-60, score-0.265]
26 The outputs of the components are illustrated in figure 1, where it can be seen how the texture component can contribute additional information over that provided by the instance segmentation from the color model. [sent-61, score-0.355]
27 The optimization proceeds by first proposing a number, N, of candidates for the instances using the mixture-ofparts model of [3 1]. [sent-62, score-0.336]
28 Some of these candidates will survive and appear in the final solution, and the ones that do will have led to the minimal energy when all the components and interactions between instances have been taken into account (during inference). [sent-63, score-0.409]
29 In the following section we describe the method for generating the candidate instances, and then give the details of the component computation and inference method in section 4. [sent-64, score-0.213]
30 Efficiently Generating Pose Candidates We would like to find a set of candidates – local optima, such that they cover all the persons in the image and all their possible poses. [sent-66, score-0.255]
31 5 people present, to capture all possible poses a large number, N, of candidates is required (N = 200 in our experiments) and the running time of this method is too large. [sent-71, score-0.246]
32 In contrast we propose a method which takes only slightly more time to find a large number of candidates than to find just the best one per root node. [sent-72, score-0.284]
33 To increase the chance of capturing all instances we restrict ourselves to the search for the best solutions with at most K candidates (K = 8 here) with the same root node. [sent-75, score-0.456]
34 Starting from the leaf nodes going towards the root node for each location and type of the part the best locations and types of its children are estimated. [sent-77, score-0.331]
35 To find the best K candidates differing by at least one type of a child, we need to estimate for each location and type of the part its top K constellation (types and locations) of all its children. [sent-78, score-0.465]
36 In the first step we find the best location of a child for each type of a child, and take the top K solutions for this location and type. [sent-79, score-0.334]
37 This step is only approximate; these K solutions are only a good approximation of the real top K solutions, which can be obtained by merging all lists for each location given the type of a child. [sent-80, score-0.321]
38 In the second step the parent has to merge its response for each type with the top K solutions of each child. [sent-84, score-0.182]
39 The final set of candidates is obtained by merging whole trees of solutions of suppressed root nodes. [sent-88, score-0.358]
40 × In practice for T = 5 and K = 8 (as used in the experiments) the running time of brute force search for the best location of a child of each type is much more expensive (especially in case of the sub-cell accuracy) than the sorting and merging steps together and the algorithm takes only 1. [sent-89, score-0.283]
41 To obtain also candidates with hidden parts, the set of types is altered with an additional hidden type, corresponding to the invisible joint whose children are also hidden. [sent-93, score-0.4]
42 Using this hidden type allows for candidates which have certain joints either outside of an image, occluded by another object or person, or self-occluded. [sent-95, score-0.642]
43 Implementation Details In this section we describe how each component of the model energy (1) is computed, and then the inference method for the model. [sent-97, score-0.219]
44 Each candidate model m is defined by the locations and types of parts m = (P, t) and has its own associated HOG weight vector wm and bias bm. [sent-107, score-0.256]
45 We now define the pixel-wise cost in terms of (3) as EHOG(xinst) = ∑ψHOG(xjinst)+ ∑j∈V ∑ (−|cm|bm)δ(m) m∑∈M (4) where V are the pixels, and δ(m) indicates the presence owfh a emVo de arl m ien tixhee labelling m(i. [sent-108, score-0.261]
46 f xthe HO=G mcel )l ainn pixels )a n=d ψHOG(xijnst) ={0−wcmm(j)· h(cm(j) oitfh xeirjnwsitse=, m ∈ M (5) where cm (j) is the corresponding cell of a model m for a pixel j. [sent-116, score-0.245]
47 the bias bm (note, the bias is negative in general as it has to prevent false negative human detections across the image). [sent-122, score-0.254]
48 The evidence should not be taken into account from occluded parts of the object, and thus a significantly occluded object would be unable to provide sufficient evidence, larger than the threshold. [sent-124, score-0.3]
49 The Body part mask component Given the location of the joints, it is possible to predict the body part segmentation to a good approximation. [sent-128, score-0.738]
50 We achieve this by learning a classifier to predict whether each pixel belongs to each body part given the location of all joints. [sent-129, score-0.442]
51 oTdhye- pinaturtit laivbee way t o{ incorporate this potential in the random field framework would be to add it as a simple unary potential, assigning a cost HBm(j) −Hmxpjart(j) if the pixel j takes a model label m and the body-part label xpjart. [sent-132, score-0.184]
52 Suppose we use only the HOG and body part mask components. [sent-134, score-0.453]
53 In the other words, if the labelling agrees with the body-part mask prediction, the energy for each candidate should be 0. [sent-136, score-0.618]
54 Thus, we need to balance the bias of all foreground pixels and the unbiased potential takes the form: EMASK(M,x) =j∑∈V(HBm(j)−Hxmpjart(j))+m∑∈MC(m)δ(m), (6) where C(m) is defined as: C(m) =∑p∈{Bm}a∪xLpart(Hpm(j) − HBm(j)). [sent-137, score-0.201]
55 (7) If the final labelling agrees with the most probable body part mask, then it sums up to zero; if some pixels do not agree, they are penalized based on the difference of bodypart likelihoods for the estimated and present label. [sent-138, score-0.672]
56 All distances are relative to the size of the object determined by the longest limb (all limbs are about the same size). [sent-141, score-0.183]
57 Because the joints (and limbs) may be occluded, we double the number of the decision stumps d(i) θ and d(i) < θ used as the weak fdeeactiusrieons, swtuhmerpes b do(tih) ≥con θdi atniodns d are by d uesfiendit aiosn t hneo tw seaatkisfied for an occluded part or limb. [sent-142, score-0.435]
58 is there a shoulder further than θ from this point, and is there a shoulder closer than θ from this point, so that the algorithm can distinguish between the cases, when the shoulder is visible and when it is not. [sent-145, score-0.198]
59 The Color component Color component ensures the solutions, where the color models of the foreground and the background are different, are preferred. [sent-149, score-0.379]
60 It is self-trained for each instance using Gaus- sian mixture model [21] initialised using the mask estimated as in section 4. [sent-150, score-0.255]
61 The Texture component The Texture component consists of potentials used for the semantic segmentation problem, the multi-feature TextonBoost [17, 25], the body-part super-pixel terms as in [17] and the pair-wise term is the usual contrast dependent Potts model. [sent-156, score-0.435]
62 Even though the performance of this component is not very high on its own, it can reliable distinguish between torso and arms and resolve several ambiguities of the mixture-of-parts model. [sent-157, score-0.231]
63 It consists of potentials used for the semantic segmentation problem: the multi-feature TextonBoost [17, 25], the body-part super-pixel terms as in [17] and the pair-wise term is the usual contrast dependent Potts model. [sent-158, score-0.237]
64 Inference We wish to minimize the energy (1) in order to determine: the set of instances M with their layout (joints, parts), as ew seeltl as a pixel-wise labelling iorf tlhayeo image according to whether the pixels belong to a part (e. [sent-161, score-0.665]
65 The optimization cannot be carried out directly, and we proceed in two stages: first, finding the number and joint position of the instances; and second, with this restricted set of label possibilities, determining the best pixel labelling (i. [sent-164, score-0.436]
66 In the first stage we have N human pose candidates (obtained as described in section 3). [sent-167, score-0.383]
67 For each candidate we compute the potentials EHOG, EMASK and ECOL. [sent-168, score-0.196]
68 The potential ETEX is independent of the candidates and is evaluated once for the entire image. [sent-169, score-0.198]
69 We start by labelling everything as background and iteratively adding the next best candidate. [sent-170, score-0.349]
70 The quality of each candidate is determined by calculating the energy after α-expansion [4] over the 11 body-part labels and background (i. [sent-171, score-0.228]
71 In the second stage the optimization is over all the selected candidates to refine the solution. [sent-175, score-0.198]
72 We use the standard Buffy data set of [12] consisting of 748 images from episodes s5e2 –s5e6, with episode s5e3 used for training, episode s5e4 for validation and episodes s5e2, s5e5 and s5e6 for testing. [sent-184, score-0.214]
73 Largely occluded persons are marked as hard and ignored for the evaluation. [sent-188, score-0.187]
74 Each individual component of the model (the HOG, texture, color and body part mask potentials) is trained separately using this annotation. [sent-191, score-0.59]
75 The HOG component is trained using the approach of [3 1], the texture component using the learning methods described in [17], the color model using a mixture of 10 Gaussians as in [21], and body part mask using the method described in section 4. [sent-192, score-0.762]
76 The top 200 candidates are used in the experiments, 8 at most with the same scale and the same root node. [sent-194, score-0.246]
77 The Image Parse [31] data sets consist of 305 images; each containing only one person and a labelling of 14 joints. [sent-198, score-0.353]
78 The data set does not contain pixel-wise labellings, so the texture and foreground mask potentials trained on the Buffy data set are also used for this data set. [sent-200, score-0.432]
79 Since most pixels are background, the average of the pixel-wise recall over classes is more affected by mislabelling a body part pixel as background than the other – 333555888200 A) B) C) D) E) Figure 2. [sent-206, score-0.474]
80 In cases where the instances are highly occluded (such as C) or difficult to distinguish, the joints are not labelled, and the body-part pixels are labelled as hard (black) and ignored for training and evaluation. [sent-209, score-0.662]
81 Some of joints are labelled as half-visible (sometimes because they are too close to the boundary) and ignored for evaluation too. [sent-210, score-0.397]
82 For the pictorial structure measures, the comparison to the state-of-the-art methods for the Buffy data set in the loose-PCP measure is given in table 1, and for the Image Parse data set for the strict- and loose-PCP measures in set (implementations of [20] and [3 1] respectively). [sent-231, score-0.215]
83 The incorporation of all components leads to a significant improvement on the Buffy data set, however, the method did not improve on the Image Parse data set, probably because the texture and mask potentials were trained on a different data set with different distribution of poses. [sent-236, score-0.428]
84 Texture potentials are good at distinguishing between limbs and torso, and thus help to resolve ambiguities in estimation of joints and their visibility. [sent-241, score-0.514]
85 Conclusion In this paper we have shown that, given appropriate training, it is possible to achieve Kinect style body part labelling and layout in color images (despite not having depth information). [sent-244, score-0.66]
86 Furthermore, we have for the first time covered the case of multiple, possibly interacting, human instances in quite varied and unconstrained poses. [sent-245, score-0.196]
87 The formulation of a joint model covering foreground and background has effectively dealt with all of the problems we listed in the introduction for pictorial-structures, e. [sent-246, score-0.255]
88 The incorporation of pose into the random field framework leads to a significant improvement of the performance. [sent-257, score-0.178]
89 The weights are optimised for the intersection over union measure, which is more suitable for this data set because of a significant imbalance of the body part classes (dominated by background). [sent-258, score-0.363]
90 Surprisingly, the incorporation of texture potentials improved the intersection over union measure also for the lower legs, even though there is insufficient training data to learn them well. [sent-259, score-0.338]
91 The improvement is mainly because the texture potentials give a much better definition of the instance boundary. [sent-260, score-0.282]
92 Pictorial structures revisited: People detection and articulated pose estimation. [sent-265, score-0.199]
93 Poselets: Body part detectors trained using 3d human pose annotations. [sent-275, score-0.249]
94 Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. [sent-289, score-0.192]
95 Upper body detection and tracking in extended signing sequences. [sent-297, score-0.214]
96 Clustered pose and nonlinear appearance models for human pose estimation. [sent-346, score-0.312]
97 Real-time human pose recognition in parts from single depth images. [sent-418, score-0.273]
98 Multi-level inference by relaxed dual decomposition for human pose segmentation. [sent-442, score-0.232]
99 The first three columns are the pictorial structure pose, instance segmentation and body-part segmentation obtained using our method; the last three columns are the corresponding ground truth. [sent-465, score-0.386]
100 Furthermore, there are examples of partial occlusion by another person (A right, C left), a background object (C left) and self-occlusion (B right, C right). [sent-468, score-0.218]
wordName wordTfidf (topN-words)
[('joints', 0.29), ('labelling', 0.261), ('buffy', 0.22), ('body', 0.214), ('candidates', 0.198), ('pictorial', 0.176), ('mask', 0.175), ('xijnst', 0.166), ('instances', 0.138), ('potentials', 0.129), ('pose', 0.127), ('hog', 0.118), ('ehog', 0.118), ('pjm', 0.109), ('col', 0.103), ('emask', 0.1), ('hbm', 0.1), ('xinst', 0.1), ('component', 0.099), ('limbs', 0.095), ('person', 0.092), ('parts', 0.088), ('background', 0.088), ('cm', 0.088), ('limb', 0.088), ('layout', 0.083), ('occluded', 0.081), ('instance', 0.08), ('lists', 0.079), ('child', 0.075), ('parse', 0.073), ('energy', 0.073), ('type', 0.073), ('texture', 0.073), ('articulated', 0.072), ('solutions', 0.072), ('bm', 0.072), ('joint', 0.071), ('failing', 0.071), ('arms', 0.069), ('candidate', 0.067), ('ecol', 0.067), ('lpart', 0.067), ('posecut', 0.067), ('xeirjnwsitse', 0.067), ('xjinst', 0.067), ('shoulder', 0.066), ('segmentation', 0.065), ('part', 0.064), ('torso', 0.063), ('bias', 0.062), ('pixel', 0.062), ('human', 0.058), ('labelled', 0.058), ('location', 0.057), ('persons', 0.057), ('foreground', 0.055), ('episode', 0.055), ('textonboost', 0.054), ('uk', 0.052), ('episodes', 0.052), ('incorporation', 0.051), ('children', 0.05), ('evidence', 0.05), ('oitfh', 0.049), ('ignored', 0.049), ('union', 0.049), ('root', 0.048), ('people', 0.048), ('optimising', 0.047), ('tex', 0.047), ('inference', 0.047), ('pixels', 0.046), ('classifier', 0.045), ('probable', 0.045), ('visibility', 0.044), ('costs', 0.043), ('semantic', 0.043), ('labellings', 0.043), ('label', 0.042), ('agrees', 0.042), ('self', 0.042), ('invisible', 0.042), ('modelling', 0.041), ('formulation', 0.041), ('eichner', 0.041), ('masks', 0.04), ('merging', 0.04), ('types', 0.039), ('wc', 0.039), ('measures', 0.039), ('color', 0.038), ('takes', 0.038), ('occlusion', 0.038), ('response', 0.037), ('torr', 0.037), ('parsing', 0.037), ('deal', 0.037), ('intersection', 0.036), ('optimisation', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
Author: Ľ
Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.
2 0.35599008 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
3 0.24525426 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
Author: Kota Hara, Rama Chellappa
Abstract: We present a hierarchical method for human pose estimation from a single still image. In our approach, a dependency graph representing relationships between reference points such as bodyjoints is constructed and thepositions of these reference points are sequentially estimated by a successive application of multidimensional output regressions along the dependency paths, starting from the root node. Each regressor takes image features computed from an image patch centered on the current node ’s position estimated by the previous regressor and is specialized for estimating its child nodes ’ positions. The use of the dependency graph allows us to decompose a complex pose estimation problem into a set of local pose estimation problems that are less complex. We design a dependency graph for two commonly used human pose estimation datasets, the Buffy Stickmen dataset and the ETHZ PASCAL Stickmen dataset, and demonstrate that our method achieves comparable accuracy to state-of-the-art results on both datasets with significantly lower computation time than existing methods. Furthermore, we propose an importance weighted boosted re- gression trees method for transductive learning settings and demonstrate the resulting improved performance for pose estimation tasks.
4 0.24484105 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
5 0.23434255 334 cvpr-2013-Pose from Flow and Flow from Pose
Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi
Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.
6 0.22283721 40 cvpr-2013-An Approach to Pose-Based Action Recognition
7 0.20049016 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
8 0.19995502 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
9 0.179855 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
10 0.15881588 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
11 0.15854962 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.15764707 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
13 0.15740891 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
14 0.15293488 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
15 0.14109114 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
16 0.13861229 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
17 0.13573098 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
18 0.13311875 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
19 0.13239501 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
20 0.13022837 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
topicId topicWeight
[(0, 0.286), (1, -0.032), (2, 0.056), (3, -0.143), (4, 0.077), (5, 0.025), (6, 0.154), (7, 0.189), (8, 0.026), (9, -0.131), (10, -0.064), (11, 0.207), (12, -0.104), (13, 0.022), (14, -0.009), (15, 0.102), (16, 0.045), (17, -0.054), (18, -0.063), (19, -0.13), (20, -0.033), (21, 0.084), (22, -0.063), (23, -0.023), (24, -0.053), (25, -0.057), (26, 0.032), (27, 0.036), (28, -0.048), (29, -0.056), (30, 0.08), (31, -0.002), (32, -0.021), (33, -0.019), (34, -0.042), (35, -0.06), (36, -0.043), (37, 0.055), (38, -0.03), (39, 0.042), (40, -0.051), (41, 0.002), (42, 0.074), (43, 0.014), (44, 0.013), (45, -0.038), (46, -0.013), (47, 0.038), (48, -0.005), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.96415579 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
Author: Ľ
Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.
2 0.91815907 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.8773551 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
4 0.82268161 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
5 0.820499 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
Author: Ben Sapp, Ben Taskar
Abstract: We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-theart approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets. The new dataset and code are avail- able online. 1
6 0.80835956 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
7 0.79728991 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
8 0.79552674 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
9 0.78522795 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
10 0.7325213 426 cvpr-2013-Tensor-Based Human Body Modeling
11 0.71738356 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.68961596 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
13 0.6550625 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
14 0.63100743 334 cvpr-2013-Pose from Flow and Flow from Pose
15 0.61896235 40 cvpr-2013-An Approach to Pose-Based Action Recognition
17 0.58604616 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models
18 0.58015627 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
19 0.57585561 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
20 0.56690806 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
topicId topicWeight
[(10, 0.111), (16, 0.012), (26, 0.056), (28, 0.018), (33, 0.288), (67, 0.141), (69, 0.05), (80, 0.029), (87, 0.074), (95, 0.049), (97, 0.067)]
simIndex simValue paperId paperTitle
1 0.9599365 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
2 0.95840377 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
Author: Jianguo Li, Yimin Zhang
Abstract: This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the wellknown Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the stateof-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.
3 0.95780545 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
4 0.95769477 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
5 0.95399535 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
6 0.95325452 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
7 0.95097744 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
8 0.95030642 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
9 0.94814032 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
10 0.94760072 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
11 0.94731587 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
12 0.94707298 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
13 0.94668931 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
14 0.9463619 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
same-paper 15 0.94618338 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
16 0.94440335 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
17 0.94189596 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
18 0.94187373 438 cvpr-2013-Towards Pose Robust Face Recognition
19 0.94093323 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
20 0.94085681 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation