cvpr cvpr2013 cvpr2013-335 knowledge-graph by maker-knowledge-mining

335 cvpr-2013-Poselet Conditioned Pictorial Structures


Source: pdf

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Poselet Conditioned Pictorial Structures Pishchulin1 Mykhaylo Andriluka1 1Max Planck Institute for Informatics, Leonid Saarbr u¨cken, Germany Abstract In this paper we consider the challenging problem of articulated human pose estimation in still images. [sent-1, score-0.377]

2 We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. [sent-2, score-0.579]

3 We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. [sent-5, score-0.643]

4 In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. [sent-6, score-0.491]

5 Introduction In this paper we consider the challenging task of articulated human pose estimation in monocular images. [sent-9, score-0.377]

6 Stateof-the-art approaches in this area [2, 15, 26] are based on the pictorial structures model (PS) and are composed of unary terms modelling body part appearance and pairwise terms between adjacent body parts and/or joints capturing their preferred spatial arrangement. [sent-10, score-1.352]

7 While this approach leads to tree-based models and thus efficient and exact inference, it fails to capture important dependencies between non-adjacent body parts. [sent-11, score-0.486]

8 That modelling such dependencies is important for effective pose estimation can be seen e. [sent-12, score-0.467]

9 1: activities of people like playing soccer, tennis or volleyball results in strong dependencies between many if not all body parts; this can not be modelled with the above approach. [sent-15, score-0.6]

10 The first simply uses a mixture of tree models thus learning separate pairwise terms for different global body configurations e. [sent-17, score-0.489]

11 The second approach is to add more pairwise terms including non-adjacent body parts leading to a loopy part graph that requires approximate inference [2, 23, 21, 25]. [sent-20, score-0.633]

12 A key challenge in designing models for pose estimation is thus to encode the higher-order part dependencies while still allowing efficient inference. [sent-21, score-0.458]

13 In this paper we propose a novel model that incorporates higher order information between body parts by defining a conditional model in which all parts are a-priori connected, but which becomes a tractable PS model once the mid-level features are observed. [sent-22, score-0.591]

14 This allows to effectively model dependencies between non-adjacent parts while still allowing for exact and efficient inference in a tree-based model. [sent-23, score-0.38]

15 In order to satisfy these requirements we rely on the non-parametric poselet representation introduced in [4]. [sent-27, score-0.388]

16 Note that for the task of people detection the best performing approaches are those which rely on a representation that jointly models appearance of multiple body parts [4, 10]. [sent-28, score-0.454]

17 Yet these models have not been shown to lead to state-of-the-art performance in human pose estimation, likely because they rely on a pose representation that is not fine-grained enough to enable localisation of all body joints. [sent-29, score-0.754]

18 Most recent methods for human pose estimation are based on the pictorial structures (PS) model [12, 11] that represents the body configuration as a collection of rigid parts and a set of pairwise part connections. [sent-31, score-1.06]

19 (a) shows the top scoring poselet detections with the corresponding poselet cluster medoids (b). [sent-36, score-0.795]

20 All poselet detections contribute to a prediction of the deformable pairwise terms, the outcome of which is shown in (c). [sent-38, score-0.611]

21 tions [21, 25] none of these models consider interactions between body parts that go beyond simple pairwise relationships. [sent-41, score-0.489]

22 Our model is related to recent work aiming to increase the flexibility of the PS approach by jointly training a mixture of tree-structured PS an exponentially large collection of PS models with a selection function that chooses a suitable model based on the observed poselet features. [sent-45, score-0.443]

23 Similar to these models, our approach allows efficient inference at test time, yet we are also able to incorporate dependencies between parts that go beyond pairwise interactions. [sent-46, score-0.469]

24 In particular, our model can be seen as Our approach is related to holistic pose estimation ap- 555888779 proaches [1, 24, 18, 13] that aim to directly predict positions of body joints from image features without relying on an intermediate part-based representation. [sent-49, score-0.595]

25 However, in our work we perform classification on the level of each body joint which allows the set of pose classes to be exponentially large. [sent-53, score-0.427]

26 Similar to their work, we define a PS model where unary and pairwise terms are image conditioned. [sent-55, score-0.409]

27 We phrase the PS model as a conditional random field (CRF), modelling the conditional probability of a body pose configuration given image evidence. [sent-64, score-0.504]

28 For convenience we distinguish between parameters for unary βu and pairwise βp factors. [sent-72, score-0.354]

29 Unary potentials We use the following unary potential functions Eu(lm; D, βu) = log φu(lm; D), ∀m = 1, . [sent-82, score-0.341]

30 We learn unary and pairwise terms in a piecewise strategy, unary potentials using AdaBoost and the pairwise terms using a Maximum-Likelihood estimate. [sent-107, score-0.846]

31 Poselet Conditioned Pictorial Structures Our approach is based on the following idea: we use a mid-level representation that captures possible anatomical configurations of a human pose to predict an image-specific 555898880 pictorial structures (PS) model that in turn is applied to the image. [sent-109, score-0.616]

32 Poselets go beyond standard pairwise part-part configurations and capture the configuration of multiple body parts jointly. [sent-111, score-0.57]

33 On the input images we compute poselet responses that capture different portions of the person’s body configuration. [sent-115, score-0.637]

34 This information is then used to augment both unary and pairwise terms of the PS model. [sent-119, score-0.381]

35 For comparison we show the deformation model of [3] (a generic pose prior being the same for all images) along with the corresponding pose estimate in the last two columns. [sent-124, score-0.394]

36 The idea of having multiple deformation models is similar to the idea of encoding body pose configurations through different mixture components as in [26]. [sent-125, score-0.549]

37 Poselet Representation The goal of the mid-level representation is to capture common dependencies of multiple body parts. [sent-136, score-0.46]

38 We implemented the following strategy to train a set of poselet detectors and compute a feature based on their responses. [sent-137, score-0.359]

39 For a reference body part, we cluster the relative positions of a subset of related body parts. [sent-138, score-0.539]

40 For example, when picking the ‘neck’ part we cluster relative offsets of all upper body parts using Euclidean distance and K-means. [sent-139, score-0.513]

41 Together with every poselet p we store its mean offset from the torso annotation μp. [sent-142, score-0.64]

42 A separate detector is trained for every poselet cluster using all training images that fall within this cluster. [sent-145, score-0.41]

43 Given a torso prediction and the relative offset μp of the poselet p, we compute the maximum poselet response in a small region1 around + μp. [sent-148, score-1.083]

44 This corresponds to a max-pooling step in a local region for every poselet p. [sent-149, score-0.359]

45 Similar to [25], we define 11body part configurations, namely full body, upper body with arms, torso and head, right arm and torso, left arm and torso, right arm alone, left arm alone, torso with legs, legs, right leg alone, and left leg alone. [sent-154, score-1.63]

46 For each of these configurations we cluster the data as described above and learn poselet detectors. [sent-155, score-0.457]

47 Poselet Dependent Unary Terms We first use the poselet features to obtain a location and rotation prediction for each body part separately. [sent-162, score-0.773]

48 During training, for part m, we cluster the relative distance between the torso and the part into k = 1, . [sent-164, score-0.382]

49 For each cluster k we compute its mean offset from the torso μk and the variance of the differences Σk. [sent-168, score-0.332]

50 This now forms a classification problem, from the poselet response f into the set of K clusters. [sent-169, score-0.359]

51 We chose a sparse method × since we expect a different set of poselets to be predictive for different body parts. [sent-171, score-0.396]

52 During test time we apply the learned classifier to predict from f the mean μk, and variance Σk that are subsequently used as a Gaussian unary potential for the part. [sent-172, score-0.319]

53 We proceed analogously for rotation, that is we learn a classifier that predicts the absolute rotation of the body part based on poselet responses. [sent-173, score-0.689]

54 Both unary parts together form a Gaussian potential Eu,poselet, and the complete set of unary terms of our model then reads Eu(lm; D) = Eu,boost(lm; D) + wpEu,poselet(lm; D), (7) where Eu,boost is the original term given by Eq. [sent-174, score-0.654]

55 For each pair of parts ln, lm we cluster their rela1The size of the region is set to 20 555898991 20 pixels in our experiments. [sent-179, score-0.35]

56 (8) We wrote β(f) to make explicit its dependency on the poselet responses and that this parameter is being predicted. [sent-187, score-0.359]

57 Results In this section we evaluate the proposed poseletconditioned PS model on three well-known pose estimation benchmarks. [sent-189, score-0.312]

58 7 and the number K of unary and pairwise clusters via grid search. [sent-197, score-0.379]

59 It can be seen that adding poselet dependent terms improves the performance w. [sent-211, score-0.414]

60 Correct predictions of unary rotation components improve the localisation of lower arms and legs most. [sent-216, score-0.482]

61 This is explained by the fact that the rotation of these body parts is far less constrained compared to the rest of the limbs. [sent-217, score-0.392]

62 Constraining part rotations to small ranges around the correct rotations reduces the uncertainty and steers the pose estimation towards the correct body pose. [sent-218, score-0.642]

63 Similar effects can be seen when constraining positions of the unary potentials and learning the pairwise parameters from correct components, as this further constrains the predicted pose. [sent-219, score-0.468]

64 The results show that using the parameters from correctly predicted components dramatically improves the localisation of all body parts in each particular setting. [sent-220, score-0.453]

65 Note that even the model with oracle component prediction does not achieve values close to 100% because of test examples with extremely foreshortened or occluded body parts. [sent-222, score-0.396]

66 As each potential includes a classifier that maps poselet features to one of the components, we also evaluate the performance of these classifiers. [sent-226, score-0.405]

67 It can be seen that using PS + torso prediction improves the results compared to PS alone (56. [sent-229, score-0.366]

68 Interestingly, when predicting the unary position parameters even despite the somewhat low component prediction accuracy of 43. [sent-233, score-0.327]

69 We also analyse how prediction of pairwise parameters affects pose estimation. [sent-241, score-0.41]

70 The prediction scores of pairwise components are generally lower than the absolute unary ones. [sent-242, score-0.485]

71 A possible explanation is that the classification problem becomes harder because several rather different poselets might still correspond to the same relative angle between the two body parts. [sent-243, score-0.396]

72 However, the final pose estima555999002 Setting Torso Upper leg Lower leg Upper arm Forearm Head Total 80. [sent-244, score-0.528]

73 , [3] + predict unary rotation (ur) + predict unary position (up) + predict pairwise (p/wise) + ur + up + p/wise Table 1. [sent-280, score-0.859]

74 The improvement is also pronounced for the lower legs which profit a lot from the improved upper legs localisation and for the upper arms (both +2. [sent-303, score-0.471]

75 This result is very interesting since the method of [26] is a mixture of parts model that is quite different from ours, as it uses multiple unary templates for every part and image-independent pairwise potentials that do not allow to model long range part dependencies. [sent-305, score-0.732]

76 In contrast, our model uses generic templates for each part, but incorporates a wide range of part unary terms by conditioning on poselet-representation. [sent-306, score-0.423]

77 , [3] + torso prediction + predict unary position (up) + predict unary rotation (ur) + ur + up + predict pairwise (p/wise) + up + ur + p/wise 43. [sent-312, score-1.256]

78 Accuracy of predicting a correct component for each unary and pairwise potential and corresponding pose estimation results (PCP) on the “Leeds Sport Poses” (LSP) dataset. [sent-323, score-0.698]

79 Method Torso Upper Lower Upper Fore Head Total leg leg arm arm ours 87. [sent-324, score-0.486]

80 Our method is able to exploit long-range dependencies between parts accross a variety of activities such as tennis serve (columns 1 and 2), climbing (column 3) and running (column 4). [sent-358, score-0.352]

81 The failure cases often correspond to images of people in poses that are underrepresented in the training set, and for which the prediction of unary and pairwise components is not accurate enough. [sent-361, score-0.619]

82 Modelling long-range part dependencies by our method results in better performance on highly articulated people. [sent-369, score-0.308]

83 Note that the value ofwp increased with respect to the LSP dataset, which results in a stronger influence of the poselet features on the final solution. [sent-375, score-0.359]

84 Our approach favourably compares to [26], outperforming it on all body parts apart from the lower arms. [sent-383, score-0.346]

85 Our method is slightly better than the multi-layer Method Torso Upper Lower Upper Fore Head Total leg leg arm arm ours ours + [16] 92. [sent-385, score-0.486]

86 Their approach aims to capture non-tree dependencies between the parts by decomposing the model into multiple layers and performing dual decomposition to cope with cycles in the part graph. [sent-440, score-0.414]

87 In contrast to their method, which incorporates multiple layers directly into the inference procedure making it infeasible without relaxations, our method implicitly models long-range dependencies between the parts and allows exact and efficient inference. [sent-441, score-0.382]

88 Our approach performs slightly worse compared to our approach [16], where we extended the tree-structured pictorial structures model with additional repulsive factors be- tween non-adjacent parts and a stronger torso detector. [sent-442, score-0.628]

89 We cluster the data into 20 clusters, again preserving only those containing at least 10 examples and learn poselet detectors on both UIUC+LSP data. [sent-454, score-0.41]

90 This finding is consistent for all three datasets, we always improved when using poselet conditioned features. [sent-458, score-0.407]

91 This method is based on hierarchical poselets which intend to capture the non-tree dependencies between the parts via multiple layers. [sent-460, score-0.47]

92 555999224 Method Torso Upper Lower Upper Fore Head Total arm arm arm arm ours 91. [sent-462, score-0.564]

93 However, tree-structured models fail to capture important dependencies between nonconnected body parts leading to estimation failures. [sent-489, score-0.615]

94 This work proposes to capture such dependencies using poselets that serve as a mid-level representation that jointly encodes articulation of several body parts. [sent-490, score-0.639]

95 We show how an existing PS model for human pose estimation can be improved using a poselet representation. [sent-491, score-0.678]

96 Experimental results show that a better prediction of human body layout using poselets improves body part estimation. [sent-493, score-0.819]

97 One of the important limitations our current mid-level representation is its dependence of the torso detector, which could be a bottleneck in cases when the torso is obstructed by other body parts or scene objects. [sent-498, score-0.848]

98 In the future we also plan to attend the most problematic cases of the current approach that are (self-)occlusion of body parts and fore-shortening. [sent-500, score-0.346]

99 Pictorial structures revisited: People detection and articulated pose estimation. [sent-517, score-0.322]

100 Clustered pose and nonlinear appearance models for human pose estimation. [sent-587, score-0.421]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('poselet', 0.359), ('ps', 0.343), ('torso', 0.251), ('pcp', 0.249), ('body', 0.244), ('lsp', 0.23), ('unary', 0.211), ('lm', 0.197), ('pose', 0.183), ('dependencies', 0.182), ('pictorial', 0.159), ('poselets', 0.152), ('pairwise', 0.143), ('arm', 0.141), ('parts', 0.102), ('leg', 0.102), ('andriluka', 0.089), ('articulated', 0.086), ('prediction', 0.084), ('potentials', 0.084), ('legs', 0.081), ('people', 0.079), ('upper', 0.076), ('leeds', 0.068), ('uiuc', 0.066), ('ur', 0.062), ('predict', 0.062), ('localisation', 0.06), ('conditioning', 0.059), ('ip', 0.058), ('sport', 0.056), ('human', 0.055), ('poses', 0.055), ('ln', 0.054), ('wp', 0.054), ('estimation', 0.053), ('structures', 0.053), ('cluster', 0.051), ('modelling', 0.049), ('poseletconditioned', 0.048), ('tmn', 0.048), ('conditioned', 0.048), ('configurations', 0.047), ('components', 0.047), ('rotation', 0.046), ('potential', 0.046), ('slda', 0.043), ('inference', 0.042), ('fore', 0.041), ('head', 0.041), ('oracle', 0.04), ('part', 0.04), ('adaboost', 0.04), ('sports', 0.039), ('arms', 0.037), ('pishchulin', 0.037), ('colour', 0.036), ('activities', 0.036), ('repulsive', 0.035), ('loopy', 0.035), ('capture', 0.034), ('reuse', 0.033), ('predicting', 0.032), ('parse', 0.032), ('tennis', 0.032), ('rotations', 0.031), ('eu', 0.031), ('pronounced', 0.031), ('alone', 0.031), ('incorporates', 0.03), ('offset', 0.03), ('correct', 0.03), ('rely', 0.029), ('planck', 0.029), ('reads', 0.029), ('anatomical', 0.029), ('improvement', 0.029), ('tractable', 0.029), ('competing', 0.029), ('dependent', 0.028), ('cycles', 0.028), ('ep', 0.028), ('mixture', 0.028), ('model', 0.028), ('templates', 0.028), ('terms', 0.027), ('playing', 0.027), ('articulation', 0.027), ('relational', 0.027), ('interestingly', 0.027), ('parsing', 0.026), ('scoring', 0.026), ('exact', 0.026), ('sapp', 0.026), ('clusters', 0.025), ('outcome', 0.025), ('joints', 0.025), ('composite', 0.025), ('germany', 0.024), ('tran', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

2 0.37096164 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool

Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.

3 0.34273091 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

4 0.26026529 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

5 0.25219697 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

Author: Kota Hara, Rama Chellappa

Abstract: We present a hierarchical method for human pose estimation from a single still image. In our approach, a dependency graph representing relationships between reference points such as bodyjoints is constructed and thepositions of these reference points are sequentially estimated by a successive application of multidimensional output regressions along the dependency paths, starting from the root node. Each regressor takes image features computed from an image patch centered on the current node ’s position estimated by the previous regressor and is specialized for estimating its child nodes ’ positions. The use of the dependency graph allows us to decompose a complex pose estimation problem into a set of local pose estimation problems that are less complex. We design a dependency graph for two commonly used human pose estimation datasets, the Buffy Stickmen dataset and the ETHZ PASCAL Stickmen dataset, and demonstrate that our method achieves comparable accuracy to state-of-the-art results on both datasets with significantly lower computation time than existing methods. Furthermore, we propose an importance weighted boosted re- gression trees method for transductive learning settings and demonstrate the resulting improved performance for pose estimation tasks.

6 0.24890405 334 cvpr-2013-Pose from Flow and Flow from Pose

7 0.24484105 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

8 0.2090296 40 cvpr-2013-An Approach to Pose-Based Action Recognition

9 0.20614763 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

10 0.17679653 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

11 0.17447287 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

12 0.16903083 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

13 0.16873182 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

14 0.15753531 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

15 0.15075725 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

16 0.14128372 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

17 0.1339224 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

18 0.133443 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

19 0.12722066 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

20 0.12558471 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.229), (1, -0.04), (2, 0.015), (3, -0.175), (4, 0.022), (5, 0.034), (6, 0.159), (7, 0.172), (8, 0.062), (9, -0.175), (10, -0.118), (11, 0.281), (12, -0.163), (13, 0.001), (14, -0.032), (15, 0.148), (16, 0.064), (17, -0.069), (18, -0.067), (19, -0.185), (20, -0.046), (21, 0.039), (22, -0.034), (23, -0.039), (24, -0.064), (25, 0.021), (26, -0.024), (27, 0.004), (28, -0.0), (29, -0.062), (30, 0.092), (31, -0.004), (32, 0.014), (33, -0.026), (34, 0.007), (35, -0.009), (36, -0.026), (37, 0.085), (38, 0.003), (39, 0.033), (40, 0.03), (41, -0.037), (42, 0.097), (43, 0.021), (44, -0.005), (45, -0.052), (46, 0.021), (47, 0.038), (48, -0.055), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97134054 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

2 0.91393703 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool

Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.

3 0.85325658 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson

Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.

4 0.85028356 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

5 0.84830308 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

Author: Ľ

Abstract: Our goal is to detect humans and estimate their 2D pose in single images. In particular, handling cases of partial visibility where some limbs may be occluded or one person is partially occluding another. Two standard, but disparate, approaches have developed in the field: the first is the part based approach for layout type problems, involving optimising an articulated pictorial structure; the second is the pixel based approach for image labelling involving optimising a random field graph defined on the image. Our novel contribution is a formulation for pose estimation which combines these two models in a principled way in one optimisation problem and thereby inherits the advantages of both of them. Inference on this joint model finds the set of instances of persons in an image, the location of their joints, and a pixel-wise body part labelling. We achieve near or state of the art results on standard human pose data sets, and demonstrate the correct estimation for cases of self-occlusion, person overlap and image truncation.

6 0.84331095 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

7 0.81663185 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

8 0.81300718 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

9 0.80750972 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

10 0.73817825 426 cvpr-2013-Tensor-Based Human Body Modeling

11 0.68515044 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

12 0.65361816 334 cvpr-2013-Pose from Flow and Flow from Pose

13 0.65050668 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

14 0.64718962 40 cvpr-2013-An Approach to Pose-Based Action Recognition

15 0.64549148 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

16 0.63313979 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

17 0.5569663 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

18 0.47261566 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

19 0.44232431 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models

20 0.44120672 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.115), (16, 0.015), (26, 0.049), (33, 0.292), (67, 0.123), (69, 0.027), (80, 0.248), (87, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90510297 272 cvpr-2013-Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery

Author: Rikke Gade, Anders Jørgensen, Thomas B. Moeslund

Abstract: This paper presents a robust occupancy analysis system for thermal imaging. Reliable detection of people is very hard in crowded scenes, due to occlusions and segmentation problems. We therefore propose a framework that optimises the occupancy analysis over long periods by including information on the transition in occupancy, whenpeople enter or leave the monitored area. In stable periods, with no activity close to the borders, people are detected and counted which contributes to a weighted histogram. When activity close to the border is detected, local tracking is applied in order to identify a crossing. After a full sequence, the number of people during all periods are estimated using a probabilistic graph search optimisation. The system is tested on a total of 51,000 frames, captured in sports arenas. The mean error for a 30-minute period containing 3-13 people is 4.44 %, which is a half of the error percentage optained by detection only, and better than the results of comparable work. The framework is also tested on a public available dataset from an outdoor scene, which proves the generality of the method.

2 0.8821339 183 cvpr-2013-GRASP Recurring Patterns from a Single View

Author: Jingchen Liu, Yanxi Liu

Abstract: We propose a novel unsupervised method for discovering recurring patterns from a single view. A key contribution of our approach is the formulation and validation of a joint assignment optimization problem where multiple visual words and object instances of a potential recurring pattern are considered simultaneously. The optimization is achieved by a greedy randomized adaptive search procedure (GRASP) with moves specifically designed for fast convergence. We have quantified systematically the performance of our approach under stressed conditions of the input (missing features, geometric distortions). We demonstrate that our proposed algorithm outperforms state of the art methods for recurring pattern discovery on a diverse set of 400+ real world and synthesized test images.

same-paper 3 0.88159108 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

4 0.85832614 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

Author: Gaurav Sharma, Frédéric Jurie, Cordelia Schmid

Abstract: We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) ‘average ’ templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.

5 0.85519266 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection

Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito

Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.

6 0.8440423 210 cvpr-2013-Illumination Estimation Based on Bilayer Sparse Coding

7 0.83060592 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video

8 0.82826984 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

9 0.82497191 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

10 0.82072896 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

11 0.81713933 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

12 0.81571919 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

13 0.81480545 334 cvpr-2013-Pose from Flow and Flow from Pose

14 0.81323045 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

15 0.81228161 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

16 0.80889392 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

17 0.8087694 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

18 0.80755258 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

19 0.80751508 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

20 0.80644649 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval