cvpr cvpr2013 cvpr2013-45 knowledge-graph by maker-knowledge-mining

45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers


Source: pdf

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com solely on strong contours and edges will fail to detect the upper and lower parts of the arms. [sent-4, score-0.188]

2 Abstract We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. [sent-5, score-0.267]

3 For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. [sent-6, score-0.552]

4 5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset. [sent-11, score-0.657]

5 We might aim to do so by fitting a stick figure model in one of its numerous manifestations [1, 7, 9, 19, 20] by detecting rectangles in the upper and lower parts of the arms. [sent-14, score-0.187]

6 It seems clear enough that how we detect the arm configurations of these people is from the position of the head and the hands. [sent-16, score-0.542]

7 Mori and Malik [16] matched whole body shapes of figures to exemplars using shape contexts and then transferred keypoint locations from exemplars to the test image. [sent-19, score-0.176]

8 [22] recovered the articulated pose of the human up- per body by defining parameter-sensitive hash functions to retrieve similar examples from the training set. [sent-21, score-0.237]

9 During training, we partition the space of keypoints and train models for arm configurations, or armlets, using linear SVMs. [sent-28, score-0.501]

10 Given a test image, we apply the trained models and use the mean predictions of the highest scoring activation to estimate the location of the joints. [sent-29, score-0.427]

11 To train the armlets we extract features that could capture the necessary cues for accurate arm keypoint predictions. [sent-30, score-0.806]

12 Given the trained armlets and an image, we consider the highest scoring armlet activation assigned to each instance. [sent-43, score-1.18]

13 From the predictions computed during training, we estimate the locations of the arm keypoints. [sent-44, score-0.493]

14 To further refine an armlet’s joint location prediction we train shoulder, elbow and wrist detectors that are able to localize the joints more concretely, conditioned on an armlet activation. [sent-51, score-0.923]

15 Related Work The direction of representing the human body pose using stick figures was initially explored by Nevatia and Binford [18], where the body parts were modelled using generalized cylinders. [sent-61, score-0.281]

16 Far right: Our context feature; at each cell we have a poselect activation feature vector. [sent-67, score-0.231]

17 For each poselet type, we put the maximum of the scores of all poselet activations of that type whose center falls in the cell, or zero if no activations are present. [sent-68, score-0.471]

18 [21] allow for richer appearance models, including contour and segmentation cues, by learning a cascade of pictorial structures of increasing pose resolution, which progressively filter the pose state space. [sent-84, score-0.195]

19 Our work is close to [25], since we also partition the keypoint configuration space to extract parts that are subsequently trained and used for recognition. [sent-97, score-0.304]

20 Datasets Commonly used datasets for human pose estimation from 2D images, the Parse dataset [19], the Buffy dataset [11] and the PASCAL stickmen dataset [7], suffer from two significant problems: size and limitation of annotations. [sent-101, score-0.195]

21 The other fundamental problem with these datasets is that the joints are annotated in the image coordinate system, meaning that a joint is labeled as ‘left’ if it is leftmost in the image and not if it is the left joint of the person in question. [sent-104, score-0.175]

22 We regard our dataset as complementary to the other “big” dataset for human pose estimation, the Leeds Sports Pose dataset [13, 14]. [sent-114, score-0.168]

23 Training armlets In this section, we describe the procedure for selecting and training highly discriminative poselets to differentiate arm configurations. [sent-123, score-0.788]

24 Partioning of the Configuration Space We create lists of positive examples by partitioning the arm configuration space. [sent-129, score-0.561]

25 The space consists of the keypoint configuration of one arm, as well as the position of the opposite shoulder. [sent-130, score-0.219]

26 This configuration space captures both the arm configuration as well as the 3D orientation of the torso. [sent-131, score-0.627]

27 For example, an arm stretched downwards can be described by the location of the arm keypoints and the relative location of the opposite shoulder captures whether the person is front or back facing. [sent-132, score-1.067]

28 By defining a distance function d(p, q) for p, q in the configuration space, we can quantitatively measure the similarity of two arm configurations. [sent-133, score-0.513]

29 Examplesof ourdifer ntarmconfigurationsresulting from the partitioning of the right arm configuration space. [sent-177, score-0.513]

30 We collect patches of arm configurations by partioning the configuration space as described above. [sent-182, score-0.633]

31 We center the patch of a configuration p around the location of the elbow and scale it by 2σp, where σp is defined in Eq. [sent-183, score-0.232]

32 We sort the examples in each armlet according to the distance function in Eq. [sent-185, score-0.663]

33 We obtain 25 different arm configurations for each arm after partitioning the corresponding configuration space. [sent-187, score-0.982]

34 4 shows four right arm configurations out of the 25. [sent-189, score-0.469]

35 Strong responses of a skin detector indicate where the head and the ×× × hands of a person in an image are likely to be located and thus eliminate the large number of possible arm configurations. [sent-207, score-0.677]

36 We generated our training data from skin patches using the H3D dataset [4]. [sent-209, score-0.202]

37 The location of the head, the torso, their orientation and scale is significant in detecting an arm configuration. [sent-215, score-0.492]

38 For example, it is much easier to detect where the right arm is if we know where the head is, where the torso is and whether the person is facing front or back. [sent-216, score-0.562]

39 To encode that information, we use generic poselets [4], trained for the purpose of person detection. [sent-217, score-0.182]

40 For each 8 8 pixel cell, we define a N-dimensional aFcotriva etaiconh 8ve ×cto 8r that contains in its i-th entry the score of the i-th detection specific poselet, if the center of the activation is located within radius r (r=8) from the center of the cell, and 0 otherwise. [sent-220, score-0.31]

41 We show the local gradient and the gPb contours used to construct the HOG features, the output of the skin detector and detectionspecific poselet activations used to encode context. [sent-225, score-0.388]

42 Since we want our armlets to be discriminative and fine-grained, we use negative images coming from people but with different arm configurations. [sent-231, score-0.708]

43 An instance with keypoint configuration q is considered as a negative example for an armlet α with a seed patch of configuration centerα, if d(centerα , q) > 2 ·i∈{m1,. [sent-232, score-1.001]

44 ,xNα}d(centerα,pi) where pi is the i-th member of armlet α consisting of Nα members. [sent-235, score-0.639]

45 For each armlet α, we model the distribution of the location of each joint J by fitting a gaussian. [sent-236, score-0.724]

46 The distribution of the location x for joint J conditioned on an activation αi of armlet α is given by Pm(x| αi) = N? [sent-237, score-0.959]

47 (4) 333333444533 μ(Jα) Σ(Jα) where is the mean location of J and is the covariance matrix, conditioned on activation αi. [sent-239, score-0.28]

48 Keypoint prediction at test time To get the armlet activations for an input image, we apply the trained model at multiple scales and keep the activations with non negative scores. [sent-243, score-1.046]

49 For the task of keypoint prediction, we cluster the activations to the instances in the image. [sent-244, score-0.322]

50 We associate an activation to the instance with the biggest overlap with the predicted torso bounds and if that is greater than 0. [sent-246, score-0.324]

51 Subsequently, for every instance in the image we consider the activation with the highest score assigned to that instance and use its mean prediction for the location of the arm keypoints. [sent-248, score-0.778]

52 In other words, if βi∗ is the activation with the highest score, which is of armlet type β, then the final prediction for joint J is given by μJ(β). [sent-249, score-0.968]

53 Results using armlets In this section we report the performance of armlets and compare it with Yang and Ramanan [26]. [sent-251, score-0.458]

54 It is clear that our approach of using gPb contours, skin and detection-specific poselets for context leads to a significant improvement over the standard HOG. [sent-264, score-0.226]

55 Augmented Armlets The armlets described above are trained to discriminate among different arm configurations. [sent-312, score-0.685]

56 To capture the appearance of smaller areas around the joints we train three different poselets to detect the shoulder, the elbow and the wrist, specific to each of the 50 arm configurations. [sent-313, score-0.637]

57 Training augmented armlets Each armlet can be considered as a root filter and the shoulderlet, the elbowlet and the wristlet are connected to 333333444644 × Figure 5. [sent-317, score-1.067]

58 HOG templates for the shoulderlet, the elbowlet and the wristlet for armlet 3 superimposed on a positive example of that armlet. [sent-318, score-0.811]

59 the armlet activation is observed, in contrast to [10] where the location of the parts is treated as a latent variable. [sent-323, score-0.916]

60 We extract rectangular patches from the positive examples of each armlet type. [sent-324, score-0.712]

61 The patches are centered at the keypoint of interest and at double the scale of the original positive example. [sent-325, score-0.178]

62 Since the patches come from similar arm configurations, defined by the armlet type, they are aligned, allowing for the use of rigid features such as HOG. [sent-326, score-1.087]

63 W foer × use instance specific skin color models, which are GMMs with 5 components fitted on the LAB pixel values corresponding to the predicted face region of each instance as dictated by the detection specific poselet activations. [sent-331, score-0.259]

64 Subsequently, we learn a linear SVM using as negatives 64 64 sized patches coming from the positive armlet examples 4 b suitz neodt p caetcntheerse dc ocmloisneg t for othme joint oisni trievfeer aernmcele. [sent-332, score-0.774]

65 A exf-ter the first round of training, we re-estimate the positive patches by running the detector in a small neighborhood around the original keypoint location. [sent-333, score-0.178]

66 This allows for some small variations in the alignment of the examples coming from the armlet clustering and results in a better alignment of the actual parts to be trained. [sent-334, score-0.715]

67 5 shows an example of an armlet along with the HOG templates for the shoulderlet, the elbowlet and the wristlet. [sent-337, score-0.721]

68 Augmented armlet activations We can use the activations of the shoulderlet, the elbowlet and the wristlet to rescore the original armlet activations. [sent-340, score-1.771]

69 Strong part activations might indicate that the right armlet has indeed fired while weak part activations indicate a false positive activation. [sent-341, score-0.981]

70 Recall that for each armlet α, we computed the mean relative location and the standard deviation of the three arm keypoints from the positive examples of that armlet. [sent-342, score-1.208]

71 Given an activation of that particular armlet, these locations give a rough estimate of where the joints might be located within the bounding box of the activation. [sent-343, score-0.274]

72 We define an area of interest for each keypoint centered at the mean location and extending twice the empirical standard deviation. [sent-344, score-0.174]

73 For each part, we detect its activations within the area of interest and record the highest scoring activation. [sent-345, score-0.236]

74 In other words, each armlet activation αi is now described by its original detection score sαi as well as the three maximum part activation scores , J = 1, 2, 3 corresponding to the three arm keypoints. [sent-346, score-1.466]

75 Let us call vαi the part activation vector which contains those four scores of the armlet activation αi. [sent-347, score-1.043]

76 For each armlet α, we can train a linear SVM wα with positives Pα = {vαi | i ∈ TP(α)} where TP(α) is the set of true positive ac|tiv ia ∈tio TnsP o(fα )ar}m wlehte α TanPd negatives Nα = {vαi | i ∈ FP(α)} where FP(α) is the set of false positive avctiv|a tii ∈on sF oPf (aαrm)}le wt α. [sent-348, score-0.712]

77 ct thivea tsieotn o αi ahlsase subsequently an activation score σ? [sent-351, score-0.224]

78 Using the augmented armlets for keypoint predictions The activations of the shoulderlets, the elbowlets and the wristlets can also be used for improved predictions of the location of the corresponding joints. [sent-358, score-0.753]

79 Assume α is an armlet and αi is the i-th activation of that armlet in an image I. [sent-359, score-1.48]

80 The score of a part activation at location x of the trained model for joint J, after fitting a logistic on the SVM scores, can be interpreted as the confidence of the part model at that location. [sent-362, score-0.32]

81 The predicted location of part J conditioned on the activation αi is given by s(αJi) x∗(J) = argmxaxPm(x| αi) · P? [sent-367, score-0.301]

82 lets We can use the shoulderlets, elbowlet and wristlet trained for each armlet to rescore the activations on the test set as well as make keypoint predictions, as described in Section 6. [sent-371, score-1.135]

83 The first column shows the performance after picking the highest scoring armlet activation, as described in Section 5. [sent-373, score-0.739]

84 (best viewed in color) second column shows the performance after picking the maximum scoring activation of the augmented armlets to make predictions for the joints using the mean relative locations. [sent-395, score-0.673]

85 The third column shows the performance after using the highest scoring activation of the augmented armlets to make a prediction using the posterior probability (Eq 6). [sent-396, score-0.615]

86 7, 8 show examples of correct right and left, respectively, arm keypoint predictions. [sent-400, score-0.576]

87 The red stick corresponds to the upper arm and the blue to the lower arm of that instance. [sent-402, score-1.003]

88 9 shows incorrect keypoint predictions for the right and left arm corresponding to the instance highlighted in green. [sent-404, score-0.674]

89 Discussion We propose a straightforward yet effective framework for training arm specific poselets for the task of joint position estimation and we show experimentally that it gives superior results on a challenging dataset. [sent-406, score-0.599]

90 8% for Lower Arms corresponds to the upper arm and blue to the lower arm of the person highlighted in green. [sent-409, score-1.019]

91 (best viewed in color) corresponds to the upper arm and blue to the lower arm of the person highlighted in green. [sent-410, score-1.019]

92 Examples of incorrect keypoint predictions for the right arm (top) and left arm (bottom) . [sent-412, score-1.045]

93 Red corresponds to the upper arm and blue to the lower arm of the person highlighted in green. [sent-413, score-1.019]

94 Thus, we can compute the PCP accuracy per armlet type 333333444866 Figure 10. [sent-416, score-0.664]

95 PCP localization accuracy per armlet type for upper arm (top left) and for lower arm (top right), where red indicates the performance of our approach while blue the performance by Yang and Ramanan [26]. [sent-417, score-1.609]

96 The number of training examples per armlet type is shown in the bottom. [sent-418, score-0.72]

97 10 shows PCP accuracy per armlet type on the test set for the upper arm (top left) and for the lower arm (top right), as well as the number of training examples per armlet type (bottom). [sent-421, score-2.329]

98 These plots show that our approach dominates Y&R;’s for most armlet types on the test set, and also reveal that both methods are strongly correlated with the amount of training data. [sent-422, score-0.671]

99 In particular, the Pearson’s correlation coefficient between the number of training examples and the PCP accuracy for the upper arm is 0. [sent-423, score-0.546]

100 Clustered pose and nonlinear appearance models for human pose estimation. [sent-522, score-0.171]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('armlet', 0.639), ('arm', 0.423), ('armlets', 0.229), ('activation', 0.202), ('pcp', 0.16), ('activations', 0.159), ('keypoint', 0.129), ('skin', 0.122), ('lscg', 0.116), ('poselets', 0.104), ('configuration', 0.09), ('elbowlet', 0.082), ('ramanan', 0.077), ('lsp', 0.073), ('pascal', 0.072), ('pose', 0.072), ('predictions', 0.07), ('psm', 0.067), ('upper', 0.067), ('lowerarm', 0.066), ('shoulderlet', 0.066), ('wristlet', 0.066), ('gpb', 0.062), ('contours', 0.059), ('stick', 0.058), ('torso', 0.055), ('hog', 0.054), ('keypoints', 0.053), ('arms', 0.051), ('augmented', 0.051), ('pictorial', 0.051), ('joints', 0.05), ('partioning', 0.049), ('voc', 0.049), ('scoring', 0.048), ('poselet', 0.048), ('body', 0.047), ('nms', 0.046), ('configurations', 0.046), ('location', 0.045), ('person', 0.045), ('joint', 0.04), ('head', 0.039), ('mori', 0.038), ('xip', 0.036), ('yip', 0.036), ('articulated', 0.035), ('elbow', 0.035), ('instances', 0.034), ('people', 0.034), ('trained', 0.033), ('prediction', 0.033), ('athletic', 0.033), ('backfacing', 0.033), ('frontfacing', 0.033), ('shoulderlets', 0.033), ('upperarm', 0.033), ('conditioned', 0.033), ('shoulder', 0.033), ('training', 0.032), ('lower', 0.032), ('center', 0.032), ('parts', 0.03), ('patch', 0.03), ('highlighted', 0.029), ('cell', 0.029), ('highest', 0.029), ('yang', 0.027), ('human', 0.027), ('ablation', 0.027), ('rescore', 0.027), ('stickmen', 0.027), ('lj', 0.026), ('hands', 0.026), ('type', 0.025), ('train', 0.025), ('patches', 0.025), ('examples', 0.024), ('orientation', 0.024), ('positive', 0.024), ('column', 0.023), ('nevatia', 0.023), ('wrist', 0.023), ('instance', 0.023), ('dataset', 0.023), ('bounds', 0.023), ('non', 0.023), ('subsequently', 0.022), ('coming', 0.022), ('pixel', 0.022), ('located', 0.022), ('parse', 0.022), ('buffy', 0.022), ('shakhnarovich', 0.022), ('johnson', 0.021), ('predicted', 0.021), ('gradients', 0.02), ('sports', 0.02), ('eichner', 0.02), ('berkeley', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

2 0.26026529 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

3 0.21145597 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

4 0.16347811 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

Author: Kota Hara, Rama Chellappa

Abstract: We present a hierarchical method for human pose estimation from a single still image. In our approach, a dependency graph representing relationships between reference points such as bodyjoints is constructed and thepositions of these reference points are sequentially estimated by a successive application of multidimensional output regressions along the dependency paths, starting from the root node. Each regressor takes image features computed from an image patch centered on the current node ’s position estimated by the previous regressor and is specialized for estimating its child nodes ’ positions. The use of the dependency graph allows us to decompose a complex pose estimation problem into a set of local pose estimation problems that are less complex. We design a dependency graph for two commonly used human pose estimation datasets, the Buffy Stickmen dataset and the ETHZ PASCAL Stickmen dataset, and demonstrate that our method achieves comparable accuracy to state-of-the-art results on both datasets with significantly lower computation time than existing methods. Furthermore, we propose an importance weighted boosted re- gression trees method for transductive learning settings and demonstrate the resulting improved performance for pose estimation tasks.

5 0.15590103 334 cvpr-2013-Pose from Flow and Flow from Pose

Author: Katerina Fragkiadaki, Han Hu, Jianbo Shi

Abstract: Human pose detectors, although successful in localising faces and torsos of people, often fail with lower arms. Motion estimation is often inaccurate under fast movements of body parts. We build a segmentation-detection algorithm that mediates the information between body parts recognition, and multi-frame motion grouping to improve both pose detection and tracking. Motion of body parts, though not accurate, is often sufficient to segment them from their backgrounds. Such segmentations are crucialfor extracting hard to detect body parts out of their interior body clutter. By matching these segments to exemplars we obtain pose labeled body segments. The pose labeled segments and corresponding articulated joints are used to improve the motion flow fields by proposing kinematically constrained affine displacements on body parts. The pose-based articulated motion model is shown to handle large limb rotations and displacements. Our algorithm can detect people under rare poses, frequently missed by pose detectors, showing the benefits of jointly reasoning about pose, segmentation and motion in videos.

6 0.14280315 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

7 0.13861229 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

8 0.12756844 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

9 0.12635748 40 cvpr-2013-An Approach to Pose-Based Action Recognition

10 0.12047967 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

11 0.10277237 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

12 0.091763712 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

13 0.087837785 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

14 0.08661326 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

15 0.086496316 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

16 0.083100684 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos

17 0.07699652 325 cvpr-2013-Part Discovery from Partial Correspondence

18 0.076073401 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations

19 0.07490097 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

20 0.070417643 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.159), (1, -0.044), (2, 0.013), (3, -0.104), (4, 0.019), (5, 0.02), (6, 0.095), (7, 0.101), (8, 0.044), (9, -0.118), (10, -0.104), (11, 0.123), (12, -0.065), (13, -0.018), (14, 0.016), (15, 0.053), (16, 0.032), (17, -0.08), (18, -0.055), (19, -0.085), (20, -0.013), (21, 0.061), (22, -0.014), (23, -0.06), (24, -0.051), (25, 0.05), (26, -0.01), (27, -0.04), (28, 0.035), (29, -0.003), (30, 0.079), (31, -0.006), (32, -0.003), (33, 0.023), (34, 0.049), (35, -0.038), (36, 0.006), (37, 0.068), (38, -0.035), (39, -0.01), (40, 0.047), (41, -0.066), (42, 0.078), (43, 0.033), (44, -0.024), (45, -0.041), (46, 0.063), (47, -0.012), (48, -0.045), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93440288 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

2 0.88711971 335 cvpr-2013-Poselet Conditioned Pictorial Structures

Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.

3 0.85955304 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors

Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool

Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.

4 0.83690238 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

5 0.80891788 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

Author: Ben Sapp, Ben Taskar

Abstract: We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-theart approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets. The new dataset and code are avail- able online. 1

6 0.80169278 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation

7 0.7946378 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

8 0.79124528 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

9 0.78894037 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

10 0.6864894 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

11 0.67108345 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts

12 0.66636217 426 cvpr-2013-Tensor-Based Human Body Modeling

13 0.65323597 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

14 0.65059859 334 cvpr-2013-Pose from Flow and Flow from Pose

15 0.63123637 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images

16 0.62665427 40 cvpr-2013-An Approach to Pose-Based Action Recognition

17 0.5781495 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest

18 0.54294932 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

19 0.51400977 120 cvpr-2013-Detecting and Naming Actors in Movies Using Generative Appearance Models

20 0.50407922 325 cvpr-2013-Part Discovery from Partial Correspondence


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.082), (16, 0.017), (26, 0.031), (30, 0.013), (33, 0.208), (67, 0.41), (69, 0.036), (80, 0.055), (87, 0.055)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93034226 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

Author: Pramod Sharma, Ram Nevatia

Abstract: In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a randomfern adaptive classifier . The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.

2 0.92384124 103 cvpr-2013-Decoding Children's Social Behavior

Author: James M. Rehg, Gregory D. Abowd, Agata Rozga, Mario Romero, Mark A. Clements, Stan Sclaroff, Irfan Essa, Opal Y. Ousley, Yin Li, Chanho Kim, Hrishikesh Rao, Jonathan C. Kim, Liliana Lo Presti, Jianming Zhang, Denis Lantsman, Jonathan Bidwell, Zhefan Ye

Abstract: We introduce a new problem domain for activity recognition: the analysis of children ’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1–2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3–5 minute child-adult interaction. In each session, the adult examiner followed a semistructured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

3 0.89481044 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

Author: Wanli Ouyang, Xiaogang Wang

Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.

same-paper 4 0.8449111 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

5 0.84434563 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

6 0.83077526 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search

7 0.8263377 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

8 0.81688344 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

9 0.80481148 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

10 0.78815538 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

11 0.76083285 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

12 0.75560898 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

13 0.74864441 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

14 0.73225188 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

15 0.73139942 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

16 0.72151685 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

17 0.69887918 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

18 0.69641382 438 cvpr-2013-Towards Pose Robust Face Recognition

19 0.69352371 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

20 0.69184375 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification