cvpr cvpr2013 cvpr2013-60 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
Reference: text
sentIndex sentText sentNum sentScore
1 au i 1 Nanjing University of Science and Technology, Nanjing, China, 210094 2 National ICT Australia (NICTA), Canberra, Australia, 2601 Abstract Simple tree models for articulated objects prevails in the last decade. [sent-7, score-0.515]
2 However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. [sent-8, score-0.702]
3 This paper attempts to address three questions: 1) are simple tree models sufficient? [sent-9, score-0.388]
4 more specifically, 2) how to use tree models effectively in human pose estimation? [sent-10, score-0.661]
5 and 3) how shall we use combined parts together with single parts efficiently? [sent-11, score-0.795]
6 Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. [sent-12, score-0.577]
7 We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. [sent-13, score-1.28]
8 This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. [sent-14, score-0.99]
9 As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. [sent-15, score-0.565]
10 Introduction Tree models are very efficient in a number of computer vision tasks such as human pose estimation and other articulated body modeling. [sent-19, score-0.599]
11 Also because of these unique advantages, it is not uncommon to speculate that tree models may not effectively handle computer vision problems in real world applications. [sent-21, score-0.445]
12 As a consequence, latent variables [1] and loopy graphical models [2] were proposed in the past few years for human pose estimation as the remedy of the problems caused by those “oversimplified” tree models. [sent-22, score-1.144]
13 Particularly, it is believed that loopy graphical models are necessary when combined parts (or “poselet”) are used to handle large variance in appearance. [sent-24, score-0.702]
14 In this paper, we argue that the simple tree model is still a very powerful representation, and combined parts and single parts can be used together without sacrificing the bene- fits brought by exact inference. [sent-26, score-1.137]
15 All the questions about tree models arise when we start to use the skeleton as the tree structure. [sent-28, score-0.841]
16 Further, this limits our choices of representation, as well as complicates the graphical model when combined parts are introduced. [sent-30, score-0.568]
17 Our goal is to learn a tree model directly from observed variables. [sent-31, score-0.342]
18 These observations can be body part locations, as used in many recent pose estimation papers. [sent-32, score-0.378]
19 At the same time, this allows us to introduce more variables such as combined parts, as long as they can be observed and the state space is the same as that of single body parts. [sent-33, score-0.447]
20 Recent advancements in learning graphical models enable us to learn latent trees from these observations. [sent-35, score-0.424]
21 The latent tree models suggest that we could approximate the joint distribution of the observations by a tree model, and latent variables are introduced only when necessary. [sent-36, score-1.382]
22 We start our journey by exploring the property of the latent tree models. [sent-37, score-0.603]
23 It is not surprising that the resulting latent tree has a similar structure to human skeleton. [sent-38, score-0.735]
24 This human pose estimation dataset is challenging both in the pose variations and its size. [sent-40, score-0.552]
25 Therefore, this implies we can directly use a tree model with mixed types of variables for human pose estimation to approximate the true distribution. [sent-41, score-0.794]
26 The types for single parts a) and b) are defined by their relative positions to their neighbors. [sent-44, score-0.337]
27 For combined parts c)-e), the types are defined by their visual categories. [sent-45, score-0.569]
28 Unlike other state of the arts in human pose estimation that use only a small number of categories, we find that a larger number of visual categories facilitates pose estimation. [sent-56, score-0.672]
29 The inference of our model is very efficient due to the tree structure, and results suggest that our method outperforms state of the art on the LSP dataset. [sent-58, score-0.511]
30 Our contributions include: • • • We propose to learn tree models for articulated pose eWsteim parotipoons problems. [sent-62, score-0.696]
31 Our method effectively exploits the interactions betOwueren m ceotmhobdin eefdfe parts laynd e single parts. [sent-63, score-0.299]
32 Our method outperforms the state of the art in human aOnudr a mneimthaold pose eesrtfiomrmatsio tnh. [sent-64, score-0.359]
33 Related work Human pose estimation Human pose estimation has been formulated as a part based inference problem. [sent-66, score-0.573]
34 Appearance model and deformable model that describe relations between parts were proposed in the past decade. [sent-67, score-0.336]
35 [9] proposed the idea of poselets as the building blocks for human recognition, which refers to combined parts that are distinctive in training images. [sent-71, score-0.619]
36 [4] proposed a flexible mixtures-of-parts model for articulated pose estimation. [sent-76, score-0.308]
37 Instead of modeling both location and orientation of each body part as rigid part, they used the model that only contains non-oriented parts with co-occurrence constraints. [sent-77, score-0.404]
38 It is widely hypothesized that graphical models that go beyond pairwise links lead to better performance in pose estimation. [sent-78, score-0.299]
39 Several new approaches also use latent nodes [1] or hierarchical graph models [10]. [sent-80, score-0.358]
40 In this paper, we examine the above concepts, and suggest that these important components in articulated body detection and pose estimation can be integrated in an efficient framework. [sent-81, score-0.499]
41 Therefore, we exploit a newly developed technique in learning latent tree models. [sent-82, score-0.603]
42 Latent tree models The latent tree models [11] aim at finding tree approximations of joint distribution of observable variables. [sent-83, score-1.429]
43 In Chow-Liu tree [12] all nodes in the latent tree must be observable. [sent-85, score-0.996]
44 Their methods automatically build tree structures from ob- servations, using information distances as the guideline of merging nodes and introducing latent variables. [sent-88, score-0.738]
45 This theoretical approach is very useful for human pose estimation, because we can learn a structure directly from our observations without making many assumptions of the physical constraints, while the performance is still guaranteed in terms of approximating the joint distribution. [sent-89, score-0.407]
46 From 555999557 left to right, the results for CLGrouping tree, CL-Neighbor Joining [11] using single parts, respectively, and CLGrouping on single and combined parts together. [sent-92, score-0.602]
47 Latent tree models for pose estimation First, we provide a brief introduction to the latent tree models, and show the results on modeling the body joints in the LSP using latent tree models. [sent-95, score-2.01]
48 We then present the learning of visual category for combined parts, which is necessary in our model for reducing complexity. [sent-96, score-0.322]
49 Brief introduction to latent tree models The goal of latent tree model is to recover a treestructured graphical model that best approximates the distributions of a set of observations. [sent-100, score-1.358]
50 Recursive Grouping and CLGrouping were proposed in [11] to create latent tree models without any redundant hidden nodes. [sent-101, score-0.649]
51 CLGrouping is its extension that can build up latent tree structures for large diameter graphs more efficiently with a pre-processing step. [sent-105, score-0.634]
52 In this grouping method, the latent tree were built recursively by identifying sibling groups using information distances. [sent-106, score-0.674]
53 ance is defined as dij = −log(ρij) (2) Then, the recursive grouping method build up the latent tree by testing relationships among each triplet i,j,k ∈ V . [sent-109, score-0.806]
54 ≤ ≤ In this way, a latent tree is recursively built. [sent-116, score-0.633]
55 Latent trees for human pose Our goals is to use single parts and combined parts in the inference model. [sent-120, score-1.158]
56 Given image I, we define P parts as pi = (loci , ti) , i ∈ [1, . [sent-121, score-0.303]
57 ,rP Pv]i,su wahl category label for combined parts (Sec 3. [sent-127, score-0.555]
58 3), or represents different morphologies of parts for single parts as suggested in [4]. [sent-128, score-0.567]
59 Two possible part combinations in our case are: • • Connected parts: A combined part may have physical cCoonnnnecectitoend i pna hrutsm:a An body. [sent-129, score-0.359]
60 Physically separated parts: The combined parts can bPeh yussiecda flloyr encoding se pmaratnst:ic T rheela ctioomnbs among single parts. [sent-133, score-0.527]
61 Therefore, one may combine these two physically separated parts as one element. [sent-136, score-0.3]
62 In our following experiment, we defined 14 single parts and 10 combined parts (Fig. [sent-137, score-0.795]
63 We tested two scenarios in our experiment: • • Single parts only: In this experiment, we used only single parts sfo orn tlyhe: Ilante tnhit str eexep meriomdeelns. [sent-141, score-0.567]
64 s2e dsh oonwlys two results using CLGrouping tree and CL-Neighbor Joining [11]. [sent-143, score-0.342]
65 It is not very surprising that the structure is similar to human body, but please note that there is no latent node introduced by CLGrouping method in such a complicated and challenging dataset. [sent-144, score-0.422]
66 Because no redundant latent nodes is used in latent tree model, this means the joint distributions of all body joints can be approximated by a simple tree structure. [sent-145, score-1.515]
67 Therefore, introducing combined parts is a solution in many algorithms. [sent-147, score-0.496]
68 We used both single and combined parts in the latent tree models (Fig. [sent-148, score-1.176]
69 This means we can approxi- mate the joint distribution by combined parts and single parts in a tree structure. [sent-151, score-1.187]
70 This finding makes our latent tree model different from [1], because all our nodes are observable. [sent-155, score-0.654]
71 Learning visual categories of combined parts Combined parts are more discriminative than single parts. [sent-159, score-0.898]
72 We learned visual categories of each combined part directly from image space. [sent-163, score-0.375]
73 For each part, we build a latent SVM ([14]) model for learning visual categories. [sent-172, score-0.327]
74 Given N instances of a combined part, we learn K categories of this part, and generate the label set T = t1, t2 , · · · , tN, ti ∈ [1, K] . [sent-175, score-0.326]
75 The visual categories of combined parts characterize the appearance models in a way that they can be regarded as “templates”. [sent-183, score-0.685]
76 We show the results of HOG filters for different parts as well as different visual categories for two parts in Fig. [sent-185, score-0.675]
77 Our model Given a training set, we manually define the parts of interest, and learn a latent tree model for these parts. [sent-189, score-0.902]
78 The following notations are consistent with those in [4], while our types for combined parts have different meanings. [sent-190, score-0.534]
79 × Learning model parameters Denote the model parameter as β, which consists of HOG filters for single parts and deformable models. [sent-214, score-0.403]
80 Our single parts are the same as those 14 joints used in [4]. [sent-230, score-0.381]
81 Our combined parts are defined as the limbs in [15]. [sent-231, score-0.533]
82 In all experiments, we firstly extract bounding boxes for all parts in the training sets. [sent-232, score-0.299]
83 For each combined part, we extract HOG features on grid image with 4 4 pixels from image patches, aatundre lse oanrn gvriisdua iml categories using plaixteenlst SfrVomM. [sent-233, score-0.296]
84 Side by side comparison of [4] (left) and our pose estimation results (right) in the Leeds Sport dataset (LSP). [sent-241, score-0.279]
85 Experiment setting We used 8-15 visual categories for combined parts. [sent-252, score-0.331]
86 This is possibly because we effectively exploit the connections between single and combined parts, as well as the benefit from exact inference. [sent-296, score-0.297]
87 One may speculate that our combined parts may be overfitted to a dataset, because they captures the distinctive features as HOG templates during visual category learning. [sent-300, score-0.647]
88 We trained our model on all the 305 images in the PARSE dataset [6], and then used the models to estimate human pose on the LSP dataset. [sent-302, score-0.356]
89 Experiment setting In this experiment we used 3 combined parts, “head”, “left leg”, and “right leg”. [sent-331, score-0.287]
90 We used 6 visual categories for each combined part. [sent-332, score-0.331]
91 In the Yang & Ramanan [4], we used the 9 keypoints in a natural skeleton structure to build the tree model. [sent-335, score-0.48]
92 This experiment demonstrates that our method serves as a very good tool for modeling parts in other articulated objects such as animals. [sent-350, score-0.454]
93 Conclusion This paper addressed three questions in human pose estimation using deformable models. [sent-352, score-0.451]
94 Latent tree models are learned to approximate the joint distributions of body part locations, and single and combined parts are used together for effective inference. [sent-353, score-1.135]
95 Empirical results suggest that our approach outperforms the state of the art in human pose and animal pose estimation. [sent-354, score-0.637]
96 Narasimhan, “Exploring the spatial hierarchy of mixture models for human pose estimation,” in ECCV (5), 2012, pp. [sent-358, score-0.319]
97 Everingham, “Clustered pose and nonlinear appearance models for human pose estimation,” in Proceedings of the British Machine Vision Conference, 2010, doi: 10. [sent-368, score-0.54]
98 Hebert, “How important are deformable parts in the deformable parts model? [sent-454, score-0.672]
99 Taskar, “Cascaded models for articulated pose estimation,” in ECCV (2), 2010. [sent-461, score-0.354]
100 Everingham, “Learning effective human pose estimation from inaccurate annotation,” in CVPR, 2011, pp. [sent-469, score-0.334]
wordName wordTfidf (topN-words)
[('lsp', 0.402), ('tree', 0.342), ('parts', 0.268), ('latent', 0.261), ('leg', 0.238), ('combined', 0.228), ('clgrouping', 0.192), ('pose', 0.181), ('arm', 0.159), ('articulated', 0.127), ('loci', 0.114), ('ijk', 0.099), ('body', 0.092), ('human', 0.092), ('hog', 0.091), ('leeds', 0.091), ('joints', 0.082), ('ramanan', 0.081), ('sport', 0.075), ('graphical', 0.072), ('parse', 0.071), ('deformable', 0.068), ('categories', 0.068), ('voc', 0.064), ('skeleton', 0.062), ('estimation', 0.061), ('category', 0.059), ('experiment', 0.059), ('animal', 0.059), ('relabeled', 0.057), ('speculate', 0.057), ('fitted', 0.056), ('johnson', 0.055), ('everingham', 0.054), ('dij', 0.054), ('state', 0.054), ('guideline', 0.053), ('nodes', 0.051), ('dog', 0.05), ('joint', 0.05), ('questions', 0.049), ('loopy', 0.047), ('tian', 0.047), ('recursive', 0.046), ('models', 0.046), ('elbow', 0.045), ('skeletons', 0.045), ('nanjing', 0.045), ('inference', 0.045), ('trees', 0.045), ('keypoints', 0.045), ('part', 0.044), ('left', 0.044), ('head', 0.044), ('tii', 0.044), ('upper', 0.043), ('physical', 0.043), ('nicta', 0.042), ('variables', 0.042), ('believed', 0.041), ('fang', 0.041), ('grouping', 0.041), ('template', 0.041), ('approximating', 0.041), ('appearance', 0.04), ('surprising', 0.04), ('joining', 0.039), ('outperformed', 0.039), ('right', 0.038), ('mixed', 0.038), ('suggest', 0.038), ('types', 0.038), ('connections', 0.038), ('cross', 0.037), ('magenta', 0.037), ('dataset', 0.037), ('limbs', 0.037), ('fore', 0.037), ('bourdev', 0.036), ('filters', 0.036), ('visual', 0.035), ('australia', 0.035), ('pi', 0.035), ('conference', 0.034), ('distributions', 0.034), ('suggests', 0.033), ('art', 0.032), ('yang', 0.032), ('physically', 0.032), ('accuracies', 0.032), ('lower', 0.031), ('poselet', 0.031), ('build', 0.031), ('single', 0.031), ('training', 0.031), ('testing', 0.031), ('torso', 0.031), ('ti', 0.03), ('recursively', 0.03), ('please', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999869 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
2 0.34273091 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.28669289 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
4 0.24802776 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
Author: Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.
5 0.21145597 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
6 0.20478144 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification
7 0.19995502 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
8 0.19859903 334 cvpr-2013-Pose from Flow and Flow from Pose
9 0.19270211 40 cvpr-2013-An Approach to Pose-Based Action Recognition
10 0.1906939 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
11 0.17956804 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.17717046 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
13 0.16665576 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
14 0.16369601 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
15 0.15996225 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
16 0.15391512 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
17 0.14932442 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
18 0.14867966 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
19 0.14740208 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
20 0.13690424 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
topicId topicWeight
[(0, 0.252), (1, -0.066), (2, 0.01), (3, -0.139), (4, 0.042), (5, 0.045), (6, 0.152), (7, 0.173), (8, 0.049), (9, -0.141), (10, -0.169), (11, 0.242), (12, -0.151), (13, -0.023), (14, -0.013), (15, 0.081), (16, 0.041), (17, -0.072), (18, -0.035), (19, -0.132), (20, -0.051), (21, 0.06), (22, 0.031), (23, -0.061), (24, -0.033), (25, 0.103), (26, 0.022), (27, -0.023), (28, 0.024), (29, 0.063), (30, 0.011), (31, 0.032), (32, 0.041), (33, -0.032), (34, 0.058), (35, -0.02), (36, -0.034), (37, 0.029), (38, -0.005), (39, -0.016), (40, 0.046), (41, -0.013), (42, -0.036), (43, 0.035), (44, -0.021), (45, -0.043), (46, -0.015), (47, 0.008), (48, -0.048), (49, -0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.97567636 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
2 0.92560059 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
3 0.90170926 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
4 0.86279064 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
5 0.8598851 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
6 0.84907174 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
7 0.82076913 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
8 0.80397719 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
9 0.80104983 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
10 0.78775078 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
11 0.69977117 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
12 0.69704264 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
13 0.67231572 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
14 0.65465194 426 cvpr-2013-Tensor-Based Human Body Modeling
15 0.61529303 40 cvpr-2013-An Approach to Pose-Based Action Recognition
16 0.61315376 325 cvpr-2013-Part Discovery from Partial Correspondence
17 0.6105932 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
18 0.60305297 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
19 0.6021595 334 cvpr-2013-Pose from Flow and Flow from Pose
topicId topicWeight
[(10, 0.137), (16, 0.025), (26, 0.037), (28, 0.013), (33, 0.261), (63, 0.014), (67, 0.143), (69, 0.051), (71, 0.145), (80, 0.043), (87, 0.066)]
simIndex simValue paperId paperTitle
1 0.9124763 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection
Author: Wen-Sheng Chu, Fernando De La Torre, Jeffery F. Cohn
Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. STM outperformed generic classifiers in all.
same-paper 2 0.90673149 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
3 0.90039206 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
4 0.89581972 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
Author: Luming Zhang, Mingli Song, Zicheng Liu, Xiao Liu, Jiajun Bu, Chun Chen
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured superpixel sets from image-level labels. Specifically, we first extract graphlets from each image where a graphlet is a smallsized graph consisting of superpixels as its nodes and it encapsulates the spatial structure of those superpixels. Then, a manifold embedding algorithm is proposed to transform graphlets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graphlets. Finally, we propose a novel image segmentation algorithm, called graphlet cut, that leverages the learned graphlet distribution in measuring the homogeneity of a set of spatially structured superpixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.
5 0.89252758 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
6 0.89247465 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
7 0.8893218 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.88860017 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
9 0.88847971 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
10 0.88842869 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
11 0.8879928 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
12 0.88761365 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
13 0.88318282 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
14 0.88309562 414 cvpr-2013-Structure Preserving Object Tracking
15 0.88219613 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
16 0.87971824 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
17 0.87959677 314 cvpr-2013-Online Object Tracking: A Benchmark
18 0.87922204 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
19 0.87905884 325 cvpr-2013-Part Discovery from Partial Correspondence