iccv iccv2013 iccv2013-403 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
Reference: text
sentIndex sentText sentNum sentScore
1 Introduction Most recent approaches to human pose estimation rely on the pictorial structures model representing the human body as a collection of rigid parts and a set of pairwise part dependencies. [sent-7, score-1.187]
2 While effective detectors have been proposed for specific body parts with characteristic appearance such as heads and hands [20, 15], detectors for other body parts are typically weak. [sent-10, score-1.606]
3 Obtaining strong detectors for all body parts is challenging for a number of reasons. [sent-11, score-0.711]
4 The appearance of body parts changes significantly due to clothing, foreshortening and occlusion by other body parts. [sent-12, score-1.122]
5 In addition, the spatial extent of the majority of the body parts is rather small, and when taken independently each of the parts lacks characteristic appearance features. [sent-13, score-0.928]
6 Example pose estimation results and corresponding part marginal maps obtained by (a) our full model combining local appearance and mid-level representation, (b) our best local appearance model and (c) results by Yang&Ramanan; [34]. [sent-15, score-0.984]
7 We argue that in order to obtain effective part detectors it is necessary to leverage both the pose specific appearance of body parts, and the joint appearance of part constellations. [sent-17, score-1.333]
8 Pose specific person and body part detectors have appeared in various forms in the literature. [sent-18, score-0.655]
9 For example, people tracking approaches [24, 14] rely on specialized detectors tailored to specific people poses that are easy to detect. [sent-19, score-0.555]
10 Local [34] and global [17] mixture models that capture pose specific appearance of individual body parts and joints have shown to be effective for pose estimation. [sent-21, score-1.141]
11 This paper builds on findings from the literature and follows two complementary routes to a more powerful pose model: improving the appearance representation and increasing the expressiveness ofthejoint body part model (see Fig. [sent-25, score-1.01]
12 Specifically, we consider local appearance representations based on rotation invariant or rotation specific appearance templates, mixtures of such local templates, specialized models tailored to appearance of 33448870 Figure 2. [sent-27, score-1.534]
13 We extend basic PS model [2] (a) to more flexible structure with stronger local appearance representations including single component part detectors (b) and mixtures of part detectors (c). [sent-29, score-1.266]
14 Then we combine local appearance model with mid-level representation based on semi-global poselets which capture configurations of multiple parts (d). [sent-30, score-0.601]
15 salient body parts such as head and torso, and semi-global representations based on poselet features (Sec. [sent-33, score-0.981]
16 The second main contribution of the paper is to combine the improved appearance model with more expressive body representations. [sent-35, score-0.628]
17 The performance of the best appearance model for individual body parts is surprisingly high and can even compete with some approaches using weaker appearance terms but a full spatial model (Tab. [sent-42, score-1.134]
18 When augmented with the best appearance model, the basic treestructured pictorial structures model perform superior to state-of-the-art models [9, 34] (Tab. [sent-44, score-0.635]
19 We show that strong appearance representations operating at different levels of granularity (mixtures of local templates vs. [sent-46, score-0.529]
20 Finally, we report the best result to date on the “Parse” and “Leeds Sports Poses” benchmarks, which are obtained by combining the best appearance model with the recently proposed image conditioned pictorial structures spatial model of [21] (Tabs. [sent-48, score-0.675]
21 Various appearance representations have been considered in the past within the pictorial structures framework. [sent-51, score-0.572]
22 These appearance models were extended by either including new types of features, or by generalising to mixtures of appearance templates. [sent-53, score-0.615]
23 Various local appearance models have been proposed, including stretchable models representing local appearance of body joints [34, 3 1, 26] and cardboard models modelling appearance of body parts as rigid templates [2, 25, 16]. [sent-57, score-1.994]
24 Recently several works have been looking into semi-global representations based on multiple parts or poselets [15, 33] and global representations for entire bodies in various configurations [16, 22]. [sent-58, score-0.444]
25 Also specialized models for detection of particular body parts, such as hands, head or entire upper body improve pose estimation results [4, 26, 12]. [sent-59, score-1.375]
26 Many methods use only one type of appearance and focus on other aspects such as efficient search [28, 25], or novel body models [34, 21] (discussed in Sec. [sent-65, score-0.623]
27 In this work we build on strong part detectors and demonstrate that even a basic treestructure spatial human body model achieves state-of-theart performance when augmented with the proper appearance representation. [sent-67, score-1.091]
28 When combining strong appearance models with flexible image-conditioned spatial model, we outperform all current methods by a large margin. [sent-68, score-0.458]
29 Model Formulation The pictorial structures model represents the human body as a collection of rigid parts L = {l1, . [sent-75, score-0.799]
30 Denoting the image observations by D, the energy of the body part configuration L defined by the pictorial structures model is given by ? [sent-80, score-0.699]
31 ∼m The pairwise relationships between body parts are denoted by n ∼ m. [sent-87, score-0.557]
32 This model is composed of N = 10 body parts: head, torso, and left and right upper arms, forearms, upper legs and lower legs. [sent-91, score-0.677]
33 Better Appearance Representations We now turn our attention to improving the appearance representations for body parts. [sent-109, score-0.738]
34 These factors use boosted part detectors over shape context features, one detector per body part. [sent-113, score-0.725]
35 This appearance representation is made independent to the part rotation, by normalising the training examples with respect to part rotation prior to learning. [sent-114, score-0.604]
36 Body Part Detectors The rotation independent representation from [1] is based on a simplifying assumption, namely that the appearance of model parts does not change with part rotation. [sent-119, score-0.694]
37 For example the upper arms raised above the head and the ones held in front of the torso look quite different because of the overlap with other parts and change in the contours of the shoulders. [sent-121, score-0.891]
38 We augment PS with two types of such local representations: 1) a rotation dependent detector tailored to the ab- solute orientation of the part (rot-dep mix) and 2) a rotation invariant representation tailored to a particular body pose (pose-dep mix). [sent-123, score-1.2]
39 As these models do capture rotation dependent appearance changes, we refer to this variant as rot-dep mix. [sent-131, score-0.488]
40 Rotation of the body parts is related to the orientation of the entire body, not necessarily to the absolute value in the image plane. [sent-135, score-0.568]
41 We model this using a part detector that depends on the body pose. [sent-136, score-0.575]
42 For this we normalise the part to a common rotation but rotate the entire body along with it. [sent-137, score-0.576]
43 We also include a simpler baseline which is a single component model trained from rotation-normalised body parts and then again evaluated for all rotations. [sent-146, score-0.56]
44 Head and Torso Detectors (spec-head, spec-torso) We consider two types of specialized part detectors proposed in the literature. [sent-150, score-0.436]
45 The torso detector from [22] and the head detector from [18]. [sent-151, score-0.784]
46 The main rationale behind using such specialized detectors is that body parts such as head and torso have rather specific appearance that calls for specialized part models. [sent-152, score-1.921]
47 Specifically, the torso detector of [22] is directly adapted from the articulated person detector based on a DPM. [sent-153, score-0.674]
48 A torso prediction is obtained by regression using the positions of the latent DPM parts as features. [sent-154, score-0.537]
49 This specialized torso detector benefits from evidence from the entire person and captures the pose. [sent-155, score-0.639]
50 This is in contrast to the previous local torso model as it is not bound to evidence within the torso bounding box only. [sent-156, score-0.786]
51 We refer to the specialized torso detector as spec-torso. [sent-157, score-0.638]
52 The head detector of [18] uses the observation that the main source of variability for the head is due to the viewpoint of the head w. [sent-158, score-0.801]
53 Note that the particular set of components is not available for the local detectors of the head that are either grouped by the in plane rotation or by the pose of the surrounding parts. [sent-165, score-0.728]
54 More flexible Models Besides improving the pure appearance representations several works suggested to alter the model representation to make it more flexible. [sent-187, score-0.578]
55 Body Joints (PS-flex) The original PS model represents body parts as variables, which in turn make appearance changes such as foreshortening very drastic. [sent-191, score-0.808]
56 Follow-up work has suggested to build appearance representation for more local parts while allowing more flexibility in their composition [26, 34]. [sent-192, score-0.479]
57 The additional pairwise terms between joint parts and body parts are modelled as a Gaussian factor w. [sent-197, score-0.704]
58 Since some body and joint parts are restricted to have the same absolute rotation, such as lower arm and wrist, we add a constraint on their rotation and scale to be identical. [sent-201, score-0.865]
59 The basic PS model has a limitation that the spatial distribution of the body parts is modelled as a Gaussian and can not properly represent the multi-modalities of human poses. [sent-206, score-0.659]
60 3 are designed to capture pose dependent appearance of individual parts and pairs of adjacent parts. [sent-221, score-0.569]
61 In order to capture appearance of the person at a higher level of granularity we extend our model with a midlevel poselet based representation and use poselet features described above to obtain rotation and position prediction of each body part separately. [sent-222, score-1.356]
62 Flexible Model We start with a comparison of models us- ing body part appearance alone (PS) with the flexible model PS-flex that includes both joint and body part appearance. [sent-242, score-1.304]
63 When removing the body parts for arms and legs and use only body joints (joints only) the performance drops. [sent-248, score-1.107]
64 Performance of rotation dependent (rot-dep single) and rotation invariant (rot-inv single) single component detectors is reported in Tab. [sent-252, score-0.535]
65 The majority of the poses in the dataset are upright, thus much of head appearance change is captured Setting Torso Upper Lower Upper Fore- Head Total leg leg arm arm PS [2] 80. [sent-257, score-1.187]
66 Rotation dependent mixture of detectors (rot-dep mix) accounts for the characteristic appearance changes of body parts under rotation. [sent-285, score-0.976]
67 While the former detectors are (in)variant to local rotations, they do not take the pose-specific appearance into account. [sent-289, score-0.476]
68 In summary, the best local mixture appearance representation improves over best single component detector by 2. [sent-293, score-0.507]
69 This indicates that mixtures better handle the highly multi-modal local appearance of body parts. [sent-295, score-0.737]
70 We discussed the possibility for designing specialized body part detectors in Section 3. [sent-297, score-0.785]
71 We add those detectors to the pose-dep mix model, also including a Gaussian term on the torso location estimated via Maximum Likelihood on the training annotations. [sent-299, score-0.641]
72 Both the specialized torso and head detector improve the performance of torso and head localization, and via the connected model also improve the performance of other body parts. [sent-302, score-1.814]
73 Even though the better torso prediction improves head localization (+0. [sent-303, score-0.671]
74 3%), a specialized head detector still improves the performance (+1. [sent-304, score-0.539]
75 Since the parts are connected to the head via the torso, the influence of the spechead detector on other body parts is found to be smaller. [sent-306, score-0.976]
76 In summary, specialized detectors improve estimation results for all body parts, and give a +0. [sent-307, score-0.729]
77 Now we combine the best performing local appearance representation with the midlevel representation of [21]. [sent-311, score-0.441]
78 33448914 Setting Torso Upper Lower Upper Fore- Head Total leg leg arm arm PS-flex + rot-dep single + rot-inv single 80. [sent-315, score-0.67]
79 Setting Torso Upper Lower Upper Fore- Head Total leg leg arm arm local appearance + mid-level rot + pos + p/wise 89. [sent-366, score-1.064]
80 Overall, adding mid-level representations to the best performing local appearance model improves the results by 2. [sent-406, score-0.535]
81 Mid-level representation based on semi-global poselets models long range part dependencies, while local appearance model concentrate on local changes in the appearance of body parts. [sent-409, score-1.216]
82 To do so we remove all connections between the parts and evaluate part detectors only. [sent-412, score-0.424]
83 Local mixtures of part detectors allow to model pose-dependent appearance of limbs while strong specific head and torso detectors push Setting Torso Upper Lower Upper Fore- Head Total leg leg arm arm PS-flex 36. [sent-417, score-2.124]
84 the performance of both most salient body parts (67. [sent-440, score-0.496]
85 So, upper/lower arms which are difficult to detect by local detectors profit a lot from semiglobal poselets (+28. [sent-448, score-0.422]
86 Interestingly, our full model including local appearance and mid-level representations outperforms not only the baseline PS [2] (69. [sent-459, score-0.47]
87 9%) who uses similar mid-level representations but have a more simplistic local appearance model based on [2]. [sent-466, score-0.431]
88 We found this result interesting, as it clearly shows how much performance gain can be achieved by improving local part appearance while preserving the mid-level representation. [sent-474, score-0.423]
89 Interestingly, our local appearance model combined with basic Gaussian pairwise terms already outperforms their method (66. [sent-476, score-0.429]
90 This demonstrates the strengths of the proposed local appear- ance model based on mixtures of pose-dependent detectors and specific torso and head detectors. [sent-480, score-1.009]
91 This demonstrates the strength of combining local appearance modelling with flexible mid-level representations. [sent-491, score-0.473]
92 33448925 Setting Torso Upper Lower Upper Fore- Head Total leg leg arm arm Our local appearance Our full model Andriluka et al. [sent-492, score-1.035]
93 3(a)), as it captures the entire pose of the body and models other part dependencies. [sent-548, score-0.598]
94 Typical failure cases of our model include large variations in scale between body parts (Fig. [sent-551, score-0.531]
95 The improvement is achieved for all body parts apart from head and lower legs. [sent-567, score-0.763]
96 Qualitative results: estimated poses and corresponding part marginal maps obtained by (a) our full model combining local appearance and flexible mid-level representation, (b) our local appearance model and (c) results by Yang&Ramanan; [34]. [sent-577, score-0.973]
97 the evidence from a people detector into the PS framework to improve torso localisation. [sent-578, score-0.5]
98 Conclusion In this paper we investigated the use of 1) stronger appearance models and 2) more flexible spatial models. [sent-584, score-0.428]
99 The second route explored in this paper are more flexible spatial body models with image conditioned terms based on mid-level representations, implemented as poselets. [sent-648, score-0.63]
100 Clustered pose and nonlinear appearance models for human pose estimation. [sent-760, score-0.573]
wordName wordTfidf (topN-words)
[('torso', 0.352), ('body', 0.349), ('appearance', 0.244), ('head', 0.234), ('lsp', 0.227), ('detectors', 0.185), ('leg', 0.176), ('pictorial', 0.172), ('arm', 0.159), ('specialized', 0.159), ('parts', 0.147), ('poselet', 0.146), ('rotation', 0.135), ('ps', 0.127), ('pose', 0.127), ('joints', 0.117), ('flexible', 0.113), ('representations', 0.105), ('mix', 0.104), ('pishchulin', 0.103), ('detector', 0.099), ('mixtures', 0.097), ('conditioned', 0.097), ('articulated', 0.095), ('part', 0.092), ('upper', 0.091), ('poselets', 0.087), ('legs', 0.078), ('parse', 0.075), ('andriluka', 0.07), ('arms', 0.067), ('granularity', 0.061), ('pairwise', 0.061), ('forearms', 0.057), ('leeds', 0.054), ('rot', 0.054), ('sports', 0.053), ('dependent', 0.051), ('structures', 0.051), ('dpm', 0.05), ('people', 0.049), ('pos', 0.049), ('ln', 0.049), ('pcp', 0.048), ('tailored', 0.047), ('improves', 0.047), ('local', 0.047), ('unary', 0.047), ('human', 0.045), ('basic', 0.042), ('templates', 0.042), ('absolute', 0.042), ('rotations', 0.042), ('findings', 0.042), ('modelling', 0.041), ('representation', 0.041), ('spatial', 0.041), ('midlevel', 0.04), ('improving', 0.04), ('complementary', 0.04), ('full', 0.039), ('dependencies', 0.039), ('kinematic', 0.039), ('poses', 0.039), ('marginal', 0.038), ('prediction', 0.038), ('articulation', 0.037), ('slda', 0.036), ('profit', 0.036), ('spec', 0.036), ('estimation', 0.036), ('eu', 0.036), ('model', 0.035), ('parsing', 0.034), ('lower', 0.033), ('treestructured', 0.033), ('stretchable', 0.033), ('wrist', 0.033), ('foreshortening', 0.033), ('dk', 0.033), ('tran', 0.032), ('sk', 0.031), ('kde', 0.031), ('sapp', 0.031), ('strengths', 0.031), ('strong', 0.03), ('models', 0.03), ('orientation', 0.03), ('adding', 0.029), ('person', 0.029), ('component', 0.029), ('planck', 0.029), ('offset', 0.029), ('augmented', 0.028), ('performing', 0.028), ('demonstrates', 0.028), ('refer', 0.028), ('cope', 0.027), ('rely', 0.027), ('benchmarks', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
2 0.27152097 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu
Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.
3 0.24317393 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
Author: Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
Abstract: We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropriate interactions between the two modules to solve individual problems. This joint estimation problem is divided into two subproblems, , foreground/background segmentation and pose tracking, which alternate iteratively for optimization; segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides foreground response map for segmentation. The final solution is obtained when the iterative procedure converges. We evaluate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its outstandingperformance compared to the state-of-the-art techniques for segmentation and pose estimation.
Author: Yan Yan, Elisa Ricci, Ramanathan Subramanian, Oswald Lanz, Nicu Sebe
Abstract: We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. As the target (person) moves, distortions in facial appearance owing to camera perspective and scale severely impede performance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearancewise related grid partitions to derive the optimal partitioning. For pose classification, upon determining the target’s position using a person tracker, the appropriate regionspecific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data.
5 0.21642333 143 iccv-2013-Estimating Human Pose with Flowing Puppets
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
6 0.20583212 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
7 0.2018705 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
8 0.19641687 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
9 0.18996106 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
10 0.18368159 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
11 0.18190356 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
12 0.18170567 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
13 0.17511089 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
14 0.158163 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
15 0.14738964 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
16 0.13621932 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
17 0.13033962 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
18 0.12545137 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
19 0.12298644 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
20 0.11271337 118 iccv-2013-Discovering Object Functionality
topicId topicWeight
[(0, 0.241), (1, -0.006), (2, 0.026), (3, 0.026), (4, 0.154), (5, -0.161), (6, -0.022), (7, 0.079), (8, -0.104), (9, 0.103), (10, 0.05), (11, 0.031), (12, -0.191), (13, -0.178), (14, -0.086), (15, 0.18), (16, 0.041), (17, -0.059), (18, 0.083), (19, 0.064), (20, 0.16), (21, 0.062), (22, 0.123), (23, -0.052), (24, 0.021), (25, -0.016), (26, 0.01), (27, -0.055), (28, 0.023), (29, -0.05), (30, 0.009), (31, -0.01), (32, 0.068), (33, -0.026), (34, 0.118), (35, -0.037), (36, 0.02), (37, -0.023), (38, -0.033), (39, -0.026), (40, 0.054), (41, 0.051), (42, -0.008), (43, 0.013), (44, 0.027), (45, -0.004), (46, -0.089), (47, 0.063), (48, -0.022), (49, -0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.98394483 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
2 0.85757113 118 iccv-2013-Discovering Object Functionality
Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.
3 0.85427916 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
Author: Jiongxin Liu, Peter N. Belhumeur
Abstract: In this paper, we propose a novel approach for bird part localization, targeting fine-grained categories with wide variations in appearance due to different poses (including aspect and orientation) and subcategories. As it is challenging to represent such variations across a large set of diverse samples with tractable parametric models, we turn to individual exemplars. Specifically, we extend the exemplarbased models in [4] by enforcing pose and subcategory consistency at the parts. During training, we build posespecific detectors scoring part poses across subcategories, and subcategory-specific detectors scoring part appearance across poses. At the testing stage, likely exemplars are matched to the image, suggesting part locations whose pose and subcategory consistency are well-supported by the image cues. From these hypotheses, part configuration can be predicted with very high accuracy. Experimental results demonstrate significantperformance gainsfrom our method on an extensive dataset: CUB-200-2011 [30], for both localization and classification tasks.
4 0.84648168 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
5 0.80667669 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
6 0.78626835 46 iccv-2013-Allocentric Pose Estimation
7 0.75731397 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
8 0.75725269 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.74288559 143 iccv-2013-Estimating Human Pose with Flowing Puppets
10 0.7373957 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
12 0.70465392 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
13 0.70014143 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
14 0.69237256 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
15 0.65962225 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
16 0.6573177 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
17 0.64878064 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
18 0.62616909 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
19 0.61912227 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
20 0.60680652 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
topicId topicWeight
[(2, 0.076), (7, 0.011), (12, 0.013), (26, 0.08), (31, 0.062), (35, 0.273), (42, 0.095), (64, 0.071), (73, 0.028), (89, 0.181)]
simIndex simValue paperId paperTitle
1 0.89001226 90 iccv-2013-Content-Aware Rotation
Author: Kaiming He, Huiwen Chang, Jian Sun
Abstract: We present an image editing tool called Content-Aware Rotation. Casually shot photos can appear tilted, and are often corrected by rotation and cropping. This trivial solution may remove desired content and hurt image integrity. Instead of doing rigid rotation, we propose a warping method that creates the perception of rotation and avoids cropping. Human vision studies suggest that the perception of rotation is mainly due to horizontal/vertical lines. We design an optimization-based method that preserves the rotation of horizontal/vertical lines, maintains the completeness of the image content, and reduces the warping distortion. An efficient algorithm is developed to address the challenging optimization. We demonstrate our content-aware rotation method on a variety of practical cases.
2 0.831285 119 iccv-2013-Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement
Author: Jin Gao, Junliang Xing, Weiming Hu, Steve Maybank
Abstract: Visual tracking has witnessed growing methods in object representation, which is crucial to robust tracking. The dominant mechanism in object representation is using image features encoded in a vector as observations to perform tracking, without considering that an image is intrinsically a matrix, or a 2nd-order tensor. Thus approaches following this mechanism inevitably lose a lot of useful information, and therefore cannot fully exploit the spatial correlations within the 2D image ensembles. In this paper, we address an image as a 2nd-order tensor in its original form, and find a discriminative linear embedding space approximation to the original nonlinear submanifold embedded in the tensor space based on the graph embedding framework. We specially design two graphs for characterizing the intrinsic local geometrical structure of the tensor space, so as to retain more discriminant information when reducing the dimension along certain tensor dimensions. However, spatial correlations within a tensor are not limited to the elements along these dimensions. This means that some part of the discriminant information may not be encoded in the embedding space. We introduce a novel technique called semi-supervised improvement to iteratively adjust the embedding space to compensate for the loss of discriminant information, hence improving the performance of our tracker. Experimental results on challenging videos demonstrate the effectiveness and robustness of the proposed tracker.
3 0.82039553 104 iccv-2013-Decomposing Bag of Words Histograms
Author: Ankit Gandhi, Karteek Alahari, C.V. Jawahar
Abstract: We aim to decompose a global histogram representation of an image into histograms of its associated objects and regions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively discriminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background histograms for the task of image classification on the PASCAL VOC 2007 dataset.
same-paper 4 0.80137801 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
5 0.72544754 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.
6 0.71505272 21 iccv-2013-A Method of Perceptual-Based Shape Decomposition
7 0.70836139 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
8 0.69428265 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
9 0.68573093 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
10 0.68416876 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
11 0.68402493 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
12 0.68188179 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
13 0.6816892 379 iccv-2013-Semantic Segmentation without Annotating Segments
14 0.68098247 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
15 0.68012297 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
16 0.67784119 143 iccv-2013-Estimating Human Pose with Flowing Puppets
17 0.67561829 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
18 0.67503577 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
19 0.67299354 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
20 0.67262518 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf