iccv iccv2013 iccv2013-273 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
Reference: text
sentIndex sentText sentNum sentScore
1 au , abhinav Abstract In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. [sent-4, score-0.368]
2 The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. [sent-5, score-0.858]
3 Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. [sent-6, score-0.281]
4 The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. [sent-9, score-0.653]
5 After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. [sent-10, score-0.346]
6 Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. [sent-11, score-0.772]
7 The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image. [sent-13, score-0.31]
8 Recent work in 3D pose reconstruction from 2D images can be categorised into (1) data-driven and (2) structure from motion based techniques. [sent-17, score-0.383]
9 In contrast, structure from motion methods extract the 3D points from the corresponding 2D points in different images for the same subject [18, 19] through estimating the camera parameters, bone lengths and parts directions. [sent-24, score-0.526]
10 Due to its limitations in the presence of self-occlusion, we add an inference step handling self-occlusion, improving the initial input to the 3D pose estimation. [sent-29, score-0.528]
11 Subsequently, we project a 3D model onto the 2D joints, which results in a very ambiguous 3D pose. [sent-30, score-0.261]
12 To solve for any remaining ambiguity, we use the Twin-GP regression method [5] to predict novel views from the initial one and project the 3D model onto the initial and synthetic views to estimate the relative depth of the parts. [sent-32, score-1.091]
13 Finally, to solve the problem of the part directions, we ‘borrow’ the unambiguous parts of the synthetic views to correct ambiguous parts of the initial view. [sent-33, score-0.962]
14 The key contributions of this paper are: • A framework for automatic 3D human pose reconsAtru fcrtaimoen wfroormk a single o2mDa image, e hvaumluaatned p on edi rfeficcounlthuman pose scenarios. [sent-34, score-0.604]
15 • A self-occlusion reasoning method to improve the initAial sieslaf-tioocnc step nan rdea stoo ninincgre maseet tohde accuracy eo tfh esta itnei-of-the-art 2D pose estimation, evaluated on a publicly available dataset. [sent-35, score-0.317]
16 • A method to automatically solve for the ambiguity of tAhe m parts’ tdoire acuttioomn aintisctaellayd soofl having teo rely on user input as in [18]. [sent-36, score-0.245]
17 Background While there is a plethora of literature on 3D human pose reconstruction from 2D images, we focus our attention on research to predict the 3D pose using data-driven or struc- ture from motion approaches. [sent-38, score-0.715]
18 Next, multiple synthetic views are generated from the initial view. [sent-40, score-0.47]
19 Then, structure from motion is used to enforce kinematic constraints and reduce the ambiguity. [sent-41, score-0.241]
20 Finally, orientation constraints are enforced from the synthetic views onto the initial input in order to generate the 3D pose. [sent-42, score-0.72]
21 Generally, the steps are: (1) extract features from a 2D image and then (2) infer the 3D pose by using the predefined predictors. [sent-44, score-0.272]
22 [1, 3] used silhouettes as an image descriptor followed by the relevance of a sparse regression method to map the extracted silhouettes to 3D pose and applied it to human tracking [2]. [sent-47, score-0.602]
23 multilevel block of SIFT feature descriptor) and predict the 3D pose in a Bayesian framework. [sent-51, score-0.272]
24 [5] proposed a twin Gaussian process regression method to estimate the 3D pose from Histogram of Oriented Gradients (HOG) and HMAX feature descriptors. [sent-54, score-0.535]
25 In this paper, we propose to reconstruct the 3D pose of a human body in images / frames in an uncontrolled environment. [sent-57, score-0.444]
26 The 3D pose is estimated from the 2D correspondences through a set of images / frames via applying a factorisation method, which was firstly introduced in [17] for reconstructing the 3D pose of a rigid structure. [sent-69, score-0.805]
27 [19], the 3D pose was recovered for articulated object from multiple images of the same subject in different poses by imposing constraints on the rigid and non-rigid structure to reduce the ambiguity. [sent-72, score-0.548]
28 They combine the rigid and non-rigid structure in a non-linear optimisation framework to estimate the camera parameters and bone lengths. [sent-73, score-0.384]
29 Further, for finding a solution to the direction of hallucinated and hidden parts, they require manual input from the user. [sent-77, score-0.265]
30 We provide a solution to decode the direction of the ambiguous parts automatically. [sent-78, score-0.332]
31 Estimating 3D pose from 2D images has been investigated in other recent works, e. [sent-80, score-0.272]
32 [4, 9], which enforce a temporal consistency to reduce the ambiguity, while we estimate the 3D pose from only a single image. [sent-82, score-0.309]
33 Predicting the 3D pose from point correspondences in a single image has been earlier investigated in [16]. [sent-83, score-0.33]
34 [15] utilised a similar initialisation step (starting from noisy 2D points), followed by a different inference scheme. [sent-85, score-0.435]
35 They used covariance matrix adaptation (CMA) to sample the 3D pose space, while our proposed method enforces 11888899 both kinematic and orientation constraints. [sent-86, score-0.462]
36 1, our proposed algorithm can be outlined in three subsequent stages: (1) Initialisation, (2) inferring synthetic views and (3) estimating 3D pose. [sent-90, score-0.419]
37 In the initialisation step, we therefore pursued a small and efficient trick to overcome the problem of self-occlusion (see Section 3. [sent-93, score-0.279]
38 Projecting the 3D model onto the initial view will result in ambiguous poses. [sent-95, score-0.465]
39 We explicitly impose geometric and kinematic constraints to reduce the ambiguity of the 3D pose via pruning those parts that are incompatible with anthropomorphism. [sent-96, score-0.798]
40 However, utilising these constraints only is not sufficient to completely solve the ambiguous parts, especially the direction of the limbs (towards or away from the camera). [sent-97, score-0.312]
41 This allows solving the problem of the remaining ambiguous poses not only for simple lab-controlled cases (e. [sent-100, score-0.271]
42 Initialisation Given the importance of the initialisation step, we first propose a novel way of dealing with self-occlusion to improve the results of the final pose estimation. [sent-105, score-0.509]
43 Mixture of Pictorial Structures: Yang and Ramanan [20] perform human pose estimation by representing the human body parts as a mixture of pictorial structure (MoPS) where the nodes are the parts in different orientations. [sent-106, score-0.778]
44 Following the pattern of the notations in [20], the score of a specific pose configuration is: S(I,p,t) = S(t)+? [sent-107, score-0.272]
45 To remove the ambiguity for the depth of different parts, we propose to infer multiple synthetic views from the initial one, which enables us to impose new constraints about the space of orientation for each bone, reducing the ambiguity of the 3D poses. [sent-158, score-0.981]
46 Based on the extracted 3D joints for each frame, we measured the heading angle of the human pose and then rotated that 3D pose to extract its 3D points in the 360 polar angles. [sent-163, score-0.794]
47 Projecting the landmarks onto the 2D plane with different orientations led to the 2D points of all joints in all polar angles. [sent-164, score-0.273]
48 Normalised Skeleton: The usage of the world coordinates in regression often results in bad predictions due to the large variance in the translation and scaling of the different human skeletons pursuing different actions. [sent-165, score-0.257]
49 The 2D input skeleton is a tree with the cHip point as a root, joints represent the nodes and each edge between a parent and its child nodes represent a bone. [sent-167, score-0.274]
50 we xen− −the x parent − p and child c pair of nodes, and θip,c= tan−1xxppxy−− xxccxy is the orientation of the bone relative to the horizonta−l a xxis. [sent-175, score-0.288]
51 Thirdly, scaling the bone lengths of each skeleton li w. [sent-176, score-0.355]
52 Subsequently, in this section, we will construct a specific model to regress from view ito view j. [sent-184, score-0.279]
53 Recently, Twin-GPR has been used instead of classic regression methods, such as Gaussian process regression and ridge regression, in structured prediction of the 3D pose from image observations. [sent-190, score-0.592]
54 Following [5], we build regression models to generate novel views from the input one. [sent-192, score-0.43]
55 , zjn) are the normalised instan,c·e·s· ,fozr two consecutive views i and j (i. [sent-196, score-0.309]
56 [10] proposed an interesting regression method, which gradually reaches the ground truth in a cascaded fashion. [sent-213, score-0.284]
57 Inspired by [10], we pose the problem of learning viewspecific regression models as a cascaded Twin-GPR problem. [sent-217, score-0.556]
58 Let Reg(θi, zi) be a function based on Twin-GPR, which maps zi → zj where zi is the normalised vector of an input pose, z→j →i sz the vector of the novel view and θi is the view of zi. [sent-218, score-0.741]
59 Algorithm 1, which is computed N times, outlines the steps for generating novel views from the input one. [sent-222, score-0.27]
60 3 Initial View Estimation To initialise the cascaded regression process (Alg. [sent-225, score-0.33]
61 Knowing the initial view of the human pose significantly reduces the ambiguity of the 3D pose reconstruction [4]. [sent-227, score-1.04]
62 The data, which has been used to learn the regression models, also have been utilised to train the Algorithm 1: Cascaded Twin-GPR based synthetic view generation Require: Input pose zi, view θi, step size δ. [sent-230, score-0.95]
63 Iterations N = (θj θi)/δ for view i∈ N do− Regression: zj = Reg(θi, zi) Update θi = θi + δ − enUdp fodrate zi= zj GMM. [sent-231, score-0.365]
64 Given the input image, in the inference, the orientation of the initial view is determined by the class with the maximum likelihood. [sent-234, score-0.324]
65 1 Propagating Ambiguous 3D Poses To estimate the 3D pose, we start with the 2D joints of the initial view and elevate to 3D pose. [sent-239, score-0.376]
66 The 3D pose is parametrised as a vector v = [vT1, ·· · , vnT] of n 3D points corresponding to 2D input points u =· v[u1T, · · · , unT]. [sent-240, score-0.335]
67 The 3D pose retrieval can be seen as a solution ,o·f· a ,liunear system, if multiple input images are available. [sent-241, score-0.335]
68 The kinematic constraints have been enforced via learning the upper and lower bounds of bone angles from the training data as in [19]. [sent-248, score-0.411]
69 As mentioned before, structure from motion based methods reconstruct the 3D pose via estimating the camera scale, bone length and depth by pro- jecting the 3D model onto the 2D point correspondences in different images. [sent-254, score-0.861]
70 Firstly, we remove the ambiguity of the depth for different parts with the help of the synthetic views. [sent-256, score-0.508]
71 Given point correspondences for the input and synthetic views, our aim is to estimate the bone lengths and depths of different parts. [sent-257, score-0.609]
72 The regression step to create multiple synthetic views can result in different bone scales. [sent-258, score-0.77]
73 11889922 To overcome this problem and given that we work with just one image (showing one human body), we can safely constrain the problem by fixing the corresponding bone lengths in all views to be the same as in the initial input image. [sent-259, score-0.7]
74 Valmadre and Lucey [18] compute the magnitude of the depth of each part via a factorisation method starting from a weak perspective projection between the 2D correspondences of different images and then deriving the required parameters by minimising the reconstruction error. [sent-261, score-0.393]
75 Inspired by [18], we utilise the same factorisation approach on the correspondences from the initial view and some of the synthetic views inferring the relative depth of each part. [sent-262, score-0.88]
76 The approach of Valmadre and Lucey [18] failed to solve the ambiguity for many poses with hallucinated parts and, hence, the user was asked to manually determine the direction (i. [sent-264, score-0.588]
77 Then, we determine the remaining ambiguous parts G = (g1, . [sent-269, score-0.289]
78 We repeat the previous two steps on all of the synthetic views, where we project the 3D model onto each synthetic view, which results in a 3D model for each view with some parts being ambiguous and others not. [sent-273, score-0.829]
79 We search over all unambiguous parts in the 3D poses, obtained from the synthetic views, which are corresponding to the ambiguous parts G. [sent-274, score-0.664]
80 Then, we iteratively borrow the direction to the 3D pose of the input image until all ambiguities are removed. [sent-276, score-0.472]
81 That is why we add one view at a time and stop when all ambiguous parts are removed. [sent-278, score-0.402]
82 The big advantage of using structure from motion after regressing multiple views is to prune the noisy predictions introduced by the regression process and to improve the result of the final 3D pose. [sent-280, score-0.559]
83 Experiments We evaluate the performance of our method in recovering the 3D pose from a single image in different experiments in both quantitative and qualitative ways. [sent-282, score-0.356]
84 Data All data used in training both the cascaded Twin-GPR and the GMM estimating the view of the input pose are collected from the CMU Mocap dataset. [sent-285, score-0.612]
85 In contrast, our method estimates 3D pose from a single image. [sent-303, score-0.272]
86 In the initialisation step, we propose a solution to the problem of overlapping and missing parts due to selfocclusion by breaking the springs between non-adjacent nodes. [sent-330, score-0.428]
87 Note that most of the errors are due to the offset in the 2D points resulting from the output of the initialisation step. [sent-334, score-0.237]
88 the computational time, estimating the 3D pose takes around 1min for each input image including the time required to get the initial 2D view. [sent-338, score-0.466]
89 For both techniques, the initialisation is performed via manually annotated 2D points. [sent-346, score-0.237]
90 In our method, the algorithm succeeds in the vast majority ofcases to remove this type of ambiguity by sharing the sign of the unambiguous parts in the various synthetic views. [sent-352, score-0.557]
91 Specifically, the motivation behind this comparison is to show the advantage of employing structure from motion after regressing multiple views from the initial one. [sent-355, score-0.457]
92 Noise that results from the regression predictions is filtered out afterwards in the factorisation, which reduces the ambiguity in the final stage. [sent-356, score-0.342]
93 It is visually evident that handling self-occlusion improves the initialisation accuracy and stops the error from being propagated to the synthesised views and then to the final 3D pose. [sent-367, score-0.501]
94 (b) Results of recovering the 3D pose for the input image by Valmadre et al. [sent-369, score-0.382]
95 Conclusions We propose a 3D pose reconstruction algorithm from a single 2D image. [sent-374, score-0.322]
96 In the initialisation step, we utilise a wellknown 2D part detectors to produce the 2D joints. [sent-375, score-0.274]
97 To enforce more constraints, we generate synthetic views by regressing the initial view to multiple oriented views. [sent-377, score-0.681]
98 The ambiguity is reduced by imposing kinematic and orientation constraints on the 3D ambiguous pose resulting from the projection of a 3D model onto the initial pose. [sent-378, score-1.077]
99 Future work in11889944 (a) (b) Figure 4: Visual comparison of the final 3D pose estimate (a) without and (b) with self-occlusion handling. [sent-381, score-0.309]
100 In (b), the initialisation is accurate, leading to an accurate 3D pose estimate. [sent-383, score-0.509]
wordName wordTfidf (topN-words)
[('pose', 0.272), ('initialisation', 0.237), ('bone', 0.231), ('valmadre', 0.209), ('views', 0.207), ('ambiguity', 0.182), ('ambiguous', 0.178), ('synthetic', 0.172), ('regression', 0.16), ('factorisation', 0.159), ('hallucinated', 0.159), ('mops', 0.149), ('joints', 0.135), ('kinematic', 0.133), ('zj', 0.126), ('cascaded', 0.124), ('utilised', 0.12), ('view', 0.113), ('zi', 0.112), ('parts', 0.111), ('normalised', 0.102), ('regressing', 0.098), ('poses', 0.093), ('unambiguous', 0.092), ('articulated', 0.092), ('initial', 0.091), ('onto', 0.083), ('cmu', 0.081), ('selfocclusion', 0.08), ('skeleton', 0.076), ('body', 0.074), ('humaneva', 0.074), ('initialised', 0.074), ('normalisation', 0.074), ('mocap', 0.073), ('reg', 0.069), ('agarwal', 0.067), ('twin', 0.066), ('parse', 0.064), ('input', 0.063), ('motion', 0.061), ('human', 0.06), ('generalisation', 0.06), ('ibrahim', 0.06), ('radwan', 0.06), ('monocular', 0.059), ('correspondences', 0.058), ('orientation', 0.057), ('handling', 0.057), ('borrow', 0.056), ('pictorial', 0.055), ('silhouettes', 0.055), ('polar', 0.055), ('spring', 0.053), ('cma', 0.053), ('chip', 0.053), ('incompatible', 0.053), ('regress', 0.053), ('reconstruction', 0.05), ('jogging', 0.049), ('minimising', 0.049), ('lengths', 0.048), ('recovering', 0.047), ('constraints', 0.047), ('pages', 0.047), ('gmm', 0.046), ('abhinav', 0.046), ('initialise', 0.046), ('inference', 0.045), ('reasoning', 0.045), ('canberra', 0.044), ('utilising', 0.044), ('rigid', 0.044), ('bo', 0.044), ('depth', 0.043), ('direction', 0.043), ('pursued', 0.042), ('mixtures', 0.041), ('estimating', 0.04), ('occlusion', 0.04), ('parentheses', 0.039), ('ambiguities', 0.038), ('divergence', 0.038), ('reconstruct', 0.038), ('skeletons', 0.037), ('optimisation', 0.037), ('sigal', 0.037), ('utilise', 0.037), ('qualitative', 0.037), ('estimate', 0.037), ('occluded', 0.035), ('camera', 0.035), ('mixture', 0.035), ('pi', 0.034), ('projection', 0.034), ('projecting', 0.034), ('noisy', 0.033), ('uncalibrated', 0.033), ('lucey', 0.033), ('wei', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
2 0.2184934 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
3 0.20282786 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
Author: Chi Xu, Li Cheng
Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.
4 0.20106901 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.
6 0.18403222 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
7 0.18170567 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
8 0.1760156 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
9 0.17408872 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
10 0.15060186 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
11 0.1452717 143 iccv-2013-Estimating Human Pose with Flowing Puppets
12 0.13804214 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
13 0.13639098 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.13560943 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
15 0.13363114 46 iccv-2013-Allocentric Pose Estimation
16 0.13163964 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
17 0.12868868 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
18 0.12597442 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
20 0.12264047 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
topicId topicWeight
[(0, 0.272), (1, -0.087), (2, -0.024), (3, 0.054), (4, 0.071), (5, -0.114), (6, 0.045), (7, -0.001), (8, -0.063), (9, 0.127), (10, 0.041), (11, -0.001), (12, -0.19), (13, -0.104), (14, -0.017), (15, 0.156), (16, 0.023), (17, -0.119), (18, 0.044), (19, 0.101), (20, 0.112), (21, -0.014), (22, 0.074), (23, 0.006), (24, 0.01), (25, -0.048), (26, 0.032), (27, 0.025), (28, 0.006), (29, -0.063), (30, 0.04), (31, 0.074), (32, 0.03), (33, -0.009), (34, 0.011), (35, 0.015), (36, 0.036), (37, 0.038), (38, -0.018), (39, 0.002), (40, -0.001), (41, -0.033), (42, -0.082), (43, 0.013), (44, 0.021), (45, 0.033), (46, 0.002), (47, -0.026), (48, -0.008), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.98005247 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
2 0.89673489 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
3 0.88480061 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.
4 0.86444449 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
5 0.86379039 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.
7 0.81936026 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
8 0.80150568 118 iccv-2013-Discovering Object Functionality
9 0.80054402 46 iccv-2013-Allocentric Pose Estimation
10 0.79869479 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
11 0.78448427 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
12 0.77581817 143 iccv-2013-Estimating Human Pose with Flowing Puppets
13 0.75787991 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
14 0.74103147 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
15 0.6888501 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
16 0.68697745 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
17 0.66956466 130 iccv-2013-Dynamic Structured Model Selection
18 0.66516763 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
19 0.64232308 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
20 0.63415927 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis
topicId topicWeight
[(2, 0.061), (7, 0.017), (12, 0.016), (13, 0.275), (26, 0.083), (31, 0.056), (35, 0.016), (42, 0.111), (64, 0.053), (73, 0.042), (89, 0.193)]
simIndex simValue paperId paperTitle
1 0.89835054 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
Author: Jan Stühmer, Peter Schröder, Daniel Cremers
Abstract: We propose a novel method to include a connectivity prior into image segmentation that is based on a binary labeling of a directed graph, in this case a geodesic shortest path tree. Specifically we make two contributions: First, we construct a geodesic shortest path tree with a distance measure that is related to the image data and the bending energy of each path in the tree. Second, we include a connectivity prior in our segmentation model, that allows to segment not only a single elongated structure, but instead a whole connected branching tree. Because both our segmentation model and the connectivity constraint are convex, a global optimal solution can be found. To this end, we generalize a recent primal-dual algorithm for continuous convex optimization to an arbitrary graph structure. To validate our method we present results on data from medical imaging in angiography and retinal blood vessel segmentation.
Author: Yannis Avrithis
Abstract: Inspired by the close relation between nearest neighbor search and clustering in high-dimensional spaces as well as the success of one helping to solve the other, we introduce a new paradigm where both problems are solved simultaneously. Our solution is recursive, not in the size of input data but in the number of dimensions. One result is a clustering algorithm that is tuned to small codebooks but does not need all data in memory at the same time and is practically constant in the data size. As a by-product, a tree structure performs either exact or approximate quantization on trained centroids, the latter being not very precise but extremely fast. A lesser contribution is a new indexing scheme for image retrieval that exploits multiple small codebooks to provide an arbitrarily fine partition of the descriptor space. Large scale experiments on public datasets exhibit state of the art performance and remarkable generalization.
3 0.8302778 145 iccv-2013-Estimating the Material Properties of Fabric from Video
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
4 0.79854029 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
same-paper 5 0.79727435 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
6 0.712044 389 iccv-2013-Shortest Paths with Curvature and Torsion
7 0.71203673 349 iccv-2013-Regionlets for Generic Object Detection
8 0.69543183 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
9 0.69435543 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
10 0.69251418 414 iccv-2013-Temporally Consistent Superpixels
11 0.69246244 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
13 0.68977606 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
14 0.68899202 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
15 0.68816078 82 iccv-2013-Compensating for Motion during Direct-Global Separation
16 0.68725038 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
17 0.68717682 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
18 0.68620479 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
20 0.68159443 255 iccv-2013-Local Signal Equalization for Correspondence Matching