iccv iccv2013 iccv2013-24 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. [sent-6, score-0.607]
2 In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. [sent-8, score-0.748]
3 Introduction Reasoning about human pose is a key ingredient in recent successful applications of computer vision systems [20]. [sent-14, score-0.33]
4 Accurately capturing the variability of human pose is challenging because there is both a variation between different persons as well as a combinatorial number of possible poses a single person can assume. [sent-15, score-0.384]
5 In this paper we propose a pose prior, a generative probabilistic model of static human pose. [sent-16, score-0.288]
6 A good pose prior must generalize to unseen poses and persons. [sent-18, score-0.37]
7 In order to generalize the prior must be compositional: it must represent the variations of parts that frequently occur together and produce a pose by combining these parts. [sent-24, score-0.242]
8 We achieve compositionality by factorizing the pose representation into a Bayesian network [13]. [sent-25, score-0.674]
9 The sparse hierarchical structure of the network enables efficient computa- tion of likelihoods and exact sampling. [sent-26, score-0.418]
10 To apply a Bayesian network on human pose data we need to specify the network structure and conditional probability distributions along the network and it is here that we make two novel technical contributions. [sent-27, score-1.378]
11 First, we enhance the representative power of Bayesian networks by proposing non-parametric Bayesian networks in which the conditional distributions are represented by conditional kernel density estimates. [sent-28, score-0.61]
12 Second, we use structure learning to obtain the network structure by finding parts of the pose that strongly depend on each other, leveraging non-parametric mutual information estimators on continuous joint data. [sent-29, score-0.765]
13 Related Work Pose priors are most often used within pose estimation systems and therefore some of the related works we discuss below incorporate a likelihood term that is computed from an observed image. [sent-38, score-0.25]
14 A natural idea to build a pose prior is to use the tree structure of the human skeleton as a starting point. [sent-40, score-0.595]
15 Models that follow the skeletal structure are called kinematic chain mod1281 els [2] and they allow us to incorporate prior beliefs about joint angles. [sent-41, score-0.632]
16 In [17] the authors used a multivariate Normal distribution along the kinematic chain and estimate the parameters from motion capture data. [sent-42, score-0.51]
17 The different choices of possible parametrizations in terms of joint angles or relative world coordinates in a kinematic tree model give rise to qualitatively different behaviours [10]. [sent-43, score-0.569]
18 Despite this flexibility a kinematic tree model has clear limitations, as sharply argued in [15]; it is unable to express the coordination of different limbs and fails to represent global balance and gravity constraints. [sent-44, score-0.744]
19 We will demonstrate that we can avoid these limitations by using a tree model that does not correspond to the kinematic chain but instead is chosen to optimally approximate the true distribution of poses. [sent-45, score-0.725]
20 The resulting tree no longer corresponds to a skeleton (Figure 1c and 3b) but retains all computational advantages of a tree-structured model. [sent-46, score-0.23]
21 Previous works have attempted to overcome the limitations of the kinematic tree model in different ways. [sent-47, score-0.524]
22 In [3] the authors have used a global kernel density model on human pose. [sent-48, score-0.292]
23 This model is global and does not reflect the combinatorial nature of human pose hence it is suitable only for modeling specific poses. [sent-49, score-0.339]
24 Another approach proposed in [21] has been to add further interactions to the kinematic tree so that limb-limb coordination and penetration constraints are modelled. [sent-50, score-0.569]
25 Another popular way to improve over the kinematic tree model is to add latent variables to the model. [sent-53, score-0.61]
26 In [15] the authors augment the kinematic tree model by a few latent variables that are identified by factor analysis. [sent-54, score-0.61]
27 The Gaussian Process latent variable model (GPLVM) [16] has been applied as a pose model [6]. [sent-55, score-0.342]
28 In the GPLVM model a low-dimensional latent space is transformed to pose space by means of a Gaussian Process regression function. [sent-56, score-0.297]
29 The Laplacian Eigenmap latent variable model (LELVM) [18] improves on the GPLVM by modeling the manifold of poses using a graph Laplacian and by providing tractable posterior inference in the latent space. [sent-58, score-0.398]
30 An interesting recent model based on a large number of latent binary variables is the implicit mixture of conditional restricted Boltzmann machines (imCRBM) [23]; both estimation and inference are again approximate. [sent-59, score-0.237]
31 In fact, each training pose is represented as one latent vector and they are not combined in an intelligent way. [sent-61, score-0.344]
32 Non-parametric Bayesian Networks In this section we introduce our non-parametric Bayesian network model of human pose and show its tractability. [sent-63, score-0.578]
33 We represent a human body pose by a d-dimensional vector whose components correspond either to angular or xyz coordinates of njoints. [sent-64, score-0.367]
34 Each pose thus decomposes on the joint level, x = [x1, . [sent-65, score-0.256]
35 ,n defines a high-dimensional pose distribution q(X) whose samples we denote by ? [sent-72, score-0.249]
36 A Bayesian network over X is a pair (p, G) where the disBtriabyuetisioann p featcwtoorrkize osv over Xth ies d air pecatired ( acyclic graph hGe, ? [sent-84, score-0.375]
37 The specification of a Bayesian network hence consists of two parts: The definition of a graph str? [sent-96, score-0.375]
38 Learning the Graph Structure The graph structure of a Bayesian network models the local and global (in)dependencies of a distribution. [sent-107, score-0.472]
39 I inn case oyf t htahet rheuflmecatns body, an obvious structure is the kinematic chain, i. [sent-109, score-0.44]
40 , a treestructured network with one parent per variable that follows the adjacency of joints in the body (Figure 1a). [sent-111, score-0.573]
41 Given a fully connected graph G˜ over X with edge weights wjk set equal to the mutual inGfo orvmerat iXon w MI(Xj , X wke)i bhet-s tween Xj and Xk, the solution to (2) can be shown to be the maximum spanning tree of G˜ (with edges directed outwards in a consistent way) [1]. [sent-137, score-0.447]
42 1 InG Gco (wntirtahst e dtog ethse d kiriencetmedat oiuc cwhaaridns, a Chow-Liu tree is thus guaranteed to model those pairs of joints that exhibit a high flow of information, independent of their adjacency in the human body. [sent-138, score-0.445]
43 Using the entropy estimate we sistent as N → ∞ in [ 1We omit arrows from our network visualizations and implicitly sume the orientations to be directed away from the hip node. [sent-156, score-0.406]
44 (4) The computed mutual information is visualized in Figure 1b and we can now solve for the Chow-Liu tree [4] by finding the maximum spanning tree [5] to obtain our final result G, tshheow mna xiinm Figure 1anc. [sent-158, score-0.475]
45 Learning the Local Models Once the network structure is fixed, we need to learn the local conditional distributions p (Xj | pa (Xj)) from training data. [sent-161, score-0.557]
46 Our approach will be to compute a conditional kernel density estimate (CKDE) in which we can condition on given values Y = y as needed. [sent-166, score-0.276]
47 In summary, we can compute the CKDE density p(x|y) efficiently and at the same asymptotic complexity as txh|ey joint cKieDnEtly density pth(xe, s ay)m. [sent-201, score-0.261]
48 Log-likelihoods and Sampling There are two important operations to perform in applications of our model as a pose prior: computing the likelihood of a given pose and sampling a pose from the prior. [sent-204, score-0.69]
49 Given a Chow-Liu/CKDE network with n variables, the log-likelihood logp (x) of a new observation x ∈ Rd is ? [sent-207, score-0.32]
50 This allows a detailed analysis of a pose not possible in global methods. [sent-218, score-0.262]
51 Thanks to the closed-form solution for a conditional Gaussian, we can employ standard ancestral sampling [13], i. [sent-220, score-0.255]
52 e, we find a topological ordering τ for the network structure and draw samples from p(Xτ(j) |Xpa(τ(j)) ), for j = 1, . [sent-221, score-0.374]
53 The H36M skeleton includes some spurious joints othseats we dheele Hte3, 6wMhi cshk erelestuonlts i ninc tluhed same 2e0 joints present in the Kinect skeleton [20]. [sent-260, score-0.378]
54 Pose Model We start by learning a pose model on the H36M training set according to the techniques introduced in section 2. [sent-264, score-0.258]
55 The resulting network structure is displayed in Figure 1c and it is worth noting some of its properties: 1. [sent-265, score-0.336]
56 Note that this does not apply to the kinematic chain. [sent-267, score-0.353]
57 The uninformative pairs of nodes present in the kinematic chain (red edges in Figure 1a) are circumvented in the Chow-Liu tree, thus guaranteeing, from an information-theoretic point of view, optimal conditional distributions under the given constraint of a sparse structure. [sent-269, score-0.768]
58 Subgraphs containing joints with high entropies (Figure 1d), such as the arms and legs, largely follow the kinematic chain. [sent-271, score-0.524]
59 This confirms the intuitive belief that joints with high uncertainty should be conditioned on nearby joints, as they provide the maximum information about a joints position in this case. [sent-272, score-0.366]
60 Using our Matlab implementation 1284 Figure 2: We show samples drawn from our non-parametric pose prior to give are untouched and were generated from a single Chow-Liu/CKDE model. [sent-276, score-0.28]
61 As described in section 2, our model consists of two components: estimation of the graph structure (non-parametric Chow-Liu tree) and estimation of the local distributions (conditional kernel density estimation). [sent-286, score-0.435]
62 The options for the conditional distributions are our CKDE approach and a Gaussian linear (GL) network [13]. [sent-289, score-0.464]
63 In the higher-order kinematic chain each joint is additionally conditioned on its parents’ parents. [sent-292, score-0.63]
64 We use parametric MI-estimation for the parametric GL network and our distance-based non-parametric MIestimation for the non-parametric CKDE network. [sent-294, score-0.466]
65 The network approaches are complemented by a comparison to the global GPLVM [16], where we employ the popular FITC approximation [22] together with subsampling to achieve tractability. [sent-295, score-0.374]
66 We use a reference implemenTable 1: Expected log-likelihoods of GL- and CKDE networks for different graph structures and a comparison to global methods. [sent-296, score-0.216]
67 89 Independent Gaussian linear Kinematic chain (order 1) network Kinematic chain (order 2) Chow-Liu tree −352. [sent-309, score-0.775]
68 03 Independent CKDE network KKiinneemmaatti cc cchhaaiinn ( oorrddeerr 21)) −322. [sent-325, score-0.29]
69 Let us now turn to the network approaches and analyze their graph structures. [sent-339, score-0.375]
70 Not surprisingly, a network modeling the joints independently performs worst, with test ELLs of −346 (GL) and −322 (CKDE). [sent-340, score-0.42]
71 Lawrence / fgplvm/ 1285 Figure 3: In (a), we show samples from the “wave” training set (left, 2 pose classes) and samples drawn from the learned model (right, 4 pose classes). [sent-347, score-0.545]
72 Higher-order kinematic tcoha −in3s1 improve on t −he2 7re1su (lCtsK by a)n. [sent-352, score-0.353]
73 The direct comparison of CKDE- to GL networks is unambiguous: CKDE networks perform consistently better, independent of the graph structure. [sent-360, score-0.279]
74 On the other hand, parametric networks based on the kinematic chain are too flexible in the sense that they allow arbitrary combinations of the position of different limbs. [sent-365, score-0.678]
75 At the same time, Gaussian linear networks are not flexible enough in the sense that their local distributions cannot cope with multimodality, which is essential when modeling human pose. [sent-371, score-0.219]
76 Ideally, we would like to have flexibility and compositionality only where it is adequate and needed. [sent-372, score-0.226]
77 We then learn a pose model according to section 2, draw 5000 samples from it and cluster them into 4 clusters using k-means. [sent-374, score-0.42]
78 Consequently, the joint positions of the latter are all modeled conditional on the corresponding joint positions of the former. [sent-381, score-0.27]
79 The samples generated by this model (Figure 3a (right)) fall into 4 distinct pose classes. [sent-382, score-0.249]
80 Two of the four clusters (coloured in purple and red) correspond to poses also present in the training set. [sent-383, score-0.276]
81 The other two clusters (coloured in blue and green) represent newly learned poses that do not appear in the training data: a neutral pose (both hands lowered) and a pose with both hands raised. [sent-384, score-0.698]
82 Real-time Scoring Time is a critical factor in applications such as tracking or pose estimation. [sent-402, score-0.246]
83 At training time, we cluster all training points into clusters C1, . [sent-411, score-0.265]
84 At test time, we partition the clusters into a set of core clusters Ce and a set of approximate clusters Ca based on the following scheme: Given a test pose x ∈ Rd, we use tnh teh ekd f-otrlleoew tion gde stcehrmeminee: tGheiv eclnus ate tersst w pohosese x ce ∈n Rters lie closest to x. [sent-415, score-0.77]
85 We then evaluate all training points within the core clusters exactly. [sent-418, score-0.256]
86 As the number of core clusters approaches the total number of clusters (or as the number of total clusters approaches the total number of training points), our approximate method converges to the exact log-likelihood. [sent-439, score-0.648]
87 Since the contribution of a training point to the loglikelihood decreases exponentially with its distance from the test point, a few core clusters should suffice to achieve a high level of accuracy. [sent-440, score-0.256]
88 Figure 4a shows the results in terms of accuracy and speed for a local log-likelihood: If an absolute error of 10−2 nats is acceptable, we need as few as 4 core clusters and the runtime is 1. [sent-444, score-0.336]
89 Adding more core clusters further decreases the error, while the runtime increases sublinearly. [sent-448, score-0.259]
90 As the evaluation of a log-likelihood for a Bayesian network in our case requires computation of 2n = 40 local log-likelihoods (see equation (7)), we achieve a total speed of approx. [sent-449, score-0.29]
91 Conclusion We have introduced a fully non-parametric Bayesian network model of human pose. [sent-454, score-0.367]
92 In order to learn the network structure, we have used a continuous variant of the ChowLiu tree, in which we have obtained the required estimates of mutual information by means of a non-parametric entropy estimator. [sent-455, score-0.464]
93 The comparison of different graph structures has shown that our non-parametric approach to structure learning outperforms the widely used kinematic chain and also a higher-order variant thereof by a significant margin. [sent-459, score-0.641]
94 We expect widespread applicability in domains such as tracking, pose estimation and pose denoising. [sent-462, score-0.461]
95 Efficient kernel density estimation using the fast Gauss transform with applications to color modeling and tracking. [sent-505, score-0.203]
96 Beyond trees: Commonfactor models for 2D human pose recovery. [sent-562, score-0.288]
97 Real-time human pose recognition in parts from single depth images. [sent-595, score-0.288]
98 Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. [sent-604, score-0.319]
99 Dynamical binary latent variable models for 3D human pose tracking. [sent-619, score-0.419]
100 Modeling mutual context of object and human pose in human-object interaction activities. [sent-644, score-0.384]
wordName wordTfidf (topN-words)
[('kinematic', 0.353), ('network', 0.29), ('ckde', 0.288), ('gplvm', 0.259), ('pose', 0.211), ('compositionality', 0.173), ('tree', 0.171), ('chain', 0.157), ('clusters', 0.133), ('joints', 0.13), ('bayesian', 0.122), ('gl', 0.114), ('conditional', 0.112), ('density', 0.108), ('xj', 0.097), ('mutual', 0.096), ('poses', 0.096), ('parametric', 0.088), ('ancestral', 0.086), ('xpa', 0.086), ('latent', 0.086), ('graph', 0.085), ('exact', 0.082), ('networks', 0.08), ('entropy', 0.078), ('human', 0.077), ('nats', 0.077), ('conditionally', 0.076), ('core', 0.076), ('conditioned', 0.075), ('kinect', 0.068), ('bb', 0.067), ('gaussian', 0.066), ('distributions', 0.062), ('yy', 0.06), ('skeleton', 0.059), ('chowliu', 0.058), ('fitc', 0.058), ('lehrmann', 0.058), ('lelvm', 0.058), ('tuebingen', 0.058), ('wjk', 0.058), ('sampling', 0.057), ('kernel', 0.056), ('flexibility', 0.053), ('global', 0.051), ('runtime', 0.05), ('training', 0.047), ('circumvented', 0.047), ('coloured', 0.047), ('xyz', 0.047), ('structure', 0.046), ('nonparametric', 0.045), ('scoring', 0.045), ('variable', 0.045), ('joint', 0.045), ('coordination', 0.045), ('hips', 0.045), ('tue', 0.045), ('approximate', 0.044), ('parent', 0.043), ('gauss', 0.042), ('ingredient', 0.042), ('tractability', 0.042), ('arms', 0.041), ('ell', 0.041), ('oyf', 0.041), ('freely', 0.04), ('ce', 0.04), ('mi', 0.04), ('estimation', 0.039), ('cluster', 0.038), ('samples', 0.038), ('wave', 0.038), ('gravity', 0.038), ('hip', 0.038), ('kde', 0.038), ('waving', 0.038), ('spanning', 0.037), ('feet', 0.037), ('uninformative', 0.037), ('sigal', 0.036), ('tracking', 0.035), ('independent', 0.034), ('positions', 0.034), ('adjacency', 0.033), ('mpi', 0.033), ('actors', 0.033), ('leg', 0.033), ('laplacian', 0.033), ('subsampling', 0.033), ('fleet', 0.033), ('limbs', 0.033), ('unseen', 0.032), ('body', 0.032), ('estimators', 0.031), ('hypotheses', 0.031), ('belief', 0.031), ('prior', 0.031), ('logp', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
2 0.20106901 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
3 0.19623801 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
Author: Chi Xu, Li Cheng
Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.
4 0.17997633 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
5 0.16835667 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.
6 0.15931271 153 iccv-2013-Face Recognition Using Face Patch Networks
7 0.15031643 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
8 0.14935492 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
9 0.14799161 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
10 0.14738964 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
11 0.14618655 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
12 0.13610734 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
13 0.12921683 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
14 0.10970455 46 iccv-2013-Allocentric Pose Estimation
15 0.10736759 214 iccv-2013-Improving Graph Matching via Density Maximization
16 0.10507853 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
17 0.10483655 143 iccv-2013-Estimating Human Pose with Flowing Puppets
18 0.10196447 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
20 0.097434118 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
topicId topicWeight
[(0, 0.24), (1, -0.017), (2, -0.014), (3, 0.019), (4, 0.073), (5, -0.06), (6, 0.02), (7, 0.047), (8, -0.045), (9, 0.054), (10, -0.039), (11, -0.079), (12, -0.15), (13, -0.049), (14, 0.009), (15, 0.212), (16, -0.006), (17, -0.184), (18, 0.027), (19, 0.114), (20, 0.054), (21, -0.002), (22, 0.142), (23, -0.029), (24, -0.005), (25, -0.049), (26, 0.049), (27, 0.011), (28, 0.058), (29, 0.019), (30, 0.081), (31, 0.085), (32, 0.039), (33, -0.023), (34, -0.011), (35, 0.035), (36, 0.007), (37, -0.045), (38, 0.037), (39, -0.008), (40, -0.059), (41, 0.024), (42, -0.032), (43, -0.074), (44, -0.012), (45, 0.028), (46, 0.003), (47, 0.034), (48, -0.008), (49, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.9685241 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.
3 0.79860377 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
4 0.78457433 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
5 0.75538522 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.
7 0.6751616 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
8 0.67356873 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
9 0.67122048 46 iccv-2013-Allocentric Pose Estimation
10 0.65481901 143 iccv-2013-Estimating Human Pose with Flowing Puppets
11 0.64819402 130 iccv-2013-Dynamic Structured Model Selection
12 0.64703161 118 iccv-2013-Discovering Object Functionality
13 0.64659458 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
14 0.62046999 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
15 0.61643529 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
16 0.61202544 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
17 0.6028353 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
18 0.58239681 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
19 0.55729014 47 iccv-2013-Alternating Regression Forests for Object Detection and Pose Estimation
20 0.5418179 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
topicId topicWeight
[(2, 0.094), (7, 0.018), (12, 0.023), (26, 0.071), (31, 0.068), (35, 0.029), (40, 0.012), (42, 0.112), (48, 0.011), (64, 0.046), (73, 0.036), (84, 0.01), (88, 0.183), (89, 0.178), (95, 0.018)]
simIndex simValue paperId paperTitle
1 0.88421392 393 iccv-2013-Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos
Author: Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Abstract: We describe a novel method that simultaneously clusters and associates short sequences of detected faces (termed as face tracklets) in videos. The rationale of our method is that face tracklet clustering and linking are related problems that can benefit from the solutions of each other. Our method is based on a hidden Markov random field model that represents the joint dependencies of cluster labels and tracklet linking associations . We provide an efficient algorithm based on constrained clustering and optimal matching for the simultaneous inference of cluster labels and tracklet associations. We demonstrate significant improvements on the state-of-the-art results in face tracking and clustering performances on several video datasets.
same-paper 2 0.84271836 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.
3 0.82521492 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
Author: Dahua Lin, Sanja Fidler, Raquel Urtasun
Abstract: In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.
4 0.80096662 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
Author: Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.
5 0.79938853 180 iccv-2013-From Where and How to What We See
Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
6 0.79394257 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
7 0.79383278 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
8 0.79350752 349 iccv-2013-Regionlets for Generic Object Detection
9 0.79289436 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.79278648 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
11 0.7925154 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
12 0.79196697 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
13 0.79176843 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
14 0.79147041 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
15 0.79145443 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
16 0.79116648 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
17 0.7903477 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
19 0.78948617 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
20 0.78933245 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria