cvpr cvpr2013 cvpr2013-2 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
Reference: text
sentIndex sentText sentNum sentScore
1 We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. [sent-2, score-0.521]
2 We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. [sent-3, score-0.232]
3 The 3D pictorial structures are evaluated on multiple view data from a professional football game. [sent-4, score-0.513]
4 Introduction Human pose estimation is an important problem in computer vision [11]. [sent-7, score-0.192]
5 It comes in many different flavors depending on the final goal and the assumptions made: • Estimate pose in 2D or 3D. [sent-8, score-0.159]
6 • Estimate pose from a single time frame or a sequence. [sent-9, score-0.159]
7 • Estimate pose from a single camera view or multiple. [sent-10, score-0.269]
8 In this paper we focus on human pose estimation in 3D, at a single time frame, using multiple views, imposing a weak pose prior. [sent-12, score-0.408]
9 We explore how pictorial structures can be used to solve this problem. [sent-13, score-0.269]
10 From a wider perspective, pictorial structures are interesting since they might provide a unifying framework for general pose estimation and object detection in both 2D and 3D. [sent-14, score-0.461]
11 Pictorial structures simplify the inference over the high-dimensional space of human poses, by modeling the dependencies between body parts as a tree structure, as opposed to a general graph. [sent-16, score-0.19]
12 Using dynamic programming over the tree graph a global optimum of a cost function is computed. [sent-18, score-0.147]
13 This is pose that best fits the images from a set of calibrated cameras, using dynamic programming. [sent-19, score-0.186]
14 the state-of-the-art for single view human 2D pose estimation [9, 8, 16, 1]. [sent-20, score-0.302]
15 The pictorial structures framework also works well for general 2D object detection. [sent-21, score-0.269]
16 Recently this type of model has also been extended to 3D pose estimation of general objects [12], where in this case pose corresponds to the single overall rotation of the object relative to the camera. [sent-23, score-0.446]
17 However, pictorial structures have not been used as much for 3D pose estimation of humans, or articulated objects in general. [sent-24, score-0.503]
18 [2] do multiple view 3D pose estimation, by first inferring the 2D pose in each view. [sent-26, score-0.428]
19 They argue that while efficient 2D pose estimation relies on a discretization, this is not practical in 3D. [sent-33, score-0.221]
20 This has two disadvantages compared to the discretized pictorial structures, commonly used in 2D. [sent-35, score-0.226]
21 2D rotations are simply described by a single angle, which can be used to create a grid of evenly spread rotations, such that two discrete rotations can be composed to another discrete rotation. [sent-60, score-0.754]
22 The space of 3D rotations is more complicated and has no gold-standard parametrization. [sent-61, score-0.234]
23 It is not obvious how to create a discrete set of 3D rotations that are evenly spread and can be easily composed. [sent-62, score-0.395]
24 Furthermore, the space of translations and rotations in 2D together form a 3D space, whereas the space of translations and rotations in 3D together form a 6D space. [sent-65, score-0.704]
25 A discretization of 3D poses would therefore require considerably more points. [sent-66, score-0.171]
26 We aim to show that discrete pictorial structures in 3D are practical and tractable. [sent-69, score-0.375]
27 1we discuss weak pose priors leading to tractable inference. [sent-73, score-0.252]
28 In the experiments section 3 we evaluate our model on multiple view × data from a professional football game. [sent-79, score-0.244]
29 Model In this section we initially present a general overview of our model and framework that is consistent with pictorial structures in 2D. [sent-82, score-0.269]
30 The state Xn = (Tn, Rn) of each part n is defined by its global translation Tn and global rotation Rn in 3D. [sent-85, score-0.33]
31 Outcomes of these random variables are denoted by xn = (tn, rn) and assumed to be elements of the discrete set ΩX = ΩT ΩR. [sent-87, score-0.369]
32 2 we discuss how the space of trans×latΩi ons and rotations in 3D are discretized to give ΩT ⊂ R3 and ΩR ⊂ SO(3). [sent-89, score-0.264]
33 We assume the parts are connected in a tree graph and the state of part n only depends on the state of its parent pa(n) : PXn|X(xn | x) = PXn|Xpa(n) (xn | xpa(n)) The joint distribution of all parts then factorizes as: PX(x) = ? [sent-94, score-0.476]
34 , InC) be a random variable representing the image evidence from C views of part n. [sent-100, score-0.14]
35 We assume the evidence from different views are independent and therefore the likelihood of part n in state xn generating the image evidence in can be written in terms of the likelihood functions for each view: PIn|Xn(in | xn) = ? [sent-101, score-0.518]
36 c This likelihood provides an image matching score or a goodness-of-fit to all camera views for a part given its state, thereby imposing view constraints. [sent-103, score-0.272]
37 , IN) is = the image evidence of all parts and we assume In is conditionally independent of all I In given Xn, the full joint \ ddiitsitornibaultliyon in over andlle tnhte o o rfan adllo Im \ v Iariables factorizes as: PX,I(x,i) = ? [sent-107, score-0.191]
38 We want to find the most probable state x∗ of the parts given measurements of their images i. [sent-110, score-0.148]
39 This corresponds to solving the discrete optimization problem: x∗ = arg max PX|I (x | i) = arg max PX,I (x, i) (5) x xm Since the objective function is factorized over a tree graph, the global maximum can be found using the max-product algorithm [3]. [sent-111, score-0.153]
40 (6) 333666111977 Algorithm 1 Max-product for our model mn (xn) := ln PIn |Xn (in | xn) ∀n for n := N to 2 p := pa(n) for xp ∈ ΩX m˜ := mxnax? [sent-118, score-0.214]
41 mp (xp) + m˜ end end x1∗ := argmax m1 (x1) x1 for n := 2 to N p := pa(n) xn∗ := argxmnax? [sent-121, score-0.232]
42 1Ω we Ωshow| how to reduce the complexity to O(|ΩT | |ΩR|) and O(|ΩT | |ΩR |2), by choosing a pose prior PXn |Xpa(n) )w ahnicdh O exploits the| fact that we are modeling a 3D human skeleton. [sent-127, score-0.195]
43 We assume the global translation and rotation of the root is uniformly distributed and view the conditional probability PXn |Xpa(n) as a prior on the pose of part n given the pose of its parent. [sent-132, score-0.716]
44 The simplest corresponds to modeling the skeleton as a chain of limbs of fixed length and is expressed with: PΔTn(Δtn) =? [sent-138, score-0.34]
45 01 ioft hΔertnwi∈se Mn (14) Another possibility is to use a loose chain model as in standard 2D pictorial structures: PΔTn(Δtn) ∝ N(Δtn | (0, 0, 0)T, σn2 I3) (15) The local translations are then described by a discretized normal distribution with zero mean and isotropic covariance. [sent-140, score-0.379]
46 Rotation Prior The distribution PΔRn describes the possible rotations of the joint connecting two parts. [sent-144, score-0.274]
47 One could of course learn an arbitrary distribution for PΔRn from training data, however, we discount this alternative in this work as we want to impose as few priors as possible on the expected pose of the subject. [sent-148, score-0.199]
48 sH aow tiemveer, ecaomchp olefx tihtye pose prior we suggested a|ll|Ωows| a speed up of the costly innermost loop maximization (6). [sent-150, score-0.266]
49 Thus when looking for the optimal state xn = (tn, rn), we know the translation tn and only need to search over all rotations rn. [sent-152, score-0.979]
50 This is the worst complexity of any comObi(n|aΩtion|| Ωof t|he suggested translation and rotation priors. [sent-156, score-0.176]
51 Discrete Search Grid Using dynamic programming to search for the optimal pose requires a discretization of the state space. [sent-159, score-0.381]
52 Secondly, if we add translations or compose rotations, it should be easy to find the resulting discrete point. [sent-162, score-0.195]
53 The discrete set of translations ΩT is created as a grid covering this cube (fig. [sent-168, score-0.273]
54 Rotation Discretization We use best-candidate sampling [13] to generate a discrete set ofrotations ΩR that are evenly spread. [sent-170, score-0.179]
55 First a large set of candidates are generated by sampling rotations uniformly. [sent-171, score-0.234]
56 For this process we use the unit quaternion representation of rotations [6]. [sent-173, score-0.265]
57 Finally, we convert the rotations from unit quaternions to rotation matrices. [sent-180, score-0.401]
58 The discrete set ofrotations ΩR now fulfills our first requirement of being evenly spread (fig. [sent-181, score-0.208]
59 The discrete set of translations ΩT is generated as a grid covering a bounding cube. [sent-184, score-0.243]
60 The discrete set of rotations ΩR is generated by sampling u=ni 3t quaternions, i. [sent-186, score-0.311]
61 Ideally, we would like the composition of two rotations in ΩR to be also in ΩR. [sent-190, score-0.234]
62 |Ω IfR w w| eta hblaev ew |hΩere| trhotea teiloemne sntatt ewsi twh ein pdriecceosm mi pauntde j i |sΩ Ωthe| i× nd |eΩx to| ttahbel e ro wtahtieoren in ΩR that is closest to the composition of the rotations with indices iand j. [sent-196, score-0.234]
63 ΩIn |section 3 we explore the tractable number of grid points. [sent-198, score-0.141]
64 On the one hand, it allows us to find the global optimum in a tractable way. [sent-202, score-0.164]
65 Researchers have addressed this problem, but frequently it involves dropping the tree assumption and using a global objective function which couples all parts [2, 14, 1]. [sent-210, score-0.144]
66 Whereas the parts should be allowed to intersect in 2D, they should never be allowed to intersect in 3D. [sent-213, score-0.162]
67 To allow simple and fast intersection tests we model the parts as capsules, i. [sent-229, score-0.138]
68 This gives a new pose whose parts do not intersect and its associated score. [sent-233, score-0.274]
69 We annotated the 2D pose in each view for 214 consecutive frames. [sent-239, score-0.269]
70 Using these 2D measurements the cameras were synchronized and calibrated and the pose was reconstructed in 3D, using affine factorization [4]. [sent-240, score-0.188]
71 Our primary questions are: • Are pictorial structures in 3D a practical solution? [sent-242, score-0.298]
72 • What is the necessary level of discretization needed to represent hheu mnecaen poses einv e3lD of? [sent-243, score-0.171]
73 To answer these questions we first investigate what levels of discretizations are tractable in terms of memory consumption. [sent-245, score-0.181]
74 In table 1 we list the memory requirements for this array for different translation ΩT and rotation ΩR discretizations. [sent-281, score-0.21]
75 Finally, we explore what level of discretization that is necessary to obtain an acceptable estimate of the 3D pose. [sent-293, score-0.144]
76 This avoids conflating inaccuracies in the measurement process with the coarseness of the grid discretization, when analyzing the cause of errors in the final 3D pose estimate. [sent-295, score-0.207]
77 The synthetic scores are computed from 2D pose annotations. [sent-296, score-0.198]
78 Let the annotated start and end points of part n in view c be denoted by s(icn) and e(icn). [sent-298, score-0.214]
79 If the part is in state xn the projected start and end points are denoted scn (xn) and ecn (xn). [sent-299, score-0.521]
80 The estimated 3D pose (red) is the pose closest to the ground truth pose (blue), that is possible to represent with the given discretization. [sent-307, score-0.477]
81 We believe this extra level of detail is needed since the hard joint angle constraints remove some of the local rotations of each part. [sent-313, score-0.356]
82 More specifically, it removes some of the rotations that approximately rotate the part around its own axis, but result in slightly different end positions. [sent-314, score-0.338]
83 Automatic Part Detection These experiments test automatic pose estimation using algorithm 2. [sent-318, score-0.192]
84 Each 3D rotation then corresponds to a 2D rotation and change of aspect ratio of this rectangle. [sent-321, score-0.19]
85 A quantitative summary of the results of our pose estimation to real images from 20 different frames. [sent-323, score-0.192]
86 2 (in blue) are used to measure performance of pose estimation using 1, 2 or 3 cameras. [sent-326, score-0.192]
87 Using 2D pose annotations we train a binary logistic regression classifier [3], to allow a probabilistic interpretation, for each part. [sent-331, score-0.159]
88 Each 3D position and rotation of the part corresponds to a 2D position, rotation and aspect ratio of the rectangle in each view. [sent-335, score-0.266]
89 (19) the ground truth 3D coordinates of the start and end point of part n and sn and en the algorithm’s estimate. [sent-344, score-0.136]
90 More importantly, the table and figure show that given a 2D part detector, the 3D pictorial structures frame-work can improve the accuracy of the estimation by imposing view, skeleton and intersection constraints in 3D. [sent-353, score-0.81]
91 Conclusions and Future Work We have described and implemented a frame-work for 3D pictorial structures that can be used for multiple view articulated pose estimation. [sent-355, score-0.58]
92 Thanks to the discretization of the search space a globally optimal pose can be computed. [sent-356, score-0.303]
93 The first algorithm (2) imposes view and skeleton constraints. [sent-358, score-0.454]
94 Finding an efficient way of computing max-convolutions over discrete subsets of SO(3) would speed up the second algorithm, imposing joint angle constraints. [sent-363, score-0.254]
95 In our implementation we compute the image evidence of the individual parts using 2D part detectors that are rather basic and not that accurate. [sent-366, score-0.152]
96 Better performance can be expected if this frame-work independent component is instead based on a state-of-the-art 2D pose estimator. [sent-367, score-0.159]
97 Now that the tractability of the frame-work has been shown, we plan to refine this appearance component and thouroughly compare the performance with alternative 3D pose estimators. [sent-368, score-0.239]
98 Multiple view 3D pose estimation imposing different types of constraints. [sent-397, score-0.359]
99 In the first column only view constraints are imposed. [sent-398, score-0.142]
100 Looselimbed people: Estimating 3d human pose and motion using non-parametric belief propagation. [sent-471, score-0.159]
wordName wordTfidf (topN-words)
[('xpa', 0.35), ('tn', 0.321), ('skeleton', 0.3), ('xn', 0.292), ('rotations', 0.234), ('pictorial', 0.196), ('rn', 0.19), ('pxn', 0.187), ('pose', 0.159), ('discretization', 0.144), ('tp', 0.13), ('rp', 0.121), ('translations', 0.118), ('view', 0.11), ('rotation', 0.095), ('tractable', 0.093), ('icn', 0.093), ('mn', 0.092), ('xp', 0.087), ('football', 0.086), ('pin', 0.083), ('translation', 0.081), ('tractability', 0.08), ('discrete', 0.077), ('rpa', 0.077), ('leg', 0.074), ('mb', 0.074), ('mp', 0.074), ('structures', 0.073), ('quaternions', 0.072), ('lnpxn', 0.07), ('prn', 0.07), ('intersection', 0.07), ('parts', 0.068), ('dn', 0.061), ('pa', 0.058), ('imposing', 0.057), ('views', 0.056), ('evenly', 0.055), ('end', 0.055), ('discretizations', 0.054), ('parent', 0.052), ('state', 0.051), ('angle', 0.05), ('tree', 0.049), ('part', 0.049), ('grid', 0.048), ('argmax', 0.048), ('factorizes', 0.048), ('professional', 0.048), ('intersect', 0.047), ('burenius', 0.047), ('ofrotations', 0.047), ('pelvis', 0.047), ('picn', 0.047), ('rpdn', 0.047), ('rtprn', 0.047), ('legs', 0.045), ('optimum', 0.044), ('imposes', 0.044), ('pcp', 0.043), ('articulated', 0.042), ('furthest', 0.041), ('innermost', 0.041), ('sigal', 0.041), ('gb', 0.041), ('limbs', 0.04), ('joint', 0.04), ('impose', 0.04), ('scores', 0.039), ('bergtholdt', 0.038), ('ptn', 0.038), ('scn', 0.038), ('intersecting', 0.037), ('arms', 0.037), ('prior', 0.036), ('ecn', 0.036), ('ioft', 0.035), ('ln', 0.035), ('evidence', 0.035), ('memory', 0.034), ('estimation', 0.033), ('sn', 0.032), ('constraints', 0.032), ('precompute', 0.032), ('sullivan', 0.032), ('arm', 0.032), ('child', 0.032), ('upper', 0.032), ('quaternion', 0.031), ('speed', 0.03), ('discretized', 0.03), ('cube', 0.03), ('spread', 0.029), ('measurements', 0.029), ('practical', 0.029), ('poses', 0.027), ('rectangle', 0.027), ('global', 0.027), ('dynamic', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
2 0.16873182 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.16702721 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
4 0.16369601 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
5 0.16255352 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
Author: Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.
6 0.13325894 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
7 0.13022837 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
9 0.11864926 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
10 0.11691063 334 cvpr-2013-Pose from Flow and Flow from Pose
11 0.11514013 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
12 0.11235346 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
13 0.11161141 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
14 0.1028254 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
15 0.089187719 40 cvpr-2013-An Approach to Pose-Based Action Recognition
16 0.081332169 196 cvpr-2013-HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences
17 0.078346811 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
18 0.077799156 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
19 0.075179726 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion
20 0.074864864 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
topicId topicWeight
[(0, 0.179), (1, 0.032), (2, -0.006), (3, -0.066), (4, 0.007), (5, -0.007), (6, 0.075), (7, 0.043), (8, 0.03), (9, -0.08), (10, -0.085), (11, 0.192), (12, -0.09), (13, -0.027), (14, -0.021), (15, 0.04), (16, 0.036), (17, -0.027), (18, -0.019), (19, -0.064), (20, -0.02), (21, 0.018), (22, -0.045), (23, -0.028), (24, 0.008), (25, 0.03), (26, 0.009), (27, -0.037), (28, -0.021), (29, 0.013), (30, 0.045), (31, 0.047), (32, 0.022), (33, -0.058), (34, -0.022), (35, -0.011), (36, -0.044), (37, 0.053), (38, 0.0), (39, -0.009), (40, 0.055), (41, 0.003), (42, 0.051), (43, 0.037), (44, -0.004), (45, -0.032), (46, 0.016), (47, -0.0), (48, -0.071), (49, -0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.95881295 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
Author: Magnus Burenius, Josephine Sullivan, Stefan Carlsson
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.
2 0.8690604 335 cvpr-2013-Poselet Conditioned Pictorial Structures
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: In this paper we consider the challenging problem of articulated human pose estimation in still images. We observe that despite high variability of the body articulations, human motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while remaining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial structures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited application for articulated human pose estimation. We demon- strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case.
3 0.82051259 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
Author: Fang Wang, Yi Li
Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.
4 0.81909841 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
Author: Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van_Gool
Abstract: In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods.
5 0.81901199 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
Author: Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer
Abstract: We introduce a novel approach to automatically recover 3D human pose from a single image. Most previous work follows a pipelined approach: initially, a set of 2D features such as edges, joints or silhouettes are detected in the image, and then these observations are used to infer the 3D pose. Solving these two problems separately may lead to erroneous 3D poses when the feature detector has performed poorly. In this paper, we address this issue by jointly solving both the 2D detection and the 3D inference problems. For this purpose, we propose a Bayesian framework that integrates a generative model based on latent variables and discriminative 2D part detectors based on HOGs, and perform inference using evolutionary algorithms. Real experimentation demonstrates competitive results, and the ability of our methodology to provide accurate 2D and 3D pose estimations even when the 2D detectors are inaccurate.
6 0.77650267 89 cvpr-2013-Computationally Efficient Regression on a Dependency Graph for Human Pose Estimation
7 0.76771986 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
8 0.75973868 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
9 0.75205714 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
10 0.71705586 426 cvpr-2013-Tensor-Based Human Body Modeling
11 0.68576109 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.67053443 439 cvpr-2013-Tracking Human Pose by Tracking Symmetric Parts
13 0.63108796 334 cvpr-2013-Pose from Flow and Flow from Pose
14 0.60988355 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
15 0.60169357 40 cvpr-2013-An Approach to Pose-Based Action Recognition
16 0.58131576 444 cvpr-2013-Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest
17 0.53465998 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
18 0.51294488 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
19 0.50583494 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
20 0.49005979 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
topicId topicWeight
[(10, 0.107), (16, 0.013), (26, 0.035), (33, 0.243), (67, 0.382), (69, 0.026), (80, 0.022), (87, 0.073)]
simIndex simValue paperId paperTitle
1 0.96103013 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
Author: Pramod Sharma, Ram Nevatia
Abstract: In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a randomfern adaptive classifier . The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.
2 0.95758706 103 cvpr-2013-Decoding Children's Social Behavior
Author: James M. Rehg, Gregory D. Abowd, Agata Rozga, Mario Romero, Mark A. Clements, Stan Sclaroff, Irfan Essa, Opal Y. Ousley, Yin Li, Chanho Kim, Hrishikesh Rao, Jonathan C. Kim, Liliana Lo Presti, Jianming Zhang, Denis Lantsman, Jonathan Bidwell, Zhefan Ye
Abstract: We introduce a new problem domain for activity recognition: the analysis of children ’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1–2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3–5 minute child-adult interaction. In each session, the adult examiner followed a semistructured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.
3 0.93626374 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
Author: Wanli Ouyang, Xiaogang Wang
Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.
4 0.89723784 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
5 0.89530903 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik
Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.
same-paper 6 0.8850801 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
7 0.88409925 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
8 0.87046874 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
9 0.86277312 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
10 0.84988666 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues
11 0.82992929 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
12 0.81733024 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
13 0.81657904 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
14 0.80081534 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
15 0.79546314 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes
16 0.79368478 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
17 0.77186823 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
18 0.76849967 438 cvpr-2013-Towards Pose Robust Face Recognition
19 0.76610202 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
20 0.76473868 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification