iccv iccv2013 iccv2013-143 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. [sent-5, score-0.355]
2 Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. [sent-7, score-0.508]
3 Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. [sent-9, score-0.698]
4 We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. [sent-10, score-0.628]
5 Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. [sent-11, score-0.455]
6 The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. [sent-12, score-0.591]
7 Introduction We address the problem of estimating the 2D pose of a person in a monocular video sequence captured under uncontrolled conditions, without manual initialization. [sent-15, score-0.337]
8 In a single frame, pose estimation is challenging and current methods tend to do poorly at estimating the pose of the limbs. [sent-16, score-0.41]
9 Previous approaches use image evidence in individual frames and then try to infer a coherent sequence of poses by imposing priors that encode smooth motion over time. [sent-19, score-0.285]
10 Such approaches can work well for tracking where an initial pose is given but, as we describe below, are difficult to use for the general pose inference problem. [sent-20, score-0.434]
11 Instead, we exploit optical flow in three ways: 1) to exploit image evidence from adjacent frames, 2) to propagate information over time, and 3) to provide richer cues for pose estimation. [sent-23, score-0.799]
12 Our approach is enabled by recent advances in methods for dense optical flow computation and by a recently introduced 2D model of articulated human body shape, the Deformable Structures model (DS) [24]. [sent-24, score-0.751]
13 The availability ofaccurate estimates of dense optical flow allows us to consider the flow as an observation, while the articulated model of 2D body shape provides a tool for modeling the regions of motion of a moving person. [sent-25, score-1.04]
14 The question is: How can optical flow be incorporated to make the pose inference problem simpler and more accurate? [sent-26, score-0.601]
15 3333 0125 Consider the problem of estimating body pose in Fig. [sent-27, score-0.383]
16 Assume we have a hypothesis for the body at frame t (Fig. [sent-29, score-0.335]
17 In any given frame, the image evidence may be ambiguous and we would like to combine evidence from multiple frames to more robustly infer pose. [sent-31, score-0.27]
18 Due to the complexity of human pose, we perform inference using a distribution of “particles” at each frame, where each particle represents the pose of the body. [sent-32, score-0.311]
19 If we are lucky and have particles at frame t and t + 1that are both correct, then the poses in each frame explain the image evidence and the change in pose between frames is consistent with the flow. [sent-33, score-0.881]
20 Estimating the pose of the body in two frames simultaneously effectively doubles the size of the state space which, for articulated body models, is already high. [sent-36, score-0.782]
21 Alternatively, if we independently estimate the pose in both frames then, given the high-dimensional space and a small set of particles, we will have to be extremely lucky to have two poses that are consistent with the image evidence in both frames and the optical flow. [sent-37, score-0.707]
22 Our first solution is to estimate the pose of the body only at one frame (keeping the dimensionality under control) and to use the optical flow to check how good this solution is in neighboring frames. [sent-39, score-0.941]
23 We refer to the body model as a “puppet” because it can be “puppeteered” by the optical flow. [sent-40, score-0.357]
24 Given a pose at frame t we use the computed dense optical flow (Fig. [sent-41, score-0.741]
25 1(b)) to predict how the puppet should move into the next frame, forwards and backwards in time. [sent-42, score-0.603]
26 1(c)), estimated from the dense optical flow, provides the prediction of the puppet in the next fame (Fig. [sent-44, score-0.722]
27 The advantage is that inference takes place for a single puppet at a time but we are able to incorporate information from multiple frames. [sent-47, score-0.56]
28 We describe upper body pose estimation but the method should be applicable to full body pose as well. [sent-49, score-0.804]
29 This model captures the rough shape of a person and how the shape of the body parts deform with pose. [sent-50, score-0.424]
30 We initialize particles on each frame of the video sequence using a state-of-the-art single-frame pose estimation method [23]. [sent-55, score-0.571]
31 We take the most likely particles in a given frame and use the puppet flow to predict their poses in adjacent frames. [sent-56, score-1.138]
32 We generate additional pose proposals that incorporate information about the possible location of hands based on image and flow evidence; this is our third use of flow. [sent-59, score-0.494]
33 0 is a complex and challenging benchmark for pose estimation methods in video sequences that includes very difficult sequences where the appearance of the people can be easily confounded with the background. [sent-63, score-0.301]
34 In summary our work proposes a new way of integrat- ing information over time to do human pose estimation in video. [sent-67, score-0.263]
35 The key idea is to use the optical flow field to define “puppets” that “flow” from one time to the next, allowing us to integrate image evidence from multiple frames in a principled and effective way and to propagate good solutions in time. [sent-68, score-0.616]
36 A good pose is one that is good in multiple frames and agrees with the optical flow. [sent-69, score-0.45]
37 There is a similarly large literature on 2D human pose estimation in static images. [sent-76, score-0.302]
38 FMP is one of the most adopted methods for human pose estimation, due to its computational efficiency and ability to detect people at different scales. [sent-80, score-0.28]
39 Surprisingly little work has addressed the combination of monocular pose estimation with tracking in uncontrolled environments. [sent-82, score-0.329]
40 Based on this detection, they build a person-specific appearance model and perform independent pose estimation on each frame using this appearance model. [sent-86, score-0.362]
41 [20] exploit optical flow information to locate foreground contours. [sent-98, score-0.417]
42 The idea of using flow discontinuities as a cue for pose estimation dates at least to [21] on 3D body pose estimation in monocular video. [sent-100, score-0.914]
43 [10] exploit optical flow for segmenting body parts and propagating segmentations over time. [sent-102, score-0.663]
44 The representation of the body for monocular articulated motion parsing has not received much attention but, we argue, is critically important. [sent-104, score-0.405]
45 The pose ofthe body can then be represented either explicitly by a kinematic tree [4, 15] or by a probabilistic collection of parts [1, 8]. [sent-108, score-0.432]
46 In contrast to polygonal parts, the Contour People [11] and Deformable Structures [24] models are derived from a realistic 3D model of body shape and better capture the 2D shape of the person including perspective effects, foreshortening, and non-rigid deformations with pose. [sent-109, score-0.382]
47 Note that in [6] body pose is displayed in a way that looks like a DS model but is not. [sent-111, score-0.383]
48 They take the rectangular body parts of a standard PS model and smooth them with the probability distribution for the part. [sent-112, score-0.308]
49 [12] propose a 2D model of clothed body shape but the model is not articulated. [sent-118, score-0.258]
50 To make sense of the optical flow in the scene this is necessary and is supported by our model (though crudely here). [sent-122, score-0.388]
51 As we will see, to “flow” the puppet in time requires that we associate observed optical flow with body parts. [sent-124, score-1.118]
52 We introduce prior knowledge on camera location and pose in TV shows by redefining the mean DS shape as the average shape in the Buffy training set [9] annotated with the DS model. [sent-130, score-0.308]
53 DS is a gender-specific part-based probabilistic model, where contour points of body parts are represented in local coordinate systems by linear models of the form ? [sent-134, score-0.333]
54 The correlation between the shape coefficients, zi, and body pose parameters captures how shape varies with pose and is modeled with pairwise Multivariate Gaussian distributions over the relative pose and shape coefficients of connected body parts. [sent-138, score-1.157]
55 Let xt be a vector of DS model variables and the scale at time t (i. [sent-145, score-0.318]
56 xt = [lt, st]), let It be the image frame at time t, and Ut,t+1 the dense optical flow between images It and It+1. [sent-147, score-0.848]
57 We define the posterior distribution over the DS model variables and scale for each frame in the sequence of N frames as: p(X|I, U, πDS) N−1 ∝ N Y p(It+1| ˆxt+1)p( ˆxt+1|xt, tY= Y1 N Yp(It|xt) tY= Y1 N Yp(lt|πDS)Yp(st|πs) tY= Y1 Ut,t+1) tY= Y1 (3) where X = [x1, . [sent-148, score-0.292]
58 (2), p(st |πs ) is a prior on scale, p(It |xt) is the static image likelihood for the frame at time t, p(It+1 | ˆxt+1 ) is the static image likelihood for the frame at t+ 1, evaluated for xt+1, which is the “flowing puppet” of xt given the flow Ut,t+1 (see below). [sent-158, score-1.057]
59 Here our likelihood uses flowing puppets in the forward direction, but our formulation is general and can be extended to consider flowing puppets generated with backward flow and for more than one time step. [sent-159, score-1.386]
60 Flowing puppets Given a DS puppet defined by the variables xt, and given the dense flow Ut,t+1, the corresponding flowing puppet for frame t + 1 is generated by propagating xt to xt+1 through the flow. [sent-162, score-2.226]
61 The conditional probability distribution p(ˆ xt+1 |xt, Ut,t+1) expresses the noisy generative process for the flowing puppet xt+1. [sent-163, score-0.817]
62 Given the visibility mask for each body part, we consider the corresponding pixels in the optical flow map Ut,t+1. [sent-167, score-0.585]
63 Figure 1(c) shows a puppet, xt, overlaid on the forward and backward optical flow fields (i. [sent-169, score-0.553]
64 We fit an affine motion model to the optical flow vectors within each body part. [sent-172, score-0.624]
65 The resulting puppet flow field is illustrated in Fig. [sent-173, score-0.761]
66 1(c); this is our estimate for how the puppet should move from frame to frame. [sent-174, score-0.671]
67 Our current process of generating the flowing puppet does not include a noise model, thus the probability distribution p( ˆxt+1 |xt, Ut,t+1) is simply a delta function centered on the predicted puppet. [sent-177, score-0.817]
68 (4) The DS model we use is learned from a 3D model that does not include hand pose variations, consequently our 2D model does not have separate hand parts with their own articulation parameters. [sent-181, score-0.343]
69 The hand likelihood ph(It |xt) is based on a hand probability map generated by a hand detector using optical flow. [sent-191, score-0.479]
70 HOGdescriptors are steered to the contour orientation and computed at contour points (blue), inside (red) and outside the contour (green) in a 3level pyramid. [sent-194, score-0.295]
71 Image (left), optical flow (center), and hand probability map defined from running a flow-based hand detector on the flow (right). [sent-198, score-0.79]
72 From these joints, we can easily compute the DS puppet parameters, li = (ci, θi , zi), where the shape coefficients zi represent the expected shape for each part. [sent-208, score-0.693]
73 The state space for a puppet in a frame is then xt = [yt, st], where st is the puppet scale. [sent-209, score-1.497]
74 Ec( ˆxt+1), Es ( ˆxt+1 ) and Eh ( ˆxt+1) are the negative log likelihoods of the puppet in frame t propagated to the frame t + 1through the dense optical flow Ut,t+1 . [sent-213, score-1.288]
75 Perturbing the vertices can produce implausible puppets so we first convert the pose into a joint angle representation, do this perturbation in joint angle space, convert back to joint positions, and then to the expected DS model to obtain contours and regions. [sent-219, score-0.492]
76 We start by initializing a set of P particles on each frame (Fig. [sent-226, score-0.3]
77 Then the video sequence is scanned forward and backward to propagate the best M particles from a frame to the next using the flow (Fig. [sent-228, score-0.739]
78 Each frame in the sequence is then optimized in turn, using PSO, starting from the first frame, and proceeding forward for all the frames then backwards. [sent-230, score-0.327]
79 After optimizing pose in each frame, the best M particles are propagated to the neighbors, forward and backward, using the flow (Fig. [sent-232, score-0.668]
80 After propagation, each frame has P+2M particles, but only the best P particles are retained for the frame. [sent-234, score-0.3]
81 Particles are initialized on each frame (first row), then the M best are propagated through the flow forward and backward (second row). [sent-239, score-0.521]
82 To further help the optimizer, we generate additional initial puppets relocating the wrists of the FMP solution to likely hands locations. [sent-248, score-0.327]
83 We also exploited the hand detector trained on optical flow described in Sec. [sent-251, score-0.478]
84 It contains frames at the original size, and frames that have been cropped and rescaled to have the person in the middle of the frame to meet the needs of the pose estimation method of [20]. [sent-259, score-0.604]
85 The dense flow is computed with the method of [22] in both the forward and backward time direction. [sent-262, score-0.38]
86 Second, to show the benefit of our optimization strategy, we show results obtained without exploiting the dense flow for propagation and likelihood (FP, -flow). [sent-270, score-0.383]
87 Figure 8 shows several examples of correctly predicted body pose, with the DS puppet overlaid on the image and on the optical flow. [sent-275, score-0.932]
88 Conclusions Given recent improvements in the accuracy of optical flow estimation, we argue that it is a useful source of in- formation for human pose estimation in video. [sent-278, score-0.651]
89 Here we use flow in a novel way to make predictions about the pose of the body in neighboring frames. [sent-279, score-0.643]
90 Given a representation of body shape, we use the optical flow forwards and backwards in time from a given frame to predict how the body should move, creating what we call a flowing puppet. [sent-280, score-1.244]
91 If the body pose is correctly estimated in the current frame, and the flow is accurate, then our method should accurately predict the pose in neighboring frames. [sent-281, score-0.829]
92 We also use our flowing puppets to propagate good candidate poses during optimization and to hypothesize putative hand locations. [sent-283, score-0.601]
93 The approach improves accuracy and robustness relative to a baseline method that does not use puppet flow. [sent-284, score-0.533]
94 If the pose in one frame is ambiguous, it may not be in neigh3333 1170 90 Pix. [sent-285, score-0.324]
95 Below each image is the estimated forward flow field color coded as in [2] with the puppet overlaid in black. [sent-297, score-0.863]
96 This represents a novel approach to temporal estimation of body pose in video. [sent-300, score-0.452]
97 Finally, flowing puppets could be used to build a temporally consistent appearance model across several frames, which could provide stronger image evidence. [sent-316, score-0.472]
98 Pictorial structures revisited: People detection and articulated pose estimation. [sent-324, score-0.318]
99 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. [sent-368, score-0.361]
100 Markerless human articulated tracking using hierarchical particle swarm optimisation. [sent-416, score-0.274]
wordName wordTfidf (topN-words)
[('puppet', 0.533), ('xt', 0.293), ('flowing', 0.254), ('ds', 0.253), ('flow', 0.228), ('puppets', 0.218), ('body', 0.197), ('pose', 0.186), ('particles', 0.162), ('optical', 0.16), ('frame', 0.138), ('fmp', 0.123), ('frames', 0.104), ('articulated', 0.098), ('likelihood', 0.091), ('contour', 0.087), ('pictorial', 0.085), ('evidence', 0.083), ('sapp', 0.081), ('hands', 0.08), ('tx', 0.067), ('backward', 0.063), ('freifeld', 0.061), ('stickman', 0.061), ('shape', 0.061), ('pso', 0.06), ('forward', 0.06), ('particle', 0.059), ('people', 0.055), ('hand', 0.054), ('ps', 0.051), ('skin', 0.051), ('arms', 0.049), ('parts', 0.049), ('wrist', 0.043), ('swarm', 0.043), ('adjacent', 0.043), ('overlaid', 0.042), ('ivekovi', 0.041), ('zuffi', 0.041), ('monocular', 0.041), ('propagate', 0.041), ('static', 0.039), ('human', 0.039), ('motion', 0.039), ('ty', 0.039), ('buffy', 0.038), ('zi', 0.038), ('estimation', 0.038), ('yp', 0.036), ('forwards', 0.036), ('lucky', 0.036), ('detector', 0.036), ('yt', 0.035), ('tracking', 0.035), ('propagation', 0.035), ('person', 0.034), ('structures', 0.034), ('ph', 0.034), ('poses', 0.034), ('steered', 0.034), ('backwards', 0.034), ('trucco', 0.034), ('ramanan', 0.033), ('convert', 0.033), ('propagated', 0.032), ('rectangular', 0.032), ('neighboring', 0.032), ('guan', 0.032), ('buehler', 0.032), ('temporal', 0.031), ('joints', 0.031), ('likelihoods', 0.03), ('probability', 0.03), ('perturbing', 0.03), ('parsing', 0.03), ('exploit', 0.029), ('eh', 0.029), ('polygonal', 0.029), ('wrists', 0.029), ('dense', 0.029), ('uncontrolled', 0.029), ('mixtures', 0.028), ('arm', 0.028), ('fragkiadaki', 0.028), ('eichner', 0.028), ('limb', 0.027), ('inference', 0.027), ('fp', 0.027), ('ferrari', 0.027), ('bounds', 0.025), ('scale', 0.025), ('sequence', 0.025), ('forsyth', 0.024), ('tv', 0.023), ('pc', 0.023), ('ec', 0.023), ('captures', 0.022), ('video', 0.022), ('contours', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 143 iccv-2013-Estimating Human Pose with Flowing Puppets
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
2 0.26789236 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
Author: Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
Abstract: We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropriate interactions between the two modules to solve individual problems. This joint estimation problem is divided into two subproblems, , foreground/background segmentation and pose tracking, which alternate iteratively for optimization; segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides foreground response map for segmentation. The final solution is obtained when the iterative procedure converges. We evaluate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its outstandingperformance compared to the state-of-the-art techniques for segmentation and pose estimation.
3 0.21642333 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
4 0.18175651 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
5 0.16488336 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
Author: Ryan Tokola, Wongun Choi, Silvio Savarese
Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).
6 0.15557131 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
7 0.14829822 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
8 0.14686652 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
9 0.1461481 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
10 0.1452717 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
11 0.14347368 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
12 0.1422928 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
13 0.13608265 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
14 0.1360594 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
15 0.13431178 263 iccv-2013-Measuring Flow Complexity in Videos
16 0.13365111 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
17 0.13156667 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
18 0.12526691 39 iccv-2013-Action Recognition with Improved Trajectories
19 0.11587331 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
20 0.1156773 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
topicId topicWeight
[(0, 0.228), (1, -0.106), (2, 0.028), (3, 0.133), (4, 0.086), (5, -0.078), (6, -0.029), (7, 0.092), (8, -0.04), (9, 0.168), (10, -0.034), (11, 0.009), (12, 0.042), (13, -0.069), (14, -0.076), (15, 0.1), (16, -0.071), (17, -0.091), (18, 0.091), (19, 0.058), (20, 0.199), (21, -0.035), (22, 0.134), (23, -0.08), (24, 0.009), (25, -0.067), (26, 0.119), (27, 0.006), (28, 0.047), (29, -0.015), (30, -0.025), (31, 0.005), (32, 0.041), (33, -0.045), (34, 0.009), (35, -0.003), (36, -0.012), (37, -0.012), (38, -0.013), (39, -0.021), (40, -0.002), (41, -0.048), (42, 0.01), (43, 0.059), (44, 0.038), (45, -0.058), (46, -0.029), (47, -0.002), (48, 0.016), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.97537529 143 iccv-2013-Estimating Human Pose with Flowing Puppets
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
2 0.85469359 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
Author: Stephen Lombardi, Ko Nishino, Yasushi Makihara, Yasushi Yagi
Abstract: Human gait modeling (e.g., for person identification) largely relies on image-based representations that muddle gait with body shape. Silhouettes, for instance, inherently entangle body shape and gait. For gait analysis and recognition, decoupling these two factors is desirable. Most important, once decoupled, they can be combined for the task at hand, but not if left entangled in the first place. In this paper, we introduce Two-Point Gait, a gait representation that encodes the limb motions regardless of the body shape. Two-Point Gait is directly computed on the image sequence based on the two point statistics of optical flow fields. We demonstrate its use for exploring the space of human gait and gait recognition under large clothing variation. The results show that we can achieve state-of-the-art person recognition accuracy on a challenging dataset.
3 0.75777328 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
Author: Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
Abstract: We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropriate interactions between the two modules to solve individual problems. This joint estimation problem is divided into two subproblems, , foreground/background segmentation and pose tracking, which alternate iteratively for optimization; segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides foreground response map for segmentation. The final solution is obtained when the iterative procedure converges. We evaluate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its outstandingperformance compared to the state-of-the-art techniques for segmentation and pose estimation.
4 0.71918815 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.
5 0.69354755 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
6 0.69275028 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
7 0.67596287 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
8 0.67022502 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors
9 0.659989 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data
10 0.64276004 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
11 0.63515407 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
12 0.6214962 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
13 0.62092811 130 iccv-2013-Dynamic Structured Model Selection
14 0.60302192 317 iccv-2013-Piecewise Rigid Scene Flow
16 0.6005891 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
17 0.59146982 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
18 0.58764684 118 iccv-2013-Discovering Object Functionality
19 0.58350319 46 iccv-2013-Allocentric Pose Estimation
20 0.56025577 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
topicId topicWeight
[(2, 0.09), (7, 0.012), (12, 0.011), (21, 0.167), (26, 0.088), (31, 0.039), (35, 0.049), (42, 0.092), (64, 0.085), (73, 0.04), (89, 0.214)]
simIndex simValue paperId paperTitle
1 0.91188562 5 iccv-2013-A Color Constancy Model with Double-Opponency Mechanisms
Author: Shaobing Gao, Kaifu Yang, Chaoyi Li, Yongjie Li
Abstract: The double-opponent color-sensitive cells in the primary visual cortex (V1) of the human visual system (HVS) have long been recognized as the physiological basis of color constancy. We introduce a new color constancy model by imitating the functional properties of the HVS from the retina to the double-opponent cells in V1. The idea behind the model originates from the observation that the color distribution of the responses of double-opponent cells to the input color-biased images coincides well with the light source direction. Then the true illuminant color of a scene is easily estimated by searching for the maxima of the separate RGB channels of the responses of double-opponent cells in the RGB space. Our systematical experimental evaluations on two commonly used image datasets show that the proposed model can produce competitive results in comparison to the complex state-of-the-art approaches, but with a simple implementation and without the need for training.
2 0.90567553 313 iccv-2013-Person Re-identification by Salience Matching
Author: Rui Zhao, Wanli Ouyang, Xiaogang Wang
Abstract: Human salience is distinctive and reliable information in matching pedestrians across disjoint camera views. In this paper, we exploit the pairwise salience distribution relationship between pedestrian images, and solve the person re-identification problem by proposing a salience matching strategy. To handle the misalignment problem in pedestrian images, patch matching is adopted and patch salience is estimated. Matching patches with inconsistent salience brings penalty. Images of the same person are recognized by minimizing the salience matching cost. Furthermore, our salience matching is tightly integrated with patch matching in a unified structural RankSVM learning framework. The effectiveness of our approach is validated on the VIPeR dataset and the CUHK Campus dataset. It outperforms the state-of-the-art methods on both datasets.
same-paper 3 0.87996089 143 iccv-2013-Estimating Human Pose with Flowing Puppets
Author: Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Abstract: We address the problem of upper-body human pose estimation in uncontrolled monocular video sequences, without manual initialization. Most current methods focus on isolated video frames and often fail to correctly localize arms and hands. Inferring pose over a video sequence is advantageous because poses of people in adjacent frames exhibit properties of smooth variation due to the nature of human and camera motion. To exploit this, previous methods have used prior knowledge about distinctive actions or generic temporal priors combined with static image likelihoods to track people in motion. Here we take a different approach based on a simple observation: Information about how a person moves from frame to frame is present in the optical flow field. We develop an approach for tracking articulated motions that “links” articulated shape models of peo- ple in adjacent frames through the dense optical flow. Key to this approach is a 2D shape model of the body that we use to compute how the body moves over time. The resulting “flowing puppets ” provide a way of integrating image evidence across frames to improve pose inference. We apply our method on a challenging dataset of TV video sequences and show state-of-the-art performance.
4 0.87867916 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection
Author: Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Abstract: Salient object detection aims to locate objects that capture human attention within images. Previous approaches often pose this as a problem of image contrast analysis. In this work, we model an image as a hypergraph that utilizes a set of hyperedges to capture the contextual properties of image pixels or regions. As a result, the problem of salient object detection becomes one of finding salient vertices and hyperedges in the hypergraph. The main advantage of hypergraph modeling is that it takes into account each pixel’s (or region ’s) affinity with its neighborhood as well as its separation from image background. Furthermore, we propose an alternative approach based on centerversus-surround contextual contrast analysis, which performs salient object detection by optimizing a cost-sensitive support vector machine (SVM) objective function. Experimental results on four challenging datasets demonstrate the effectiveness of the proposed approaches against the stateof-the-art approaches to salient object detection.
5 0.85941303 338 iccv-2013-Randomized Ensemble Tracking
Author: Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, Camille Monnier
Abstract: We propose a randomized ensemble algorithm to model the time-varying appearance of an object for visual tracking. In contrast with previous online methods for updating classifier ensembles in tracking-by-detection, the weight vector that combines weak classifiers is treated as a random variable and the posterior distribution for the weight vector is estimated in a Bayesian manner. In essence, the weight vector is treated as a distribution that reflects the confidence among the weak classifiers used to construct and adapt the classifier ensemble. The resulting formulation models the time-varying discriminative ability among weak classifiers so that the ensembled strong classifier can adapt to the varying appearance, backgrounds, and occlusions. The formulation is tested in a tracking-by-detection implementation. Experiments on 28 challenging benchmark videos demonstrate that the proposed method can achieve results comparable to and often better than those of stateof-the-art approaches.
6 0.85351127 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
7 0.84416521 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
8 0.83999324 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
9 0.83997577 379 iccv-2013-Semantic Segmentation without Annotating Segments
10 0.83975875 104 iccv-2013-Decomposing Bag of Words Histograms
11 0.83783948 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
12 0.83731258 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
13 0.83698797 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
14 0.83676189 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
15 0.83670264 396 iccv-2013-Space-Time Robust Representation for Action Recognition
16 0.83662677 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
17 0.83650398 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
18 0.83648682 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
19 0.83647299 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
20 0.83619946 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests