iccv iccv2013 iccv2013-341 knowledge-graph by maker-knowledge-mining

341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors


Source: pdf

Author: Thomas Helten, Meinard Müller, Hans-Peter Seidel, Christian Theobalt

Abstract: In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the research in monocular full body skeletal pose tracking. Unfortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part occlusions. In this paper, we present a novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations. It takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database. In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sensors. These sensors provide reliable and complementary information when the monocular depth information alone is not sufficient. We also contribute by new algorithmic solutions to best fuse depth and inertial data in both trackers. One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. We also contribute with a new inertial-basedpose retrieval, and an adapted late fusion step to calculate the final body pose.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the research in monocular full body skeletal pose tracking. [sent-3, score-0.805]

2 Unfortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part occlusions. [sent-4, score-0.758]

3 In this paper, we present a novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations. [sent-5, score-0.553]

4 It takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database. [sent-6, score-0.971]

5 In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sensors. [sent-7, score-0.62]

6 We also contribute by new algorithmic solutions to best fuse depth and inertial data in both trackers. [sent-9, score-0.656]

7 One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. [sent-10, score-0.747]

8 5D depth images has triggered extensive research in monocular human pose tracking. [sent-14, score-0.43]

9 However, noise in the depth data, and the ambiguous representation of human poses in depth images are still a challenge and often lead to trackig errors, even if all body parts are actually exposed to the camera. [sent-19, score-0.898]

10 In addition, if large parts of the body are occluded from view, tracking of the full pose is not possible. [sent-20, score-0.589]

11 In this paper, we show that fusing a depth tracker with an additional sensor modality, which provides information complementary to the 2. [sent-23, score-0.673]

12 In particular, we use the orientation data obtained from a sparse set of inexpensive inertial measurement devices fixed to the arms, legs, the trunk and the head of the tracked person. [sent-25, score-0.731]

13 We include this additional information as stabilizing evidence in a hybrid tracker that combines generative and discriminative pose computation. [sent-26, score-0.723]

14 Our method is the first to adaptively fuse inertial and depth information in a combined generative and discriminative monocular pose estimation framework. [sent-29, score-1.035]

15 To enable this, we contribute with a novel visibility model for determining which parts of the body are visible to the depth camera. [sent-30, score-0.662]

16 This model tells what data modality is reliable and can be used to infer the pose, and enables us to more robustly infer global body orientation even in challenging poses, see Sect. [sent-31, score-0.422]

17 Our second contribution is a generative tracker that fuses optical and inertial cues depending as gtrheautamengady,ndgemreonived era ltiptvo esbetrehsacytpkexoertphsleoasipntsimtfhrieozmeobfthose rm tvhe udspidnaergapamthre itmerieasvgoaefl. [sent-33, score-0.931]

18 We evaluate our proposed tracker on an extensive dataset including calibrated depth images, inertial sensor data, as well as ground-truth data obtained with a traditional marker-based mocap system, see Sect. [sent-42, score-1.153]

19 Many monocular tracking algorithms use this depth data for human pose estimation. [sent-50, score-0.488]

20 A discriminative strategy based on body part detectors that also estimated body part orientations on depth images was presented in [9]. [sent-52, score-0.834]

21 The approach [13] uses regression forests based on depth features to estimate the joint positions of the tracked person without the need for a kinematic model of its skeleton. [sent-54, score-0.457]

22 Finally, also using depth features and regression forests, [16] generate correspondences between body parts and a pose and size parametrized human model that is optimized in real-time using a one-shot optimization approach. [sent-56, score-0.748]

23 While showing good results on single frame basis, these approaches cannot deduce true poses of body parts that are invisible in the camera. [sent-57, score-0.48]

24 By using kinematic body models with simple shape primitives, the pose of an actor can be found using a generative strategy. [sent-58, score-0.689]

25 The body model is fitted to depth data or to a combination of depth and image features [5, 8]. [sent-59, score-0.694]

26 With all these depth-based methods, real-time pose estimation is still a challenge, tracking may drift, and with exception to [19] the employed shape models are rather coarse which impairs pose estimation accuracy. [sent-66, score-0.416]

27 Soon after that, [3] showed a hybrid approach specialized to reconstruct human 3D pose from depth image using the body part detectors proposed by [9] as regularizing component. [sent-69, score-0.752]

28 However, none of these hybrid approaches is able to give a meaningful pose hypothesis for non-visible body parts in case of occlusions. [sent-76, score-0.588]

29 Methods that reconstruct motions based on inertial sensors only have been proposed e. [sent-77, score-0.639]

30 One approach combining 3D inertial information and multi-view markerless motion capture was presented in [10]. [sent-83, score-0.481]

31 Here, the orientation data of five inertial sensors was used as additional energy term to stabilize the local pose optimization. [sent-84, score-0.851]

32 Another example is [23] who fuse information from densely placed inertial sensors is fused with global position estimation using a laser range scanner equipped robot accompany11 110066 ing the tracked person. [sent-85, score-0.707]

33 [1, 17] can track human skeletons in real-time from a single depth camera, as long as the body is mostly front-facing. [sent-89, score-0.485]

34 Our new hybrid depth-based tracker succeeds in such cases by incorporating additional inertial sensor data for tracking stabilization. [sent-93, score-1.057]

35 While our concepts are in general applicable to a wide range of generative approaches, discriminative approaches and hybrid approaches, we modify the hybrid depth-based tracker by Baak etal. [sent-94, score-0.632]

36 This tracker uses discriminative features detected in the depth data, so-called geodesic extrema EI, to query a database containing pre-recorded full body poses. [sent-96, score-1.205]

37 These poses are then used to initialize a generative tracker that optimizes skeletal pose parameters X of a mesh-based human body model MX ⊆ R3 to best explain the 3D point cloud MI ⊆ R3 of the observed depth image I. [sent-97, score-1.308]

38 In a late fusion step, the tracker decides between two pose hypotheses: one obtained using the database pose as initialization or one obtained that used the previously tracked poses as initialization. [sent-98, score-1.025]

39 ’s approach makes two assumptions: The person to be tracked is facing the depth camera and all body parts are visible to the depth camera, which means it fails in difficult poses mentioned earlier (see Fig. [sent-100, score-1.116]

40 In our new hybrid approach, we overcome these limitations by modifying every step in the original algorithm to benefit from depth and inertial data together. [sent-102, score-0.744]

41 In particular, we introduce a visibility model to decide what data modality is best used in each pose estimation step, and develop a discrimative tracker combining both data. [sent-103, score-0.627]

42 We also empower generative tracking to use both data for reliable pose inference, and develop a new late fusion step using both modalities. [sent-104, score-0.443]

43 Body Model Similar to [1], we use a body model comprising a surface mesh MX of 6 449 vertices, whose deformation is controlled by an embedded kinematic skeleton of 62 joints and 42 degrees of freedom via surface skinning. [sent-105, score-0.495]

44 As additional sensors, we use inertial measurement units (IMUs) which are able to determine their relative orientation with respect to a global coordinate system, irrespective of visibility from a camera. [sent-115, score-0.692]

45 The sensor sroot gives us information about the global body orientation, while the sensors on arms and feet give cues about the configuration of the extremities. [sent-119, score-0.699]

46 Finally, the head sensor is important to resolve some of the ambigu- ities in sparse inertial features. [sent-120, score-0.647]

47 For ease of explanation, we introduce the concept of a virtual sensor which provides a simulated orientation reading of an IMU for a given pose X of our kinematic skeleton. [sent-124, score-0.542]

48 Furthermore, the transformation between the virtual sensor’s coordinate system and the depth camera’s global coordinate system can be calculated. [sent-125, score-0.422]

49 qS,root denotes the measured orientation of the real sensor attached to the trunk, while qX,root represents the readings of the virtual sensor for a given pose X. [sent-128, score-0.682]

50 Visibility Model Our visibility model enables us to reliably detect global body pose and the visibility of body parts in the depth cam11 110077 era. [sent-133, score-1.216]

51 This information is then used to establish reliable correspondences between the depth image and body model during generative tracking, even under occlusion. [sent-134, score-0.638]

52 Furthermore, it enables us to decide whether inertial or optical data are more reliable for pose retrieval. [sent-135, score-0.698]

53 In [1], the authors use plane fitting to a heuristically chosen subset of depth data to compute body orientation and translation of the depth centroid. [sent-137, score-0.762]

54 Their approach fails if the person is not roughly facing the camera or body parts are occluding the torso. [sent-138, score-0.434]

55 Inertial sensors are able to measure their orientation in space independent of occlusions and lack of data in the depth channel. [sent-139, score-0.434]

56 However, inertial sensors measure their orientation with respect to some global sensor coordinate system that in general is not identical to the camera’s global coordinate system, see also Fig. [sent-142, score-0.99]

57 To infer body part visibility, we compute all vertices CX ⊆ MX of the body mesh that the depth camera sees in pose X. [sent-156, score-1.055]

58 Note, that the accuracy of Bvis depends on MX resembling the actual pose assumed by the person in the depth image as closely as possible, which is not known before pose estimation. [sent-172, score-0.605]

59 For this reason, we choose the pose X = XDB, obtained by the discriminative tracker which yields better results than using the pose X(t − 1) from the previous step, (see Sect. [sent-173, score-0.7]

60 In the rendering process also a virtual depth image IX is created, from which we calculate the first M = 50 geodesic extrema in the same way as for the real depth image I, see [1]. [sent-177, score-0.811]

61 Generative Pose Estimation Similar to [1], generative tracking optimizes skeletal pose parameters by minimizing the distance between corresponding points on the model and in the depth data. [sent-180, score-0.604]

62 Obviously, this leads to wrong correspondences if the person strikes a pose in which large parts of the body are occluded. [sent-184, score-0.577]

63 In contrast to prior work, it also considers which parts of the body are visible and can actually contribute to explaining a good alignment into the depth image. [sent-186, score-0.561]

64 (b) Body part directions used as inertial features for indexing the database. [sent-191, score-0.447]

65 (c) Two poses that cannot be distinguished using inertial features. [sent-192, score-0.606]

66 Discriminative Pose Estimation In hybrid tracking, discriminative tracking complements generative tracking by continuous re-initialization of the pose optimization when generative tracking converges to an erroneous pose optimum (see also Sect. [sent-211, score-0.892]

67 We present a new discriminative pose estimation approach that retrieves poses from a database with 50 000 poses obtained from motion sequences recorded using a marker-based mocap system. [sent-213, score-0.755]

68 It adaptively relies on optical features for pose look-up, and new inertial features, depending on visibility and thus reliability of each sensor type. [sent-214, score-1.016]

69 [1] use geodesic extrema computed on the depth map as index. [sent-218, score-0.543]

70 In their original work, they expect that the first five geodesic extrema E5I from the depth image I are roughly co-located with the positions of the body extrema (head, hands and feet). [sent-219, score-1.092]

71 to global body orientation which reduces the database size. [sent-224, score-0.417]

72 Our method thus fares better even in poses where all geodesic extrema are found, but the pose is lateral to the camera. [sent-229, score-0.672]

73 In poses where not all body extrema are visible, or where they are too close to the torso, the geodesic extrema become unreliable for database lookup. [sent-231, score-1.048]

74 Similar to the optical features based on geodesic extrema, these normalized orientations ˆq b(t) := qroot(t) ◦ qb(t), b ∈ B = {larm,rarm,lleg,rleg,head} are invariant to the tracked person’s global orientation but capture the relative orientation of various parts of the person’s body. [sent-234, score-0.484]

75 The normalized directions := ˆq b(t)[db] are then stacked to serve as inertial feature-based query to the database. [sent-240, score-0.447]

76 At first sight, it may seem that inertial features alone are sufficient to look up poses from the database, because they are independent from visibility issues. [sent-243, score-0.707]

77 However, with our sparse set of six IMUs, the inertial data alone are often not discriminative enough to exactly characterize body poses. [sent-244, score-0.767]

78 Some very different poses may induce the same inertial readings, and are thus ambiguous, see also Fig. [sent-245, score-0.606]

79 Optical geodesic extrema features are very accurate and discriminative of a pose, given that they are reliably found, which is not the case for all extrema in difficult non-frontal starkly occluded poses, see Fig. [sent-248, score-0.68]

80 Therefore, we introduce two reliability measures to assess the reliability of optical features for retrieval, and use the inertial features only as fall-back modality for retrieval in case optical feautures 11 110099 cannot be trusted. [sent-250, score-0.742]

81 A second reliability measure is the difference between the purely optical computation of the global body pose similar to Baak etal. [sent-270, score-0.607]

82 This way, even if body parts were occluded or unreliably captured by the camera, we obtain a final result that is based on actual sensor measurements, and not only hypothesized from some form of prior. [sent-294, score-0.518]

83 However, the data neither contain a pose param- eterized model of the recorded person nor inertial sensor data. [sent-320, score-0.885]

84 Note, that there are a lot of manual preprocessing steps involved to make our tracker run on this data set, and each step introduces errors that are not part of the evaluation of the other tested trackers (we copied over their error bars from the respective papers). [sent-325, score-0.444]

85 We now, tracked the dataset using the provided depth frames as well as the virtual sensor readings with our tracker and computed the error metric as described in [3], Fig. [sent-326, score-0.85]

86 Here, we averaged the results of the sequences 0–23 that contain relatively easy to track motions and where our tracker is 11 11 1100 performing comparable to previous approaches with little difference across sequences (see additional material for full table). [sent-333, score-0.429]

87 However, our tracker shows its true advantage on sequences with more challenging motion, 24–27, of which only 24 shows notable non-frontal poses, and periods where parts of the body are completely invisible. [sent-335, score-0.667]

88 Evaluation Dataset For more reliable testing of our tracker’s performance, we recorded a new dataset containing a substantial fraction of challenging non-frontal poses and stark occlusions of body parts. [sent-342, score-0.517]

89 For all sequences we computed ground truth pose parameters and joint positions using the recorded marker positions and the same kinematic skeleton that we use in our tracker. [sent-352, score-0.536]

90 We also quantitatively evaluate our tracker with only optical retrieval (oDB), and only inertial retrieval (iDB). [sent-359, score-0.849]

91 Sequence D3 contains squats, on which inertial feature lookup is ambiguous. [sent-384, score-0.511]

92 Conclusions We presented a hybrid method to track human full body poses from a single depth camera and additional inertial sensors. [sent-388, score-1.225]

93 Our algorithm runs in real-time and, in contrast to previous methods, captures the true body configuration even in difficult non-frontal poses and poses with partial and substantial visual occlusions. [sent-389, score-0.594]

94 The core of the algorithm are new solutions for depth and inertial data fusion in a combined generative and discriminative tracker. [sent-390, score-0.867]

95 (blue), and our tracker with only optical DB lookup (oDB) (light blue), only inertial DB lookup (iDB) (orange), and the proposed combined DB lookup (hDB) (yellow). [sent-393, score-1.009]

96 A data-driven approach for real-time full body pose reconstruction from a depth camera. [sent-408, score-0.664]

97 Fusion of 2D and 3D sensor data for articulated body tracking. [sent-436, score-0.442]

98 Realtime human motion control with a small number of inertial sensors. [sent-451, score-0.481]

99 Realtime identification and localization of body parts from depth images. [sent-463, score-0.53]

100 Real-time human pose recognition in parts from a single depth image. [sent-493, score-0.433]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('inertial', 0.447), ('baak', 0.301), ('tracker', 0.298), ('body', 0.276), ('extrema', 0.235), ('depth', 0.209), ('pose', 0.179), ('sensor', 0.166), ('poses', 0.159), ('sensors', 0.157), ('imus', 0.125), ('trackers', 0.118), ('generative', 0.114), ('mx', 0.101), ('visibility', 0.101), ('geodesic', 0.099), ('ganapathi', 0.095), ('hybrid', 0.088), ('tracked', 0.074), ('optical', 0.072), ('joints', 0.072), ('bvis', 0.071), ('hdb', 0.071), ('idb', 0.071), ('sroot', 0.071), ('kinematic', 0.07), ('sdk', 0.069), ('orientation', 0.068), ('kinect', 0.064), ('lookup', 0.064), ('seidel', 0.061), ('virtual', 0.059), ('tracking', 0.058), ('inexpensive', 0.055), ('recorded', 0.055), ('helten', 0.054), ('qroot', 0.054), ('fusion', 0.053), ('trunk', 0.053), ('reliability', 0.051), ('actor', 0.05), ('modality', 0.049), ('sequences', 0.048), ('plagemann', 0.047), ('odb', 0.047), ('xdb', 0.047), ('coordinate', 0.047), ('camera', 0.046), ('skeleton', 0.045), ('parts', 0.045), ('discriminative', 0.044), ('database', 0.044), ('skeletal', 0.044), ('readings', 0.044), ('quaternions', 0.044), ('monocular', 0.042), ('cx', 0.041), ('ye', 0.04), ('correspondences', 0.039), ('late', 0.039), ('positions', 0.038), ('person', 0.038), ('mi', 0.037), ('vertices', 0.037), ('dmi', 0.036), ('dmx', 0.036), ('mbx', 0.036), ('meinard', 0.036), ('mtx', 0.036), ('shead', 0.036), ('starkly', 0.036), ('xact', 0.036), ('xsens', 0.036), ('motions', 0.035), ('marker', 0.035), ('realtime', 0.034), ('ball', 0.034), ('motion', 0.034), ('head', 0.034), ('uller', 0.033), ('mocap', 0.033), ('db', 0.032), ('quantitatively', 0.032), ('mesh', 0.032), ('cgf', 0.032), ('occluded', 0.031), ('visible', 0.031), ('transformation', 0.031), ('hasler', 0.029), ('forearms', 0.029), ('orientations', 0.029), ('cloud', 0.029), ('global', 0.029), ('facing', 0.029), ('errors', 0.028), ('joint', 0.028), ('imu', 0.028), ('stark', 0.027), ('shotton', 0.027), ('qualitatively', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors

Author: Thomas Helten, Meinard Müller, Hans-Peter Seidel, Christian Theobalt

Abstract: In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the research in monocular full body skeletal pose tracking. Unfortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part occlusions. In this paper, we present a novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations. It takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database. In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sensors. These sensors provide reliable and complementary information when the monocular depth information alone is not sufficient. We also contribute by new algorithmic solutions to best fuse depth and inertial data in both trackers. One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. We also contribute with a new inertial-basedpose retrieval, and an adapted late fusion step to calculate the final body pose.

2 0.38214794 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones

Author: Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys

Abstract: unkown-abstract

3 0.24132149 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data

Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt

Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.

4 0.20477462 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

Author: Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu

Abstract: Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing–as well as the levels of accuracy–involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose; (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their ‘re-enacted’ 3D poses; (3) quantitative analysis revealing the human performance in 3D pose reenactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our find- ings for the construction of visual human sensing systems.

5 0.1994618 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image

Author: Chi Xu, Li Cheng

Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.

6 0.19641687 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation

7 0.18310337 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data

8 0.18097439 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking

9 0.17951934 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

10 0.1760156 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

11 0.16817485 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

12 0.15061276 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera

13 0.14686652 143 iccv-2013-Estimating Human Pose with Flowing Puppets

14 0.14618655 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose

15 0.14386266 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation

16 0.14356619 168 iccv-2013-Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms

17 0.13992234 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects

18 0.13824183 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

19 0.13421887 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

20 0.13149048 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.222), (1, -0.168), (2, -0.014), (3, 0.098), (4, 0.06), (5, -0.158), (6, -0.009), (7, -0.003), (8, -0.151), (9, 0.272), (10, -0.004), (11, -0.098), (12, -0.125), (13, -0.034), (14, -0.036), (15, 0.064), (16, 0.035), (17, -0.194), (18, -0.002), (19, 0.072), (20, 0.064), (21, 0.083), (22, 0.039), (23, -0.016), (24, -0.008), (25, -0.024), (26, 0.005), (27, 0.111), (28, 0.031), (29, 0.068), (30, -0.006), (31, -0.039), (32, -0.053), (33, 0.036), (34, -0.048), (35, -0.066), (36, 0.039), (37, -0.001), (38, -0.01), (39, 0.024), (40, 0.028), (41, -0.056), (42, 0.047), (43, 0.065), (44, 0.006), (45, 0.019), (46, 0.052), (47, 0.009), (48, -0.024), (49, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97050685 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors

Author: Thomas Helten, Meinard Müller, Hans-Peter Seidel, Christian Theobalt

Abstract: In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the research in monocular full body skeletal pose tracking. Unfortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part occlusions. In this paper, we present a novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations. It takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database. In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sensors. These sensors provide reliable and complementary information when the monocular depth information alone is not sufficient. We also contribute by new algorithmic solutions to best fuse depth and inertial data in both trackers. One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. We also contribute with a new inertial-basedpose retrieval, and an adapted late fusion step to calculate the final body pose.

2 0.89817554 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data

Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt

Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.

3 0.81108391 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image

Author: Chi Xu, Li Cheng

Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.

4 0.7458359 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones

Author: Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys

Abstract: unkown-abstract

5 0.7243284 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data

Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid

Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.

6 0.71257818 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

7 0.70765412 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

8 0.6680181 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

9 0.64982456 143 iccv-2013-Estimating Human Pose with Flowing Puppets

10 0.60674381 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose

11 0.60497344 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

12 0.59523278 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis

13 0.5950256 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera

14 0.58284426 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

15 0.57183957 46 iccv-2013-Allocentric Pose Estimation

16 0.55764139 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

17 0.55446601 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation

18 0.55156314 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

19 0.53731924 118 iccv-2013-Discovering Object Functionality

20 0.5359329 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.041), (7, 0.022), (12, 0.017), (13, 0.024), (26, 0.059), (31, 0.036), (35, 0.032), (40, 0.024), (42, 0.101), (64, 0.111), (73, 0.035), (89, 0.185), (90, 0.182), (95, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86095881 15 iccv-2013-A Generalized Low-Rank Appearance Model for Spatio-temporally Correlated Rain Streaks

Author: Yi-Lei Chen, Chiou-Ting Hsu

Abstract: In this paper, we propose a novel low-rank appearance model for removing rain streaks. Different from previous work, our method needs neither rain pixel detection nor time-consuming dictionary learning stage. Instead, as rain streaks usually reveal similar and repeated patterns on imaging scene, we propose and generalize a low-rank model from matrix to tensor structure in order to capture the spatio-temporally correlated rain streaks. With the appearance model, we thus remove rain streaks from image/video (and also other high-order image structure) in a unified way. Our experimental results demonstrate competitive (or even better) visual quality and efficient run-time in comparison with state of the art.

same-paper 2 0.85255694 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors

Author: Thomas Helten, Meinard Müller, Hans-Peter Seidel, Christian Theobalt

Abstract: In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the research in monocular full body skeletal pose tracking. Unfortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part occlusions. In this paper, we present a novel sensor fusion approach for real-time full body tracking that succeeds in such difficult situations. It takes inspiration from previous tracking solutions, and combines a generative tracker and a discriminative tracker retrieving closest poses in a database. In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sensors. These sensors provide reliable and complementary information when the monocular depth information alone is not sufficient. We also contribute by new algorithmic solutions to best fuse depth and inertial data in both trackers. One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. We also contribute with a new inertial-basedpose retrieval, and an adapted late fusion step to calculate the final body pose.

3 0.80136812 140 iccv-2013-Elastic Net Constraints for Shape Matching

Author: Emanuele Rodolà, Andrea Torsello, Tatsuya Harada, Yasuo Kuniyoshi, Daniel Cremers

Abstract: We consider a parametrized relaxation of the widely adopted quadratic assignment problem (QAP) formulation for minimum distortion correspondence between deformable shapes. In order to control the accuracy/sparsity trade-off we introduce a weighting parameter on the combination of two existing relaxations, namely spectral and game-theoretic. This leads to the introduction of the elastic net penalty function into shape matching problems. In combination with an efficient algorithm to project onto the elastic net ball, we obtain an approach for deformable shape matching with controllable sparsity. Experiments on a standard benchmark confirm the effectiveness of the approach.

4 0.7962923 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg

Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.

5 0.79176366 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

Author: Shuran Song, Jianxiong Xiao

Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.

6 0.79149115 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning

7 0.79141122 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation

8 0.79028457 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes

9 0.78997087 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

10 0.78983355 338 iccv-2013-Randomized Ensemble Tracking

11 0.78740406 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

12 0.78392744 86 iccv-2013-Concurrent Action Detection with Structural Prediction

13 0.78303409 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

14 0.78231138 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

15 0.78087395 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

16 0.77994508 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

17 0.7787568 441 iccv-2013-Video Motion for Every Visible Point

18 0.77787399 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary

19 0.77707505 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints

20 0.77664149 379 iccv-2013-Semantic Segmentation without Annotating Segments