iccv iccv2013 iccv2013-340 knowledge-graph by maker-knowledge-mining

340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests


Source: pdf

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. [sent-9, score-0.748]

2 Noisy data and occlusions are the major challenges of articulated hand pose estimation. [sent-10, score-0.475]

3 In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. [sent-11, score-0.807]

4 We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. [sent-12, score-0.811]

5 Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. [sent-14, score-0.93]

6 Introduction Articulated hand pose estimation shares a lot of similarities with the popular 3-D body pose estimation. [sent-17, score-0.641]

7 While latest depth sensor technology has enabled body pose estimation in real-time [2, 24, 12, 26], hand pose estimation still requires improvement. [sent-19, score-0.721]

8 Despite their similarities, proven approaches in body pose estimation cannot be repurposed directly to hand articulations, due to the unique challenges of the task: (1) Occlusions and viewpoint changes. [sent-20, score-0.54]

9 Different from body poses which are usually upright and (a) RGB(b) Labels(c) Synthetic(d) Realistic Figure 1: The ring finger is missing due to occlusions in (d), and the little finger is wider than the synthetic image in (c). [sent-23, score-0.709]

10 frontal [9], different viewpoints can render different depth images despite the same hand articulation. [sent-24, score-0.189]

11 Body poses usually occupy larger and relatively static regions in the depth images. [sent-26, score-0.157]

12 1, missing parts and quantisation error is common in hand pose data, especially at small, partially occluded parts such as finger tips. [sent-29, score-0.523]

13 Consequently, a large discrepancy is observed between synthetic and realistic data. [sent-31, score-0.363]

14 Moreover, manually labelled realistic data are extremely costly to obtain. [sent-32, score-0.42]

15 Existing state-of-the-arts resort to synthetic data [16], or model-based optimisation [8, 15]. [sent-33, score-0.215]

16 Besides, the noisy realistic data make joint detection difficult, whereas in synthetic data joint boundaries are always clean and accurate. [sent-35, score-0.733]

17 This process is known as transductive transfer learning [21]: A transductive model learns from a source domain, e. [sent-37, score-0.58]

18 synthetic data; on the other hand, it applies knowledge transform to a different but related target domain, e. [sent-39, score-0.226]

19 As a result, it benefits from 33221247 the characteristics of both domain: The STR forest not only captures a wide range of poses from synthetic data, it also achieves promising accuracy in challenging environments by learning from realistic data. [sent-42, score-0.677]

20 In addition, we design an efficient pseudo-kinematic joint refinement algorithm to handle occluded and noisy articulations. [sent-43, score-0.388]

21 The STR forest is also semi-supervised, learning the noisy appearances of realistic data from both labelled and unlabelled datapoints. [sent-44, score-0.864]

22 Moreover, generic pose estimation is facilitated by a wide range of poses from synthetic data, using a data-driven pose refinement scheme. [sent-45, score-0.793]

23 As far as we are aware, the proposed method is the first semi-supervised and transductive articulated hand pose estimation framework. [sent-46, score-0.746]

24 The main contributions of our work are threefold: (1) Realistic-Synthetic fusion: Considering the issue of noisy inputs, we propose the first transductive learning algorithm for 3-D hand pose estimation that captures the characteristics of both realistic and synthetic data. [sent-47, score-1.045]

25 (2) Semi-supervised learning: The proposed learning algorithm utilises both labelled and unlabelled data, improving estimation accuracy while keeping a low labelling cost. [sent-48, score-0.418]

26 (3) Data-driven pseudo-kinematics: The limitations of traditional Hough forest [11] against occlusions is alleviated by learning a novel data-driven pseudo-kinematic algorithm. [sent-49, score-0.289]

27 Related Work Hand pose estimation Earlier approaches for articulated hand pose estimation are diversified, such as coloured markers [6], probabilistic line matching [1], multi-camera network [13] and Bayesian filter with Chamfer matching [25]. [sent-51, score-0.677]

28 We refer the reader to [10] for a detailed survey of earlier hand pose estimation algorithms. [sent-52, score-0.344]

29 [14] address strong occlusions using local trackers at separate hand segments. [sent-61, score-0.189]

30 [20] estimate hand poses in realtime from RGB-D images using particle swarm optimisation. [sent-65, score-0.214]

31 Model-based approaches inherently handle joint articulations and viewpoint changes. [sent-66, score-0.363]

32 However, their performances depend on the previous pose estimations, output poses may drift away from groundtruth when error accumulates over time. [sent-67, score-0.335]

33 Discriminative approaches learn a mapping from visual features to the target parameter space, such as joint labels [24] or joint coordinates [12]. [sent-68, score-0.366]

34 Instead of using a predefined visual model, discriminative methods learn a pose estimator from a labelled training dataset. [sent-69, score-0.432]

35 Although discrim- inative methods have proved successful in real-time body pose estimation from depth sensors [24, 12, 2, 26], they are less common than model-based approaches with respect to hand pose estimation. [sent-70, score-0.674]

36 Recent discriminative algorithms for hand pose estimation include approximate nearest neighbour search [23, 27] and hierarchical random forests [16]. [sent-71, score-0.503]

37 A large labelled dataset is necessary to model a wide range of poses. [sent-73, score-0.19]

38 It is also costly to label sufficient realistic data for training. [sent-74, score-0.23]

39 As a result, existing approaches resort to synthetic data by means of computer graphics [23, 16], which suffers from the realistic-synthetic discrepancies. [sent-75, score-0.215]

40 Kinematics Inverse kinematics is a standard technique in model-based and tracking approaches for both body [28, 22] and hand poses estimation [8, 15, 25]. [sent-77, score-0.395]

41 [12] estimate body poses using a simple range heuristic, yet it is inapplicable to hand pose due to selfocclusions. [sent-80, score-0.478]

42 [27] detect joint using a colouredglove and match them from the groundtruth database. [sent-82, score-0.193]

43 Transfer Learning Transductive transfer learning is often employed when training data of the target domain are too costly to obtain. [sent-83, score-0.175]

44 It has seen various successful applications [21], still it has not been applied in articulated pose estimation. [sent-84, score-0.286]

45 [5] to the proposed STR forest, where the training algorithm preserves the associations between cross-domain data pairs. [sent-86, score-0.14]

46 Semi-supervised and Regression Forest Various semisupervised forest learning algorithms have been proposed. [sent-87, score-0.223]

47 [19] sample unlabelled datapoints to improve Gaussian processes for body pose estimation. [sent-89, score-0.606]

48 [7] measure data compactness to relate labelled and unlabelled datapoints. [sent-91, score-0.432]

49 [18, 17] design a margin metric to evaluate with unlabelled data. [sent-93, score-0.181]

50 On the other side, regression forest is widely adopted in body pose estimation, e. [sent-94, score-0.584]

51 The STR forest adaptively combines the aforementioned semi-supervised and regression forest learning techniques in a single frame work. [sent-97, score-0.543]

52 For each viewpoint, training data are collected from a partially labelled target domain (realistic depth images) and a fully labelled source domain (synthetic depth images). [sent-101, score-0.684]

53 These domains are explicitly related by establishing associations from the labelled target datapoints to their corresponding source datapoints, as shown in the figure. [sent-102, score-0.499]

54 Firstly, transductive realistic-synthetic associations are preserved, such that the matched data are passed down to the same node. [sent-105, score-0.394]

55 Secondly, the distributions of labelled and unlabelled realistic data are modelled jointly in the proposed STR forest using unsupervised learning. [sent-106, score-0.813]

56 Thirdly, viewpoint changes are handled alongside with hand poses using an adaptive hierarchical classification scheme. [sent-107, score-0.32]

57 Finally, we also propose an data-driven, kinematic-based pose refinement scheme. [sent-108, score-0.299]

58 Training datasets The training dataset D = {Rl , Ru, S} consists of both realTishteic t rdaaitnai nRg adnatda synthetic {dRata, SR. [sent-111, score-0.218]

59 All datapoints sin a Se are olatebdel blyed R Rwitahn groundtruths. [sent-114, score-0.161]

60 Each datapoint in D is an image patch sampled randomly × fromEa foreground pixels i sn a atnh eim training images. [sent-117, score-0.233]

61 pTahtec hnu ism 6b4er× o6f4 datapoints roughly equals h5e% p aotcf foreground pixels in the depth images. [sent-119, score-0.227]

62 Every datapoint in Rl or S is assigned to a tuple of labels (a, p, ver)y. [sent-120, score-0.129]

63 Viewpoint oRf a patch isss represented by othf ela rboellls, pitch and yaw angles, which are quantised into 3, 5 and 9 steps respectively. [sent-121, score-0.133]

64 sF luabrtheler omfo irtse, c every l jaobienltl,ed p datapoint 1c6o}n,ta siinms 1la6r vote vectors v ∈ R3×16 from the patch’s centroid to the 3-D vloocteat vioencsto oorsf avll ∈ ∈16 R joints as in [11]. [sent-127, score-0.321]

65 Realistic-synthetic associations are established through matching datapoints in Rl and S, according to their 3Dm joint gloc daattiaopnos. [sent-128, score-0.426]

66 STR Forest Building upon the hybrid regression forest by Yu et al. [sent-132, score-0.32]

67 [29], the STR forest performs classification, clustering and regression on both domains in one pose estimator, instead of performing each task in separate forests. [sent-133, score-0.494]

68 We grow Nt decision trees by recursively splitting and passing the current training data to two child nodes. [sent-134, score-0.152]

69 QQtaspsv==QαtωQQau+(1−α)βQp+(1−α)(1−β)Qv (2) where Qapv is a combined quality function for learning classification-regression decision trees, and Qtss enables transductive and semi-supervised learning. [sent-140, score-0.352]

70 Patch classification term Qp: Similar to Qa, it is the information gain of the joint labels p in L. [sent-145, score-0.161]

71 aTsuhruess, Qthae apnerdf Qp optimises tshsief ydiencgis i nodni vtrideeusa by classifying Lu ,th Qeir viewpoints iamndis joint elab deelcsi. [sent-147, score-0.161]

72 s 33221269 Regression term Qv : This term learns the regression as- pect of the decision trees by measuring the compactness of vote vectors. [sent-148, score-0.374]

73 Given the set of vote vectors J(L) in L, regression toerrsm. [sent-149, score-0.197]

74 ) Q =v tinrcarceea(sveasr w(·)i)th i compactness i vna rvoiatnec space aantodr converges to 1when all votes in a node are identical. [sent-153, score-0.161]

75 Assuming appearances and poses are correlated under the same viewpoint, Qu evaluates the appearance similarities of all realistic patches R within a node: Qu=? [sent-157, score-0.346]

76 (4) Since the realistic dataset is sparsely labelled, i. [sent-160, score-0.216]

77 Adaptive switching{α, β,ω } A decision tree mainly perfAodrmaps icvleass swifiictacthioinnsg {aαt ,tβhe, top Alev deelsc,i i otsn training objective is switched adaptively to regression at the bottom levels (Fig. [sent-169, score-0.195]

78 Δa(L) and Δp(L) denote the margin measures of viewpoint (laLb)el asn a Δand( joint lnaobteels t p i nm Larg. [sent-172, score-0.267]

79 They measure tvhieew purity o laf a lnso dae awndith j respect etlos viewpoint ahnedy patch ulraebel. [sent-173, score-0.206]

80 Data-driven Kinematic Joint Refinement Since the proposed STR forest considers joint as independent detection targets, it lacks structural information to recover poorly detected joints when they are occluded or missing from the depth image. [sent-181, score-0.596]

81 Without having an explicit hand model as in most model-based tracking methods, we designed a data-driven, kinematic-based method to refine joint locations from the STR forest. [sent-182, score-0.327]

82 A large hand pose database K is generated, such that |K| ? [sent-183, score-0.297]

83 |S| , in order to odbattaaibna sthee K Km iasx giemnuemra pose coverage. [sent-184, score-0.174]

84 e pose d ianta obradesre tKo iosb generated using tmhe p same ohvaenrda gme. [sent-186, score-0.174]

85 od Tehl as oins eth dea synthetic dataset S, but K contains only the joint coordinates. [sent-187, score-0.343]

86 G contains viewpointspecific d isist dreisbucrtiiobneds of joint liothcmati 1o. [sent-190, score-0.161]

87 Data: A joint dataset K ⊂ R3×16that contains 1 2 synthetic joint l Koc a⊂tio Rns, where |K| ? [sent-193, score-0.504]

88 Similar to other decision forests, each patch passes down the STR forest to obtain the viewpoint ˆa and vote vectors vˆ. [sent-214, score-0.559]

89 The patch vote for all 16 joint locations according to vˆ. [sent-215, score-0.372]

90 The objective of kinematic joint refinement is to compute the final joint locations Y = {y1 . [sent-217, score-0.592]

91 ti oDnesr oivfe dvo fterso mvec thtoers m are esvhailf-t uated as stated below: The set of votes received by the j-th joint is fitted a 2-part GMM = { μˆj1, ρ ˆj1, μ ˆj2, ρˆj2}, where μˆ, ˆρ denote the mean, variance and weight of the Gaussian components respectively. [sent-225, score-0.293]

92 The j-th joint is of high-confidence when the Euclidean distance between μˆj1 and μˆ j2 is smaller than a threshold tq. [sent-230, score-0.161]

93 μμˆ ˆ j 21 if | ˆ μ ˆj1 − ˆ μ ˆ μj2 | 2 2 < t q a n d ˆ ρ ˆρj1 <≥ˆ ρ ˆ j2 Subsequently, final locations (7) of all high-confidence joints are determined. [sent-233, score-0.135]

94 The joint refinement process is performed on the other low-confidence joints. [sent-234, score-0.286]

95 The nearest neighbour of the set of high-confidence joints are searched from its corresponding joint means {μ 1aˆ . [sent-235, score-0.368]

96 Only the highconfidentjoint locations are used in the above nearest neighbour matching; the low-confident joint locations are masked out. [sent-239, score-0.362]

97 nTgh ej- tfhin ajol output toiof a lnow {μ-confidence joint yl is computed by merging ttphuet tG oafus as lioawns- cino Equation o9. [sent-241, score-0.161]

98 The index proximal joint is occluded by the middle finger as seen in the RGB image; the 2-part GMM Gˆj is represented by the red crosses (mean) and ellipses (vaGriance). [sent-252, score-0.355]

99 RGB Labels Joint Refinement Figure 3: The proposed joint refinement algorithm. [sent-257, score-0.286]

100 e dE uaschin finger was Algorithm 2: Pose Refinement Data: Vote vectors obtained from passing down the testing image to the STR forest. [sent-262, score-0.14]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('str', 0.293), ('transductive', 0.29), ('forest', 0.223), ('labelled', 0.19), ('lrc', 0.183), ('synthetic', 0.182), ('realistic', 0.181), ('unlabelled', 0.181), ('pose', 0.174), ('datapoints', 0.161), ('joint', 0.161), ('finger', 0.14), ('datapoint', 0.129), ('refinement', 0.125), ('hand', 0.123), ('articulated', 0.112), ('rl', 0.112), ('viewpoint', 0.106), ('associations', 0.104), ('kinematic', 0.102), ('vote', 0.1), ('qu', 0.1), ('imperial', 0.1), ('regression', 0.097), ('qv', 0.097), ('qt', 0.096), ('articulations', 0.096), ('joints', 0.092), ('poses', 0.091), ('body', 0.09), ('ru', 0.081), ('neighbour', 0.08), ('london', 0.076), ('yj', 0.073), ('qapv', 0.073), ('qtss', 0.073), ('patch', 0.068), ('votes', 0.068), ('occlusions', 0.066), ('depth', 0.066), ('qa', 0.065), ('quantised', 0.065), ('qp', 0.062), ('decision', 0.062), ('compactness', 0.061), ('gmm', 0.057), ('nn', 0.054), ('occluded', 0.054), ('trees', 0.054), ('discrepancies', 0.05), ('costly', 0.049), ('uk', 0.049), ('gaussian', 0.048), ('noisy', 0.048), ('estimation', 0.047), ('domain', 0.046), ('llc', 0.046), ('lis', 0.044), ('kinematics', 0.044), ('target', 0.044), ('forests', 0.044), ('locations', 0.043), ('appearances', 0.041), ('modelled', 0.038), ('performances', 0.038), ('training', 0.036), ('refining', 0.035), ('sparsely', 0.035), ('college', 0.035), ('nearest', 0.035), ('similarities', 0.033), ('resort', 0.033), ('rgb', 0.033), ('groundtruth', 0.032), ('estimator', 0.032), ('aantodr', 0.032), ('fsoert', 0.032), ('skp', 0.032), ('thtoers', 0.032), ('iitn', 0.032), ('sioe', 0.032), ('tvhieew', 0.032), ('tinhet', 0.032), ('laeb', 0.032), ('tfe', 0.032), ('rribu', 0.032), ('hamer', 0.032), ('peols', 0.032), ('tsio', 0.032), ('artefacts', 0.032), ('foreach', 0.032), ('bise', 0.032), ('realised', 0.032), ('quantisation', 0.032), ('dvo', 0.032), ('rns', 0.032), ('ntgh', 0.032), ('rine', 0.032), ('adnadta', 0.032), ('alev', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

2 0.28386751 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image

Author: Chi Xu, Li Cheng

Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.

3 0.21566197 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data

Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt

Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.

4 0.1866819 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke

Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.

5 0.15031643 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose

Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin

Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.

6 0.14095207 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

7 0.14037825 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

8 0.13824183 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors

9 0.12095845 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

10 0.11656833 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

11 0.11637604 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

12 0.11625835 404 iccv-2013-Structured Forests for Fast Edge Detection

13 0.11547909 47 iccv-2013-Alternating Regression Forests for Object Detection and Pose Estimation

14 0.11090717 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading

15 0.11069997 352 iccv-2013-Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees

16 0.11004596 336 iccv-2013-Random Forests of Local Experts for Pedestrian Detection

17 0.10903456 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild

18 0.10762535 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation

19 0.10556214 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

20 0.10034823 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, -0.053), (2, -0.031), (3, 0.007), (4, 0.057), (5, -0.09), (6, 0.018), (7, 0.023), (8, -0.071), (9, 0.093), (10, -0.008), (11, -0.058), (12, -0.131), (13, -0.071), (14, 0.062), (15, 0.07), (16, -0.054), (17, -0.214), (18, 0.007), (19, 0.146), (20, 0.079), (21, 0.014), (22, 0.078), (23, 0.061), (24, -0.001), (25, -0.039), (26, 0.025), (27, 0.111), (28, 0.023), (29, -0.092), (30, 0.041), (31, 0.115), (32, -0.045), (33, 0.107), (34, -0.046), (35, 0.012), (36, 0.019), (37, -0.007), (38, 0.025), (39, -0.014), (40, -0.02), (41, 0.051), (42, -0.129), (43, -0.047), (44, -0.053), (45, 0.028), (46, 0.01), (47, -0.075), (48, 0.067), (49, -0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96713781 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

2 0.80353355 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image

Author: Chi Xu, Li Cheng

Abstract: We tackle the practical problem of hand pose estimation from a single noisy depth image. A dedicated three-step pipeline is proposed: Initial estimation step provides an initial estimation of the hand in-plane orientation and 3D location; Candidate generation step produces a set of 3D pose candidate from the Hough voting space with the help of the rotational invariant depth features; Verification step delivers the final 3D hand pose as the solution to an optimization problem. We analyze the depth noises, and suggest tips to minimize their negative impacts on the overall performance. Our approach is able to work with Kinecttype noisy depth images, and reliably produces pose estimations of general motions efficiently (12 frames per second). Extensive experiments are conducted to qualitatively and quantitatively evaluate the performance with respect to the state-of-the-art methods that have access to additional RGB images. Our approach is shown to deliver on par or even better results.

3 0.77971089 47 iccv-2013-Alternating Regression Forests for Object Detection and Pose Estimation

Author: Samuel Schulter, Christian Leistner, Paul Wohlhart, Peter M. Roth, Horst Bischof

Abstract: We present Alternating Regression Forests (ARFs), a novel regression algorithm that learns a Random Forest by optimizing a global loss function over all trees. This interrelates the information of single trees during the training phase and results in more accurate predictions. ARFs can minimize any differentiable regression loss without sacrificing the appealing properties of Random Forests, like low computational complexity during both, training and testing. Inspired by recent developments for classification [19], we derive a new algorithm capable of dealing with different regression loss functions, discuss its properties and investigate the relations to other methods like Boosted Trees. We evaluate ARFs on standard machine learning benchmarks, where we observe better generalization power compared to both standard Random Forests and Boosted Trees. Moreover, we apply the proposed regressor to two computer vision applications: object detection and head pose estimation from depth images. ARFs outperform the Random Forest baselines in both tasks, illustrating the importance of optimizing a common loss function for all trees.

4 0.7707727 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data

Author: Srinath Sridhar, Antti Oulasvirta, Christian Theobalt

Abstract: Tracking the articulated 3D motion of the hand has important applications, for example, in human–computer interaction and teleoperation. We present a novel method that can capture a broad range of articulated hand motions at interactive rates. Our hybrid approach combines, in a voting scheme, a discriminative, part-based pose retrieval method with a generative pose estimation method based on local optimization. Color information from a multiview RGB camera setup along with a person-specific hand model are used by the generative method to find the pose that best explains the observed images. In parallel, our discriminative pose estimation method uses fingertips detected on depth data to estimate a complete or partial pose of the hand by adopting a part-based pose retrieval strategy. This part-based strategy helps reduce the search space drastically in comparison to a global pose retrieval strategy. Quantitative results show that our method achieves state-of-the-art accuracy on challenging sequences and a near-realtime performance of 10 fps on a desktop computer.

5 0.73550385 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose

Author: Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin

Abstract: Having a sensible prior of human pose is a vital ingredient for many computer vision applications, including tracking and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flexibility and tractability, as well as estimating model parameters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the estimation of both its graph structure and its local distributions. We describe an efficient sampling scheme for our model and show its tractability for the computation of exact log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior performance to global models and parametric networks. We further illustrate our model’s ability to represent and compose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows realtime scoring of poses.

6 0.73392302 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

7 0.66689277 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

8 0.66038358 341 iccv-2013-Real-Time Body Tracking with One Depth Camera and Inertial Sensors

9 0.60045332 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

10 0.58538699 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis

11 0.57139832 46 iccv-2013-Allocentric Pose Estimation

12 0.56953579 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

13 0.56713933 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading

14 0.55448866 130 iccv-2013-Dynamic Structured Model Selection

15 0.55201536 118 iccv-2013-Discovering Object Functionality

16 0.55169296 404 iccv-2013-Structured Forests for Fast Edge Detection

17 0.54770082 143 iccv-2013-Estimating Human Pose with Flowing Puppets

18 0.5453403 336 iccv-2013-Random Forests of Local Experts for Pedestrian Detection

19 0.53596991 352 iccv-2013-Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees

20 0.51767468 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.078), (7, 0.05), (12, 0.01), (22, 0.202), (26, 0.074), (31, 0.055), (35, 0.02), (40, 0.036), (42, 0.083), (48, 0.011), (64, 0.069), (73, 0.035), (84, 0.023), (89, 0.165), (98, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86240435 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Author: Taehwan Kim, Greg Shakhnarovich, Karen Livescu

Abstract: Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain ’s “grammar”. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of fingerspelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of fingerspelled words; here we study the more natural open-vocabulary case, where the only domain knowledge is the possible fingerspelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic handshape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% us- ing the proposed semi-Markov model.

same-paper 2 0.83711195 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

Author: Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim

Abstract: This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely labelled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic technique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and synthetic data via transductive learning; (ii) showing accuracies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine articulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of- the-arts in accuracy, robustness and speed.

3 0.82092136 49 iccv-2013-An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points

Author: Lilian Calvet, Pierre Gurdjos

Abstract: This work aims at introducing a new unified Structurefrom-Motion (SfM) paradigm in which images of circular point-pairs can be combined with images of natural points. An imaged circular point-pair encodes the 2D Euclidean structure of a world plane and can easily be derived from the image of a planar shape, especially those including circles. A classical SfM method generally runs two steps: first a projective factorization of all matched image points (into projective cameras and points) and second a camera selfcalibration that updates the obtained world from projective to Euclidean. This work shows how to introduce images of circular points in these two SfM steps while its key contribution is to provide the theoretical foundations for combining “classical” linear self-calibration constraints with additional ones derived from such images. We show that the two proposed SfM steps clearly contribute to better results than the classical approach. We validate our contributions on synthetic and real images.

4 0.75906515 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework

Author: Jianping Shi, Renjie Liao, Jiaya Jia

Abstract: We propose a co-detection and labeling (CoDeL) framework to identify persons that contain self-consistent appearance in multiple images. Our CoDeL model builds upon the deformable part-based model to detect human hypotheses and exploits cross-image correspondence via a matching classifier. Relying on a Gaussian process, this matching classifier models the similarity of two hypotheses and efficiently captures the relative importance contributed by various visual features, reducing the adverse effect of scattered occlusion. Further, the detector and matching classifier together make our modelfit into a semi-supervised co-training framework, which can get enhanced results with a small amount of labeled training data. Our CoDeL model achieves decent performance on existing and new benchmark datasets.

5 0.7513963 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition

Author: Mohamed R. Amer, Sinisa Todorovic, Alan Fern, Song-Chun Zhu

Abstract: This paper presents an efficient approach to video parsing. Our videos show a number of co-occurring individual and group activities. To address challenges of the domain, we use an expressive spatiotemporal AND-OR graph (ST-AOG) that jointly models activity parts, their spatiotemporal relations, and context, as well as enables multitarget tracking. The standard ST-AOG inference is prohibitively expensive in our setting, since it would require running a multitude of detectors, and tracking their detections in a long video footage. This problem is addressed by formulating a cost-sensitive inference of ST-AOG as Monte Carlo Tree Search (MCTS). For querying an activity in the video, MCTS optimally schedules a sequence of detectors and trackers to be run, and where they should be applied in the space-time volume. Evaluation on the benchmark datasets demonstrates that MCTS enables two-magnitude speed-ups without compromising accuracy relative to the standard cost-insensitive inference.

6 0.73550624 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

7 0.73529649 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

8 0.73407567 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

9 0.73075438 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

10 0.73033488 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

11 0.72959518 291 iccv-2013-No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

12 0.72927892 180 iccv-2013-From Where and How to What We See

13 0.72901583 338 iccv-2013-Randomized Ensemble Tracking

14 0.72755021 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

15 0.72709322 168 iccv-2013-Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms

16 0.72665465 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

17 0.72662902 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

18 0.72625077 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach

19 0.7252841 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow

20 0.72468746 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions