cvpr cvpr2013 cvpr2013-359 knowledge-graph by maker-knowledge-mining

359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

Source: pdf

Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

Abstract: We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms stateof-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1second per image. To facilitate future comparisons, we release the MATLAB code1 and the pretrained models for research purposes.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk c} al Abstract We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. [sent-7, score-0.582]

2 The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. [sent-8, score-0.838]

3 Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. [sent-9, score-0.449]

4 [4] proposed several generative AAM fitting methods, some capable of realtime face tracking [17], making AAM one of the most commonly used face tracking method. [sent-18, score-0.532]

5 As an alternative, several discriminative fitting methods for AAM were proposed [16, 20, 21, 22] that utilized the available training data for learning the fitting update model and showed robustness against poor initialization. [sent-20, score-0.884]

6 However, the overall performance of these discriminative fitting methods have been shown to deteriorate significantly for crossdatabase experiments [22]. [sent-21, score-0.434]

7 [23] who proposed a fitting method, known as the Regularized Landmark MeanShift (RLMS), which outperformed AAM in terms of landmark localization accuracy and is considered to be among the state-of-the-art methods for the generic face fitting scenario. [sent-24, score-0.912]

8 However, the discriminative regression-based fitting approaches have not received much attention in the CLM framework, and hence, are the main focus of our work. [sent-25, score-0.406]

9 As our main contribution, we propose a novel Discriminative Response Map Fitting (DRMF) method for the CLM framework that outperforms both the RLMS fitting method [23] and the tree-based method [26]. [sent-26, score-0.351]

10 Moreover, we show that the robust HOG feature [12] based patch experts can significantly boost the fitting performance and robustness of the CLM framework. [sent-27, score-0.503]

11 We show that the multi-view HOG-CLM framework, which uses the RLMS fitting method [23], also outperforms the recently proposed tree-based method [26]. [sent-28, score-0.351]

12 For controlled settings, we conduct identity, pose, illumination and expression invariant experiments on MultiPIE [14] and XM2VTS [19] databases. [sent-30, score-0.146]

13 The Problem The aim of a facial deformable model is to infer from an image the facial shape (2D or 3D, sparse [9, 5] or dense [7]), controlled by a set of parameters. [sent-36, score-0.37]

14 333444444422 (a) Holistic Models that use the holistic texture-based facial representations; and (b) Part Based Models that use the local image patches around the landmark points. [sent-42, score-0.318]

15 Holistic Models Holistic models employ a shape model, typically learned by annotating n fiducial points xj = [xj, and, then, concatenating them into a vector s = [x1, . [sent-47, score-0.164]

16 A statistical shape model S can be learned from a set of training points by applying PCA. [sent-51, score-0.154]

17 Another common characteristic of holistic models is the motion model, which is defined using a warping function W(x; s). [sent-52, score-0.17]

18 The holistic models can be further divided according to the way the fitted strategy is designed. [sent-56, score-0.14]

19 In generative holis- yj]Tjn=1 tic models [4, 17], a texture model is also defined besides the shape and motion models. [sent-57, score-0.195]

20 The fitting is performed by an analysis-by-synthesis loop, where, based on the current parameters of the model, an image is rendered. [sent-58, score-0.382]

21 In probabilistic terms, these models attempt to update the required parameters by maximizing the probability of the test sample being constructed by the model. [sent-60, score-0.137]

22 Drawbacks of Holistic Models: (1) For the case of the generative holistic models, the task of defining a linear statistical model for the texture that explains the variations due to changes in identity, expressions, pose and illumination is not an easy task. [sent-62, score-0.31]

23 Part Based Models The main advantages of the part-based models are (1) partial occlusions can be easier to handled since we are interested only in facial parts, (2) the incorporation of a 3D facial shape is now straightforward since there is no warping image function to be estimated. [sent-68, score-0.401]

24 In general, in part-based representations the model setup is M = {S, D} where D is a set of detectors of the various facial parts (each part corresponds to a fiducial point of the shape model S). [sent-69, score-0.254]

25 The 3D shape model of CLMs can be described as: s(p) = sR(s0 + Φsq) + t, (1) where R (computed via pitch rx, yaw ry and roll rz), s and t = [tx ; ty; 0] control the rigid 3D rotation, scale and translations respectively, while q controls the non-rigid variations of the shape. [sent-71, score-0.203]

26 Therefore the parameters of the shape model are p = [s, rx , ry, rz , tx , ty, q] . [sent-72, score-0.169]

27 Furthermore, D is a set of linear classifiers for detection of n parts of the face and is represented as D = {wi, bi}in=1, where wi , bi is the linear detector for the ith part of the face (e. [sent-73, score-0.156]

28 In ASM and CLMs, the objective is to create a shape model from the parameters p such that the positions of the created model on the image correspond to well-aligned parts. [sent-79, score-0.135]

29 , p(s(p) | {li = 1}in=1 , I)), we propose to follow a discriminative regression framework for estimating the model parameters p. [sent-103, score-0.14]

30 That is, we propose to find a mapping from the response estimate of shape perturbations to × shape parameter updates. [sent-104, score-0.434]

31 In particular, let us assume that in the training set we introduce a perturbation Δp and around each point of the perturbed shape we have response estimates in a w w window centered around the perturbed point, Ai (Δp) = [p(li = 1 | x + xi (Δp)] . [sent-105, score-0.546]

32 Then, from the response maps around the perturbed shape {Ai (Δp)}in=1 we want to learn a function f such that f({Ai (Δp)}in=1) = Δp. [sent-106, score-0.433]

33 In the first step, the goal is to train a dictionary for the response map approximation that can be used for extracting the relevant feature for learning the fitting update model. [sent-110, score-0.727]

34 The second step involves iteratively learning the fitting update model which is achieved by a modified boosting procedure. [sent-111, score-0.428]

35 The goal here is to learn a set of weak learners that model the obvious non-linear relationship between the joint low-dimensional projection of the response maps from all landmark points and the iterative 3D shape model parameters update (Δp). [sent-112, score-0.615]

36 Training Response Patch Model Before proceeding to the learning step, the goal is to build a dictionary of response maps that can be used for representing any instance of an unseen response map. [sent-115, score-0.592]

37 Now, given the dictionary Zi, the set of weights for a response map window Ai for the point ican be found by: hio = argmhaix||Zihi − vec(Ai)||2, s. [sent-123, score-0.299]

38 Then, instead of finding a regression function from the perturbed responses {Ai (Δp)}in=1, we aim at finding a function from the low-dimensional weight vectors {hi (Δp)}in=1 to the update of parameters Δp. [sent-125, score-0.265]

39 For practical reasons and to avoid solving the optimization problem (5) for each part in the fitting procedure, instead of NMF we have also applied PCA on {Ai (Δpj)}jN=1 . [sent-126, score-0.351]

40 An illustrative example on how effectively a response map can be reconstructed by as small number of PCA components (capturing 85% of the variation) is shown in Figure 1. [sent-128, score-0.256]

41 We refer to this dictionary as Response Patch Model represented by: {M, V} : M = {mi}in=1 and V = {Vi}in=1 (6) where, mi and Vi are the mean vector and PCA bases, respectively, obtained for each of the n landmark points. [sent-129, score-0.131]

42 Training Parameter Update Model Given a set of N training images I the correspondand ing shapes S, the goal is to iteratively model the relationship between the joint low-dimensional projection of the response patches, obtained from the response patch model {M , V}, and the parameters update (Δp). [sent-132, score-0.688]

43 For this, we propose to use a modified boosting procedure in that we uniformly sample the 3D shape model parameter space within a pre-defined range around the ground truth parameters pg (See Eqn. [sent-133, score-0.192]

44 1), and iteratively model the relationship between the joint low-dimensional projection of the response patches at the current sampled shape (represented by tth sampled shape parameter pt) and the parameter update Δp (Δp = pg − pt). [sent-134, score-0.537]

45 Overview of the response patch model: (a) Original HOG based response patches. [sent-136, score-0.53]

46 (b) Reconstructed response patches using the response patch model that captured 85% variation. [sent-137, score-0.53]

47 Let T be the number of shape parameters set sampled from the shapes in S, such that the initial sampled shape parameter set is represented by P(1) : P(1) = {pj(1)}jT=1 and ψ(1) = {Δpj(1)}jT=1 (7) ‘1’ in the superscript represents the initial set (first iteration). [sent-138, score-0.239]

48 Next, extract the response patches for the shape represented by each of the sampled shape parameters in P(1) and compute the low-dimensional projection using the response patch model {M , V}. [sent-139, score-0.769]

49 Now, with the training set T(1) = {χ(1) , ψ(1) }, we learn the fitting parameter update function for the first iteration i. [sent-144, score-0.507]

50 a weak learner F(1) : F(1) ψ(1) ← χ(1) (9) We then propagate all the samples from T(1) through F(1) to generate Tn1ew and eliminate the converged samples in to generate T(2) for the second iteration. [sent-146, score-0.223]

51 Here, : Tn(1ew) convergence means that the shape root mean square error (RMSE) between the predicted shape and the ground truth shape is less than a threshold (for example, set to 2 for the experiments in this paper). [sent-147, score-0.312]

52 6 global shape parameters and the top 10 non-rigid shape parameters. [sent-152, score-0.239]

53 We propagate this new sample set through F1 and eliminate the converged samples to generate an additional replacement training set for the second iteration . [sent-157, score-0.239]

54 The training set for the second iteration is updated: Tr(e2p) T(2) ← {T(2), Tr(e2p)} (10) and the fitting parameter update function for the second iteration is learnt i. [sent-158, score-0.536]

55 Firstly, it plays an important role in insuring that the progressive fitting parameter update functions are trained on the tougher samples that have not converged in the previous iterations. [sent-162, score-0.528]

56 The above training procedure is repeated iteratively until all the training samples have converged or the maximum number of desired training iterations (η) have been reached. [sent-164, score-0.281]

57 The resulting fitting parameter update model U is a set of weak learners: U= {F(1), . [sent-165, score-0.457]

58 3 Generate training set for first iteration 4 for i= 1→ η do 5 Compute the weak learner using . [sent-176, score-0.144]

59 Fitting Procedure Given the test image Itest, the fitting parameter model U is used to compute the additive parameter Δp iteratively. [sent-186, score-0.351]

60 The goodness of fitting is judged fitting score that is computed for each iteration by update update by the simply adding the responses (i. [sent-187, score-0.919]

61 the probability values) at the landmark locations estimated by the current shape estimate of that iteration. [sent-189, score-0.192]

62 The final fitting shape is the shape with the highest fitting score. [sent-190, score-0.91]

63 Experiments We conducted generic face fitting experiments on the Multi-PIE [14], XM2VTS [19] and the LFPW [6] databases. [sent-192, score-0.473]

64 The Multi-PIE database is the most commonly used database for generic face fitting and is the best for comparison with previous approaches. [sent-193, score-0.593]

65 The XM2VTS database focuses mainly on the variations in identity and is a challenging database in a generic face fitting scenario because of the large variations in facial shape and appearance due to facial hair, glasses, ethnicity and other subtle variations. [sent-195, score-1.111]

66 Unlike the Multi-PIE and the XM2VTS, the LFPW database is a completely wild database, i. [sent-196, score-0.147]

67 consists of images captured under uncontrolled natural settings, and is an extremely challenging database for the generic face fitting experiment. [sent-198, score-0.611]

68 Another consistent aspect for all the following experiments is the initialization of the fitting procedure. [sent-204, score-0.351]

69 However, this face detector often fails on the LFPW dataset and for several images with varying illumination and pose in Multi-PIE and XM2VTS database. [sent-206, score-0.159]

70 Therefore, for the images on which the face detector failed, we used the bounding box provided by our own trained tree-based model p204 (described in the following section) and perturbed this bounding box by 10 pixels for translation, 5◦ for rotation and 0. [sent-207, score-0.147]

71 We then initialized the mean face at the centre of this perturbed bounding box. [sent-209, score-0.147]

72 We believe this is due to the use of tree-based shape model that allows for non-face like structures to occur making it hard to accurately fit the model, especially for the case of facial expressions. [sent-213, score-0.223]

73 [2] XM2VTS experiment, performed in an out-of-database scenario, highlights the ability of the DRMF method to handle unseen variations and other challenging variations like facial hair, glasses and ethnicity. [sent-214, score-0.329]

74 the response maps extracted from an unseen image can be very faithfully represented by a small set of parameters and are suited for the discriminative fitting frameworks, unlike the holistic texture based features. [sent-219, score-0.983]

75 [5] Moreover, the fitting procedure of the DRMF method is highly efficient and is real-time capable. [sent-220, score-0.382]

76 The training set consisted of roughly 8300 images which included the subjects 001-170 at poses 051, 050, 140, 041 and 130 with all six expressions at frontal illumination and one other randomly selected illumination condition. [sent-230, score-0.298]

77 The multi-view CLMs trained using the HOG feature based patch experts and the RLMS fitting method is referred as HOG-RLMS-Multiview. [sent-232, score-0.503]

78 Whereas, the multi-view CLMs trained using the HOG feature based patch experts and the DRMF fitting method (Section 3) is referred as as HOG-DRMF-Multiview. [sent-233, score-0.503]

79 Moreover, we also trained RAWRLMS-Multiview which refers to the multi-view CLM using the RAW pixel based patch experts and the RLMS fitting method. [sent-234, score-0.503]

80 This helps in showing the performance gained by using the HOG feature based patch experts instead of the RAW pixel based patch experts. [sent-235, score-0.23]

81 For the tree-based method [26], we trained the tree-based model p204 that share the patch templates across the neighboring viewpoints and is equivalent to the multi-view CLM methods, using exactly the same training data for a fair comparison with CLM based approaches. [sent-236, score-0.128]

82 Basically, training an independent tree-based model amounts to training separate models for each variation present in the dataset i. [sent-238, score-0.129]

83 With preliminary calculations, such a model will require over a month of training time and nearly 90 seconds per image of fitting time. [sent-244, score-0.401]

84 The test set consisted of roughly 7100 images which included the subjects 171-346 at poses 051, 050, 140, 041 and 130 with all six expressions at frontal illumination and one other randomly selected illumination condition. [sent-245, score-0.248]

85 We also see a substantial gain in the performance by using the HOG feature based patch experts (HOG-RLMS-Multiview) instead of the RAW pixel (RAWRLMS-Multiview). [sent-247, score-0.152]

86 The qualitative analysis of the results suggest that the tree-based methods [26], although suited for the task of face detection and rough pose estimation, are not well suited for the task of landmark localization. [sent-249, score-0.266]

87 We believe, this is due to the use of tree-based shape model that allows for the non-face like structures to occur frequently, especially for the case of facial expressions. [sent-250, score-0.223]

88 the models used for fitting are trained entirely on the Multi-PIE database. [sent-257, score-0.38]

89 We used the HOG-DRMF-Multiview, HOG-RLMS-Multiview and the tree-based model p204, used for generating results in Figure 2, to perform the fitting on the XM2VTS database. [sent-258, score-0.351]

90 The results show that not only does DRMF outperform other state-of-the-art approaches in an out-of-database experiment but also handles the challenging variations in the facial shape and appearance present in the XM2VTS database due to facial hair, glasses and ethnicity. [sent-264, score-0.521]

91 the response maps extracted from an unseen image can be very faithfully represented by a small set of parameters and are suited for the discriminative fitting frameworks, unlike the holistic texture based features. [sent-267, score-0.983]

92 LFPW Experiments For further test the ability of the DRMF method to handle unseen variations, we conduct experiments using the database that presents the challenge of uncontrolled natural settings. [sent-270, score-0.24]

93 All of these images were captured in the wild and contain large variations in pose, illumination, expression and occlusion. [sent-272, score-0.175]

94 We used the HOG-DRMF-Multiview, HOG-RLMSMultiview and the tree-based model p204 trained only on the Multi-PIE database (used previously for generating results in Figure 2) to perform fitting on the LFPW test set. [sent-277, score-0.411]

95 We then augmented the Multi-PIE training set with the LFPW training set and re-trained the CLM and treebased models. [sent-278, score-0.136]

96 These wild models were then used to perform fitting on the LFPW test set and the results are reported in Figure 4. [sent-280, score-0.467]

97 Firstly, this result clearly show that the proposed response map based discriminative fitting methodology can handle wild face and further emphasises the suitability of the parameterized response map models for the discriminative fitting frameworks. [sent-284, score-1.518]

98 This shows the advantage of the proposed response map based discriminative fitting approach that uses the available training data in a more useful way by learning the fitting update model as compared to the RLMS that rely entirely on the gauss-newton optimization based methodologies. [sent-290, score-1.14]

99 We conduct detailed experiments in a generic face fitting scenario on the databases with images captured under both the controlled (Multi-PIE and XM2VTS) and uncontrolled natural setting (LFPW Database). [sent-295, score-0.647]

100 The results show that the proposed DRMF method outperforms the state-of-the- art RLMS fitting method [23] and the recently proposed tree-based method [26] consistently across all databases. [sent-296, score-0.351]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('drmf', 0.542), ('fitting', 0.351), ('lfpw', 0.289), ('clms', 0.239), ('rlms', 0.239), ('response', 0.226), ('clm', 0.214), ('facial', 0.119), ('holistic', 0.111), ('shape', 0.104), ('aam', 0.104), ('landmark', 0.088), ('wild', 0.087), ('patch', 0.078), ('face', 0.078), ('uncontrolled', 0.078), ('update', 0.077), ('experts', 0.074), ('perturbed', 0.069), ('converged', 0.068), ('argmaxp', 0.065), ('nmf', 0.065), ('ai', 0.064), ('unseen', 0.063), ('database', 0.06), ('expressions', 0.057), ('variations', 0.056), ('discriminative', 0.055), ('regression', 0.054), ('rmse', 0.053), ('pretrained', 0.05), ('training', 0.05), ('illumination', 0.047), ('release', 0.044), ('generic', 0.044), ('akshay', 0.043), ('markup', 0.043), ('shiyang', 0.043), ('zihi', 0.043), ('yaw', 0.043), ('matlab', 0.043), ('dictionary', 0.043), ('faithfully', 0.042), ('hog', 0.041), ('conduct', 0.039), ('asms', 0.038), ('urls', 0.038), ('frontal', 0.038), ('jt', 0.037), ('texture', 0.037), ('rms', 0.036), ('yi', 0.036), ('learner', 0.036), ('treebased', 0.036), ('multipie', 0.036), ('li', 0.035), ('glasses', 0.035), ('hi', 0.035), ('identity', 0.035), ('pose', 0.034), ('pj', 0.034), ('replacement', 0.034), ('rz', 0.034), ('iew', 0.034), ('responses', 0.034), ('maps', 0.034), ('xeon', 0.033), ('vec', 0.033), ('hair', 0.033), ('suited', 0.033), ('expression', 0.032), ('samples', 0.032), ('procedure', 0.031), ('pca', 0.031), ('parameters', 0.031), ('fiducial', 0.031), ('meanshift', 0.031), ('map', 0.03), ('frameworks', 0.03), ('warping', 0.03), ('poses', 0.03), ('models', 0.029), ('iteration', 0.029), ('scenario', 0.029), ('consisted', 0.029), ('weak', 0.029), ('experiment', 0.028), ('controlled', 0.028), ('xi', 0.028), ('behind', 0.028), ('deteriorate', 0.028), ('motivations', 0.028), ('asm', 0.027), ('ghz', 0.027), ('learners', 0.026), ('tn', 0.026), ('eliminate', 0.026), ('pg', 0.026), ('generative', 0.025), ('moreover', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

2 0.16638087 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

Author: Yue Wu, Zuoguan Wang, Qiang Ji

Abstract: Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, sincefaces may have varyingfacial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.

3 0.13129419 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions

Author: Yen H. Le, Uday Kurkure, Ioannis A. Kakadiaris

Abstract: Statistical shape models, such as Active Shape Models (ASMs), sufferfrom their inability to represent a large range of variations of a complex shape and to account for the large errors in detection of model points. We propose a novel method (dubbed PDM-ENLOR) that overcomes these limitations by locating each shape model point individually using an ensemble of local regression models and appearance cues from selected model points. Our method first detects a set of reference points which were selected based on their saliency during training. For each model point, an ensemble of regressors is built. From the locations of the detected reference points, each regressor infers a candidate location for that model point using local geometric constraints, encoded by a point distribution model (PDM). The final location of that point is determined as a weighted linear combination, whose coefficients are learnt from the training data, of candidates proposed from its ensemble ’s component regressors. We use different subsets of reference points as explanatory variables for the component regressors to provide varying degrees of locality for the models in each ensemble. This helps our ensemble model to capture a larger range of shape variations as compared to a single PDM. We demonstrate the advantages of our method on the challenging problem of segmenting gene expression images of mouse brain.

4 0.11693371 438 cvpr-2013-Towards Pose Robust Face Recognition

Author: Dong Yi, Zhen Lei, Stan Z. Li

Abstract: Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for pose robust face recognition towards practical applications, which is fast, pose robust and can work well under unconstrained environments. Firstly, a 3D deformable model is built and a fast 3D model fitting algorithm is proposed to estimate the pose of face image. Secondly, a group of Gabor filters are transformed according to the pose and shape of face image for feature extraction. Finally, PCA is applied on the pose adaptive Gabor features to remove the redundances and Cosine metric is used to evaluate the similarity. The proposed method has three advantages: (1) The pose correction is applied in the filter space rather than image space, which makes our method less affected by the precision of the 3D model; (2) By combining the holistic pose transformation and local Gabor filtering, the final feature is robust to pose and other negative factors in face recognition; (3) The 3D structure and facial symmetry are successfully used to deal with self-occlusion. Extensive experiments on FERET and PIE show the proposed method outperforms state-ofthe-art methods significantly, meanwhile, the method works well on LFW.

5 0.11639147 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment

Author: Xuehan Xiong, Fernando De_la_Torre

Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, thispaperproposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. The code is available at www. .human sen sin g. . cs . cmu . edu/in t ra fa ce.

6 0.11599302 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

7 0.10612326 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

8 0.10186164 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models

9 0.10113211 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation

10 0.092896499 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

11 0.087912552 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

12 0.081849359 96 cvpr-2013-Correlation Filters for Object Alignment

13 0.080040224 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

14 0.078877933 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

15 0.078776762 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

16 0.077579521 152 cvpr-2013-Exemplar-Based Face Parsing

17 0.076101109 415 cvpr-2013-Structured Face Hallucination

18 0.074742809 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image

19 0.074292257 315 cvpr-2013-Online Robust Dictionary Learning

20 0.071223684 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.171), (1, -0.037), (2, -0.066), (3, 0.035), (4, 0.029), (5, -0.019), (6, 0.009), (7, -0.025), (8, 0.136), (9, -0.117), (10, 0.015), (11, 0.012), (12, -0.0), (13, 0.033), (14, 0.038), (15, 0.003), (16, 0.008), (17, 0.03), (18, 0.042), (19, 0.05), (20, -0.005), (21, 0.004), (22, 0.013), (23, -0.023), (24, 0.038), (25, 0.051), (26, -0.064), (27, -0.042), (28, 0.05), (29, -0.003), (30, -0.008), (31, -0.085), (32, -0.072), (33, -0.029), (34, -0.074), (35, -0.046), (36, -0.01), (37, -0.046), (38, -0.066), (39, 0.014), (40, 0.043), (41, 0.012), (42, 0.0), (43, 0.058), (44, -0.009), (45, 0.045), (46, -0.028), (47, 0.021), (48, 0.039), (49, -0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9239074 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

2 0.81049162 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment

Author: Xuehan Xiong, Fernando De_la_Torre

3 0.78589177 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

Author: Yue Wu, Zuoguan Wang, Qiang Ji

4 0.76937759 415 cvpr-2013-Structured Face Hallucination

Author: Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang

Abstract: The goal of face hallucination is to generate highresolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we propose to exploit local image structures for face hallucination. Each face image is represented in terms of facial components, contours and smooth regions. The image structure is maintained via matching gradients in the reconstructed highresolution output. For facial components, we align input images to generate accurate exemplars and transfer the high-frequency details for preserving structural consistency. For contours, we learn statistical priors to generate salient structures in the high-resolution images. A patch matching method is utilized on the smooth regions where the image gradients are preserved. Experimental results demonstrate that the proposed algorithm generates hallucinated face images with favorable quality and adaptability.

5 0.75451386 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models

Author: Robert Anderson, Björn Stenger, Vincent Wan, Roberto Cipolla

Abstract: This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a ‘talking head’, given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

6 0.69555157 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection

7 0.65359187 96 cvpr-2013-Correlation Filters for Object Alignment

8 0.62929857 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

9 0.62831342 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

10 0.61299711 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions

11 0.59833527 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

12 0.5949083 438 cvpr-2013-Towards Pose Robust Face Recognition

13 0.58266926 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

14 0.5780859 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

15 0.57672244 152 cvpr-2013-Exemplar-Based Face Parsing

16 0.56610602 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

17 0.56014812 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

18 0.55237603 166 cvpr-2013-Fast Image Super-Resolution Based on In-Place Example Regression

19 0.54790825 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

20 0.54584509 308 cvpr-2013-Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.091), (16, 0.024), (26, 0.05), (28, 0.01), (33, 0.261), (39, 0.226), (55, 0.013), (67, 0.106), (69, 0.061), (87, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91801661 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models

Author: Yibiao Zhao, Song-Chun Zhu

Abstract: Indoor functional objects exhibit large view and appearance variations, thus are difficult to be recognized by the traditional appearance-based classification paradigm. In this paper, we present an algorithm to parse indoor images based on two observations: i) The functionality is the most essentialproperty to define an indoor object, e.g. “a chair to sit on ”; ii) The geometry (3D shape) ofan object is designed to serve its function. We formulate the nature of the object function into a stochastic grammar model. This model characterizes a joint distribution over the function-geometryappearance (FGA) hierarchy. The hierarchical structure includes a scene category, , functional groups, , functional objects, functional parts and 3D geometric shapes. We use a simulated annealing MCMC algorithm to find the maximum a posteriori (MAP) solution, i.e. a parse tree. We design four data-driven steps to accelerate the search in the FGA space: i) group the line segments into 3D primitive shapes, ii) assign functional labels to these 3D primitive shapes, iii) fill in missing objects/parts according to the functional labels, and iv) synthesize 2D segmentation maps and verify the current parse tree by the Metropolis-Hastings acceptance probability. The experimental results on several challenging indoor datasets demonstrate theproposed approach not only significantly widens the scope ofindoor sceneparsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.

2 0.90210253 240 cvpr-2013-Keypoints from Symmetries by Wave Propagation

Author: Samuele Salti, Alessandro Lanza, Luigi Di_Stefano

Abstract: The paper conjectures and demonstrates that repeatable keypoints based on salient symmetries at different scales can be detected by a novel analysis grounded on the wave equation rather than the heat equation underlying traditional Gaussian scale–space theory. While the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on planar highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to untextured objects, which are hardly dealt with by current feature-based recognition pipelines. We provide experimental results on standard datasets and also contribute with a new dataset focused on untextured objects. Based on the positive experimental results, we hope to foster further research on the promising topic ofscale invariant analysis through the wave equation.

3 0.89397985 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection

Author: Xi Song, Tianfu Wu, Yunde Jia, Song-Chun Zhu

Abstract: This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an overcomplete set of shape primitives, and then organize them into a directed acyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weaklysupervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection , benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

4 0.85826188 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

Author: Byung-soo Kim, Shili Xu, Silvio Savarese

Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.

same-paper 5 0.84179294 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

6 0.83492231 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

7 0.81791526 397 cvpr-2013-Simultaneous Super-Resolution of Depth and Images Using a Single Camera

8 0.81458408 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

9 0.81415719 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

10 0.80247664 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

11 0.80016649 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

12 0.79922646 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

13 0.79781467 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

14 0.79608566 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

15 0.79525566 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

16 0.79506528 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

17 0.79454988 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

18 0.79401451 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

19 0.79390597 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision

20 0.79202688 220 cvpr-2013-In Defense of Sparsity Based Face Recognition