cvpr cvpr2013 cvpr2013-420 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xuehan Xiong, Fernando De_la_Torre
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, thispaperproposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. The code is available at www. .human sen sin g. . cs . cmu . edu/in t ra fa ce.
Reference: text
sentIndex sentText sentNum sentScore
1 It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. [sent-6, score-0.171]
2 However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. [sent-7, score-0.216]
3 During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. [sent-10, score-0.268]
4 In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. [sent-11, score-0.214]
5 We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. [sent-12, score-0.132]
6 dWoGhread iesnthe learns from training data a set of generic descent directions {Rk}. [sent-36, score-0.257]
7 p× , Newton’s method creates a sequence of updates as xk+1 = xk (xk)Jf (xk) , (1) where H(xk) ∈ ? [sent-49, score-0.163]
8 555333002 However, when applying Newton’s method to computer vision problems, three main problems arise: (1) The Hessian is positive definite at the local minimum, but it might not be positive definite elsewhere; therefore, the Newton steps might not be taken in the descent direction. [sent-57, score-0.177]
9 For instance, consider the case of image alignment using SIFT [21] features, where the SIFT can be seen as a non-differentiable image operator. [sent-60, score-0.101]
10 In order to address previous limitations, this paper proposes a Supervised Descent Method (SDM) that learns the descent directions in a supervised manner. [sent-64, score-0.233]
11 The training data consists of a set of functions {f(x, yi)} sampled at different locations yi (i. [sent-79, score-0.078]
12 Using tfherise training ed)at wa,h SeDreM th leea mrnisn a sear {iexs∗ of parameter updates, which incrementally, minimizes the mean of all NLS functions in training. [sent-82, score-0.079]
13 In testing, given an unseen y, an update is generated by projecting yspecific components onto the learned generic directions Rk. [sent-87, score-0.139]
14 We illustrate the benefits of SDM on analytic functions, and in the problem of facial feature detection and tracking. [sent-88, score-0.205]
15 We show how SDM improves state-of-the-art performance for facial feature detection in two “face in the wild” databases [26, 4] and demonstrate extremely good performance tracking faces in the YouTube celebrity database [20]. [sent-89, score-0.24]
16 Previous work This section reviews previous work on face alignment. [sent-91, score-0.101]
17 Parameterized Appearance Models (PAMs), such as Active Appearance Models [11, 14, 2], Morphable Models [6, 19], Eigentracking [5], and template tracking [22, 30] build an object appearance and shape representation by computing Principal Component Analysis (PCA) on a set of manually labeled data. [sent-92, score-0.128]
18 2a illustrates an image labeled with p landmarks (p = 66 in this case). [sent-94, score-0.155]
19 Given an image d, PAMs alignment algorithms optimize Eq. [sent-138, score-0.101]
20 [11] proposed to fit AAMs by learning a linear regression between the increment of motion parameters Δp and the appearance differences Δd. [sent-146, score-0.175]
21 The linear regressor is a numerical approximation of the Jacobian [11]. [sent-147, score-0.1]
22 Gradient Boosting, first introduced by Friedman [16], has become one of the most popular regressors in face alignment because of its efficiency and the ability to model nonlinearities. [sent-149, score-0.247]
23 [29] showed that using boosted regression for AAM discriminative fitting significantly improved over the original lin- ∂d(f∂(xp,p)) ear formulation. [sent-151, score-0.125]
24 b) Mean landmarks, x0, initialized using the face detector. [sent-155, score-0.101]
25 a new weak regressor is learned at each iteration but also the features are re-computed at the latest estimate of the landmark location. [sent-156, score-0.136]
26 Beyond the gradient boosting, Rivera and Martinez [24] explored kernel regression to map from image features directly to landmark location achieving surprising results for low-resolution images. [sent-157, score-0.171]
27 [12] investigated Random Forest regressors in the context of face alignment. [sent-159, score-0.146]
28 [25] proposed to learn a regression model in the continuous domain to efficiently and uniformly sample the motion space. [sent-161, score-0.109]
29 [32] learned a set of independent linear predictor for different local motion and then a subset of them is chosen during tracking. [sent-163, score-0.079]
30 Part-based deformable models perform alignment by maximizing the posterior likelihood of part locations given an image. [sent-164, score-0.123]
31 Zhu and Ramanan [3 1] assumed that the face shape is a tree structure (for fast inference), and used a part-based model for face detection, pose estimation, and facial feature detection. [sent-176, score-0.387]
32 Supervised Descent Method (SDM) This section describes the SDM in the context of face alignment, and unifies discriminative methods with PAMs. [sent-178, score-0.125]
33 During training, we will assume that the correct p landmarks (in our case 66) are known, and we will refer to them as x∗ (see Fig. [sent-190, score-0.109]
34 Also, to reproduce the testing scenario, we ran the face detector on the training images to provide an initial configuration of the landmarks (x0), which corresponds to an average shape (see Fig. [sent-192, score-0.294]
35 In this setting, face alignment can be framed as minimizing the following function over Δx f(x0 + Δx) = ? [sent-194, score-0.226]
36 3 we do not learn any model of shape or appearance beforehand from training data. [sent-202, score-0.087]
37 For the shape, our model will be a non-parametric one, an∗d we will optimize the landmark locations x ∈ ? [sent-207, score-0.091]
38 Recall tohpatti mini tzrea dthieti olannadl mPAarMks lo, tchatei non-rigid motion is modeled as a linear combination of shape bases learned by computing PCA on a training set. [sent-209, score-0.142]
39 Second, we use SIFT features extracted from patches around the landmarks to achieve a ro- bust representation against illumination. [sent-213, score-0.109]
40 The goal of SDM is to learn a series of descent directions and re-scaling factors (done by the Hessian in the case of Newton’s method) such that it produces a sequence of updates (xk+1 = xk + Δxk ) starting from x0 that converges to x∗ in the training data. [sent-219, score-0.398]
41 The computation of this descent direction requires the function h to be twice differentiable or expensive numerical approximations for the Jacobian and Hessian. [sent-233, score-0.245]
42 In our supervised setting, we will directly estimate R0 from training data by learning a linear regression between Δx∗ = x∗ − x0 and Δφ0. [sent-234, score-0.161]
43 To∗ use the descent direction during testing, we will not use the information of φ∗ for training. [sent-239, score-0.119]
44 To deal with non-quadratic functions, the SDM will generate a sequence of descent directions. [sent-245, score-0.143]
45 For a particular image, the Newton method generates a sequence of updates along the imagespecific gradient directions, xk = xk−1 − 2H−1Jh? [sent-246, score-0.222]
46 In contrast, SDM will learn a sequence of generic descent directions {Rk} and lbeiaasrn nte arm sesq {ubenk}c,e xk = xk−1 + Rk−1φk−1 + bk−1 , (8) such that the succession ofxk converges to x∗ for all images in the training set. [sent-249, score-0.396]
47 Learning for SDM This section illustrates how to learn Rk, bk from training data. [sent-252, score-0.116]
48 Assume that we are given a set of face images {di} adnatda . [sent-253, score-0.101]
49 la Fndo-r marks xi0, R0 and b0 are obtained by minimizing the expected loss between the predicted and the optimal landmark displacement under many possible initializations. [sent-256, score-0.123]
50 We assume that xi0 is sampled f rxom a Normal distribution whose parameters capture the variance of a face detector. [sent-262, score-0.101]
51 The subsequent Rk , bk can be learned as follows. [sent-270, score-0.087]
52 More explicitly, after Rk−1 , bk−1 is learned, we update the current landmarks estimate xik using Eq. [sent-273, score-0.192]
53 We generate a new set of training data by computing the new optimal parameter update Δxki = xi −xik and the new feature vector, φik = h(di (xik)). [sent-275, score-0.09]
54 Rk a∗n−d bxk can be learned from a new linear regressor in the new training set by minimizing arRgkm,bkin? [sent-276, score-0.141]
55 (11) The error monotonically decreases as a function of the number of regressors added. [sent-281, score-0.075]
56 Recent work on boosted regres- sion [27, 29, 15, 10] learns a set of weak regressors to model the relation between φ and Δx. [sent-286, score-0.102]
57 SDM is developed to solve a general NLS problems while boosted regression is a greedy method to approximate the function mapping from φ to Δx. [sent-287, score-0.101]
58 In the original gradient boosting formulation [16], feature vectors are fixed throughout the optimization, while [15, 10] re-sample the features at the updated landmarks for training different weak regressors. [sent-288, score-0.2]
59 Although they have shown improvements using those re-sampled features, feature re-generation in regression is not well understood and invalidates some properties of gradient boosting. [sent-289, score-0.102]
60 In SDM, the linear regressor and feature re-generation come up naturally in our derivation from Newton method. [sent-290, score-0.084]
61 7 illustrates that a Newton update can be expressed as a linear combination of the feature differences between the one extracted at current landmark locations and the template. [sent-292, score-0.193]
62 In previous work, it was unclear what the alignment error function is for discriminative methods. [sent-293, score-0.155]
63 In the second experiment, we tested the performance of the SDM in the problem of facial feature detection in two standard databases. [sent-311, score-0.132]
64 Finally, in the third experiment we illustrate how the method can be applied to facial feature tracking. [sent-312, score-0.132]
65 SDM on analytic scalar functions This experiment compares the performance in speed and accuracy of the SDM against the Newton’s method on four analytic functions. [sent-315, score-0.171]
66 186420Nehw(xto)n=S6extpS8NDewMton10 Figure 3: Normalized error versus iterations on four analytic (see Table 1) functions using the Newton method and SDM. [sent-345, score-0.128]
67 Not surprisingly, when the Newton method converges it provides more accurate estimation than SDM, because SDM uses a generic descent direction. [sent-347, score-0.187]
68 , h is linear function of x), SDM will converge in one iteration, because the average gradient evaluated at different locations will be the same for linear functions. [sent-350, score-0.107]
69 Facial feature detection This section reports experiments on facial feature detection in two “face in the wild” datasets, and compares SDM with state-of-the-art methods. [sent-354, score-0.132]
70 The two face databases are the LFPW dataset1 [4] and the LFW-A&C; dataset [26]. [sent-355, score-0.101]
71 First the face is detected using the OpenCV face detector [7]. [sent-357, score-0.202]
72 The evaluation × is performed on the images in which a face can be detected. [sent-358, score-0.101]
73 The initial shape estimate is given by centering the mean face at the normalized square. [sent-362, score-0.133]
74 The translational and scaling differences between the initial and true landmark locations are also computed, and their means and variances are used for generating Monte Carlo samples in Eq. [sent-363, score-0.134]
75 LFPW dataset contains images downloaded from the web that exhibit large variations in pose, illumination, and facial expression. [sent-369, score-0.132]
76 Note that SDM is different from the AAM trained in a discriminative manner with linear regression [11], because we do not learn any shape or appearance model (it is non-parametric). [sent-380, score-0.173]
77 The recently proposed method in [10] is based on boosted regression with pose-indexed features. [sent-385, score-0.101]
78 Most errors are caused by gradient feature’s incapability to distinguish between similar facial parts and occluding objects (e. [sent-394, score-0.16]
79 Each image is annotated with the same 66 landmarks shown in Fig. [sent-398, score-0.109]
80 Root mean squared error (RMSE) is used to measure the alignment accuracy. [sent-404, score-0.131]
81 s aPR fiAxe reported a m50ed ×ian 2 alignment error o isf 2. [sent-406, score-0.131]
82 Facial feature tracking This section tested the use of SDM for facial feature tracking. [sent-415, score-0.181]
83 We trained our model with 66 landmarks on MPIE [17] and LFW-A&C; datasets. [sent-431, score-0.109]
84 It indicates that in two consecutive frames the probability of a tracked face shifting more than 20 pixels or scaling more than 10% is less than 5%. [sent-434, score-0.101]
85 The dataset is labeled with the same 66 landmarks of our trained model except the 17 jaw points that are defined slightly different (See Fig. [sent-438, score-0.132]
86 To make sense of the numerical results, in the same figure we also show one tracking result overlayed with ground truth and in this example it gives us a RMS error of 5. [sent-444, score-0.118]
87 Also, the person-specific AAM gives unreliable results when the subject’s face is partially occluded while SDM still provides a robust estimation (See Fig. [sent-447, score-0.101]
88 It was released as a dataset for face tracking and recognition so no labeled facial landmarks are given. [sent-452, score-0.414]
89 From the videos, we can observe that SDM can reliably track facial landmarks with large pose (±45◦ yaw, ±eli9a0b◦l yro tlrla acnkd ,f a±ci3a0l◦ l pitch), rokcscl wuistihon la argnde i pllousmei (n±at4io5n changes. [sent-455, score-0.311]
90 SDM learns in a supervised manner generic descent directions, and is able to overcome many drawbacks of second order optimization schemes, such as nondifferentiability and expensive computation of the Jacobians and Hessians. [sent-460, score-0.235]
91 We have illustrated the benefits of our approach in the minimization of analytic functions, and in the problem of facial feature detection and tracking. [sent-462, score-0.205]
92 We have shown how SDM outperforms state-of-the-art approaches in facial feature detection and tracking in challenging databases. [sent-463, score-0.181]
93 Beyond the SDM, an important contribution of this work in the context of algorithms for image alignment is to propose the error function of Eq. [sent-464, score-0.131]
94 Existing discriminative methods for facial alignment pose the problem as a regression one, but lack a well-defined alignment error function. [sent-466, score-0.483]
95 3 allows to establish a direct connection with existing PAMs for face alignment, and apply existing algorithms for minimizing it such as Gauss-Newton (or the supervised version proposed in this paper). [sent-468, score-0.162]
96 Boix for the implementation of the linear and kernel regression method in the fall of 2008. [sent-473, score-0.093]
97 Damped newton algorithms for matrix factorization with missing data. [sent-537, score-0.276]
98 Robust and accurate shape model fitting using random forest regression voting. [sent-566, score-0.106]
99 Face detection, pose estimation, and landmark localization in the wild. [sent-674, score-0.09]
100 The first two rows show faces with strong changes in pose and illumination, and faces partially occluded. [sent-683, score-0.081]
wordName wordTfidf (topN-words)
[('sdm', 0.778), ('newton', 0.276), ('nls', 0.168), ('facial', 0.132), ('descent', 0.119), ('hessian', 0.117), ('landmarks', 0.109), ('xk', 0.107), ('alignment', 0.101), ('lfpw', 0.101), ('face', 0.101), ('saragih', 0.078), ('pams', 0.076), ('jacobian', 0.074), ('regression', 0.074), ('jf', 0.073), ('analytic', 0.073), ('rk', 0.069), ('landmark', 0.069), ('bk', 0.062), ('aams', 0.059), ('ced', 0.054), ('aam', 0.052), ('tracking', 0.049), ('torre', 0.049), ('directions', 0.047), ('celebrities', 0.047), ('xik', 0.046), ('regressors', 0.045), ('morphable', 0.044), ('rms', 0.042), ('regressor', 0.042), ('cootes', 0.041), ('numerical', 0.039), ('converges', 0.038), ('rivera', 0.038), ('txep', 0.038), ('zimmermann', 0.038), ('pca', 0.038), ('update', 0.037), ('supervised', 0.037), ('youtube', 0.035), ('motion', 0.035), ('nonlinear', 0.033), ('updates', 0.032), ('boosting', 0.032), ('shape', 0.032), ('training', 0.031), ('pra', 0.031), ('imagespecific', 0.031), ('approximations', 0.031), ('error', 0.03), ('sift', 0.03), ('faces', 0.03), ('learns', 0.03), ('generic', 0.03), ('la', 0.03), ('twice', 0.029), ('tresadern', 0.029), ('celebrity', 0.029), ('eigentracking', 0.029), ('definite', 0.029), ('di', 0.028), ('gradient', 0.028), ('differentiable', 0.027), ('erf', 0.027), ('boosted', 0.027), ('stays', 0.026), ('wild', 0.025), ('functions', 0.025), ('learned', 0.025), ('ua', 0.024), ('discriminative', 0.024), ('minimizing', 0.024), ('appearance', 0.024), ('sequence', 0.024), ('labeled', 0.023), ('minimizes', 0.023), ('illustrates', 0.023), ('jd', 0.023), ('xto', 0.023), ('differences', 0.023), ('derivation', 0.023), ('xi', 0.022), ('opencv', 0.022), ('locations', 0.022), ('pose', 0.021), ('wx', 0.021), ('testing', 0.021), ('matthews', 0.02), ('sequences', 0.02), ('translational', 0.02), ('carlo', 0.019), ('converge', 0.019), ('observe', 0.019), ('optimization', 0.019), ('linear', 0.019), ('doll', 0.019), ('ks', 0.019), ('constrained', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment
Author: Xuehan Xiong, Fernando De_la_Torre
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, thispaperproposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. The code is available at www. .human sen sin g. . cs . cmu . edu/in t ra fa ce.
Author: Yue Wu, Zuoguan Wang, Qiang Ji
Abstract: Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, sincefaces may have varyingfacial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.
Author: Dong Chen, Xudong Cao, Fang Wen, Jian Sun
Abstract: Making a high-dimensional (e.g., 100K-dim) feature for face recognition seems not a good idea because it will bring difficulties on consequent training, computation, and storage. This prevents further exploration of the use of a highdimensional feature. In this paper, we study the performance of a highdimensional feature. We first empirically show that high dimensionality is critical to high performance. A 100K-dim feature, based on a single-type Local Binary Pattern (LBP) descriptor, can achieve significant improvements over both its low-dimensional version and the state-of-the-art. We also make the high-dimensional feature practical. With our proposed sparse projection method, named rotated sparse regression, both computation and model storage can be reduced by over 100 times without sacrificing accuracy quality.
4 0.12209017 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
5 0.11639147 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models
Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
Abstract: We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms stateof-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1second per image. To facilitate future comparisons, we release the MATLAB code1 and the pretrained models for research purposes.
6 0.1064721 438 cvpr-2013-Towards Pose Robust Face Recognition
7 0.099888384 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
8 0.096972041 96 cvpr-2013-Correlation Filters for Object Alignment
9 0.096430384 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
10 0.095059365 152 cvpr-2013-Exemplar-Based Face Parsing
11 0.090854891 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
12 0.089229636 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
13 0.087328285 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models
14 0.082641035 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
15 0.079090931 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
16 0.073417962 415 cvpr-2013-Structured Face Hallucination
17 0.072467536 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
18 0.072182678 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking
19 0.069615424 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking
20 0.068566389 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency
topicId topicWeight
[(0, 0.154), (1, -0.012), (2, -0.053), (3, 0.006), (4, 0.029), (5, -0.01), (6, 0.01), (7, -0.102), (8, 0.162), (9, -0.09), (10, 0.042), (11, 0.002), (12, 0.013), (13, 0.039), (14, 0.031), (15, 0.016), (16, 0.001), (17, 0.007), (18, 0.033), (19, 0.064), (20, -0.015), (21, -0.006), (22, 0.005), (23, 0.001), (24, 0.032), (25, 0.079), (26, -0.032), (27, 0.007), (28, 0.076), (29, 0.04), (30, -0.033), (31, -0.036), (32, -0.074), (33, -0.019), (34, -0.073), (35, -0.0), (36, -0.011), (37, -0.044), (38, 0.013), (39, -0.009), (40, 0.035), (41, 0.0), (42, -0.029), (43, 0.017), (44, -0.032), (45, 0.02), (46, -0.025), (47, -0.005), (48, 0.009), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.91605461 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment
Author: Xuehan Xiong, Fernando De_la_Torre
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, thispaperproposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. The code is available at www. .human sen sin g. . cs . cmu . edu/in t ra fa ce.
Author: Yue Wu, Zuoguan Wang, Qiang Ji
Abstract: Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, sincefaces may have varyingfacial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.
3 0.81143385 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models
Author: Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
Abstract: We present a novel discriminative regression based approach for the Constrained Local Models (CLMs) framework, referred to as the Discriminative Response Map Fitting (DRMF) method, which shows impressive performance in the generic face fitting scenario. The motivation behind this approach is that, unlike the holistic texture based features used in the discriminative AAM approaches, the response map can be represented by a small set of parameters and these parameters can be very efficiently used for reconstructing unseen response maps. Furthermore, we show that by adopting very simple off-the-shelf regression techniques, it is possible to learn robust functions from response maps to the shape parameters updates. The experiments, conducted on Multi-PIE, XM2VTS and LFPW database, show that the proposed DRMF method outperforms stateof-the-art algorithms for the task of generic face fitting. Moreover, the DRMF method is computationally very efficient and is real-time capable. The current MATLAB implementation takes 1second per image. To facilitate future comparisons, we release the MATLAB code1 and the pretrained models for research purposes.
4 0.78359437 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models
Author: Robert Anderson, Björn Stenger, Vincent Wan, Roberto Cipolla
Abstract: This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a ‘talking head’, given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
5 0.7772966 415 cvpr-2013-Structured Face Hallucination
Author: Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang
Abstract: The goal of face hallucination is to generate highresolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we propose to exploit local image structures for face hallucination. Each face image is represented in terms of facial components, contours and smooth regions. The image structure is maintained via matching gradients in the reconstructed highresolution output. For facial components, we align input images to generate accurate exemplars and transfer the high-frequency details for preserving structural consistency. For contours, we learn statistical priors to generate salient structures in the high-resolution images. A patch matching method is utilized on the smooth regions where the image gradients are preserved. Experimental results demonstrate that the proposed algorithm generates hallucinated face images with favorable quality and adaptability.
6 0.70287013 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
7 0.69031143 438 cvpr-2013-Towards Pose Robust Face Recognition
8 0.68555474 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection
9 0.66709733 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
10 0.66160637 152 cvpr-2013-Exemplar-Based Face Parsing
11 0.65277356 96 cvpr-2013-Correlation Filters for Object Alignment
12 0.62784231 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
13 0.62015408 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition
14 0.61652178 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
15 0.61220092 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
16 0.59305012 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
17 0.59260118 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences
18 0.58550143 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
19 0.57580751 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
20 0.54558671 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
topicId topicWeight
[(10, 0.111), (16, 0.018), (26, 0.097), (33, 0.249), (39, 0.024), (55, 0.194), (67, 0.064), (69, 0.044), (87, 0.082)]
simIndex simValue paperId paperTitle
1 0.90231198 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models
Author: Robert Anderson, Björn Stenger, Vincent Wan, Roberto Cipolla
Abstract: This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a ‘talking head’, given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
2 0.88965929 26 cvpr-2013-A Statistical Model for Recreational Trails in Aerial Images
Author: Andrew Predoehl, Scott Morris, Kobus Barnard
Abstract: unkown-abstract
3 0.88055617 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
Author: Luca Del_Pero, Joshua Bowdish, Bonnie Kermgard, Emily Hartley, Kobus Barnard
Abstract: We develop a comprehensive Bayesian generative model for understanding indoor scenes. While it is common in this domain to approximate objects with 3D bounding boxes, we propose using strong representations with finer granularity. For example, we model a chair as a set of four legs, a seat and a backrest. We find that modeling detailed geometry improves recognition and reconstruction, and enables more refined use of appearance for scene understanding. We demonstrate this with a new likelihood function that re- wards 3D object hypotheses whose 2D projection is more uniform in color distribution. Such a measure would be confused by background pixels if we used a bounding box to represent a concave object like a chair. Complex objects are modeled using a set or re-usable 3D parts, and we show that this representation captures much of the variation among object instances with relatively few parameters. We also designed specific data-driven inference mechanismsfor eachpart that are shared by all objects containing that part, which helps make inference transparent to the modeler. Further, we show how to exploit contextual relationships to detect more objects, by, for example, proposing chairs around and underneath tables. We present results showing the benefits of each of these innovations. The performance of our approach often exceeds that of state-of-the-art methods on the two tasks of room layout estimation and object recognition, as evaluated on two bench mark data sets used in this domain. work. 1) Detailed geometric models, such as tables with legs and top (bottom left), provide better reconstructions than plain boxes (top right), when supported by image features such as geometric context [5] (top middle), or an approach to using color introduced here. 2) Non convex models allow for complex configurations, such as a chair under a table (bottom middle). 3) 3D contextual relationships, such as chairs being around a table, allow identifying objects supported by little image evidence, like the chair behind the table (bottom right). Best viewed in color.
same-paper 4 0.85607404 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment
Author: Xuehan Xiong, Fernando De_la_Torre
Abstract: Many computer vision problems (e.g., camera calibration, image alignment, structure from motion) are solved through a nonlinear optimization method. It is generally accepted that 2nd order descent methods are the most robust, fast and reliable approaches for nonlinear optimization ofa general smoothfunction. However, in the context of computer vision, 2nd order descent methods have two main drawbacks: (1) The function might not be analytically differentiable and numerical approximations are impractical. (2) The Hessian might be large and not positive definite. To address these issues, thispaperproposes a Supervised Descent Method (SDM) for minimizing a Non-linear Least Squares (NLS) function. During training, the SDM learns a sequence of descent directions that minimizes the mean of NLS functions sampled at different points. In testing, SDM minimizes the NLS objective using the learned descent directions without computing the Jacobian nor the Hessian. We illustrate the benefits of our approach in synthetic and real examples, and show how SDM achieves state-ofthe-art performance in the problem of facial feature detec- tion. The code is available at www. .human sen sin g. . cs . cmu . edu/in t ra fa ce.
5 0.83763021 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
6 0.83203578 440 cvpr-2013-Tracking People and Their Objects
7 0.82986516 152 cvpr-2013-Exemplar-Based Face Parsing
8 0.82551879 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.82253253 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
10 0.82134563 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
11 0.82086068 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
12 0.82058549 381 cvpr-2013-Scene Parsing by Integrating Function, Geometry and Appearance Models
13 0.81945395 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
14 0.81938773 414 cvpr-2013-Structure Preserving Object Tracking
15 0.81909627 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
16 0.81864876 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
17 0.81796068 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
18 0.81768692 325 cvpr-2013-Part Discovery from Partial Correspondence
19 0.81630605 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
20 0.81568176 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments