iccv iccv2013 iccv2013-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract Face recognition with large pose and illumination variations is a challenging problem in computer vision. [sent-6, score-0.253]
2 This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. [sent-7, score-0.536]
3 Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. [sent-8, score-0.346]
4 Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. [sent-9, score-0.611]
5 This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. [sent-10, score-0.301]
6 In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. [sent-11, score-0.469]
7 The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. [sent-12, score-0.359]
8 Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods. [sent-13, score-0.541]
9 Introduction In many practical applications, the pose and illumination changes become the bottleneck for face recognition [36]. [sent-15, score-0.475]
10 (b) shows some images of two identities, including the original image (left) and the reconstructed image in the canonical view (right) from the FIP features. [sent-33, score-0.28]
11 The reconstructed images remove the pose and illumination variations and retain the intrinsic face structures of the identities. [sent-34, score-0.684]
12 [17] represented a test face as a linear combination of training images, and utilized the linear regression coefficients as features for face recognition. [sent-39, score-0.538]
13 3D-based methods usually capture 3D face data or estimate 3D models from 2D input, and try to match them to a 2D probe face image. [sent-40, score-0.603]
14 Such methods make it possible to synthesize any view of the probe face, which makes them generally more robust to pose variation. [sent-41, score-0.222]
15 [18] first generated a virtual view for the probe face by using a set of 3D displacement fields sampled from a 3D face database, and then matched the synthesized face with the gallery faces. [sent-43, score-0.948]
16 make assumptions about how illumination affects the face images, and use these assumptions to model and remove the illumination effect. [sent-51, score-0.461]
17 With this augmented gallery, they adopted sparse coding to perform face recognition. [sent-54, score-0.275]
18 In this paper, unlike previous works that either build physical models or make statistical assumptions, we propose a novel face representation, the face identitypreserving (FIP) features, which are directly extracted from face images with arbitrary poses and illuminations. [sent-59, score-0.885]
19 This new representation can significantly remove pose and illumination variations, while maintaining the discriminativeness across identities, as shown in Fig. [sent-60, score-0.249]
20 LBP [2], Gabor [3 1], and LE [4], which cannot recover the original images, the FIP features can reconstruct face images in the frontal pose and with neutral illumination (we call it the canonical view) of the same identity, as shown in Fig. [sent-64, score-0.735]
21 With this attractive property, the conventional descriptors and learning algorithms can utilize our reconstructed face images in the canonical view as input so as to eliminate the negative effects from poses and illuminations. [sent-66, score-0.69]
22 Specifically, we present a new deep network to learn the FIP features. [sent-67, score-0.266]
23 It utilizes face images with arbitrary pose and illumination variations of an identity as input, and reconstructs a face in the canonical view of the same identity as the target (see Fig. [sent-68, score-0.99]
24 First, input images are encoded through feature extraction layers, which have three locally connected layers and two pooling layers stacked alternately. [sent-70, score-0.398]
25 Each layer captures face features at a different scale. [sent-71, score-0.371]
26 3, the first locally connected layer outputs 32 feature maps. [sent-73, score-0.204]
27 Each map has a large number of high responses outside the face region, which mainly capture pose information, and some high responses inside the face region, which capture face structures (red indicates large response and blue indicates no response). [sent-74, score-0.928]
28 On the output feature maps of the second locally connected layer, high responses outside the face region have been significantly reduced, which indicates that it discards most pose variations while retain the face structures. [sent-75, score-0.805]
29 The third locally connected layer outputs the FIP features, which is sparse and identity-preserving. [sent-76, score-0.204]
30 Second, the FIP features recover the face image in the canonical view using a fully-connected reconstruction layer. [sent-77, score-0.474]
31 We then update all the parameters by back-propagating the summed squared reconstruction error between the reconstructed image and the ground truth. [sent-81, score-0.197]
32 Existing deep learning methods for face recognition are generally in two categories: (1) unsupervised learning features with deep models and then using discriminative methods (e. [sent-82, score-0.719]
33 SVM) for classification [21, 10, 15]; (2) directly using class labels as supervision of deep models [6, 24]. [sent-84, score-0.204]
34 In the first category, features related to identity, poses, and lightings are coupled when learned by deep models. [sent-85, score-0.247]
35 In the second category, a ‘0/1 ’ class label is a much weaker supervision, compared with ours using a face image (with thousands of pixels) of the canonical view as supervision. [sent-88, score-0.392]
36 We require the deep model to fully reconstruct the face in the canonical view rather than simply predicting class labels, and this strong regularization is more effective to avoid overfitting. [sent-89, score-0.632]
37 This design is suitable for face recognition, where a canonical view exists. [sent-90, score-0.392]
38 Different from convolutional neural networks whose filters share weights, our filers are localized and do not share weights since we assume different face regions should employ different features. [sent-91, score-0.409]
39 (1) We propose a new deep network that combines the feature extraction layers and the reconstruction layer. [sent-93, score-0.469]
40 These features can eliminate the poses and illumination variations, and 114 Feature Extraction Layers Reconstruction Layer n1=48 ? [sent-95, score-0.217]
41 It combines the feature extraction layers and reconstruction layer. [sent-101, score-0.203]
42 The feature extraction layers include three locally connected layers and two pooling layers. [sent-102, score-0.376]
43 FIP features can be used to recover the face image y in the canonical view. [sent-105, score-0.404]
44 (2) Unlike conventional face descriptors, the FIP features can be used to reconstruct a face image in the canonical view. [sent-109, score-0.703]
45 We also demonstrate significant improvement of the existing methods, when they are applied on our reconstructed face images. [sent-110, score-0.378]
46 (3) Unlike existing works that need to know the pose of a probe face, so as to build models for different poses specifically, our method can extract the FIP features without knowing information on pose and illumination. [sent-111, score-0.419]
47 Related Work This section reviews related works on learning-based face descriptors and deep models for feature learning. [sent-114, score-0.508]
48 [4] devised an unsupervised feature learning method (LE) with random- projection trees and PCA trees, and adopted PCA to gain a compact face descriptor. [sent-117, score-0.275]
49 [35] extended [4] by introducing an inter-modality encoding method, which can match face images in two modalities, e. [sent-119, score-0.278]
50 Our FIP features are learned with a multi-layer deep model in a supervised manner, and have more discriminative and representative power than the above works. [sent-127, score-0.249]
51 The deep models learn representations by stacking many hidden layers, which are layer-wisely trained in an unsupervised manner. [sent-131, score-0.204]
52 For example, the deep belief networks [9] (DBN) and deep Boltzmann machine [22] (DBM) stack many layers of restricted Boltzmann machines (RBM) and can extract different levels offeatures. [sent-132, score-0.576]
53 [24] proposed a hybrid Convolutional Neural Network-Restricted Boltzmann × Machine (CNN-RBM) model to learn relational features for comparing face similarity. [sent-137, score-0.282]
54 Unlike DBN and DBM employ fully connected layers, our deep network combines both locally and fully connected layers, which enables it to extract both the local and global information. [sent-138, score-0.427]
55 The locally connected architecture of our deep network is similar to CRBM [10], but we learn the network with a supervised scheme and the FIP features are required to recover the frontal face image. [sent-139, score-0.821]
56 Therefore, this method is more robust to pose and illumination variations, as shown in Fig. [sent-140, score-0.19]
57 The input is a face image x0 under an arbitrary pose and illumination, and the output is a frontal face image under neutral illumination y. [sent-145, score-0.785]
58 The feature extraction layers h6av ×e 9th6re =e 115 locally connected layers and two pooling layers, which encode x0 into FIP features x3. [sent-147, score-0.402]
59 3Note that in the conventional deep model [9], there is a bias term b, so that the output is σ(Wx + b). [sent-190, score-0.23]
60 Finally, t]h ∈e R reconstruction layer transforms the FIP features to the frontal face image y, through a weight matrix ∈ Rn0,n2 , x3 W4 y = σ(W4x3). [sent-202, score-0.467]
61 Training Training our deep network requires estimating all the weight matrices {Wi} as introduced above, which is challenging m beatcraiucsees {oWf th}e amsil ilnitoronds uocfe parameters. [sent-204, score-0.306]
62 8, X1 = {xi1}im=1 is a set of outputs of the first locally connected layer xbef}ore pooling and P is also a fixed binary matrix, which sums together the corresponding pixels and rescales the results to the same size as Y . [sent-237, score-0.214]
63 In our deep network, there are three different expressions of ei. [sent-265, score-0.204]
64 2 demonstrates that classical face recognition methods can be significantly improved when applied on our reconstructed face images in the canonical view. [sent-284, score-0.788]
65 To extensively evaluate our method under different poses and illuminations, we select the MultiPIE face database [7], which contains 754,204 images of 337 identities. [sent-286, score-0.356]
66 Like the previous methods [3, 18, 17], we evaluate our algorithm on a subset of the MultiPIE database, where each identity has images from all the four sections under seven poses from yaw angles −45◦ ∼ +45◦, and 20 illuminations marked as ID 0a0ng-1le9s i n− 4M5ultiPIE. [sent-289, score-0.216]
67 This is to evaluate the robustness when both pose and illumination variations are present. [sent-308, score-0.224]
68 ∼ +45◦ except 0◦, and 19 mngar fkreomd as 4ID5 00-19 except 07, as input to train our deep network. [sent-315, score-0.254]
69 In the test stage, in order to better demonstrate the proposed methods, we directly adopt the FIP and the reconstructed images (denoted as RL) as features for face recognition. [sent-423, score-0.426]
70 1, LGBP is a 2D-based method, while VAAM, FA-EGFC, and SAEGFC used 3D face models. [sent-428, score-0.256]
71 Note that LGBP, VAAM, and SA-EGFC need to know the pose of a probe, which means that they build different models to account for different poses specifically. [sent-430, score-0.204]
72 We do not need to know the pose of the probe, since our deep network can extract × FIP features and reconstruct the face image in the canonical view given a probe under any pose and any illumination. [sent-431, score-1.035]
73 It is interesting to note that RL even outperforms all the 3D-based models, which verifies that our reconstructed face images in the canonical view are of high quality and robust to pose changes. [sent-436, score-0.634]
74 4 shows several reconstructed images, indicating that RL can effectively remove the variations of poses and illuminations, while still retains the intrinsic shapes and structures of the identities. [sent-438, score-0.29]
75 Second, FIP features are better than the two learningbased descriptors and the other three methods except SAEGFC, which used the 3D model and required the pose of the probe. [sent-520, score-0.202]
76 Setting-II covers only pose variations and Setting-III covers both pose and illumination variations. [sent-529, score-0.322]
77 Note that the poses of probes in [17] are assumed to be given, which means they trained a different model for each pose separately. [sent-531, score-0.212]
78 [17] did not report detailed recognition rates when the poses of the probes are unknown, except for describing a 20-30% decline of the overall recognition rate. [sent-532, score-0.252]
79 For Setting-III, RL+LDA is compared with [17] on images with both pose and illumination variations. [sent-533, score-0.212]
80 First, we show the advantage of our reconstructed images in the canonical view over the original × images. [sent-542, score-0.28]
81 5(a), where the results on the original images and the reconstructed images are illustrated as solid bars (front) and hollow bars (back). [sent-548, score-0.257]
82 They can achieve relatively high performance on different poses, because our reconstruction layer can successfully recover the frontal face image. [sent-550, score-0.443]
83 h W 59e then adopt the χ2 distance, PCA, LDA, and PCA+LDA for face recognition. [sent-575, score-0.256]
84 The conventional face recognition methods can be improved when they are applied on our reconstructed images. [sent-589, score-0.433]
85 The results of three descriptors (pixel intensity, Gabor, and LBP) and four face recognition methods (? [sent-590, score-0.314]
86 The hollow bars are the performance of these methods applied on our reconstructed images, while the solid bars are on the original images. [sent-592, score-0.213]
87 Conclusion We have proposed identity-preserving features for face recognition. [sent-594, score-0.282]
88 The FIP features are not only robust to pose and illumination variations, but can also be used to reconstruct face images in the canonical view. [sent-595, score-0.633]
89 FIP is learned using a deep model that contains feature extraction layers and a reconstruction layer. [sent-596, score-0.39]
90 We show that FIP features outperform the state-of-the-art face recognition methods. [sent-597, score-0.311]
91 We have aslo improved classical face recognition methods by applying them on our reconstructed face images. [sent-598, score-0.663]
92 We clearly see that our method can remove the effects of both poses and illuminations, and retains the intrinsic face shapes and structures of the identity. [sent-614, score-0.39]
93 Fully automatic pose-invariant face recognition via 3d pose normalization. [sent-624, score-0.383]
94 Wide-baseline stereo for face recognition with large pose variation. [sent-638, score-0.383]
95 Learning hierarchical representations for face verification with convolutional deep belief networks. [sent-674, score-0.557]
96 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-712, score-0.253]
97 Discriminant image filter learning for face recognition with local binary pattern like representation. [sent-721, score-0.285]
98 Morphable displacement field based image matching for face recognition across pose. [sent-736, score-0.285]
99 Toward a practical face recognition system: Robust alignment and illumination by sparse representation. [sent-788, score-0.396]
100 Local gabor binary pattern histogram sequence (lgbphs): A novel non-statistical model for face representation and recognition. [sent-842, score-0.347]
wordName wordTfidf (topN-words)
[('fip', 0.737), ('face', 0.256), ('lda', 0.211), ('rl', 0.209), ('deep', 0.204), ('crbm', 0.131), ('reconstructed', 0.122), ('lbp', 0.11), ('canonical', 0.103), ('layers', 0.102), ('pose', 0.098), ('illumination', 0.092), ('probe', 0.091), ('gabor', 0.091), ('layer', 0.089), ('poses', 0.078), ('convolutional', 0.073), ('illuminations', 0.068), ('lgbp', 0.066), ('vaam', 0.066), ('pca', 0.064), ('identities', 0.062), ('network', 0.062), ('boltzmann', 0.06), ('multipie', 0.058), ('gallery', 0.056), ('rates', 0.055), ('saegfc', 0.049), ('identity', 0.048), ('locally', 0.048), ('connected', 0.048), ('wki', 0.044), ('frontal', 0.042), ('neutral', 0.041), ('cuhk', 0.041), ('le', 0.04), ('discriminativeness', 0.038), ('reconstruction', 0.037), ('avg', 0.036), ('probes', 0.036), ('reconstruct', 0.036), ('architecture', 0.035), ('variations', 0.034), ('view', 0.033), ('eonic', 0.033), ('gtoear', 0.033), ('bars', 0.031), ('recognition', 0.029), ('asthana', 0.029), ('dbm', 0.029), ('hollow', 0.029), ('descriptors', 0.029), ('neural', 0.029), ('pooling', 0.029), ('extraction', 0.028), ('know', 0.028), ('intelligence', 0.028), ('transactions', 0.027), ('sc', 0.026), ('features', 0.026), ('conventional', 0.026), ('filters', 0.026), ('networks', 0.025), ('momentum', 0.025), ('except', 0.025), ('rectified', 0.025), ('kong', 0.025), ('learningbased', 0.024), ('wagner', 0.024), ('belief', 0.024), ('indicates', 0.023), ('matrices', 0.023), ('retain', 0.023), ('images', 0.022), ('session', 0.022), ('sessions', 0.022), ('dbn', 0.022), ('remove', 0.021), ('chinese', 0.021), ('eliminate', 0.021), ('li', 0.02), ('summed', 0.02), ('ans', 0.019), ('supervised', 0.019), ('recover', 0.019), ('sparse', 0.019), ('retains', 0.019), ('feature', 0.019), ('international', 0.018), ('update', 0.018), ('unlike', 0.017), ('wx', 0.017), ('weight', 0.017), ('machine', 0.017), ('averaged', 0.017), ('sketch', 0.017), ('coupled', 0.017), ('discriminant', 0.017), ('combines', 0.017), ('structures', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
2 0.2066817 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
Author: Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Abstract: One of the most challenging task in face recognition is to identify people with varied poses. Namely, the test faces have significantly different poses compared with the registered faces. In this paper, we propose a high-level feature learning scheme to extract pose-invariant identity feature for face recognition. First, we build a single-hiddenlayer neural network with sparse constraint, to extractposeinvariant feature in a supervised fashion. Second, we further enhance the discriminative capability of the proposed feature by using multiple random faces as the target values for multiple encoders. By enforcing the target values to be uniquefor inputfaces over differentposes, the learned highlevel feature that is represented by the neurons in the hidden layer is pose free and only relevant to the identity information. Finally, we conduct face identification on CMU MultiPIE, and verification on Labeled Faces in the Wild (LFW) databases, where identification rank-1 accuracy and face verification accuracy with ROC curve are reported. These experiments demonstrate that our model is superior to oth- er state-of-the-art approaches on handling pose variations.
3 0.17474712 206 iccv-2013-Hybrid Deep Learning for Face Verification
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
4 0.16773969 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
Author: Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
Abstract: Cascaded classifiers1 have been widely used in pedestrian detection and achieved great success. These classifiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of backpropagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a specific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classifier handles samples at a different difficulty level. Unsupervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental results show that the training strategy helps to avoid overfitting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches.
5 0.16367085 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
Author: Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
Abstract: Over the past two decades, a number of face recognition methods have been proposed in the literature. Most of them use holistic face images to recognize people. However, human faces are easily occluded by other objects in many real-world scenarios and we have to recognize the person of interest from his/her partial faces. In this paper, we propose a new partial face recognition approach by using feature set matching, which is able to align partial face patches to holistic gallery faces automatically and is robust to occlusions and illumination changes. Given each gallery image and probe face patch, we first detect keypoints and extract their local features. Then, we propose a Metric Learned ExtendedRobust PointMatching (MLERPM) method to discriminatively match local feature sets of a pair of gallery and probe samples. Lastly, the similarity of two faces is converted as the distance between two feature sets. Experimental results on three public face databases are presented to show the effectiveness of the proposed approach.
6 0.13509218 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
7 0.13450108 157 iccv-2013-Fast Face Detector Training Using Tailored Views
8 0.13334158 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
9 0.12045366 392 iccv-2013-Similarity Metric Learning for Face Recognition
10 0.11870322 153 iccv-2013-Face Recognition Using Face Patch Networks
11 0.11155423 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
12 0.11067541 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
13 0.10840603 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
14 0.10216814 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
15 0.10215209 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
16 0.1007385 444 iccv-2013-Viewing Real-World Faces in 3D
17 0.092935324 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
18 0.091943927 261 iccv-2013-Markov Network-Based Unified Classifier for Face Identification
19 0.087918125 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
20 0.086896859 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
topicId topicWeight
[(0, 0.158), (1, 0.023), (2, -0.084), (3, -0.089), (4, -0.036), (5, -0.118), (6, 0.185), (7, 0.075), (8, -0.008), (9, -0.009), (10, -0.003), (11, 0.042), (12, 0.044), (13, -0.044), (14, -0.036), (15, 0.03), (16, -0.043), (17, -0.006), (18, 0.051), (19, 0.142), (20, -0.039), (21, -0.102), (22, 0.052), (23, -0.121), (24, -0.078), (25, 0.07), (26, -0.071), (27, 0.115), (28, -0.027), (29, 0.184), (30, 0.033), (31, -0.018), (32, 0.065), (33, -0.083), (34, -0.05), (35, 0.006), (36, -0.091), (37, 0.01), (38, 0.029), (39, 0.045), (40, -0.031), (41, -0.051), (42, -0.009), (43, -0.044), (44, -0.033), (45, 0.049), (46, -0.008), (47, 0.043), (48, -0.03), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.93704379 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
2 0.88959134 206 iccv-2013-Hybrid Deep Learning for Face Verification
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
3 0.78024238 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
Author: Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Abstract: One of the most challenging task in face recognition is to identify people with varied poses. Namely, the test faces have significantly different poses compared with the registered faces. In this paper, we propose a high-level feature learning scheme to extract pose-invariant identity feature for face recognition. First, we build a single-hiddenlayer neural network with sparse constraint, to extractposeinvariant feature in a supervised fashion. Second, we further enhance the discriminative capability of the proposed feature by using multiple random faces as the target values for multiple encoders. By enforcing the target values to be uniquefor inputfaces over differentposes, the learned highlevel feature that is represented by the neurons in the hidden layer is pose free and only relevant to the identity information. Finally, we conduct face identification on CMU MultiPIE, and verification on Labeled Faces in the Wild (LFW) databases, where identification rank-1 accuracy and face verification accuracy with ROC curve are reported. These experiments demonstrate that our model is superior to oth- er state-of-the-art approaches on handling pose variations.
4 0.70996934 154 iccv-2013-Face Recognition via Archetype Hull Ranking
Author: Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
Abstract: The archetype hull model is playing an important role in large-scale data analytics and mining, but rarely applied to vision problems. In this paper, we migrate such a geometric model to address face recognition and verification together through proposing a unified archetype hull ranking framework. Upon a scalable graph characterized by a compact set of archetype exemplars whose convex hull encompasses most of the training images, the proposed framework explicitly captures the relevance between any query and the stored archetypes, yielding a rank vector over the archetype hull. The archetype hull ranking is then executed on every block of face images to generate a blockwise similarity measure that is achieved by comparing two different rank vectors with respect to the same archetype hull. After integrating blockwise similarity measurements with learned importance weights, we accomplish a sensible face similarity measure which can support robust and effective face recognition and verification. We evaluate the face similarity measure in terms of experiments performed on three benchmark face databases Multi-PIE, Pubfig83, and LFW, demonstrat- ing its performance superior to the state-of-the-arts.
5 0.68284774 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
Author: Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
Abstract: Age invariant face recognition has received increasing attention due to its great potential in real world applications. In spite of the great progress in face recognition techniques, reliably recognizingfaces across ages remains a difficult task. The facial appearance of a person changes substantially over time, resulting in significant intra-class variations. Hence, the key to tackle this problem is to separate the variation caused by aging from the person-specific features that are stable. Specifically, we propose a new method, calledHidden FactorAnalysis (HFA). This methodcaptures the intuition above through a probabilistic model with two latent factors: an identity factor that is age-invariant and an age factor affected by the aging process. Then, the observed appearance can be modeled as a combination of the components generated based on these factors. We also develop a learning algorithm that jointly estimates the latent factors and the model parameters using an EM procedure. Extensive experiments on two well-known public domain face aging datasets: MORPH (the largest public face aging database) and FGNET, clearly show that the proposed method achieves notable improvement over state-of-the-art algorithms.
6 0.67600864 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
7 0.67073411 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
8 0.65485263 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
9 0.63931155 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
10 0.62444228 153 iccv-2013-Face Recognition Using Face Patch Networks
11 0.62383181 261 iccv-2013-Markov Network-Based Unified Classifier for Face Identification
12 0.59834206 157 iccv-2013-Fast Face Detector Training Using Tailored Views
13 0.57675809 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
14 0.56480598 392 iccv-2013-Similarity Metric Learning for Face Recognition
15 0.56054139 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
16 0.55836374 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
17 0.526443 272 iccv-2013-Modifying the Memorability of Face Photographs
18 0.52100933 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings
19 0.51097524 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
20 0.48394492 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
topicId topicWeight
[(2, 0.058), (4, 0.057), (7, 0.021), (12, 0.021), (26, 0.077), (31, 0.051), (34, 0.02), (39, 0.138), (42, 0.149), (48, 0.067), (62, 0.011), (64, 0.047), (73, 0.02), (78, 0.022), (89, 0.14)]
simIndex simValue paperId paperTitle
same-paper 1 0.85943925 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
2 0.82927358 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
Author: Zhuoyuan Chen, Ying Wu
Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionaryfrom clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, , further analysis reveals the connection between our approach and the “partial” dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.
3 0.82021487 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
Author: Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
Abstract: Age invariant face recognition has received increasing attention due to its great potential in real world applications. In spite of the great progress in face recognition techniques, reliably recognizingfaces across ages remains a difficult task. The facial appearance of a person changes substantially over time, resulting in significant intra-class variations. Hence, the key to tackle this problem is to separate the variation caused by aging from the person-specific features that are stable. Specifically, we propose a new method, calledHidden FactorAnalysis (HFA). This methodcaptures the intuition above through a probabilistic model with two latent factors: an identity factor that is age-invariant and an age factor affected by the aging process. Then, the observed appearance can be modeled as a combination of the components generated based on these factors. We also develop a learning algorithm that jointly estimates the latent factors and the model parameters using an EM procedure. Extensive experiments on two well-known public domain face aging datasets: MORPH (the largest public face aging database) and FGNET, clearly show that the proposed method achieves notable improvement over state-of-the-art algorithms.
4 0.81199682 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
Author: Daniel Wesierski, Patrick Horain
Abstract: Elongated objects have various shapes and can shift, rotate, change scale, and be rigid or deform by flexing, articulating, and vibrating, with examples as varied as a glass bottle, a robotic arm, a surgical suture, a finger pair, a tram, and a guitar string. This generally makes tracking of poses of elongated objects very challenging. We describe a unified, configurable framework for tracking the pose of elongated objects, which move in the image plane and extend over the image region. Our method strives for simplicity, versatility, and efficiency. The object is decomposed into a chained assembly of segments of multiple parts that are arranged under a hierarchy of tailored spatio-temporal constraints. In this hierarchy, segments can rescale independently while their elasticity is controlled with global orientations and local distances. While the trend in tracking is to design complex, structure-free algorithms that update object appearance on- line, we show that our tracker, with the novel but remarkably simple, structured organization of parts with constant appearance, reaches or improves state-of-the-art performance. Most importantly, our model can be easily configured to track exact pose of arbitrary, elongated objects in the image plane. The tracker can run up to 100 fps on a desktop PC, yet the computation time scales linearly with the number of object parts. To our knowledge, this is the first approach to generic tracking of elongated objects.
5 0.81168914 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
Author: Masoud S. Nosrati, Shawn Andrews, Ghassan Hamarneh
Abstract: The inclusion of shape and appearance priors have proven useful for obtaining more accurate and plausible segmentations, especially for complex objects with multiple parts. In this paper, we augment the popular MumfordShah model to incorporate two important geometrical constraints, termed containment and detachment, between different regions with a specified minimum distance between their boundaries. Our method is able to handle multiple instances of multi-part objects defined by these geometrical hamarneh} @ s fu . ca (a)Standar laΩb ehlingΩfuhnctionseting(Ωb)hΩOuirseΩtijng Figure 1: The inside vs. outside ambiguity in (a) is resolved by our containment constraint in (b). constraints using a single labeling function while maintaining global optimality. We demonstrate the utility and advantages of these two constraints and show that the proposed convex continuous method is superior to other state-of-theart methods, including its discrete counterpart, in terms of memory usage, and metrication errors.
6 0.80793858 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
7 0.80765146 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
8 0.80620897 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
9 0.805305 206 iccv-2013-Hybrid Deep Learning for Face Verification
10 0.80425107 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
11 0.80229294 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
12 0.80193913 54 iccv-2013-Attribute Pivots for Guiding Relevance Feedback in Image Search
13 0.8018595 277 iccv-2013-Multi-channel Correlation Filters
14 0.80182409 44 iccv-2013-Adapting Classification Cascades to New Domains
15 0.80141777 180 iccv-2013-From Where and How to What We See
16 0.80056179 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
17 0.80049545 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
18 0.79960525 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
19 0.79954827 52 iccv-2013-Attribute Adaptation for Personalized Image Search
20 0.79882574 392 iccv-2013-Similarity Metric Learning for Face Recognition