iccv iccv2013 iccv2013-206 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. [sent-5, score-0.671]
2 A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. [sent-6, score-0.711]
3 The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. [sent-7, score-0.577]
4 These relational features are further processed through multiple layers to extract high-level and global features. [sent-8, score-0.243]
5 Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. [sent-9, score-0.295]
6 The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. [sent-10, score-0.168]
7 The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. [sent-11, score-0.643]
8 Our model achieves competitive face verification performance on the LFW dataset. [sent-12, score-0.282]
9 This paper addresses the key challenge of computing the similarity of two face images given their large intrapersonal variations in poses, illuminations, expressions, ages, makeups, and occlusions. [sent-15, score-0.207]
10 We focus on the task of face verification, which aims to determine whether two face images belong to the same identity. [sent-17, score-0.431]
11 are not directly related to face identity [5, 13]. [sent-28, score-0.264]
12 At the recognition stage, classifiers such as SVM are used to classify two face images as having the same identity or not [5, 24, 13], or other models are employed to compute the similarities of two face images [10, 22, 12, 6, 7, 25]. [sent-32, score-0.509]
13 To handle large-scale data with complex distributions, large amount of over-completed features may need to be extracted from the face [12, 7, 25]. [sent-35, score-0.263]
14 All of the issues discussed above motivate us to learn a hybrid deep network to compute face similarities. [sent-39, score-0.608]
15 (1) It directly learns visual features from raw pixels under the supervision of face identities. [sent-42, score-0.267]
16 Instead of extracting features from each face image separately, the model jointly extracts relational visual features from two face images in comparison. [sent-43, score-0.619]
17 The extracted features are effective for computing the identity similarities of face images. [sent-45, score-0.358]
18 (3) The deep and wide architecture of our hybrid network can handle large-scale face data with complex distributions. [sent-47, score-0.631]
19 The deep ConvNets in our network have four convolutional layers (followed by max-pooling) and two fully-connected layers. [sent-48, score-0.541]
20 In addition, multiple groups of ConvNets are constructed to achieve good robustness and characterize face similarities from different aspects. [sent-49, score-0.295]
21 The parameters of the entire pipeline (weights and biases in all the layers) are jointly optimized for the target of face verification. [sent-52, score-0.282]
22 Related work All existing methods for face verification start by extracting features from two faces in comparison separately. [sent-54, score-0.376]
23 Some methods generated midlevel features [24, 13] with variants of convolutional deep belief networks (CDBN) [19] or ConvNets [18]. [sent-56, score-0.433]
24 Thus variations other than identity are encoded in the features, such as poses, illumination, and expressions, which constitute the main impediment to face recognition. [sent-58, score-0.264]
25 Many face recognition models are shallow structures, and need high-dimensional over-completed feature representations to learn the complex mappings from pairs of noisy features to face similarities [12, 7, 25]; otherwise, the models may suffer from inferior performance. [sent-59, score-0.557]
26 [6, 7] factorized the face images as identity variations plus variations within the same identity, and assumed each factor as a Gaussian distribution for closed form solutions. [sent-63, score-0.264]
27 All these methods extract features from a single face separately, and the comparison of two face images are deferred in the later recognition stage. [sent-69, score-0.466]
28 There are a few methods that also used deep models for face verification [8, 24, 13], but extracted features independently from each face. [sent-72, score-0.517]
29 In [34], face images under various poses and lighting conditions were transformed to a canonical view with a convolutional neural network. [sent-74, score-0.392]
30 In contrast, we deal with face pairs directly by extracting relational visual features from the two compared faces. [sent-76, score-0.37]
31 The top layer RBM in our model is similar to that of the deep belief net (DBN) proposed by Hinton and Osindero [11]. [sent-77, score-0.295]
32 Averaging the results of multiple ConvNets has been shown to be an effective way of improving performance [9, 15], while we will show that our hybrid structure is significantly better than the simple averaging scheme. [sent-79, score-0.211]
33 Moreover, unlike most existing face recognition pipelines, in which each stage is optimized independently, our hybrid ConvNet-RBM model is jointly optimized after pre-training each part separately, which further enhances its performance. [sent-80, score-0.462]
34 Themapnumbers and dimensions of the input layer and all the convolutional and max-pooling layers are illustrated as the length, width, and height of cuboids. [sent-88, score-0.387]
35 The 3D convolution kernel sizes of the convolutional layers and the pooling region sizes of the max-pooling layers are shown as the small cuboids and squares inside the large cuboids of maps respectively. [sent-89, score-0.559]
36 Figure 2 is an overview of our hybrid ConvNet-RBM model, which is a cascade of deep ConvNet groups, two levels of average pooling, and Classification RBM. [sent-92, score-0.317]
37 The lower part of our hybrid model contains 12 groups, each of which contains five ConvNets. [sent-93, score-0.168]
38 Each ConvNet takes a pair of aligned face regions as input. [sent-95, score-0.272]
39 Its four convolutional layers (followed by max-pooling) extract the relational features hierarchically. [sent-96, score-0.41]
40 Finally, the extracted features pass a fully connected layer and are fully connected to a single neuron in layer L0 (shown in Figure 2), which indicates whether the two regions belong to the same person. [sent-97, score-0.34]
41 The input region pairs for ConvNets in different groups differ in terms of region ranges and color channels (shown in Figure 4) to make their predictions complementary. [sent-98, score-0.237]
42 When the size of the input regions changes in different groups, the map sizes in the following layers of the ConvNets will change accordingly. [sent-99, score-0.183]
43 Although ConvNets in the same group take the same kind of region pair as input, they are different in that they are trained with different bootstraps of the training data (Section 4. [sent-100, score-0.188]
44 Each input region pair generates eight modes by exchanging the two regions and horizontally flipping each region (shown in Figure 5). [sent-102, score-0.182]
45 When the eight modes (shown as M1-M8 in Figure 2) are input to the same × ×× Figure 4: Twelve face regions used in our network. [sent-103, score-0.335]
46 P5 - P12 are local regions covering different face parts, of size 31 47. [sent-106, score-0.254]
47 The group prediction is given by two levels of average × pooling of ConvNet predictions. [sent-115, score-0.18]
48 he L eight predictions o ×f t 1he2 same ConvNet from eight different input modes. [sent-117, score-0.165]
49 Layer L2 (with 12 neurons) is formed by averaging the five neurons in L1 associated with the same group. [sent-118, score-0.197]
50 The large number of deep ConvNets means that our model has a high capacity. [sent-123, score-0.179]
51 Deep ConvNets A pair of gray regions forms two input maps of a ConvNet (Figure 5), while a pair of color regions forms six 11449911 input maps, replacing each gray map with three maps from RGB channels. [sent-132, score-0.264]
52 The input regions are stacked into multiple maps instead of being concatenated to form one map, which enables the ConvNet to model the relations between the two regions from the first convolutional stage. [sent-133, score-0.343]
53 Our deep ConvNets contain four convolutional layers (followed by max-pooling). [sent-134, score-0.457]
54 The operation in each convolutional layer can be expressed as yjr= max? [sent-135, score-0.251]
55 Moreover, weights of neurons (including convolution kernels and biases) in the same map in higher convolutional layers are locally shared. [sent-141, score-0.472]
56 Since faces are structured objects, locally sharing weights in higher layers allows the network to learn different high-level features at different locations. [sent-143, score-0.387]
57 Since each stage extracts features from all the maps in the previous stage, relations between the two face regions are modeled; see Figure 6 for examples. [sent-148, score-0.408]
58 As the network goes deeper, more global and higher-level relations between the two regions are modeled. [sent-149, score-0.169]
59 These high-level relational features make it possible for the top layer neurons in ConvNets to predict the high-level concept of whether the two input regions come from the same person. [sent-150, score-0.38]
60 The upper and lower filters in each pair convolve with the two face regions in comparison, respectively, and the results are added. [sent-161, score-0.3]
61 For filter pairs in which one filter varies greatly while the other remains near uniform (column 1, 2), features are extracted from the two input regions separately. [sent-162, score-0.249]
62 For those pairs in which both filters vary greatly, some kind of relations between the two input regions are extracted. [sent-163, score-0.188]
63 This makes sense since face similarities should be invariant with the order of the two face regions in comparison. [sent-166, score-0.499]
64 Let {Ink}kK=1 be the K possible input tm froodmes hfoerm Ceodn by a pair {ofI fa}ce regions of group n. [sent-194, score-0.176]
65 Given the N group predictions {xn}nN=1, the final prediction by RBM is maxc∈{1,2} { {px(yc} | x)}, where p(yc | x) is defined in Eq. [sent-198, score-0.194]
66 Experiments We evaluate our algorithm on LFW [14], which has been used extensively to evaluate algorithms of face verification in the wild. [sent-212, score-0.282]
67 The 600 testing pairs in each fold are predefined by LFW and fixed, whereas training pairs can be generated using the identity information in the other nine folds and the number is not limited. [sent-225, score-0.25]
68 It contains 87, 628 face images of 5, 436 celebrities from the web, and was assembled by first collecting the celebrity names that do not exist in LFW to avoid any overlap, then searching for the face images for each name on the web. [sent-231, score-0.441]
69 For both settings, we randomly choose 80% people from the training data to train the deep ConvNets, and use the remaining 20% people to train the top-layer RBM and fine-tune the entire model. [sent-234, score-0.269]
70 The positive training pairs are randomly formed such that on average each face image appears in k = 6 (3) positive pairs for LFW (CelebFaces) dataset, unless a person does not have enough training images. [sent-235, score-0.403]
71 In this way, we generate approximately 40, 000 (240, 000) training pairs for the ConvNets and 8, 000 (50, 000) training pairs for the RBM and fine-tuning for LFW (CelebFaces) training dataset. [sent-238, score-0.244]
72 The average accuracy is defined as the percentage of correctly classified face pairs. [sent-245, score-0.207]
73 We assign each face pair to the class with higher probabilities without further learning a threshold for the final classification. [sent-246, score-0.225]
74 Our ConvNets locally share weights in the last two convolutional layers. [sent-250, score-0.236]
75 In the second last convolutional layer, maps are evenly divided into 2 2 regions, and weights are shared among neurons in e2ac ×h region. [sent-251, score-0.319]
76 11449933 Figure 7: Average training set failure rates with respect to the number of training epochs for ConvNets in group P1 with the local (S 1) or global (S2) weight-sharing schemes for the LFW and CelebFaces training settings. [sent-256, score-0.23]
77 (refer to as S 1) with the conventional ConvNets (refer to as S2), where weights in all the convolutional layers are globally shared, on both training errors and test accuracies. [sent-263, score-0.365]
78 The ConvNet group predictions are derived from two levels of average pooling as described in Section 3. [sent-267, score-0.224]
79 The increase of the RBM prediction accuracies would indicate that complementary information Figure 8: ConvNet prediction accuracies for each group averaged over the 10-fold LFW training settings. [sent-276, score-0.254]
80 Since different groups observe different kinds of regions, each group may be good at judging particular kinds of face pairs differently. [sent-287, score-0.393]
81 Continuing to average group predictions may smooth out the patterns in different group predictions. [sent-288, score-0.248]
82 Moreover, we find that the performance can be further enhanced by averaging five different hybrid ConvNet-RBM models. [sent-291, score-0.241]
83 This is achieved by first training five RBMs (each with a different set of randomly generated training data) with the weights of ConvNets pre-trained and fixed, and then fine-tuning each of the whole ConvNetRBM network separately. [sent-292, score-0.269]
84 Interestingly, though directly averaging the 12 group predictions (group averaging) is suboptimal, it 11449944 TableFRBGMr2ienos:dMtu-espAifulnacxvgielnurga grcioneugspofLtFh98eW1089. [sent-294, score-0.235]
85 We achieved our best results with the averaging of five hybrid ConvNet-RBM model predictions (model averaging). [sent-298, score-0.317]
86 Conclusion This paper has proposed a new hybrid ConvNet-RBM model for face verification. [sent-316, score-0.345]
87 The model learns directly and jointly extracts relational visual features from face pairs under the supervision of face identities. [sent-317, score-0.663]
88 Figure 10: ROC comparison of our hybrid ConvNet-RBM model and the state-of-the-art methods under the LFW unrestricted protocol. [sent-323, score-0.199]
89 tion and recognition stages are unified under a single deep network architecture and all the components are jointly optimized for the target of face verification. [sent-324, score-0.574]
90 Face description with local binary patterns: Application to face recognition. [sent-334, score-0.207]
91 11449955 Figure 11: ROC comparison of our hybrid ConvNet-RBM model and the state-of-the-art methods relying on outside training data. [sent-335, score-0.211]
92 Tom-vs-pete classifiers and [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] identity-preserving alignment for face verification. [sent-344, score-0.207]
93 Poof: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. [sent-350, score-0.24]
94 Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. [sent-375, score-0.207]
95 Learning a similarity metric discriminatively, with application to face verification. [sent-382, score-0.207]
96 Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. [sent-410, score-0.282]
97 Learning hierarchical representations for face verification with convo- lutional deep belief networks. [sent-417, score-0.493]
98 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-426, score-0.268]
99 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-466, score-0.233]
100 Beyond simple features: A large-scale feature search approach to unconstrained face recognition. [sent-499, score-0.207]
wordName wordTfidf (topN-words)
[('convnets', 0.622), ('rbm', 0.327), ('convnet', 0.315), ('lfw', 0.23), ('face', 0.207), ('celebfaces', 0.195), ('deep', 0.179), ('convolutional', 0.167), ('hybrid', 0.138), ('layers', 0.111), ('neurons', 0.094), ('group', 0.086), ('layer', 0.084), ('network', 0.084), ('relational', 0.08), ('predictions', 0.076), ('verification', 0.075), ('averaging', 0.073), ('pooling', 0.062), ('faces', 0.061), ('unrestricted', 0.061), ('identity', 0.057), ('neuron', 0.052), ('pairs', 0.05), ('groups', 0.05), ('training', 0.048), ('regions', 0.047), ('stage', 0.04), ('weights', 0.039), ('yt', 0.038), ('similarities', 0.038), ('relations', 0.038), ('jointly', 0.035), ('outputs', 0.034), ('features', 0.033), ('prediction', 0.032), ('belief', 0.032), ('roc', 0.032), ('eight', 0.032), ('wdref', 0.031), ('xln', 0.031), ('extraction', 0.031), ('convolution', 0.031), ('validation', 0.031), ('locally', 0.03), ('five', 0.03), ('simonyan', 0.029), ('sharing', 0.029), ('accuracies', 0.028), ('lbp', 0.028), ('exclusive', 0.028), ('yc', 0.028), ('filters', 0.028), ('celebrity', 0.027), ('rbms', 0.027), ('supervision', 0.027), ('kong', 0.027), ('cuhk', 0.026), ('abilities', 0.026), ('folds', 0.026), ('berg', 0.026), ('input', 0.025), ('stages', 0.025), ('outside', 0.025), ('extracts', 0.024), ('pubfig', 0.024), ('filter', 0.024), ('modes', 0.024), ('separately', 0.024), ('gray', 0.023), ('greatly', 0.023), ('extracted', 0.023), ('architecture', 0.023), ('lost', 0.022), ('chinese', 0.022), ('networks', 0.022), ('beside', 0.022), ('boltzmann', 0.022), ('shallow', 0.022), ('optimized', 0.021), ('mn', 0.021), ('people', 0.021), ('generalization', 0.02), ('whole', 0.02), ('cuboids', 0.02), ('settings', 0.02), ('cao', 0.02), ('hinton', 0.02), ('extract', 0.019), ('mouth', 0.019), ('fold', 0.019), ('maps', 0.019), ('logp', 0.019), ('biases', 0.019), ('huang', 0.018), ('trained', 0.018), ('pair', 0.018), ('neural', 0.018), ('region', 0.018), ('whether', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 206 iccv-2013-Hybrid Deep Learning for Face Verification
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
2 0.19563575 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
Author: Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Abstract: One of the most challenging task in face recognition is to identify people with varied poses. Namely, the test faces have significantly different poses compared with the registered faces. In this paper, we propose a high-level feature learning scheme to extract pose-invariant identity feature for face recognition. First, we build a single-hiddenlayer neural network with sparse constraint, to extractposeinvariant feature in a supervised fashion. Second, we further enhance the discriminative capability of the proposed feature by using multiple random faces as the target values for multiple encoders. By enforcing the target values to be uniquefor inputfaces over differentposes, the learned highlevel feature that is represented by the neurons in the hidden layer is pose free and only relevant to the identity information. Finally, we conduct face identification on CMU MultiPIE, and verification on Labeled Faces in the Wild (LFW) databases, where identification rank-1 accuracy and face verification accuracy with ROC curve are reported. These experiments demonstrate that our model is superior to oth- er state-of-the-art approaches on handling pose variations.
3 0.18087679 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
Author: Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
Abstract: Cascaded classifiers1 have been widely used in pedestrian detection and achieved great success. These classifiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of backpropagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a specific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classifier handles samples at a different difficulty level. Unsupervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental results show that the training strategy helps to avoid overfitting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches.
4 0.1807085 392 iccv-2013-Similarity Metric Learning for Face Recognition
Author: Qiong Cao, Yiming Ying, Peng Li
Abstract: Recently, there is a considerable amount of efforts devoted to the problem of unconstrained face verification, where the task is to predict whether pairs of images are from the same person or not. This problem is challenging and difficult due to the large variations in face images. In this paper, we develop a novel regularization framework to learn similarity metrics for unconstrained face verification. We formulate its objective function by incorporating the robustness to the large intra-personal variations and the discriminative power of novel similarity metrics. In addition, our formulation is a convex optimization problem which guarantees the existence of its global solution. Experiments show that our proposed method achieves the state-of-the-art results on the challenging Labeled Faces in the Wild (LFW) database [10].
5 0.17474712 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
6 0.14999044 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
7 0.13461831 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
8 0.13210121 157 iccv-2013-Fast Face Detector Training Using Tailored Views
9 0.12953272 153 iccv-2013-Face Recognition Using Face Patch Networks
10 0.12930734 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
11 0.11669069 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
12 0.10369006 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
13 0.10008313 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
14 0.094135188 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
15 0.093588345 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
16 0.092659444 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild
17 0.089778826 321 iccv-2013-Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model
18 0.089246154 444 iccv-2013-Viewing Real-World Faces in 3D
19 0.085449234 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
20 0.085237674 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
topicId topicWeight
[(0, 0.152), (1, 0.056), (2, -0.066), (3, -0.108), (4, 0.001), (5, -0.097), (6, 0.175), (7, 0.093), (8, 0.011), (9, -0.038), (10, -0.026), (11, 0.022), (12, 0.076), (13, -0.032), (14, -0.021), (15, 0.005), (16, -0.03), (17, 0.025), (18, 0.028), (19, 0.128), (20, -0.059), (21, -0.072), (22, 0.047), (23, -0.091), (24, -0.109), (25, 0.027), (26, -0.03), (27, 0.047), (28, -0.059), (29, 0.234), (30, -0.02), (31, 0.011), (32, 0.081), (33, -0.052), (34, -0.044), (35, -0.028), (36, -0.083), (37, 0.022), (38, 0.035), (39, 0.01), (40, -0.045), (41, -0.03), (42, 0.007), (43, -0.051), (44, -0.026), (45, 0.039), (46, 0.019), (47, 0.018), (48, -0.071), (49, 0.103)]
simIndex simValue paperId paperTitle
same-paper 1 0.93191838 206 iccv-2013-Hybrid Deep Learning for Face Verification
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
2 0.85181314 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
3 0.71078932 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
Author: Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Abstract: One of the most challenging task in face recognition is to identify people with varied poses. Namely, the test faces have significantly different poses compared with the registered faces. In this paper, we propose a high-level feature learning scheme to extract pose-invariant identity feature for face recognition. First, we build a single-hiddenlayer neural network with sparse constraint, to extractposeinvariant feature in a supervised fashion. Second, we further enhance the discriminative capability of the proposed feature by using multiple random faces as the target values for multiple encoders. By enforcing the target values to be uniquefor inputfaces over differentposes, the learned highlevel feature that is represented by the neurons in the hidden layer is pose free and only relevant to the identity information. Finally, we conduct face identification on CMU MultiPIE, and verification on Labeled Faces in the Wild (LFW) databases, where identification rank-1 accuracy and face verification accuracy with ROC curve are reported. These experiments demonstrate that our model is superior to oth- er state-of-the-art approaches on handling pose variations.
4 0.69178987 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
Author: Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestrians can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset1 that includes 3, 673 annotated samples collected from 171 surveillance videos. It is 20 times larger than existing public datasets.
5 0.67114204 154 iccv-2013-Face Recognition via Archetype Hull Ranking
Author: Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
Abstract: The archetype hull model is playing an important role in large-scale data analytics and mining, but rarely applied to vision problems. In this paper, we migrate such a geometric model to address face recognition and verification together through proposing a unified archetype hull ranking framework. Upon a scalable graph characterized by a compact set of archetype exemplars whose convex hull encompasses most of the training images, the proposed framework explicitly captures the relevance between any query and the stored archetypes, yielding a rank vector over the archetype hull. The archetype hull ranking is then executed on every block of face images to generate a blockwise similarity measure that is achieved by comparing two different rank vectors with respect to the same archetype hull. After integrating blockwise similarity measurements with learned importance weights, we accomplish a sensible face similarity measure which can support robust and effective face recognition and verification. We evaluate the face similarity measure in terms of experiments performed on three benchmark face databases Multi-PIE, Pubfig83, and LFW, demonstrat- ing its performance superior to the state-of-the-arts.
6 0.65660793 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
7 0.65007371 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
8 0.63859427 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
9 0.62880468 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
10 0.61285537 153 iccv-2013-Face Recognition Using Face Patch Networks
11 0.59052122 157 iccv-2013-Fast Face Detector Training Using Tailored Views
12 0.56864667 392 iccv-2013-Similarity Metric Learning for Face Recognition
13 0.5616498 261 iccv-2013-Markov Network-Based Unified Classifier for Face Identification
14 0.54673612 272 iccv-2013-Modifying the Memorability of Face Photographs
15 0.54284775 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
16 0.53014326 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
17 0.52688849 356 iccv-2013-Robust Feature Set Matching for Partial Face Recognition
18 0.51005405 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
19 0.50433606 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
20 0.50430626 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
topicId topicWeight
[(2, 0.064), (4, 0.019), (7, 0.017), (26, 0.096), (31, 0.041), (34, 0.011), (42, 0.154), (48, 0.055), (64, 0.043), (73, 0.032), (78, 0.021), (79, 0.168), (89, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.8572275 206 iccv-2013-Hybrid Deep Learning for Face Verification
Author: Yi Sun, Xiaogang Wang, Xiaoou Tang
Abstract: This paper proposes a hybrid convolutional network (ConvNet)-Restricted Boltzmann Machine (RBM) model for face verification in wild conditions. A key contribution of this work is to directly learn relational visual features, which indicate identity similarities, from raw pixels of face pairs with a hybrid deep network. The deep ConvNets in our model mimic the primary visual cortex to jointly extract local relational visual features from two face images compared with the learned filter pairs. These relational features are further processed through multiple layers to extract high-level and global features. Multiple groups of ConvNets are constructed in order to achieve robustness and characterize face similarities from different aspects. The top-layerRBMperforms inferencefrom complementary high-level features extracted from different ConvNet groups with a two-level average pooling hierarchy. The entire hybrid deep network is jointly fine-tuned to optimize for the task of face verification. Our model achieves competitive face verification performance on the LFW dataset.
2 0.85672057 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
Author: Etienne Huot, Giuseppe Papari, Isabelle Herlin
Abstract: This paper describes modeling and numerical computation of orthogonal bases, which are used to describe images and motion fields. Motion estimation from image data is then studied on subspaces spanned by these bases. A reduced model is obtained as the Galerkin projection on these subspaces of a physical model, based on Euler and optical flow equations. A data assimilation method is studied, which assimilates coefficients of image data in the reduced model in order to estimate motion coefficients. The approach is first quantified on synthetic data: it demonstrates the interest of model reduction as a compromise between results quality and computational cost. Results obtained on real data are then displayed so as to illustrate the method.
3 0.81005585 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
Author: Chenglong Bao, Jian-Feng Cai, Hui Ji
Abstract: In recent years, how to learn a dictionary from input images for sparse modelling has been one very active topic in image processing and recognition. Most existing dictionary learning methods consider an over-complete dictionary, e.g. the K-SVD method. Often they require solving some minimization problem that is very challenging in terms of computational feasibility and efficiency. However, if the correlations among dictionary atoms are not well constrained, the redundancy of the dictionary does not necessarily improve the performance of sparse coding. This paper proposed a fast orthogonal dictionary learning method for sparse image representation. With comparable performance on several image restoration tasks, the proposed method is much more computationally efficient than the over-complete dictionary based learning methods.
4 0.80964291 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
Author: Zhuoyuan Chen, Ying Wu
Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corruption in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictionaryfrom clean data, this paper is targeted at handling corruptions and outliers in training data for dictionary learning. We propose a general method to decompose the reconstructive residual into two components: a non-sparse component for small universal noises and a sparse component for large outliers, respectively. In addition, , further analysis reveals the connection between our approach and the “partial” dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.
5 0.79955769 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
6 0.79927123 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
7 0.79726601 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
8 0.79592329 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
9 0.79450363 44 iccv-2013-Adapting Classification Cascades to New Domains
10 0.79436201 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
11 0.79359871 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
12 0.79304838 277 iccv-2013-Multi-channel Correlation Filters
13 0.79288232 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
14 0.79109395 150 iccv-2013-Exemplar Cut
15 0.7906599 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
16 0.79032105 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
17 0.79016268 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
18 0.78994501 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
19 0.78950316 54 iccv-2013-Attribute Pivots for Guiding Relevance Feedback in Image Search
20 0.78855854 180 iccv-2013-From Where and How to What We See