cvpr cvpr2013 cvpr2013-152 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.
Reference: text
sentIndex sentText sentNum sentScore
1 edu/~lizhang/projects/face-parsing/ Abstract In this work, we propose an exemplar-based face image segmentation algorithm. [sent-5, score-0.341]
2 Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. [sent-7, score-0.768]
3 Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. [sent-8, score-1.076]
4 Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. [sent-9, score-0.715]
5 We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. [sent-10, score-0.624]
6 We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. [sent-11, score-0.664]
7 Introduction In face image analysis, one common task is to parse an in- put face image into facial parts, e. [sent-14, score-0.677]
8 Most previous methods accomplish this task by marking a few landmarks [1, 22] or a few contours [4, 18] on the input face image. [sent-17, score-0.425]
9 In this paper, we seek to mark each pixel on the face with its semantic part label; that is, our algorithm parses a face image into its constituent facial parts. [sent-18, score-0.724]
10 • Other than eye corners and mouth corners, most landmarks are not well-defined. [sent-20, score-0.459]
11 For example, it is unclear how many landmarks should be defined on the chinline, or how noses should be represented: should there be a line segment along the nose ridge, or a contour around the nostrils? [sent-21, score-0.327]
12 Our exemplar-based algorithm parses a face image into its constituent facial parts using a soft segmentation. [sent-25, score-0.519]
13 • • Contour-based representations are not general enough to model several facial parts useful for robust face analysis. [sent-26, score-0.426]
14 For example, teeth are important cues for analyzing open-mouth expressions; ears are important cues for analyzing profile faces; strands of hair are often confused by algorithms as occluders. [sent-27, score-0.36]
15 For example, the precise location of the tip of an eyebrow or the contour of a nose ridge are difficult to determine, even for human labelers. [sent-30, score-0.42]
16 Such uncertainty leads to errors in human labeled face data that are used in both the training and evaluation of algorithms. [sent-31, score-0.328]
17 Segment-based representations alleviate the aforementioned limitations: segments can represent any facial part, be they hair or teeth, and soft segmentation can model uncertain transitions between parts. [sent-32, score-0.535]
18 Although semantic segmentation for general scenes has received tremendous attention in recent years [7, 8, 12, 19], there has been relatively little attention given specifically to face part segmentation, with the exception of [15, 21]. [sent-33, score-0.341]
19 Since facial parts have special geometric configurations compared to general indoor and outdoor scenes, we propose an exemplar-based face image segmentation algorithm, taking inspiration from previous work in image parsing for general scenes. [sent-34, score-0.65]
20 Specifically, our approach assumes a database of face images, each of which is associated with a hand-labeled segmentation map and a set of sparse keypoint descriptors. [sent-35, score-0.466]
21 [1] to select m top exemplar images from the database as input. [sent-38, score-0.453]
22 Our algorithm then computes a nonrigid warp for each top exemplar; each nonrigid warp aligns the exemplar image to the test image by matching the set of sparse precomputed exemplar keypoints to the test image. [sent-39, score-1.4]
23 Finally, we propagate labels from the exemplar images to the test image in a pixelwise manner, using trained weights that modulate and combine label maps differently for each part type. [sent-40, score-0.715]
24 We evaluate our method on two challenging datasets [6, 9] and compare with two face parsing algorithms [15, 21] and a general scene parsing algorithm [12]. [sent-41, score-0.624]
25 We also compare our segmentation results with contour-based face alignment results: that is, we first run the alignment algorithms [4, 18, 22] to extract contour points and then derive segments from the contours. [sent-42, score-0.664]
26 , skin and eyebrow regions, we recover label probabilities at each pixel. [sent-51, score-0.542]
27 A learning algorithm for finding optimal parameters for calibrating exemplar label types. [sent-54, score-0.546]
28 To correct these biases, we train a set of label weights that adjust the relative importance of each label type. [sent-58, score-0.37]
29 Hair mattes are also included for future work in hair segmentation. [sent-63, score-0.338]
30 Second, because they produce only a binary classification, the component-specific segmentors do not generalize well to more complicated label interactions, such as those that exist between the inner mouth region, the lips, and the skin around the lips, for example. [sent-75, score-0.696]
31 Warrell and Prince argued that the scene parsing approach is advantageous because it is general enough to handle unconstrained face images, where the shape and appearance of features vary widely and relatively rare semantic label classes exist, such as moustaches and hats. [sent-78, score-0.613]
32 As part of their contribution, they introduced priors to loosely model the topological structure of face images (so that mouth labels do not appear in the forehead, for example). [sent-79, score-0.603]
33 However, the labels they generate are often coarse and inaccurate, especially for small face components like eyes and eyebrows. [sent-80, score-0.375]
34 We show in this work that our approach produces accurate, fine-scale label estimates in unconstrained face images. [sent-81, score-0.444]
35 This savings allows us to use a large set of top exemplar images for label transfer (in [12] they use m ≤ 9 top exemplar images; we use m =ns 1fe0r0 ()i,n nw [h1i2c]h t hise important i n9 our approach for two reasons. [sent-92, score-0.986]
36 To this end, we propose a training algorithm for estimat333444888533 ing a set of weights that convert label maps from exemplars to label probabilities on the test image. [sent-96, score-0.622]
37 We remark that soft segmentation is useful for future work on hair segmentation, among other applications. [sent-97, score-0.367]
38 Each pi encodes label uncertainty at the pixel level, which reflects the natural indistinctness of some facial features (e. [sent-108, score-0.346]
39 OEuarch d exemplar Mj choams four parts: an image, a l aarbsel { map, a very sparse set of facial landmark points, and a sparse set of SIFT [14] keypoint descriptors. [sent-118, score-0.65]
40 We use 12 landmark points: 2 mouth corners, 4 eye corners, 2 points on the eyebrows (each centered on the top edge), 2 points on the mouth (one on the top edge ofthe upper lip and one on the bottom edge ofthe bottom lip), 1point between the nostrils, and 1chin point. [sent-124, score-0.966]
41 Runtime Pre-processing Given a test image, we first use a face detector (i. [sent-130, score-0.333]
42 The test image is then rescaled so that the face has an IOD of approximately 55 pixels, which is the size of the exemplar faces. [sent-133, score-0.721]
43 To search for a subset of m top exemplar faces in the database, we use Belhumeur et al. [sent-135, score-0.482]
44 The output of the pre-processing is a set of m exemplars, each of which is associated with a similarity transformation that aligns the exemplar to the face in the test image. [sent-137, score-0.721]
45 Step 1: Nonrigid exemplar alignment For each keypoint in each of the top m exemplars, search within a small window in the test image to find the best match; record the matching score and the location offset of the best match for each keypoint. [sent-138, score-0.685]
46 Warp the label map of each top exemplar nonrigidly using a displacement field interpolated from the location offsets. [sent-139, score-0.616]
47 Step 2: Exemplar label map aggregation Aggregate warped label maps using weights derived from the keypoint matching scores in Step 1. [sent-140, score-0.485]
48 The weights are spatially varying among exemplar pixel locations and favor exemplar pixels near keypoints that are matched well with the test image. [sent-141, score-0.969]
49 Step 3: Pixel-wise label selection Produce a label probability vector at each pixel by first attenuating each channel in the aggregated label map and then normalizing it. [sent-142, score-0.565]
50 Step 1: Nonrigid Exemplar Alignment Due to local deformation, a similarity transformation is not sufficient to align an exemplar with the testing image. [sent-147, score-0.388]
51 The goal of Step 1is to refine the registration using a nonrigid warp between each top exemplar label map and the test image. [sent-148, score-0.825]
52 Therefore, for efficiency reasons, we instead rely on about 150 SIFT keypoints to compute the nonrigid warp between each exemplar and the test image. [sent-152, score-0.733]
53 333444888644 Our algorithm computes a nonrigid warp for each exemplar label map by interpolating the displacements {Δxf}fF=1, pwlaherr lae Fel ims tphe b ynu inmtebrepro olaft nSIgF tThe keypoints einn ttsh e{ exemplar. [sent-158, score-0.844]
54 Step 2: Exemplar Aggregation For each exemplar label map, we interpolate the matching scores r(Δxf) in Eq. [sent-162, score-0.546]
55 Now, )ea ∈ch [ nonrigidly warped exemplar label map is associated with a perpixel matching score map. [sent-165, score-0.619]
56 Near smaller regions, like the eyes, eyebrows, and lips, we observe that, if the aggregated label probabilities are incorrect, they tend to be incorrect in the direction of the larger surrounding regions, namely the face skin and background. [sent-172, score-0.626]
57 Assuming noise prevents perfect correspondences, “skin” label correspondences will occur more frequently than “eyebrow” label correspondence simply because there are many more skin labels than eyebrow labels. [sent-174, score-0.729]
58 A common symptom of this label bias is that estimated eyebrow regions (and other small regions of the face) tend to be too small. [sent-175, score-0.356]
59 We compensate for this bias by re-weighting each component of the aggregated label probability vector, and then renormalizing each pixel’s label probability vector afterward. [sent-176, score-0.398]
60 Given a tuning set with ground truth label probabilities, we find label component weights α = [α1 , α2 , . [sent-177, score-0.439]
61 After the label component weights have been found, we adjust each label probability vector. [sent-223, score-0.397]
62 Results and Discussion We have evaluated our method on two different datasets, and we show that it clearly improves upon a recent general scene parsing approach and existing face parsing approaches. [sent-227, score-0.624]
63 Additionally, we adapt a recent landmark localization method and two face alignment algorithms to produce segmentation results, and show that our method is more accurate. [sent-228, score-0.484]
64 Following their procedure to evaluate accuracy, we generated ground truth by annotating each face with contour points around each segment. [sent-235, score-0.431]
65 Our second (primary) dataset is Helen [9], which is composed of 2330 face images with densely-sampled, manuallyannotated contours around the eyes, eyebrows, nose, outer lips, inner lips, and jawline. [sent-236, score-0.388]
66 Our exemplar set was used for all experiments, including experiments on LFW images. [sent-239, score-0.388]
67 Our Helen tuning and test sets were formed by taking the first 330 images in the dataset; they include no subjects from the exemplar set. [sent-240, score-0.475]
68 For face skin, we used the jawline contour as the lower boundary; for the upper boundary, we separated the forehead from the hair by manually annotating forehead and hair scribbles and running an automatic matting algorithm [11] on each image. [sent-276, score-1.107]
69 Although we do not focus on hair segmentation in this work, we also recovered “ground truth” hair regions using this approach. [sent-277, score-0.587]
70 The hair mattes from [11] are usually accurate, but mistakes are inevitable. [sent-278, score-0.338]
71 Therefore, to ensure fair accuracy measurements, we manually annotated the face skin in all test images. [sent-279, score-0.453]
72 4 (we matched the LFW segment representation by grouping the Helen mouth components, and treating face skin as background). [sent-289, score-0.72]
73 [18], and Gu & Kanade [4] are face alignment methods. [sent-313, score-0.358]
74 (4) finds label weights that maximize the recall rates of eye, eyebrow, nose, and mouth pixels, which are relatively few and sensitive to errors, by sacrificing the recall rate of background pixels, which are numerous and insensitive to errors. [sent-322, score-0.521]
75 i1t02h814036s127evraloth face parsing and alignment methods [4, 18, 21, 22] in Table 1. [sent-329, score-0.527]
76 The inside of the mouth is not given as ground truth for the LFW images, and so we show only the entire mouth segment. [sent-333, score-0.583]
77 case, the “overall” measure is computed over eye, eyebrow, nose, inner mouth, upper lip, and lower lip segments; face skin is excluded in the overall measure, as it cannot be computed for Zhu & Ramanan, Saragih et al. [sent-339, score-0.591]
78 The difference is minimal and is primarily due to our algorithm incorrectly “hallucinates” skin in hair regions, while Liu et al. [sent-343, score-0.413]
79 Regardless, we see that our approach improves upon the segments generated by recent face alignment algorithms. [sent-357, score-0.421]
80 By comparing the fifth and sixth rows of Table 2, we observe that the local search and nonrigid exemplar alignment from Step 1 of our algorithm modestly improves the quantitative accuracy of our results. [sent-371, score-0.56]
81 We see a noticeable improvement from row six to row seven in Table 2, especially in the inner mouth region, due to the label weights. [sent-375, score-0.466]
82 In our view, the mouth is the most challenging region of the face to segment. [sent-376, score-0.563]
83 The shape and appearance of lips vary widely between subjects, mouths deform significantly, and the overall appearance of the mouth region changes depending on whether the inside of the mouth is visible or not. [sent-377, score-0.651]
84 Unusual mouth expressions, like the one shown in the rightmost column of Figure 4, are not represented well in the exemplar images, which results in poor label transfer from the top exemplars to the test image. [sent-378, score-1.067]
85 Our improvement over other algorithms demonstrates the advantages of using segments to parse face parts. [sent-380, score-0.349]
86 For example, the inside of the mouth is not well modeled using a classical contour based representation. [sent-381, score-0.393]
87 However, we can recover contours by treating contour points in the exemplars in almost the same way that we treat segment labels. [sent-411, score-0.427]
88 That is, in Step 1, we warp the contour point from each exemplar in the same way that we warp the exemplar label maps. [sent-412, score-1.262]
89 Specifically, each contour point is found by computing the weighted average location of the warped exemplar contour points; each weight j is given by the match scores in Rj closest to the contour point. [sent-415, score-0.765]
90 Hair Segmentation Several approaches for hair segmentation start by estimating a set of hair / not hair seed pixels in the image, and then refine the hair region using a matting algorithm ([16] is one example). [sent-417, score-1.16]
91 We can also generate seeds by counting the votes from hair / not hair labels from the top exemplars, and thresholding the counts. [sent-418, score-0.643]
92 Figure 6 shows seeds generating using this approach, and hair mattes computed from these seeds using [11]. [sent-419, score-0.428]
93 Face Image Reconstruction and Synthesis Examplarbased face image reconstruction/synthesis is applicable for various face image editing tasks, including grayscale image colorization [10] and automatic face image retouching [5]. [sent-420, score-0.89]
94 We can create a synthetic version of the input face by propagating color and intensity information from the exemplar images to the input image; this can be easily accomplished by replacing the label vectors with the color (or intensity) channels of the exemplar images. [sent-421, score-1.22]
95 We can recover accurate hair mattes in many cases (first two columns), but the procedure often fails on difficult cases (third column). [sent-427, score-0.37]
96 Wecansythesiz theinputfacebyreplacingthe xmplar label vectors with the color channels from the exemplar images. [sent-430, score-0.546]
97 Conclusion and Future Work In this paper, we have proposed an automatic face parsing technique that recovers a soft segment-based representation of the face, which naturally encodes the segment class uncertainty in the image. [sent-434, score-0.58]
98 Second, we proposed a learning algorithm for finding optimal label calibration weights, which remove biases between label types. [sent-436, score-0.35]
99 Third, we offer a new face segmentation dataset built as an extension of the recent Helen face dataset [9], which offers ground truth pixel-wise labels for face parts in high quality images. [sent-437, score-1.017]
100 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-491, score-0.366]
wordName wordTfidf (topN-words)
[('exemplar', 0.388), ('helen', 0.307), ('face', 0.286), ('mouth', 0.277), ('hair', 0.266), ('eyebrow', 0.198), ('exemplars', 0.171), ('parsing', 0.169), ('label', 0.158), ('skin', 0.12), ('lfw', 0.118), ('contour', 0.116), ('segmentors', 0.11), ('warp', 0.106), ('nose', 0.106), ('facial', 0.105), ('nonrigid', 0.1), ('lip', 0.097), ('lips', 0.097), ('keypoints', 0.092), ('eyebrows', 0.09), ('keypoint', 0.086), ('xf', 0.078), ('runtime', 0.077), ('saragih', 0.075), ('alignment', 0.072), ('eye', 0.072), ('mattes', 0.072), ('contours', 0.071), ('landmark', 0.071), ('landmarks', 0.068), ('confusion', 0.067), ('ck', 0.065), ('warrell', 0.065), ('segments', 0.063), ('teeth', 0.062), ('iod', 0.058), ('sift', 0.057), ('segmentation', 0.055), ('weights', 0.054), ('belhumeur', 0.053), ('forehead', 0.051), ('liu', 0.051), ('favorably', 0.05), ('eyes', 0.049), ('luo', 0.047), ('test', 0.047), ('parses', 0.047), ('prince', 0.047), ('soft', 0.046), ('seeds', 0.045), ('gu', 0.044), ('gdesc', 0.044), ('gspatial', 0.044), ('hallucinates', 0.044), ('labelfaces', 0.044), ('nonrigidly', 0.044), ('nostrils', 0.044), ('kanade', 0.043), ('corners', 0.042), ('uncertainty', 0.042), ('pi', 0.041), ('matting', 0.041), ('faces', 0.041), ('tuning', 0.04), ('labels', 0.04), ('database', 0.039), ('exemplifies', 0.039), ('zhu', 0.039), ('window', 0.039), ('segment', 0.037), ('attenuating', 0.036), ('aggregate', 0.036), ('parts', 0.035), ('probabilities', 0.034), ('biases', 0.034), ('colorization', 0.032), ('ears', 0.032), ('nonparametric', 0.032), ('recover', 0.032), ('maximize', 0.032), ('inner', 0.031), ('warping', 0.031), ('compares', 0.03), ('upper', 0.03), ('occur', 0.029), ('segmentations', 0.029), ('truth', 0.029), ('warped', 0.029), ('modulate', 0.028), ('brandt', 0.028), ('aggregated', 0.028), ('offset', 0.027), ('bel', 0.027), ('et', 0.027), ('probability', 0.027), ('top', 0.026), ('adobe', 0.026), ('correspondence', 0.026), ('judge', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 152 cvpr-2013-Exemplar-Based Face Parsing
Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.
2 0.41417533 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
3 0.24165717 415 cvpr-2013-Structured Face Hallucination
Author: Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang
Abstract: The goal of face hallucination is to generate highresolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we propose to exploit local image structures for face hallucination. Each face image is represented in terms of facial components, contours and smooth regions. The image structure is maintained via matching gradients in the reconstructed highresolution output. For facial components, we align input images to generate accurate exemplars and transfer the high-frequency details for preserving structural consistency. For contours, we learn statistical priors to generate salient structures in the high-resolution images. A patch matching method is utilized on the smooth regions where the image gradients are preserved. Experimental results demonstrate that the proposed algorithm generates hallucinated face images with favorable quality and adaptability.
4 0.2235458 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling
Author: Andrew Kae, Kihyuk Sohn, Honglak Lee, Erik Learned-Miller
Abstract: Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., superpixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.
5 0.19637004 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement
Author: Linjie Luo, Cha Zhang, Zhengyou Zhang, Szymon Rusinkiewicz
Abstract: We propose a novel algorithm to reconstruct the 3D geometry of human hairs in wide-baseline setups using strand-based refinement. The hair strands arefirst extracted in each 2D view, and projected onto the 3D visual hull for initialization. The 3D positions of these strands are then refined by optimizing an objective function that takes into account cross-view hair orientation consistency, the visual hull constraint and smoothness constraints defined at the strand, wisp and global levels. Based on the refined strands, the algorithm can reconstruct an approximate hair surface: experiments with synthetic hair models achieve an accuracy of ∼3mm. We also show real-world examples to demonsotfra ∼te3 mthme capability t soh capture full-head hamairp styles as mwoenll- as hair in motion with as few as 8 cameras.
6 0.19100611 438 cvpr-2013-Towards Pose Robust Face Recognition
7 0.18977971 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
8 0.18870531 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
10 0.18120486 172 cvpr-2013-Finding Group Interactions in Social Clutter
11 0.17102529 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
12 0.15527481 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
13 0.15248023 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
14 0.15046297 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
15 0.14425994 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
16 0.14225021 325 cvpr-2013-Part Discovery from Partial Correspondence
17 0.14161825 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
18 0.14141086 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
19 0.13230874 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences
20 0.12759896 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
topicId topicWeight
[(0, 0.227), (1, -0.055), (2, -0.032), (3, 0.004), (4, 0.131), (5, -0.005), (6, -0.036), (7, -0.064), (8, 0.259), (9, -0.227), (10, 0.185), (11, -0.02), (12, 0.14), (13, 0.137), (14, 0.054), (15, 0.002), (16, 0.086), (17, -0.058), (18, 0.013), (19, 0.094), (20, -0.027), (21, 0.0), (22, 0.067), (23, 0.021), (24, 0.079), (25, 0.021), (26, 0.137), (27, 0.002), (28, 0.006), (29, 0.092), (30, -0.023), (31, -0.009), (32, -0.015), (33, 0.074), (34, 0.018), (35, -0.109), (36, 0.013), (37, -0.084), (38, -0.021), (39, 0.023), (40, 0.066), (41, 0.002), (42, -0.036), (43, -0.014), (44, 0.087), (45, -0.121), (46, 0.007), (47, -0.068), (48, -0.086), (49, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.95325238 152 cvpr-2013-Exemplar-Based Face Parsing
Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.
2 0.83290893 415 cvpr-2013-Structured Face Hallucination
Author: Chih-Yuan Yang, Sifei Liu, Ming-Hsuan Yang
Abstract: The goal of face hallucination is to generate highresolution images with fidelity from low-resolution ones. In contrast to existing methods based on patch similarity or holistic constraints in the image space, we propose to exploit local image structures for face hallucination. Each face image is represented in terms of facial components, contours and smooth regions. The image structure is maintained via matching gradients in the reconstructed highresolution output. For facial components, we align input images to generate accurate exemplars and transfer the high-frequency details for preserving structural consistency. For contours, we learn statistical priors to generate salient structures in the high-resolution images. A patch matching method is utilized on the smooth regions where the image gradients are preserved. Experimental results demonstrate that the proposed algorithm generates hallucinated face images with favorable quality and adaptability.
3 0.814852 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
4 0.66471237 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
Abstract: Pose variation remains to be a major challenge for realworld face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-ofthe-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.
5 0.65767235 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
Author: Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen
Abstract: In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by sum-pooling the nonnegative sparse codes of position-free patches sampled within the block (resp. volume). Whitened Principal Component Analysis (WPCA) is further utilized to reduce the feature dimension, which leads to our Spatial Face Region Descriptor (SFRD) (resp. Spatial-Temporal Face Region Descriptor, STFRD) for images (resp. videos). Moreover, we develop a new distance method for face verification metric learning called Pairwise-constrained Multiple Metric Learning (PMML) to effectively integrate the face region descriptors of all blocks (resp. volumes) from an image (resp. a video). Our work achieves the state- of-the-art performances on two real-world datasets LFW and YouTube Faces (YTF) according to the restricted protocol.
7 0.64779544 438 cvpr-2013-Towards Pose Robust Face Recognition
8 0.63700622 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
9 0.63598233 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment
10 0.63296032 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
11 0.62257195 50 cvpr-2013-Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling
12 0.60566634 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
13 0.56382805 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models
15 0.55292892 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models
16 0.53132564 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement
17 0.52638328 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
18 0.52375144 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
19 0.51060939 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
20 0.50470835 96 cvpr-2013-Correlation Filters for Object Alignment
topicId topicWeight
[(10, 0.102), (16, 0.023), (19, 0.012), (26, 0.386), (33, 0.197), (67, 0.086), (69, 0.031), (87, 0.069)]
simIndex simValue paperId paperTitle
Author: Adrien Bartoli, Toby Collins
Abstract: It has been shown that a surface deforming isometrically can be reconstructed from a single image and a template 3D shape. Methods from the literature solve this problem efficiently. However, they all assume that the camera model is calibrated, which drastically limits their applicability. We propose (i) a general variational framework that applies to (calibrated and uncalibrated) general camera models and (ii) self-calibrating 3D reconstruction algorithms for the weak-perspective and full-perspective camera models. In the former case, our algorithm returns the normal field and camera ’s scale factor. In the latter case, our algorithm returns the normal field, depth and camera ’s focal length. Our algorithms are the first to achieve deformable 3D reconstruction including camera self-calibration. They apply to much more general setups than existing methods. Experimental results on simulated and real data show that our algorithms give results with the same level of accuracy as existing methods (which use the true focal length) on perspective images, and correctly find the normal field on affine images for which the existing methods fail.
2 0.84114206 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization
Author: Liang Li, Wei Feng, Liang Wan, Jiawan Zhang
Abstract: This paper addresses a challenging problem of regularizing arbitrary superpixels into an optimal grid structure, which may significantly extend current low-level vision algorithms by allowing them to use superpixels (SPs) conveniently as using pixels. For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i.e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid. For a given formation of image SPs and proper number of dummy nodes, we first dynamically align them into a grid based on the centroid localities of SPs. We then define the SP-grid coherence as the sum of edge weights, with SP locality and appearance encoded, along all direct paths connecting any two nearest neighboring real nodes in the grid. We finally maximize the SP-grid coherence via cascade dynamic programming. Our approach can take the regional objectness as an optional constraint to produce more semantically reliable SP-grids. Experiments on object localization show that our approach outperforms state-of-the-art methods in terms of both detection accuracy and speed. We also find that with the same searching strategy and features, object localization at SP-level is about 100-500 times faster than pixel-level, with usually better detection accuracy.
3 0.81743199 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation
Author: Jordi Pont-Tuset, Ferran Marques
Abstract: This paper tackles the supervised evaluation of image segmentation algorithms. First, it surveys and structures the measures used to compare the segmentation results with a ground truth database; and proposes a new measure: the precision-recall for objects and parts. To compare the goodness of these measures, it defines three quantitative meta-measures involving six state of the art segmentation methods. The meta-measures consist in assuming some plausible hypotheses about the results and assessing how well each measure reflects these hypotheses. As a conclusion, this paper proposes the precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation. We make the datasets and code of all the measures publicly available.
4 0.80894107 440 cvpr-2013-Tracking People and Their Objects
Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe
Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.
same-paper 5 0.78785646 152 cvpr-2013-Exemplar-Based Face Parsing
Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.
6 0.76317763 311 cvpr-2013-Occlusion Patterns for Object Class Detection
7 0.70175344 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
8 0.69914353 88 cvpr-2013-Compressible Motion Fields
9 0.66646826 21 cvpr-2013-A New Perspective on Uncalibrated Photometric Stereo
10 0.64845079 465 cvpr-2013-What Object Motion Reveals about Shape with Unknown BRDF and Lighting
11 0.6481365 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
12 0.64006048 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
13 0.63926834 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
14 0.63197112 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
15 0.62802953 96 cvpr-2013-Correlation Filters for Object Alignment
16 0.62789047 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
17 0.62554336 208 cvpr-2013-Hyperbolic Harmonic Mapping for Constrained Brain Surface Registration
18 0.62423462 414 cvpr-2013-Structure Preserving Object Tracking
19 0.62417978 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm
20 0.62330931 248 cvpr-2013-Learning Collections of Part Models for Object Recognition