cvpr cvpr2013 cvpr2013-160 knowledge-graph by maker-knowledge-mining

160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification


Source: pdf

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu z , , Abstract This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. [sent-8, score-1.958]

2 1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. [sent-12, score-1.125]

3 We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. [sent-13, score-0.427]

4 More importantly, our method excels at rejecting unknown identities by at least 8% in average precision. [sent-15, score-0.23]

5 As video search sites like YouTube have grown, video content-based search has become increasingly necessary. [sent-20, score-0.243]

6 For example, a capable retrieval system should return all a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. [sent-21, score-1.389]

7 The main drawback is the availability of annotated video face tracks. [sent-24, score-0.535]

8 This avenue is one little exploited by video face recognition. [sent-26, score-0.535]

9 Existing video face recognition methods tend to perform classification on a frame-by-frame basis and later combining those predictions using an appropriate metric. [sent-29, score-0.591]

10 In contrast, we propose a novel method, Mean Sequence Sparse Representation-based Classification (MSSRC), that performs a joint optimization over all faces in the track at once. [sent-32, score-0.324]

11 Finally, we perform face recognition using our novel algorithm MSSRC with an input face track and dictionary of still images. [sent-35, score-1.173]

12 1-minimization over the mean face track, thus reducing a many classification problem to one with inherent computational and practical benefits. [sent-37, score-0.426]

13 Our proposed method aims to perform video face recognition across domains, leveraging thousands of labeled, still images gathered from the Internet, specif- ically the PubFig and LFW datasets, to perform face recognition on real-world, unconstrained videos. [sent-38, score-1.132]

14 To do this we collected 101 movie trailers from YouTube and automatically extracted and tracked faces in the video to create a dataset for video face recognition (http : / /vfr . [sent-39, score-1.259]

15 we identify famous actors appearing in movie trailers while rejecting background faces that represent unknown extras. [sent-44, score-0.712]

16 We show our method outperforms existing methods in precision and recall, exhibiting the ability to better reject unknown or uncertain identities. [sent-45, score-0.19]

17 The contributions of this paper are summarized as follows: (1) We develop a fully automatic end-to-end system for video face recognition, which includes face tracking and recognition leveraging information from both still images for the known dictionary and video for recognition. [sent-46, score-1.27]

18 (2) We propose a novel algorithm, MSSRC, that performs video face recognition using an optimization leveraging all of the available video data. [sent-47, score-0.714]

19 The rest of this paper is organized as follows: Section 2 discusses the related work on video face recognition. [sent-49, score-0.535]

20 Then Section 3 describes our entire framework for video face recognition from tracking to recognition. [sent-50, score-0.638]

21 Related Work For a complete survey of video-based face recognition refer to [18]; here we focus on an overview of the most related methods. [sent-55, score-0.46]

22 Current video face recognition techniques fall into one of three categories: key-frame based, temporal model based, and image-set matching based. [sent-56, score-0.595]

23 Key-frame based methods generally perform a prediction on the identity of each key-frame in a face track followed by a probabilistic fusion or majority voting to select the best match. [sent-57, score-0.719]

24 They learn a model over this dictionary by learning key faces via clustering. [sent-61, score-0.159]

25 These cluster centers are compared to test frames using a nearest-neighbor search followed by majority, probabilistic voting to make a final prediction. [sent-62, score-0.113]

26 333555333200 Temporal model based methods learn the temporal, fa- cial dynamics of the face throughout a video. [sent-64, score-0.426]

27 Image-set matching based methods allows the modeling ofa face track as an image-set. [sent-72, score-0.652]

28 Many methods, like [24], perform a mutual subspace distance where each face track is modeled in their own subspace from which a distance is computed between each. [sent-73, score-0.652]

29 They are effective with clean data, but these methods are very sensitive to the variations inherent in video face tracks. [sent-74, score-0.535]

30 Other methods take a more statistical approach, like [5], which used Logistic Discriminantbased Metric Learning (LDML) to learn a relationship between images in face tracks, where the inter-class distances are maximized. [sent-75, score-0.426]

31 LDML is very computationally expensive and focuses more on learning relationships within the data, whereas we directly relate the test track to the training data. [sent-76, score-0.276]

32 Another [3] used a small user selected sample of characters in the given movie to do a pixel-wise Euclidean distance to handle oc- clusion. [sent-81, score-0.313]

33 While character recognition is suitable for a long-running series, the use of clothing and other contextual clues are not helpful in the task of identifying actors between movies, TV shows, or non-related video clips. [sent-83, score-0.212]

34 The key concept is enforcing sparsity, since a test face can be reconstructed best from a small subset of the large dictionary, i. [sent-87, score-0.452]

35 1-minimization is known to be computationally expensive, thus we propose a constrained optimization with the knowledge that the images within a face track are of the same person. [sent-92, score-0.652]

36 Video Face Recognition Pipeline In this section, we describe our end-to-end video face recognition system. [sent-96, score-0.569]

37 First, we detail our algorithm for face tracking based on face detections from video. [sent-97, score-0.95]

38 Finally, we derive our optimization for video face recognition that classifies a video face track based on a dictionary of still images. [sent-99, score-1.391]

39 Face Tracking Our method performs the difficult task of face tracking based on face detections extracted using the highperformance SHORE face detection system [15] and generates a face track based on two metrics. [sent-102, score-2.028]

40 To associate a new detection to an existing track, our first metric determines the ratio of the maximum sized bounding box encompassing both face detections to the size of the larger bounding box of the two detections. [sent-103, score-0.477]

41 The second tracking metric takes into account the appearance information via a local color histogram ofthe face. [sent-107, score-0.114]

42 We compute the distance as a ratio of the histogram intersection of the RGB histograms with 30 bins per channel of the last face of a track and the current detection to the total summation of the histogram bins: n dappearance n = ? [sent-108, score-0.698]

43 We compare each new face detection to existing tracks; if the location and appearance metric is similar, the face is added to the track, otherwise a new track is created. [sent-111, score-1.1]

44 Input: Training gallery A, test face track Y = [y1, y2 , . [sent-118, score-0.702]

45 Gabor wavelets were extracted with one scale λ = 4 at four orientations = {0◦, 45◦ , 90◦ , 135◦} with a tight face crop at a resolution of 25x30 pixels. [sent-146, score-0.452]

46 The leading principle of our method is that all of the images y from the face track Y = [y1, y2, . [sent-161, score-0.652]

47 Because all images in a face track belong to the same person, one would expect a high degree of correlation amongst the sparse coefficient vectors xj∀j ∈ [1. [sent-165, score-0.738]

48 In fact, with sufficient similarity between the faces in a track, one might expect nearly the same coefficient vector to be recovered for each frame. [sent-170, score-0.15]

49 This conclusion, that enforcing a single, consistent coefficient vector x across all images in a face track Y is equivalent to a single ? [sent-232, score-0.704]

50 1-minimization over the average of all the frames in the face track, is key to keeping our approach robust yet fast. [sent-233, score-0.473]

51 1minimization on the mean of the face track, which is not only a significant speed up, but theoretically sound. [sent-236, score-0.426]

52 Finally, we classify the average test track ¯y by determining the class of training samples that best reconstructs the face from the recovered coefficients: I(¯ y) = mjinrj( y¯) = min ? [sent-238, score-0.678]

53 2, (8) where the label I(¯ y) of the test face track is the minimal residual or reconstruction error rj ( y¯) and xj is the recovered coefficients from the global solution ˜x ? [sent-240, score-0.701]

54 1− 1∈ [0,1], (9) ranging from 0 (the test face is represented equally by all classes) to 1 (the test face is fully represented by one class). [sent-246, score-0.904]

55 The YouTube Celebrities Dataset [14] has unconstrained videos from YouTube, however they are very low quality and only contain 3 unique videos per person, which they segment. [sent-249, score-0.159]

56 The YouTube Faces Dataset [22] and Buffy Dataset [5] also exhibit more challenging scenarios than traditional video face recognition datasets, however YouTube Faces is geared towards face mr TkeboNfsruca64280 0 25 07510 1251 017520 Classes Figure 3. [sent-250, score-0.995]

57 The distribution of face tracks across the identities in PubFig+10. [sent-251, score-0.598]

58 not same, and Buffy only contains 8 actors; thus, both are ill-suited for the large-scale face identification of our proposed video retrieval framework. [sent-253, score-0.6]

59 We built our Movie Trailer Face Dataset using 101 movie trailers from YouTube from the 2010 release year that contained celebrities present in the supplemented PublicFig+10 dataset. [sent-254, score-0.611]

60 These videos were then processed to generate face tracks using the method described above. [sent-255, score-0.568]

61 The resulting dataset contains 4,485 face tracks, 65% consisting of unknown identities (not present in PubFig+10) and 35% known. [sent-256, score-0.578]

62 3 with the number of face tracks per celebrity in the movie trailers ranging from 5 to 60 labeled samples. [sent-258, score-0.981]

63 The fact that half of the public figures do not appear in any of the movie trailers presents an interesting test scenario in which the algorithm must be able to distinguish the subject ofinterest from within a large pool of potential identities. [sent-259, score-0.453]

64 Then, we evaluate our video face recognition method on three existing datasets, YouTube Faces, YouTube Celebrities, Buffy. [sent-262, score-0.569]

65 Tracking Results To analyze the quality of our automatically generated face tracks, we ground-truthed five movie trailers from the dataset: ‘The Killer Inside’, ‘My Name is Khan’, ‘Biutiful’, ‘Eat, Pray, Love’, and ‘The Dry Land’ . [sent-266, score-0.853]

66 Results for top performing video face verification algorithm MBGS and our competitive method MSSRC. [sent-306, score-0.613]

67 Although our goal is not to solve the tracking problem, in Table 1 we show our results compared to a standard face tracking method. [sent-308, score-0.564]

68 The first column shows a KLT-based method [8], where the face detections are associated based on a ratio of overlapping tracked features, and the second shows our method. [sent-309, score-0.483]

69 YouTube Faces Dataset Although face identification is the focus of our paper, we evaluated our method on the YouTube Faces Dataset [22] for face verification (same/not same), to show that our method can also work in this context. [sent-315, score-0.971]

70 To the best of our knowledge, there is only one paper [9], that has done face verification using SRC, however it was not in the context of video face recognition, but that of still images from LFW. [sent-316, score-1.015]

71 YouTube Celebrities Dataset The YouTube Celebrities Dataset [14] consists of 47 celebrities (actors and politicians) in 1910 video clips downloaded from YouTube and manually segmented to the portions where the celebrity of interest appears. [sent-336, score-0.335]

72 There are approximately 41 clips per person segmented from 3 unique videos per actor. [sent-337, score-0.126]

73 Buffy Dataset The Buffy Dataset consists of 639 manually annotated face tracks extracted from episodes 9, 21, and 45 from different seasons of the TV series “Buffy the Vampire Slayer”. [sent-344, score-0.522]

74 Movie Trailer Face Dataset In this section, we present results on our unconstrained Movie Trailer Face Dataset that allows us to test larger scale face identification, as well as each algorithms ability to reject unknown identities. [sent-357, score-0.626]

75 In our test scenario, we chose the Public Figures (PF) [16] dataset as our training gallery, supplemented by images collected of 10 actors and actresses from web searches for additional coverage of face tracks extracted from movie trailers. [sent-358, score-0.958]

76 The distribution of face tracks across all of the identities in the PubFig+10 dataset are shown in Fig. [sent-360, score-0.626]

77 In total, PubFig+10 consists of 34,522 images and our Movie Trailer Face Dataset has 4,485 face tracks, which we use to conduct experiments on several algorithms. [sent-362, score-0.426]

78 For the experiments with NN, LDML, SVM, L2, and SRC, we test each individual frame of the face track and predict its final identity via probabilistic voting and its confidence is an average over the predicted distances or decision values. [sent-366, score-0.778]

79 The confidence values are used to reject predictions to evaluate the precision and recall of the system. [sent-367, score-0.186]

80 4, the SRC based methods reject unknown identities better than the others. [sent-380, score-0.183]

81 Instead of computing SRC on each frame, which takes approximately 45 minutes per track, we reduce a face track to a single feature vector for ? [sent-382, score-0.652]

82 To answer this question we select the first m frames for each track and test the two best performing methods from the previous experiments: MSSRC and SVM. [sent-393, score-0.323]

83 5 shows that at just after 20 frames performance plateaus, which is close to the average track length of 22 frames. [sent-395, score-0.273]

84 Most importantly, the results show that using multiple frames is beneficial since moving from using 1 frame to 20 frames results in a 5. [sent-396, score-0.127]

85 03% increase in average precision and recall at 90% precision respectively for MSSRC. [sent-398, score-0.162]

86 We see that performance levels out at about 20 frames (close to the average track length). [sent-401, score-0.273]

87 Conclusions and Future Work In this paper we have presented a fully automatic endto-end system for video face recognition, which includes face tracking and identification leveraging information from both still images for the known dictionary and video for recognition. [sent-406, score-1.301]

88 We propose a novel algorithm Mean Sequence SRC, MSSRC, that performs a joint optimization using all of the available image data to perform video face recognition. [sent-407, score-0.535]

89 We finally showed that our method outperforms the state-of-the-art on real-world, unconstrained videos in our new Movie Trailer Face Dataset. [sent-408, score-0.113]

90 Furthermore, we showed our method especially excels at rejecting unknown identities outperforming the next best method in terms of average precision by 8%. [sent-409, score-0.287]

91 Video face recognition presents a very compelling area of research with difficulties unseen in stillimage recognition. [sent-410, score-0.46]

92 Face description [2] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] with local binary patterns: Application to face recognition. [sent-419, score-0.426]

93 Enhancing face recognition from video sequences using robust statistics. [sent-427, score-0.569]

94 Unsupervised metric learning for face identification in TV video. [sent-434, score-0.513]

95 From still image to videobased face recognition: an experimental analysis. [sent-465, score-0.426]

96 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-480, score-0.625]

97 Face detection and tracking in video sequences using the modified census transformation. [sent-503, score-0.178]

98 Describable visual attributes for face verification and image search. [sent-510, score-0.48]

99 Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. [sent-515, score-0.426]

100 Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. [sent-546, score-0.527]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mssrc', 0.451), ('face', 0.426), ('movie', 0.281), ('trailer', 0.267), ('track', 0.226), ('youtube', 0.209), ('src', 0.18), ('celebrities', 0.152), ('trailers', 0.146), ('ax', 0.137), ('buffy', 0.122), ('ym', 0.113), ('video', 0.109), ('pubfig', 0.101), ('faces', 0.098), ('tracks', 0.096), ('ldml', 0.091), ('identities', 0.076), ('rejecting', 0.07), ('actors', 0.069), ('tracking', 0.069), ('unconstrained', 0.067), ('identification', 0.065), ('dictionary', 0.061), ('reject', 0.059), ('mota', 0.058), ('precision', 0.057), ('verification', 0.054), ('coefficient', 0.052), ('mxin', 0.051), ('recall', 0.048), ('gabor', 0.048), ('unknown', 0.048), ('frames', 0.047), ('videos', 0.046), ('mbgs', 0.045), ('methodaccuracy', 0.044), ('hadid', 0.044), ('clips', 0.042), ('ajxj', 0.041), ('argmxinm', 0.041), ('biutiful', 0.041), ('discriminantbased', 0.041), ('dry', 0.041), ('lmdl', 0.041), ('pray', 0.041), ('voting', 0.04), ('person', 0.038), ('shore', 0.036), ('excels', 0.036), ('knock', 0.036), ('love', 0.036), ('leveraging', 0.036), ('amongst', 0.034), ('recognition', 0.034), ('sci', 0.034), ('frame', 0.033), ('maxj', 0.032), ('supplemented', 0.032), ('celebrity', 0.032), ('characters', 0.032), ('motp', 0.03), ('tv', 0.03), ('detections', 0.029), ('tpami', 0.029), ('wright', 0.029), ('tracked', 0.028), ('probabilistically', 0.028), ('dataset', 0.028), ('identity', 0.027), ('lbp', 0.027), ('temporal', 0.026), ('wavelets', 0.026), ('uncertain', 0.026), ('eat', 0.026), ('sequence', 0.026), ('test', 0.026), ('nn', 0.026), ('tracker', 0.025), ('imposing', 0.025), ('sites', 0.025), ('metrics', 0.025), ('expensive', 0.024), ('lfw', 0.024), ('gallery', 0.024), ('svm', 0.024), ('arg', 0.024), ('performing', 0.024), ('shah', 0.023), ('hmm', 0.023), ('sparsity', 0.023), ('residual', 0.023), ('movies', 0.023), ('histogram', 0.023), ('cast', 0.023), ('land', 0.023), ('hundred', 0.023), ('metric', 0.022), ('mm', 0.022), ('predictions', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

2 0.312677 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification

Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang

Abstract: Pose variation remains to be a major challenge for realworld face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-ofthe-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

3 0.28115627 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu

Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.

4 0.24260911 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

Author: Martin Bäuml, Makarand Tapaswi, Rainer Stiefelhagen

Abstract: We address the problem of person identification in TV series. We propose a unified learning framework for multiclass classification which incorporates labeled and unlabeled data, and constraints between pairs of features in the training. We apply the framework to train multinomial logistic regression classifiers for multi-class face recognition. The method is completely automatic, as the labeled data is obtained by tagging speaking faces using subtitles and fan transcripts of the videos. We demonstrate our approach on six episodes each of two diverse TV series and achieve state-of-the-art performance.

5 0.23948054 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild

Author: Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen

Abstract: In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by sum-pooling the nonnegative sparse codes of position-free patches sampled within the block (resp. volume). Whitened Principal Component Analysis (WPCA) is further utilized to reduce the feature dimension, which leads to our Spatial Face Region Descriptor (SFRD) (resp. Spatial-Temporal Face Region Descriptor, STFRD) for images (resp. videos). Moreover, we develop a new distance method for face verification metric learning called Pairwise-constrained Multiple Metric Learning (PMML) to effectively integrate the face region descriptors of all blocks (resp. volumes) from an image (resp. a video). Our work achieves the state- of-the-art performances on two real-world datasets LFW and YouTube Faces (YTF) according to the restricted protocol.

6 0.23545307 438 cvpr-2013-Towards Pose Robust Face Recognition

7 0.23276114 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

8 0.22902383 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

9 0.19230494 220 cvpr-2013-In Defense of Sparsity Based Face Recognition

10 0.17438352 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition

11 0.15139182 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

12 0.15046297 152 cvpr-2013-Exemplar-Based Face Parsing

13 0.14313687 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

14 0.14294367 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

15 0.1363034 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

16 0.12466412 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification

17 0.1150215 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories

18 0.1115109 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

19 0.10349794 414 cvpr-2013-Structure Preserving Object Tracking

20 0.096666984 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.206), (1, -0.116), (2, -0.126), (3, 0.011), (4, 0.0), (5, -0.052), (6, 0.016), (7, -0.17), (8, 0.317), (9, -0.114), (10, 0.122), (11, -0.083), (12, 0.099), (13, 0.141), (14, -0.006), (15, 0.02), (16, 0.051), (17, -0.023), (18, -0.075), (19, -0.01), (20, -0.076), (21, 0.059), (22, -0.011), (23, 0.082), (24, 0.024), (25, -0.038), (26, -0.05), (27, 0.076), (28, -0.045), (29, -0.06), (30, 0.014), (31, 0.086), (32, 0.119), (33, 0.016), (34, 0.072), (35, 0.047), (36, -0.026), (37, 0.038), (38, 0.019), (39, -0.026), (40, -0.021), (41, 0.024), (42, 0.003), (43, -0.131), (44, -0.041), (45, 0.047), (46, -0.051), (47, 0.05), (48, 0.013), (49, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98213148 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

2 0.90814203 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification

Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang

Abstract: Pose variation remains to be a major challenge for realworld face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-ofthe-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

3 0.87394994 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild

Author: Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen

Abstract: In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by sum-pooling the nonnegative sparse codes of position-free patches sampled within the block (resp. volume). Whitened Principal Component Analysis (WPCA) is further utilized to reduce the feature dimension, which leads to our Spatial Face Region Descriptor (SFRD) (resp. Spatial-Temporal Face Region Descriptor, STFRD) for images (resp. videos). Moreover, we develop a new distance method for face verification metric learning called Pairwise-constrained Multiple Metric Learning (PMML) to effectively integrate the face region descriptors of all blocks (resp. volumes) from an image (resp. a video). Our work achieves the state- of-the-art performances on two real-world datasets LFW and YouTube Faces (YTF) according to the restricted protocol.

4 0.81739748 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu

Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.

5 0.8152625 438 cvpr-2013-Towards Pose Robust Face Recognition

Author: Dong Yi, Zhen Lei, Stan Z. Li

Abstract: Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for pose robust face recognition towards practical applications, which is fast, pose robust and can work well under unconstrained environments. Firstly, a 3D deformable model is built and a fast 3D model fitting algorithm is proposed to estimate the pose of face image. Secondly, a group of Gabor filters are transformed according to the pose and shape of face image for feature extraction. Finally, PCA is applied on the pose adaptive Gabor features to remove the redundances and Cosine metric is used to evaluate the similarity. The proposed method has three advantages: (1) The pose correction is applied in the filter space rather than image space, which makes our method less affected by the precision of the 3D model; (2) By combining the holistic pose transformation and local Gabor filtering, the final feature is robust to pose and other negative factors in face recognition; (3) The 3D structure and facial symmetry are successfully used to deal with self-occlusion. Extensive experiments on FERET and PIE show the proposed method outperforms state-ofthe-art methods significantly, meanwhile, the method works well on LFW.

6 0.81301045 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

7 0.79925019 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

8 0.7709924 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

9 0.73248702 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition

10 0.71890199 220 cvpr-2013-In Defense of Sparsity Based Face Recognition

11 0.65436637 152 cvpr-2013-Exemplar-Based Face Parsing

12 0.63700378 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification

13 0.61219198 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

14 0.60406572 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification

15 0.59759724 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment

16 0.56957465 199 cvpr-2013-Harry Potter's Marauder's Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization

17 0.56826174 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

18 0.56608081 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

19 0.52466285 415 cvpr-2013-Structured Face Hallucination

20 0.52124512 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.096), (16, 0.017), (26, 0.046), (33, 0.198), (67, 0.441), (69, 0.033), (87, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91889608 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

Author: Pramod Sharma, Ram Nevatia

Abstract: In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a randomfern adaptive classifier . The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.

2 0.91233939 103 cvpr-2013-Decoding Children's Social Behavior

Author: James M. Rehg, Gregory D. Abowd, Agata Rozga, Mario Romero, Mark A. Clements, Stan Sclaroff, Irfan Essa, Opal Y. Ousley, Yin Li, Chanho Kim, Hrishikesh Rao, Jonathan C. Kim, Liliana Lo Presti, Jianming Zhang, Denis Lantsman, Jonathan Bidwell, Zhefan Ye

Abstract: We introduce a new problem domain for activity recognition: the analysis of children ’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1–2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3–5 minute child-adult interaction. In each session, the adult examiner followed a semistructured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

3 0.8804599 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection

Author: Wanli Ouyang, Xiaogang Wang

Abstract: In this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian detectors is designed to capture the unique visual cues which are formed by nearby multiple pedestrians but cannot be captured by single-pedestrian detectors. A probabilistic framework is proposed to model the relationship between the configurations estimated by single- and multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection. It can integrate with any single-pedestrian detector without significantly increasing the computation load. 15 state-of-the-art single-pedestrian detection approaches are investigated on three widely used public datasets: Caltech, TUD-Brussels andETH. Experimental results show that our framework significantly improves all these approaches. The average improvement is 9% on the Caltech-Test dataset, 11% on the TUD-Brussels dataset and 17% on the ETH dataset in terms of average miss rate. The lowest average miss rate is reduced from 48% to 43% on the Caltech-Test dataset, from 55% to 50% on the TUD-Brussels dataset and from 51% to 41% on the ETH dataset.

same-paper 4 0.82733428 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification

Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah

Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.

5 0.82198715 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers

Author: Georgia Gkioxari, Pablo Arbeláez, Lubomir Bourdev, Jitendra Malik

Abstract: We propose a novel approach for human pose estimation in real-world cluttered scenes, and focus on the challenging problem of predicting the pose of both arms for each person in the image. For this purpose, we build on the notion of poselets [4] and train highly discriminative classifiers to differentiate among arm configurations, which we call armlets. We propose a rich representation which, in addition to standardHOGfeatures, integrates the information of strong contours, skin color and contextual cues in a principled manner. Unlike existing methods, we evaluate our approach on a large subset of images from the PASCAL VOC detection dataset, where critical visual phenomena, such as occlusion, truncation, multiple instances and clutter are the norm. Our approach outperforms Yang and Ramanan [26], the state-of-the-art technique, with an improvement from 29.0% to 37.5% PCP accuracy on the arm keypoint prediction task, on this new pose estimation dataset.

6 0.81162012 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search

7 0.80864626 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation

8 0.79580116 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

9 0.78418946 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking

10 0.76770043 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues

11 0.74164873 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

12 0.73746467 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

13 0.72936785 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

14 0.71354461 363 cvpr-2013-Robust Multi-resolution Pedestrian Detection in Traffic Scenes

15 0.71010125 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

16 0.70000142 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence

17 0.67265499 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

18 0.67160553 438 cvpr-2013-Towards Pose Robust Face Recognition

19 0.66766137 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

20 0.66739482 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification