cvpr cvpr2013 cvpr2013-92 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji
Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 Constrained Clustering and Its Application to Face Clustering in Videos Baoyuan Wu ∗1,2, Yifan Zhang1, Bao-Gang Hu1, and Qiang Ji2 1NLPR, CASIA, Beijing 100190, China 2Rensselaer Polytechnic Institute, Troy, NY 12180, USA Abstract In this paper, we focus on face clustering in videos. [sent-1, score-0.66]
2 Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. [sent-2, score-0.683]
3 Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. [sent-3, score-1.683]
4 As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. [sent-4, score-0.962]
5 These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). [sent-5, score-0.516]
6 Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. [sent-6, score-0.782]
7 The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. [sent-7, score-0.639]
8 We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. [sent-8, score-1.718]
9 The framework is applicable to other clustering algorithms to significantly reduce the computational cost. [sent-9, score-0.306]
10 Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms. [sent-10, score-0.5]
11 Introduction We are interested in clustering faces into groups corresponding to individuals appearing in real-world videos. [sent-12, score-0.595]
12 A successful solution of this problem can be applied to many fields, including automatically determining the cast of a feature-length film, content based video retrieval, rapid browsing and organization of video collections, automatic collection of large-scale face data set, etc. [sent-13, score-0.425]
13 In real-word videos, lighting conditions, facial expressions and head pose may drastically change the appearance of faces. [sent-15, score-0.12]
14 Hence, the uncontrolled imaging condition in real-world videos raise many difficulties for face clustering. [sent-23, score-0.566]
15 Face clustering is a task of grouping faces by visual similarity. [sent-25, score-0.595]
16 It is closely related to face recognition but has several different aspects. [sent-26, score-0.354]
17 For face recognition, it is assumed that the number of persons and a training facial image data set are known beforehand. [sent-27, score-0.567]
18 The data set consists of certain labeled facial images, which is used for training a classifier. [sent-28, score-0.12]
19 When testing, each querying facial image can be classified by the trained classifier and the best matching person ID is returned. [sent-29, score-0.189]
20 Hence, face recognition can be considered as a supervised classification problem. [sent-30, score-0.354]
21 In contrast, since no labelled faces are provided, face clustering is considered as a unsupervised problem. [sent-31, score-0.996]
22 Although a large body of work has been conducted on face recognition, face clustering is a rather novel topic with few publications in the literature. [sent-32, score-1.051]
23 Those works can be grouped into two categories: purely data-driven methods and clustering with prior knowledge. [sent-33, score-0.37]
24 Most data-driven methods are fully unsupervised, and focus on obtaining a good distance measure or mapping raw data to a new space for better representing the structure of the inter-personal dissimilarities from the unlabeled faces[10, 11, 19, 12, 1]. [sent-34, score-0.108]
25 Fitzgibbon and Zisserman [10] proposed an affine invariant distance measure to achieve robustness to face pose changing. [sent-35, score-0.354]
26 To the best of our knowledge, this work is the first attempt for face clustering in movies. [sent-36, score-0.66]
27 Then they [11] extended their work to a Joint Manifold Distance (JMD), where each subspace represents a set of facial images of the same person detected in consecutive video frames. [sent-37, score-0.157]
28 [19] proposed a Manifold-Manifold Distance (MMD), in which a nonlinear manifold is divided into several local linear subspaces. [sent-39, score-0.068]
29 Arandjelovic and Cipolla [1] clustered faces over face appearance manifolds in an anisotropic manifold space which exploits the coherence of dissimilarities between manifolds. [sent-43, score-0.783]
30 In addition to fully unsupervised methods, another kind of data-driven clustering methods try to utilize some partial supervision to help clustering. [sent-44, score-0.353]
31 Prince and Elder [18] combined clustering with a Bayesian approach to count the number of different people that appear in a collection of face images. [sent-45, score-0.696]
32 The parameters of a generative model describing the face manifold are learned from the training data. [sent-46, score-0.506]
33 Du and Chellappa [8] presented an on-line contextaided face association method, which uses a Conditional Random Fields (CRFs) to combine multiple contextual features. [sent-48, score-0.354]
34 The aforementioned methods are all purely data-driven, which only exploit information contained in the data. [sent-51, score-0.064]
35 Considering the huge uncertainty in real-world videos, purely data-driven methods are expected to be unstable. [sent-53, score-0.064]
36 Instead, prior knowledge could be exploited to guide the clustering in order to achieve robustness and increase the generalization ability of the methods. [sent-54, score-0.476]
37 [4] considered using extra information to enhance the face clustering, where the faces are collected from web news pages. [sent-56, score-0.742]
38 A set of names automatically captured from associated news captions are employed to supervise the clustering. [sent-57, score-0.1]
39 However, such textbased labels are not always available for faces in videos. [sent-58, score-0.337]
40 Thus, we can easily obtain plenty of must-link and cannot-link constraints without much extra cost. [sent-60, score-0.227]
41 In [21], constraints are exploited to modify the distance matrix and to guide the clustering to satisfy such constraints. [sent-62, score-0.588]
42 The latest work on face clustering with constraints is presented in [7], called unsupervised logistic discriminative metric learning (ULDML). [sent-65, score-0.819]
43 A metric is learned such that must-linked faces are close, while cannot-linked faces are far from each other. [sent-66, score-0.614]
44 In the literature of constrained clustering, many meth- ods have been proposed to exploit pairwise constraints to guide the clustering, such as COP-KMEANS [22], constrained EM [20], HMRF-KMeans [2] and PPC [17]. [sent-67, score-0.565]
45 COPKMEANS embeds constraints in hard manner, while the other three adopt the soft constraints. [sent-68, score-0.265]
46 However, the weights of these soft constraints are totally user-defined. [sent-69, score-0.195]
47 In contrast, the weights of constraints can be easily adjusted through learning in our algorithm. [sent-70, score-0.207]
48 In this paper we propose a probabilistic constrained clustering model based on Hidden Markov Random Fields. [sent-72, score-0.392]
49 Together with the pairwise constraints, the local smoothness assumptions are also incorporated into the model to achieve the robust performance. [sent-73, score-0.338]
50 The proposed model is optimized by the simulated field algorithm [6]. [sent-75, score-0.07]
51 Such that the parameters are learned effectively, and the weights of pairwise constraints can be easily adjusted. [sent-76, score-0.405]
52 Note that there are many issues in the whole process of face clustering in videos, such as face detection, face tracking, face features and determining the number of persons, etc. [sent-77, score-1.722]
53 However, in this paper we only focus on showing how and to what extent the pairwise constraints can help the face clustering in videos. [sent-78, score-0.934]
54 Our task can be briefly described as follows: given the number of persons K, the detected faces and their tracks, and features of each face, we partition all faces into K disjoint clusters. [sent-79, score-0.776]
55 (1) We propose a probabilistic constrained clustering model based on HMRFs, in which the pairwise constraints, label-level and constraint-level local smoothness assumptions are incorporated together to guide the clustering process. [sent-81, score-1.155]
56 (2) The model parameters are effectively learned through the simulated field algorithm, and the weights of constraints can be easily adjusted. [sent-82, score-0.313]
57 In contrast, in many existing constrained clusterings [2] [17], no practical suggestions are provided to determine the weights of constraints. [sent-83, score-0.246]
58 (3) We present an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. [sent-84, score-1.718]
59 Any clustering algorithm can be directly used in this framework to significantly reduce the computational cost. [sent-85, score-0.306]
60 A constrained clustering based on Hidden Markov Random Fields × 2. [sent-88, score-0.392]
61 Problem formulation Given a unlabeled data set X = {x1, x2 , . [sent-90, score-0.036]
62 , xn |xi ∈ Rd}G, our goal ibse tleod partition iXt i n=to Kx (predefined) dis∈joint c oluurste grosa. [sent-93, score-0.107]
63 A pairwise constraint {seyt C = { Cml , C, ycl }∈ ∈i s{ 1a,l2so, provided Ato p guide teh ceo cnslutsrateinr-t ing, Csuc =h t{hCat the clustering ore psruolvti sdhedou tlod satisfy hthee cselu constraints. [sent-101, score-0.671]
64 The must-link constraints Cml = {(xi, xj)} indicates that xi and xj should be in the same c(xluster). [sent-102, score-0.33]
65 }T ihnecannot-link constraints Ccl = {(xi , xj)} requires that xi and xj should be partitioned i{n(txo diff)e}re rnetq cuiluresste trhs. [sent-103, score-0.33]
66 a xA n n symmetric matrix W = {wij } is adopted to repnres ×ent n a slly mthem pairwise rcioxns Wtrai =nts: { wif (x}i i , xj) ∈p eCdm tol, t rheepnwij = 1; if (xi, xj) ∈ Ccl, then wij = −1; if) )( ∈xi, C xj) ∈/ C, wij = 0. [sent-104, score-0.506]
67 As we w)il ∈l s Chow later, through c; oifns (xtraint propagation, −1 ≤ wij ≤ 1: wij > 0 means must-link, wij < 0 means ,c −an1no ≤t- lwink;≤ |wij |w denotes the confidence of the consmtreaainnts, ci. [sent-105, score-0.62]
68 Hidden Markov Random Fields Hidden Markov Random Fields [13] is a commonly used generative model. [sent-110, score-0.048]
69 in=1 p(xi |yi) ; (b) given tahree oinbdseeprveendd evnat,ri ia. [sent-114, score-0.045]
70 Its general formulation is as follows [13]: P(X, Y ) = P(X|Y )P(Y ) × = Z1? [sent-120, score-0.035]
71 Ni where ψu denotes the unary potential function, ψp denotes the pairwise potential function, and Z is the partition function, Ni represents the neighbourhood set of xi. [sent-130, score-0.627]
72 1 Unary potential function ψu ψu (xi, yi) embeds the correlation between the observation xi and its latent label yi. [sent-133, score-0.336]
73 There have been many different methods to design the unary potential. [sent-134, score-0.089]
74 For simplicity, we assume it as a Gaussian distribution, as follows: ψu (xi , yi) = N(xi |μyi Σyi ) , (2) where μyi is the mean vector of the yith cluster, Σyi denotes the covariance matrix. [sent-135, score-0.084]
75 2 Pairwise potential function ψp ψp(yi, yj) embeds the correlation between yi and yj . [sent-143, score-0.696]
76 All correlations in Y can be represented by a neighbourhood system, denoted as a n n symmetric matrix V = {vij }. [sent-144, score-0.137]
77 vij e>m m0, means tdhe a positive nco sryremlmateiotnri,c ci m. [sent-145, score-0.299]
78 be same; vij < 0 means the negative correlation, i. [sent-148, score-0.299]
79 yi and yj should be different; vij = 0 means no correlation. [sent-150, score-0.761]
80 Since V plays a key role in the pairwise potential, we present a detailed discussion about how to generate V in the next section. [sent-151, score-0.162]
81 Here we firstly show how to utilize a generated V in pairwise potential, as follows: Θ ψp(yi,yj) = exp(−βφ(yi,yj)) Θ = exp? [sent-152, score-0.162]
82 unction, β is a positive trade-offparameter between unary and pairwise potential, and it will be learned. [sent-158, score-0.251]
83 δ is defined as follows: if yi = yj, then δ(yi, yj) = 1, else δ(yi, yj) = 0. [sent-159, score-0.215]
84 Equation (3) has an intuitive interpretation in probability: when vij > 0, if yi yj, then (yi, yj ) = exp(−β|vij |) < 1, i. [sent-160, score-0.727]
wordName wordTfidf (topN-words)
[('face', 0.354), ('clustering', 0.306), ('faces', 0.289), ('vij', 0.265), ('yj', 0.247), ('yi', 0.215), ('wij', 0.172), ('hmrfs', 0.163), ('pairwise', 0.162), ('videos', 0.146), ('embeds', 0.12), ('facial', 0.12), ('guide', 0.119), ('xj', 0.116), ('constraints', 0.112), ('ccl', 0.109), ('cml', 0.109), ('xi', 0.102), ('neighbourhood', 0.095), ('persons', 0.093), ('unary', 0.089), ('constrained', 0.086), ('smoothness', 0.083), ('mmd', 0.08), ('fields', 0.079), ('tracks', 0.078), ('hidden', 0.077), ('clusterings', 0.077), ('potential', 0.075), ('dissimilarities', 0.072), ('simulated', 0.07), ('manifold', 0.068), ('news', 0.065), ('markov', 0.064), ('purely', 0.064), ('specially', 0.06), ('partition', 0.059), ('matter', 0.052), ('exploited', 0.051), ('weights', 0.05), ('incorporated', 0.05), ('track', 0.049), ('bedded', 0.048), ('textbased', 0.048), ('baoyuan', 0.048), ('ibse', 0.048), ('troy', 0.048), ('txo', 0.048), ('ycl', 0.048), ('yifan', 0.048), ('yith', 0.048), ('generative', 0.048), ('unsupervised', 0.047), ('disjoint', 0.046), ('easily', 0.045), ('tahree', 0.045), ('tol', 0.045), ('ixt', 0.045), ('sanp', 0.045), ('unction', 0.045), ('assumptions', 0.043), ('correlations', 0.042), ('diff', 0.042), ('cipolla', 0.04), ('chow', 0.04), ('casia', 0.04), ('mbgs', 0.04), ('correlation', 0.039), ('elder', 0.038), ('polytechnic', 0.038), ('prince', 0.038), ('person', 0.037), ('publications', 0.037), ('unlabeled', 0.036), ('ore', 0.036), ('ato', 0.036), ('axn', 0.036), ('plenty', 0.036), ('arandjelovic', 0.036), ('qiang', 0.036), ('learned', 0.036), ('collection', 0.036), ('denotes', 0.036), ('follows', 0.035), ('vd', 0.035), ('captions', 0.035), ('tihse', 0.035), ('nts', 0.035), ('browsing', 0.035), ('film', 0.035), ('raise', 0.034), ('means', 0.034), ('extra', 0.034), ('exp', 0.033), ('nst', 0.033), ('owf', 0.033), ('suggestions', 0.033), ('soft', 0.033), ('uncontrolled', 0.032), ('querying', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
Author: Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji
Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms.
2 0.29057002 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
3 0.2413931 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
Abstract: Pose variation remains to be a major challenge for realworld face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-ofthe-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.
4 0.23276114 160 cvpr-2013-Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
5 0.20564258 438 cvpr-2013-Towards Pose Robust Face Recognition
Author: Dong Yi, Zhen Lei, Stan Z. Li
Abstract: Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for pose robust face recognition towards practical applications, which is fast, pose robust and can work well under unconstrained environments. Firstly, a 3D deformable model is built and a fast 3D model fitting algorithm is proposed to estimate the pose of face image. Secondly, a group of Gabor filters are transformed according to the pose and shape of face image for feature extraction. Finally, PCA is applied on the pose adaptive Gabor features to remove the redundances and Cosine metric is used to evaluate the similarity. The proposed method has three advantages: (1) The pose correction is applied in the filter space rather than image space, which makes our method less affected by the precision of the 3D model; (2) By combining the holistic pose transformation and local Gabor filtering, the final feature is robust to pose and other negative factors in face recognition; (3) The 3D structure and facial symmetry are successfully used to deal with self-occlusion. Extensive experiments on FERET and PIE show the proposed method outperforms state-ofthe-art methods significantly, meanwhile, the method works well on LFW.
7 0.19973545 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
8 0.18688735 93 cvpr-2013-Constraints as Features
9 0.17572893 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
10 0.17531095 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
11 0.16912414 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
12 0.159347 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs
13 0.15248023 152 cvpr-2013-Exemplar-Based Face Parsing
14 0.14653353 135 cvpr-2013-Discriminative Subspace Clustering
15 0.14154665 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
16 0.13046795 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
17 0.12770084 379 cvpr-2013-Scalable Sparse Subspace Clustering
18 0.12315928 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds
19 0.12293229 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
20 0.12185876 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
topicId topicWeight
[(0, 0.236), (1, -0.081), (2, -0.096), (3, 0.002), (4, 0.074), (5, 0.011), (6, -0.036), (7, -0.177), (8, 0.216), (9, -0.207), (10, 0.203), (11, -0.054), (12, 0.006), (13, 0.09), (14, -0.083), (15, 0.079), (16, -0.007), (17, 0.024), (18, -0.05), (19, -0.074), (20, -0.034), (21, 0.005), (22, -0.017), (23, 0.081), (24, 0.066), (25, -0.09), (26, -0.044), (27, 0.036), (28, -0.002), (29, -0.035), (30, 0.04), (31, 0.073), (32, 0.124), (33, 0.035), (34, 0.031), (35, 0.051), (36, -0.02), (37, 0.011), (38, 0.041), (39, -0.081), (40, -0.029), (41, 0.025), (42, 0.025), (43, 0.042), (44, -0.019), (45, -0.023), (46, -0.026), (47, 0.06), (48, 0.095), (49, -0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.99045312 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
Author: Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji
Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms.
2 0.81572032 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
Author: Martin Bäuml, Makarand Tapaswi, Rainer Stiefelhagen
Abstract: We address the problem of person identification in TV series. We propose a unified learning framework for multiclass classification which incorporates labeled and unlabeled data, and constraints between pairs of features in the training. We apply the framework to train multinomial logistic regression classifiers for multi-class face recognition. The method is completely automatic, as the labeled data is obtained by tagging speaking faces using subtitles and fan transcripts of the videos. We demonstrate our approach on six episodes each of two diverse TV series and achieve state-of-the-art performance.
Author: Enrique G. Ortiz, Alan Wright, Mubarak Shah
Abstract: This paper presents an end-to-end video face recognition system, addressing the difficult problem of identifying a video face track using a large dictionary of still face images of a few hundred people, while rejecting unknown individuals. A straightforward application of the popular ?1minimization for face recognition on a frame-by-frame basis is prohibitively expensive, so we propose a novel algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and the knowledge that the face track frames belong to the same individual. By adding a strict temporal constraint to the ?1-minimization that forces individual frames in a face track to all reconstruct a single identity, we show the optimization reduces to a single minimization over the mean of the face track. We also introduce a new Movie Trailer Face Dataset collected from 101 movie trailers on YouTube. Finally, we show that our methodmatches or outperforms the state-of-the-art on three existing datasets (YouTube Celebrities, YouTube Faces, and Buffy) and our unconstrained Movie Trailer Face Dataset. More importantly, our method excels at rejecting unknown identities by at least 8% in average precision.
4 0.78495193 182 cvpr-2013-Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild
Author: Zhen Cui, Wen Li, Dong Xu, Shiguang Shan, Xilin Chen
Abstract: In many real-world face recognition scenarios, face images can hardly be aligned accurately due to complex appearance variations or low-quality images. To address this issue, we propose a new approach to extract robust face region descriptors. Specifically, we divide each image (resp. video) into several spatial blocks (resp. spatial-temporal volumes) and then represent each block (resp. volume) by sum-pooling the nonnegative sparse codes of position-free patches sampled within the block (resp. volume). Whitened Principal Component Analysis (WPCA) is further utilized to reduce the feature dimension, which leads to our Spatial Face Region Descriptor (SFRD) (resp. Spatial-Temporal Face Region Descriptor, STFRD) for images (resp. videos). Moreover, we develop a new distance method for face verification metric learning called Pairwise-constrained Multiple Metric Learning (PMML) to effectively integrate the face region descriptors of all blocks (resp. volumes) from an image (resp. a video). Our work achieves the state- of-the-art performances on two real-world datasets LFW and YouTube Faces (YTF) according to the restricted protocol.
5 0.78377414 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
Abstract: Pose variation remains to be a major challenge for realworld face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-ofthe-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.
6 0.72998363 438 cvpr-2013-Towards Pose Robust Face Recognition
7 0.72302169 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
8 0.6858393 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
9 0.68081689 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
10 0.63614887 152 cvpr-2013-Exemplar-Based Face Parsing
11 0.63566279 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines
12 0.63160419 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
13 0.60100478 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images
14 0.59261537 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment
15 0.58901596 64 cvpr-2013-Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification
16 0.56769013 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
17 0.56733865 93 cvpr-2013-Constraints as Features
18 0.56292665 220 cvpr-2013-In Defense of Sparsity Based Face Recognition
19 0.55577791 415 cvpr-2013-Structured Face Hallucination
20 0.55484366 252 cvpr-2013-Learning Locally-Adaptive Decision Functions for Person Verification
topicId topicWeight
[(10, 0.128), (16, 0.019), (26, 0.064), (28, 0.011), (33, 0.315), (59, 0.17), (67, 0.095), (69, 0.061), (87, 0.064)]
simIndex simValue paperId paperTitle
1 0.97490311 11 cvpr-2013-A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles
Author: Dror Sholomon, Omid David, Nathan S. Netanyahu
Abstract: In thispaper wepropose thefirst effective automated, genetic algorithm (GA)-based jigsaw puzzle solver. We introduce a novel procedure of merging two ”parent” solutions to an improved ”child” solution by detecting, extracting, and combining correctly assembled puzzle segments. The solver proposed exhibits state-of-the-art performance solving previously attempted puzzles faster and far more accurately, and also puzzles of size never before attempted. Other contributions include the creation of a benchmark of large images, previously unavailable. We share the data sets and all of our results for future testing and comparative evaluation of jigsaw puzzle solvers.
2 0.9477374 276 cvpr-2013-MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification
Author: Amr Bakry, Ahmed Elgammal
Abstract: Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, authentication, and personal privacy issues. In this paper, we propose a novel approach for lipreading and speaker identification. Wepropose a new approachfor manifold parameterization in a low-dimensional latent space, where each manifold is represented as a point in that space. We initially parameterize each instance manifold using a nonlinear mapping from a unified manifold representation. We then factorize the parameter space using Kernel Partial Least Squares (KPLS) to achieve a low-dimension manifold latent space. We use two-way projections to achieve two manifold latent spaces, one for the speech content and one for the speaker. We apply our approach on two public databases: AVLetters and OuluVS. We show the results for three different settings of lipreading: speaker independent, speaker dependent, and speaker semi-dependent. Our approach outperforms for the speaker semi-dependent setting by at least 15% of the baseline, and competes in the other two settings.
3 0.93499154 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy
Author: Wenbin Li, Darren Cosker, Matthew Brown, Rui Tang
Abstract: In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. The algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.
4 0.93466467 112 cvpr-2013-Dense Segmentation-Aware Descriptors
Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer
Abstract: In this work we exploit segmentation to construct appearance descriptors that can robustly deal with occlusion and background changes. For this, we downplay measurements coming from areas that are unlikely to belong to the same region as the descriptor’s center, as suggested by soft segmentation masks. Our treatment is applicable to any image point, i.e. dense, and its computational overhead is in the order of a few seconds. We integrate this idea with Dense SIFT, and also with Dense Scale and Rotation Invariant Descriptors (SID), delivering descriptors that are densely computable, invariant to scaling and rotation, and robust to background changes. We apply our approach to standard benchmarks on large displacement motion estimation using SIFT-flow and widebaseline stereo, systematically demonstrating that the introduction of segmentation yields clear improvements.
same-paper 5 0.91980553 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
Author: Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji
Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms.
6 0.91685522 298 cvpr-2013-Multi-scale Curve Detection on Surfaces
7 0.91032833 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
8 0.90118271 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.89847797 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
10 0.89805818 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
11 0.89797455 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
12 0.89751399 325 cvpr-2013-Part Discovery from Partial Correspondence
13 0.89695597 414 cvpr-2013-Structure Preserving Object Tracking
14 0.89611691 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
15 0.89497626 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
16 0.89496982 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
17 0.89483851 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
18 0.89444512 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
19 0.89443249 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
20 0.89425343 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection