cvpr cvpr2013 cvpr2013-385 knowledge-graph by maker-knowledge-mining

385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection


Source: pdf

Author: Wen-Sheng Chu, Fernando De La Torre, Jeffery F. Cohn

Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. STM outperformed generic classifiers in all.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Cohn†‡ †Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213 ‡Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260 Abstract Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. [sent-2, score-0.472]

2 Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. [sent-8, score-0.114]

3 The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). [sent-10, score-0.25]

4 We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. [sent-11, score-0.228]

5 STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. [sent-12, score-0.156]

6 To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. [sent-13, score-0.114]

7 FACS segments the visible effects of facial muscle activation into “action units” (AUs). [sent-19, score-0.175]

8 Each AU is related to one or more facial muscles. [sent-20, score-0.175]

9 FACS describes facial activity on the basis of 33 unique action units (AUs), as well as several categories of head and eye positions and other movements. [sent-21, score-0.225]

10 Automatic facial action unit detection (AFA) confronts a (sTarua)binjeinctgs for AU12 (lip-corner raiser). [sent-23, score-0.25]

11 Selective transfer machine, which personalizes the generic classifier, reliably separates AU12 for the unseen subject. [sent-25, score-0.128]

12 Then there is the challenge of automatically detecting facial actions that require significant training and expertise even for human coders, as has been recently reported in the first Facial Expression Recognition and Analysis Challenge [32]. [sent-31, score-0.222]

13 While improvements have been achieved, generalizability of classifiers to previously unseen persons remains a continuing challenge. [sent-34, score-0.08]

14 1(a) illustrates an example of how a simple linear classifier can separate the positive samples of AU12 (obliquely raised lip corners, seen in smiling) from negative samples (i. [sent-36, score-0.127]

15 However, when a classifier is learned using training data from all subjects (Fig. [sent-40, score-0.148]

16 1(b)) and tested on a subject excluded from the training set, it fails to generalize well. [sent-41, score-0.068]

17 When a classifier is trained on all available subjects, it is referred as generic. [sent-42, score-0.067]

18 Our guiding hypothesis is that these factors lead generic classifiers to perform better or worse on some subjects than others. [sent-45, score-0.17]

19 To mitigate the person-specific biases, this paper ex- plores the idea of personalizing a generic classifier. [sent-46, score-0.103]

20 Generic classifiers are personalized using no AU labels from test subjects. [sent-47, score-0.135]

21 STM personalizes the generic classifier in an unsupervised manner to compensate for person-specific biases, and greatly improves generalizability, see Fig. [sent-49, score-0.134]

22 We illustrate the benefits of our approach in the task of facial AU detection in three major datasets of posed and spontaneous facial expressions. [sent-51, score-0.416]

23 To the best of our knowledge, this is the first work to investigate personalizing a classifier for facial expression analysis. [sent-52, score-0.307]

24 Tracking non-rigid facial features has been a long standing problem in computer vision. [sent-60, score-0.175]

25 Common to all of these approaches is the assumption that training and test data come from the same distribution. [sent-76, score-0.08]

26 It therefore seeks to personalize the classifier by automatically re-weighting training samples that are most relevant to each test subject. [sent-78, score-0.209]

27 Torralba and Efros [3 1] discovered significant biases in object categorization; as a remedy, they encouraged advances in domain adaptation to cope with dataset biases. [sent-83, score-0.111]

28 Aytar and Zisserman [1] proposed to transfer pre-learned models to regularize the training of a new object class. [sent-84, score-0.086]

29 They cannot be applied to new domains or subjects when one has no prior knowledge of them. [sent-88, score-0.077]

30 In contrast, our approach is fully unsupervised, uses no labeled instances, and hence well suited to the problem of generalizing learning to new domains or new subjects in our case. [sent-89, score-0.077]

31 Close to our approach is a special case in unsupervised domain adaptation known as covariate shift [28], where training and test domains follow different distributions but the label distributions remain the same. [sent-90, score-0.265]

32 SVM-KNN [38] labels a single query using an SVM trained on its k neighborhood of the training data. [sent-96, score-0.069]

33 Unlike previous approaches, STM learns weights on individual training instances and hence makes better use of the data. [sent-98, score-0.094]

34 Considering distribution mismatch, Kernel Mean Matching (KMM) [16] directly infers the re-sampling weights by matching training and test distributions. [sent-99, score-0.109]

35 [36] estimated the relative importance weight and learn from weighted training samples for 3D human pose estimation. [sent-101, score-0.078]

36 On the contrary, STM jointly optimize the weights as well as the classifier parameters, and hence preserves discriminant property of the new decision boundary. [sent-104, score-0.096]

37 Selective Transfer Machine (STM) This section describes the proposed STM approach for personalizing a generic classifier. [sent-108, score-0.103]

38 Problem formulation: The main idea behind the STM is to re-weight more the training samples that are closer to the test samples. [sent-112, score-0.111]

39 The classifiers trained on the re-weighted training samples will be more likely to fit the test subject. [sent-113, score-0.184]

40 Ωs (Xtr, Xte) measures the distribution mismatch between the training and test distribution as a function of s. [sent-121, score-0.125]

41 T trhade goal oof b tahleSTM is to jointly optimize the penalized SVM w as well as the selective coefficient s, such that the resulting personalized classifier can better remove person-specific biases. [sent-129, score-0.223]

42 Penalized SVM: The first term in STM, Rw(Dtr, s), is the empirical risk of a penalized SVM, where ea(cDh training instance is weighted by its relevance to the test data. [sent-130, score-0.147]

43 Using the representer theorem [8], the penalized iSnVgM ϕ (i·n) . [sent-138, score-0.067]

44 Domain mismatch: The second term in STM, Ωs (Xtr, Xte), is the domain mismatch, and it has the objective to find a re-weighting function for minimizing the mismatch between training and test domains. [sent-147, score-0.151]

45 An intuitive way to reassign weights is to compute the ratio between training and test densities. [sent-152, score-0.109]

46 Here we adopt the Kernel Mean Matching (KMM) [16] method to reduce the difference between the means of the training and test distributions in the Reproducing Kernel Hilbert Space H. [sent-154, score-0.08]

47 between training and each test sample, finding a suitable s? [sent-179, score-0.08]

48 Circles represent the training data and squares the test data. [sent-183, score-0.08]

49 As can be observed, KMM put higher weights in the training samples closer to the test samples. [sent-185, score-0.14]

50 T-SVM [17] equally weights all the training data; by contrast, STM gives greater weight to training data that are more relevant to a given test subject. [sent-218, score-0.156]

51 On the other hand, STM is formulated as a biconvex problem and therefore assures convergence. [sent-220, score-0.07]

52 KMM does re-weighing only once, while STM does so in Algorithm 1: Selective Transfer Machine Input : Xtr, Xte, parameters C, λ Output: Classifier w and instance-wise weights s 1 Initialize training loss ? [sent-222, score-0.102]

53 DA-SVM [5], similar to TSVM, learns a classifier without re-weighting the training data. [sent-228, score-0.092]

54 (1) we adopt the Alternate Convex Search method [15] that alternates between solving two convex subproblems over the hyperplane w and the selective coefficient s. [sent-233, score-0.087]

55 Figures it # 1,4,8,1 2 with training/test accuracy (Tr% and Te%) show the hyperplanes in corresponding iterations, where grey (shaded) dots denote training data and white (unshaded) dots denote test data; circle/square patterns denote positive/negative classes respectively. [sent-267, score-0.08]

56 STM improves separation relative to generic SVM as early as the first iteration and converges close to the ideal hyperplane by the 12-th iteration. [sent-269, score-0.123]

57 Introducing the training loss helps preserve the discriminant property of the new decision boundary, and hence leads to a personalized classifier that is close to the ideal one. [sent-273, score-0.224]

58 On the other hand, STM simultaneously considers training loss and the weightings, and thus encourages the training samples close to the test samples be well classified. [sent-279, score-0.215]

59 Minimizing over w: In the case of training loss ? [sent-282, score-0.073]

60 2 being quadratic, the gradient and Hessian of the penalized linear SVM in (2) can be written as: ? [sent-283, score-0.067]

61 the expansion coefficient β for the penalized nonlinear SVM in (3) as: ? [sent-297, score-0.067]

62 Experiments STM was compared for AU detection with generic SVM and cross-domain learning approaches in three widely used databases that vary in duration, extent of out-of-plane head motion, and spontaneity of facial expression. [sent-303, score-0.257]

63 Image sequences average about 20 frames in length; they begin with neutral expression and proceed to a peak, which is AU-labelled. [sent-308, score-0.072]

64 6 to specific face regions, descriptors were computed within 36 36 pixel regions at predetermined facial landmarks (9 f3o6r× ×th3e6 upper rfeagceio oannsd a a7t fporre tdheete lromwienre dfac fea)c. [sent-360, score-0.198]

65 Positive samples were frames in which a given AU was present, and negative samples in which it was not. [sent-363, score-0.087]

66 The other meaning, which we refer to as PS2 or quasiPS, is a classifier that has been tested on a subject that was included among others in a training set. [sent-374, score-0.113]

67 For instance, consider the case in which data from five subjects are randomly assigned to training and testing sets. [sent-375, score-0.103]

68 A PS2 classifier is trained and then tested on the test set. [sent-376, score-0.1]

69 It is not surprising that PS2-SVM perform better than PS1-SVM since PS1-SVM was trained only on limited training data and thus suffers from overfitting. [sent-382, score-0.069]

70 As PS2-SVM was trained on all available subjects, it can be viewed as a generic classifier, as used in most literature on AU detection. [sent-383, score-0.085]

71 1, generic classifiers could suffer from the biases and lead to suboptimal performance. [sent-385, score-0.155]

72 On the other hand, STM consistently outperforms both person-specific classifiers since STM allows to select only relevant training data and fits better the test distribution. [sent-386, score-0.131]

73 Each entry shows the portion of selected training samples w. [sent-405, score-0.078]

74 Each row sums to 1 and each entry shows the portion of selected samples of training subjects with respect to each test subject. [sent-410, score-0.167]

75 4(b), when STM converges, it selects most of the training data that belongs to the target subject (higher diagonal values). [sent-412, score-0.068]

76 Comparison with generic classifiers and domain adaptation approaches This experiment compares the performance of STM against generic classifiers learned on the entire dataset, the covariate shift method KMM [16], a semi-supervised T-SVM [10], and the domain adaptation method DASVM [5]. [sent-415, score-0.462]

77 In this experiment, any sample of the test subjects is excluded from training. [sent-417, score-0.089]

78 Unlike STM that used a penalized SVM, T-SVM did not consider re-weighting for training instances and make use of the losses for all training data. [sent-458, score-0.179]

79 DA-SVM extends T-SVM by progressively labelling test patterns and removing labelled training patterns. [sent-460, score-0.08]

80 Not surprisingly, DA-SVM shows better performance than KMM and T-SVM, because it used more relevant training samples and resulted in a better personalized classifier. [sent-461, score-0.129]

81 By contrast, STM is a biconvex formulation, and therefore guarantees to converge to a critical point and outperforms existing approaches. [sent-465, score-0.092]

82 Conclusions This paper proposed a transductive method to personalize a generic classifier for facial Action Unit (AU) detection. [sent-491, score-0.381]

83 Our STM framework simultaneously learns the parameters of a classifier and the selective weights that minimizes the mismatch between the training and the test distributions. [sent-492, score-0.259]

84 We show that STM translates to a biconvex problem, and propose a simple alternated minimization approach to optimize it in the primal. [sent-493, score-0.093]

85 By attenuating the influence of inherent biases in morphology and behavior, we have shown that STM can achieve results that surpass non-personalized generic classifiers and approach the performance of classifiers that have been trained for individual persons (i. [sent-494, score-0.25]

86 The results have clearly demonstrated that STM outperforms existing classifiers when using the same protocol for training and testing. [sent-497, score-0.098]

87 This leads to high values in the estimated weights for training instances that were not reliable. [sent-501, score-0.094]

88 Automatic recognition of facial actions in [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] spontaneous expressions. [sent-519, score-0.218]

89 Learning partiallyobserved hidden conditional random fields for facial expression recognition. [sent-562, score-0.222]

90 Facial action coding system: A technique for the measurement of facial movement. [sent-608, score-0.225]

91 Biconvex sets and optimization with biconvex functions: a survey and extensions. [sent-614, score-0.07]

92 The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. [sent-655, score-0.075]

93 A model of the perception of facial expressions of emotion by humans: Research overview and perspectives. [sent-660, score-0.195]

94 Kernel conditional ordinal random fields for temporal segmentation of facial action units. [sent-678, score-0.247]

95 Nonparametric discriminant HMM and application to facial expression recognition. [sent-689, score-0.244]

96 Direct importance estimation with model selection and its application to covariate shift adaptation. [sent-712, score-0.094]

97 Facial action unit recognition by exploiting their dynamic and semantic relationships. [sent-718, score-0.075]

98 Fully automatic recognition of the temporal phases of facial actions. [sent-739, score-0.175]

99 No bias left behind: Covariate shift adaptation for discriminative 3d pose estimation. [sent-763, score-0.079]

100 Dynamic cascades with bidirectional bootstrapping for action unit detection in spontaneous facial behavior. [sent-787, score-0.293]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stm', 0.809), ('kmm', 0.357), ('facial', 0.175), ('au', 0.133), ('biconvex', 0.07), ('aus', 0.068), ('penalized', 0.067), ('generic', 0.063), ('auc', 0.062), ('svm', 0.061), ('selective', 0.06), ('dasvm', 0.059), ('covariate', 0.059), ('ck', 0.056), ('subjects', 0.056), ('personalize', 0.053), ('personalized', 0.051), ('classifiers', 0.051), ('action', 0.05), ('training', 0.047), ('expression', 0.047), ('classifier', 0.045), ('mismatch', 0.045), ('transductive', 0.045), ('ntr', 0.045), ('adaptation', 0.044), ('spontaneous', 0.043), ('biases', 0.041), ('interviews', 0.04), ('personalizing', 0.04), ('sugiyama', 0.04), ('xitr', 0.04), ('transfer', 0.039), ('avg', 0.037), ('shift', 0.035), ('xtr', 0.035), ('ols', 0.035), ('facs', 0.035), ('tsvm', 0.035), ('cohn', 0.035), ('lucey', 0.035), ('torre', 0.034), ('test', 0.033), ('ideal', 0.033), ('dtr', 0.033), ('samples', 0.031), ('xte', 0.031), ('densities', 0.03), ('generalizability', 0.029), ('weights', 0.029), ('hyperplane', 0.027), ('saragih', 0.027), ('ps', 0.027), ('afa', 0.026), ('personalizes', 0.026), ('rudovic', 0.026), ('whitehill', 0.026), ('wols', 0.026), ('domain', 0.026), ('loss', 0.026), ('frames', 0.025), ('unit', 0.025), ('health', 0.024), ('pittsburgh', 0.024), ('alternated', 0.023), ('dud', 0.023), ('chu', 0.023), ('borgwardt', 0.023), ('littlewort', 0.023), ('markedly', 0.023), ('posed', 0.023), ('face', 0.023), ('converge', 0.022), ('trained', 0.022), ('fasel', 0.022), ('yamada', 0.022), ('ordinal', 0.022), ('attenuating', 0.022), ('discriminant', 0.022), ('kernel', 0.021), ('domains', 0.021), ('subject', 0.021), ('la', 0.021), ('emotion', 0.02), ('gretton', 0.02), ('aytar', 0.02), ('aams', 0.02), ('qp', 0.02), ('unweighted', 0.02), ('imbalanced', 0.02), ('valstar', 0.02), ('lip', 0.02), ('afgr', 0.02), ('successive', 0.019), ('databases', 0.019), ('si', 0.019), ('eisr', 0.019), ('smile', 0.019), ('lp', 0.018), ('instances', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection

Author: Wen-Sheng Chu, Fernando De La Torre, Jeffery F. Cohn

Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. STM outperformed generic classifiers in all.

2 0.16444393 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

Author: Yue Wu, Zuoguan Wang, Qiang Ji

Abstract: Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, sincefaces may have varyingfacial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.

3 0.14514218 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

Author: Ziheng Wang, Shangfei Wang, Qiang Ji

Abstract: Spatial-temporal relations among facial muscles carry crucial information about facial expressions yet have not been thoroughly exploited. One contributing factor for this is the limited ability of the current dynamic models in capturing complex spatial and temporal relations. Existing dynamic models can only capture simple local temporal relations among sequential events, or lack the ability for incorporating uncertainties. To overcome these limitations and take full advantage of the spatio-temporal information, we propose to model the facial expression as a complex activity that consists of temporally overlapping or sequential primitive facial events. We further propose the Interval Temporal Bayesian Network to capture these complex temporal relations among primitive facial events for facial expression modeling and recognition. Experimental results on benchmark databases demonstrate the feasibility of the proposed approach in recognizing facial expressions based purely on spatio-temporal relations among facial muscles, as well as its advantage over the existing methods.

4 0.074868508 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints

Author: Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, Trevor Darrell

Abstract: Most successful object classification and detection methods rely on classifiers trained on large labeled datasets. However, for domains where labels are limited, simply borrowing labeled data from existing datasets can hurt performance, a phenomenon known as “dataset bias.” We propose a general framework for adapting classifiers from “borrowed” data to the target domain using a combination of available labeled and unlabeled examples. Specifically, we show that imposing smoothness constraints on the classifier scores over the unlabeled data can lead to improved adaptation results. Such constraints are often available in the form of instance correspondences, e.g. when the same object or individual is observed simultaneously from multiple views, or tracked between video frames. In these cases, the object labels are unknown but can be constrained to be the same or similar. We propose techniques that build on existing domain adaptation methods by explicitly modeling these relationships, and demonstrate empirically that they improve recognition accuracy in two scenarios, multicategory image classification and object detection in video.

5 0.073883936 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

Author: Yi Sun, Xiaogang Wang, Xiaoou Tang

Abstract: We propose a new approach for estimation of the positions of facial keypoints with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation. Thanks to the deep structures of convolutional networks, global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy keypoints. There are two folds of advantage for this. First, the texture context information over the entire face is utilized to locate each keypoint. Second, since the networks are trained to predict all the keypoints simultaneously, the geometric constraints among keypoints are implicitly encoded. The method therefore can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings. The networks at the following two levels are trained to locally refine initial predictions and their inputs are limited to small regions around the initial predictions. Several network structures critical for accurate and robust facial point detection are investigated. Extensive experiments show that our approach outperforms state-ofthe-art methods in both detection accuracy and reliability1.

6 0.063829675 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment

7 0.06123713 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources

8 0.059728771 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

9 0.058929138 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

10 0.05637601 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer

11 0.056263693 152 cvpr-2013-Exemplar-Based Face Parsing

12 0.055539299 164 cvpr-2013-Fast Convolutional Sparse Coding

13 0.05498939 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

14 0.054525997 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition

15 0.053130612 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

16 0.052894834 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning

17 0.051631756 386 cvpr-2013-Self-Paced Learning for Long-Term Tracking

18 0.050343972 287 cvpr-2013-Modeling Actions through State Changes

19 0.04960065 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

20 0.04925175 438 cvpr-2013-Towards Pose Robust Face Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.12), (1, -0.047), (2, -0.041), (3, -0.019), (4, -0.009), (5, 0.009), (6, -0.021), (7, -0.04), (8, 0.081), (9, -0.058), (10, 0.037), (11, -0.03), (12, 0.007), (13, 0.011), (14, -0.034), (15, 0.021), (16, -0.004), (17, 0.002), (18, 0.038), (19, 0.025), (20, -0.032), (21, -0.057), (22, -0.034), (23, -0.016), (24, -0.011), (25, 0.064), (26, 0.026), (27, -0.048), (28, 0.033), (29, 0.03), (30, -0.049), (31, -0.078), (32, -0.097), (33, -0.036), (34, -0.088), (35, -0.024), (36, -0.021), (37, 0.008), (38, -0.026), (39, 0.004), (40, 0.005), (41, -0.034), (42, -0.013), (43, 0.122), (44, 0.006), (45, 0.024), (46, 0.093), (47, -0.02), (48, 0.114), (49, -0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86427778 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection

Author: Wen-Sheng Chu, Fernando De La Torre, Jeffery F. Cohn

Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. STM outperformed generic classifiers in all.

2 0.79042178 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

Author: Ziheng Wang, Shangfei Wang, Qiang Ji

Abstract: Spatial-temporal relations among facial muscles carry crucial information about facial expressions yet have not been thoroughly exploited. One contributing factor for this is the limited ability of the current dynamic models in capturing complex spatial and temporal relations. Existing dynamic models can only capture simple local temporal relations among sequential events, or lack the ability for incorporating uncertainties. To overcome these limitations and take full advantage of the spatio-temporal information, we propose to model the facial expression as a complex activity that consists of temporally overlapping or sequential primitive facial events. We further propose the Interval Temporal Bayesian Network to capture these complex temporal relations among primitive facial events for facial expression modeling and recognition. Experimental results on benchmark databases demonstrate the feasibility of the proposed approach in recognizing facial expressions based purely on spatio-temporal relations among facial muscles, as well as its advantage over the existing methods.

3 0.71874976 161 cvpr-2013-Facial Feature Tracking Under Varying Facial Expressions and Face Poses Based on Restricted Boltzmann Machines

Author: Yue Wu, Zuoguan Wang, Qiang Ji

Abstract: Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, sincefaces may have varyingfacial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.

4 0.67172015 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

Author: Yi Sun, Xiaogang Wang, Xiaoou Tang

Abstract: We propose a new approach for estimation of the positions of facial keypoints with three-level carefully designed convolutional networks. At each level, the outputs of multiple networks are fused for robust and accurate estimation. Thanks to the deep structures of convolutional networks, global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy keypoints. There are two folds of advantage for this. First, the texture context information over the entire face is utilized to locate each keypoint. Second, since the networks are trained to predict all the keypoints simultaneously, the geometric constraints among keypoints are implicitly encoded. The method therefore can avoid local minimum caused by ambiguity and data corruption in difficult image samples due to occlusions, large pose variations, and extreme lightings. The networks at the following two levels are trained to locally refine initial predictions and their inputs are limited to small regions around the initial predictions. Several network structures critical for accurate and robust facial point detection are investigated. Extensive experiments show that our approach outperforms state-ofthe-art methods in both detection accuracy and reliability1.

5 0.59475374 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models

Author: Robert Anderson, Björn Stenger, Vincent Wan, Roberto Cipolla

Abstract: This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a ‘talking head’, given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

6 0.5608744 359 cvpr-2013-Robust Discriminative Response Map Fitting with Constrained Local Models

7 0.5477286 420 cvpr-2013-Supervised Descent Method and Its Applications to Face Alignment

8 0.49751896 415 cvpr-2013-Structured Face Hallucination

9 0.49305934 463 cvpr-2013-What's in a Name? First Names as Facial Attributes

10 0.42056492 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning

11 0.4192389 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video

12 0.40703091 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

13 0.40270334 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill

14 0.39737281 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation

15 0.39216933 118 cvpr-2013-Detecting Pulse from Head Motions in Video

16 0.38217542 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images

17 0.37943637 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction

18 0.37746137 103 cvpr-2013-Decoding Children's Social Behavior

19 0.3728449 346 cvpr-2013-Real-Time No-Reference Image Quality Assessment Based on Filter Learning

20 0.36178178 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.093), (16, 0.036), (26, 0.066), (28, 0.014), (33, 0.216), (36, 0.01), (55, 0.012), (67, 0.068), (69, 0.037), (71, 0.26), (80, 0.011), (87, 0.065)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79278886 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection

Author: Wen-Sheng Chu, Fernando De La Torre, Jeffery F. Cohn

Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+ [20], GEMEP-FERA [32] and RU-FACS [2]. STM outperformed generic classifiers in all.

2 0.75686675 448 cvpr-2013-Universality of the Local Marginal Polytope

Author: unkown-author

Abstract: We show that solving the LP relaxation of the MAP inference problem in graphical models (also known as the minsum problem, energy minimization, or weighted constraint satisfaction) is not easier than solving any LP. More precisely, any polytope is linear-time representable by a local marginal polytope and any LP can be reduced in linear time to a linear optimization (allowing infinite weights) over a local marginal polytope.

3 0.73486227 58 cvpr-2013-Beta Process Joint Dictionary Learning for Coupled Feature Spaces with Application to Single Image Super-Resolution

Author: Li He, Hairong Qi, Russell Zaretzki

Abstract: This paper addresses the problem of learning overcomplete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods super-resolution. tionaries learned resolution results ods. by applying this method to single image The experimental results show that dicby our method produces the best supercompared to other state-of-the-art meth-

4 0.72124195 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation

Author: Fang Wang, Yi Li

Abstract: Simple tree models for articulated objects prevails in the last decade. However, it is also believed that these simple tree models are not capable of capturing large variations in many scenarios, such as human pose estimation. This paper attempts to address three questions: 1) are simple tree models sufficient? more specifically, 2) how to use tree models effectively in human pose estimation? and 3) how shall we use combined parts together with single parts efficiently? Assuming we have a set of single parts and combined parts, and the goal is to estimate a joint distribution of their locations. We surprisingly find that no latent variables are introduced in the Leeds Sport Dataset (LSP) during learning latent trees for deformable model, which aims at approximating the joint distributions of body part locations using minimal tree structure. This suggests one can straightforwardly use a mixed representation of single and combined parts to approximate their joint distribution in a simple tree model. As such, one only needs to build Visual Categories of the combined parts, and then perform inference on the learned latent tree. Our method outperformed the state of the art on the LSP, both in the scenarios when the training images are from the same dataset and from the PARSE dataset. Experiments on animal images from the VOC challenge further support our findings.

5 0.71127003 20 cvpr-2013-A New Model and Simple Algorithms for Multi-label Mumford-Shah Problems

Author: Byung-Woo Hong, Zhaojin Lu, Ganesh Sundaramoorthi

Abstract: In this work, we address the multi-label Mumford-Shah problem, i.e., the problem of jointly estimating a partitioning of the domain of the image, and functions defined within regions of the partition. We create algorithms that are efficient, robust to undesirable local minima, and are easy-toimplement. Our algorithms are formulated by slightly modifying the underlying statistical model from which the multilabel Mumford-Shah functional is derived. The advantage of this statistical model is that the underlying variables: the labels and thefunctions are less coupled than in the original formulation, and the labels can be computed from the functions with more global updates. The resulting algorithms can be tuned to the desired level of locality of the solution: from fully global updates to more local updates. We demonstrate our algorithm on two applications: joint multi-label segmentation and denoising, and joint multi-label motion segmentation and flow estimation. We compare to the stateof-the-art in multi-label Mumford-Shah problems and show that we achieve more promising results.

6 0.70047927 311 cvpr-2013-Occlusion Patterns for Object Class Detection

7 0.70010841 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

8 0.69922656 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

9 0.69871587 340 cvpr-2013-Probabilistic Label Trees for Efficient Large Scale Image Classification

10 0.69695866 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

11 0.6960783 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

12 0.69597828 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

13 0.69472212 325 cvpr-2013-Part Discovery from Partial Correspondence

14 0.69432724 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

15 0.69429374 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

16 0.69424117 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

17 0.69396967 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

18 0.69387174 414 cvpr-2013-Structure Preserving Object Tracking

19 0.69353247 152 cvpr-2013-Exemplar-Based Face Parsing

20 0.69309872 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models