iccv iccv2013 iccv2013-69 knowledge-graph by maker-knowledge-mining

69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition


Source: pdf

Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji

Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 iy2 Abstract In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. [sent-2, score-0.865]

2 Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. [sent-3, score-0.184]

3 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. [sent-5, score-0.288]

4 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. [sent-6, score-0.561]

5 Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches. [sent-9, score-0.176]

6 Introduction The facial action units (AUs), as defined in the Facial Action Coding System (FACS) [4], refer to the local facial muscle actions. [sent-11, score-1.072]

7 The AUs occur in different combinations and can lead to a huge variety of complex facial behaviors. [sent-12, score-0.305]

8 Automatic analysis of these facial action units is of great importance in a wide range of fields such as behavioral understanding, video conference, affective computing, human-machine interface and so on. [sent-13, score-0.827]

9 Facial action units have been mainly treated as unrelated entities and recognized individually, based on either shape . [sent-14, score-0.543]

10 Each bar shows the intensity of presence for the corresponding action unit. [sent-18, score-0.281]

11 On the one hand, facial action units generally do not occur alone and some combinations of action units are frequently observed. [sent-25, score-1.321]

12 On the other hand, some AUs must or must not be present at the same time due to the limitations of facial anatomy. [sent-29, score-0.282]

13 While the presence or absence of each action unit can be inferred from the low-level facial shape and appearance changes, it is also significantly influenced by the states of the related action units. [sent-32, score-0.935]

14 Therefore, an automatic facial action unit recognition system should not only rely on the low-level features, but also take advantage of the high-level semantic relationships among all the action units. [sent-33, score-1.094]

15 Despite the importance, AU relationships are difficult to 33329047 capture and have not been fully exploited by the existing works. [sent-34, score-0.194]

16 An advanced AU recognition algorithm should efficiently and effectively capture not only local but also global dependencies among AUs. [sent-39, score-0.176]

17 Second, the relationships among action units are influenced by many factors, including the expression, identity, age and gender of the subject. [sent-40, score-0.723]

18 As shown in Figure 1, these two action units are likely to be both absent during happiness, both present during surprise, and mutually exclusive during anger or fear. [sent-42, score-0.55]

19 Unlike a regular BN, the restricted Boltzmann machine (RBM) and its variants can model higher-order dependencies among random variables by introducing a layer of latent units. [sent-44, score-0.33]

20 Unlike [20], we introduce RBM to model the action units, thereby capturing not only local but also global AU dependencies. [sent-47, score-0.253]

21 To the best of our knowledge, this is the first time RBM is used to model the action units. [sent-48, score-0.229]

22 Related Works A number of approaches have been proposed for facial action unit recognition and they generally involve two steps: bottom-level facial feature extraction and top-level AU classifier design. [sent-57, score-0.931]

23 Facial features: The facial features for AU recognition can be grouped into appearance features and geometric features. [sent-58, score-0.307]

24 The appearance features capture the local or global appearance changes of the facial components. [sent-59, score-0.325]

25 Geometric features represent the changes in the direction or magnitude of the skin surface and salient facial points caused by facial muscular activities. [sent-61, score-0.564]

26 These geometric changes can be measured via dense optical flows [10] or the displacement of facial feature points [23, 11], etc. [sent-62, score-0.282]

27 With the features, action units can be classified using two types of approaches. [sent-64, score-0.508]

28 The first type of approaches treats action units as unrelated entities. [sent-65, score-0.508]

29 The action units are recognized independently and statically using different classification models such as Neural Network [10, 19], Support Vector Machine [11, 1], AdaBoost, sparse representation based model [12], rule based model [15], etc. [sent-66, score-0.543]

30 The common weakness of these methods is that they ignore the semantic relationships among the action units and therefore may generate inconsistent AU predictions. [sent-67, score-0.724]

31 [16] propose to capture the AU relationships with a set of explicit rules and the AUs are recognized using a fast direct chaining inference procedure. [sent-70, score-0.259]

32 [20] apply the Bayesian network (BN) to model the semantic relationships among the AUs. [sent-75, score-0.265]

33 Dynamic Bayesian network (DBN) is also proposed to further capture the temporal relationships among AUs in [21]. [sent-77, score-0.279]

34 Nevertheless, due to the Markov assumption, both BN and DBN are limited to capture the local relationships between pairs of AUs such as co-occurrence, co-absence and mutual exclusion. [sent-78, score-0.194]

35 Another issue of BN based models is that they fail to capture expression dependent AU dependencies. [sent-80, score-0.172]

36 Furthermore we can also incorporate related factors to improve the understanding of the relationships among action units. [sent-82, score-0.466]

37 Another related work worth mentioning is proposed in [18], where a deep belief network is used to model the facial expressions and the latent units are used to capture higher level image representations. [sent-83, score-0.842]

38 However, our model is essentially different from theirs since we use the latent units to model the dependencies among the action units. [sent-84, score-0.704]

39 33329058 RBM is composed of a layer of visible variables v ∈ {0, R1B}nM a insd a layer odf ohfid adle any evrar oiafb vleissi bhle ∈ v r{i0a,b 1le}sm v. [sent-88, score-0.175]

40 A∈ graphical depiction o off R hiBddMe nis sahrioawbnle sin h Figure ,21, where each latent node is connected to each visible node. [sent-89, score-0.196]

41 Wij measures iothne compatibility bb,ectw}e aerne visible node vi and latent node hj. [sent-92, score-0.169]

42 j Different from a Bayesian network which factorizes the joint distribution into the product of local probabilities, the distribution of the visible units of RBM is calculated by marginalizing over all the hidden units with Equation 2, where Z(θ) is the partition function. [sent-109, score-0.654]

43 This allows to capture global dependencies among the visible variables instead of local relationships. [sent-110, score-0.222]

44 A Hierarchical Model for AU Recognition RBM provides us with an effective tool to model highorder dependencies among the visible inputs. [sent-124, score-0.155]

45 The middle layer contains the binary visible units {a1, · · · , an}, representing the state othfeA bUin1a troy A viUsinb. [sent-128, score-0.378]

46 Right: the captured AU combination patterns of two latent units implied by their parameters. [sent-135, score-0.367]

47 Unlike a regular BN, RBM is able to capture higher-order dependencies among the visible variables by connecting all the visible units through the latent units. [sent-153, score-0.636]

48 Instantiating the visible units with the AU labels, we are able to capture the global relationships among the facial action units. [sent-154, score-1.067]

49 Following this path, the top two layers of the proposed model assumes an RBM-like structure, with each latent unit connecting to all the action units to model their dependencies. [sent-155, score-0.733]

50 – 33329069 These two layers constitute the top part of the model and encode the high-level semantic relationships among AUs, enabling us to infer the presence or absence of each AU in the top-down direction. [sent-156, score-0.294]

51 {ToW gain some insight as tom hittoewd they are related, consider the mth latent unit hm. [sent-158, score-0.201]

52 Its compatibility with each action unit is measured by the pairwise energy E(hm, ai) = −Wimaihm, i = 1, · · · , n. [sent-159, score-0.403]

53 As a whole, vector [Wim]in=1 captures a specific presence and absence pattern of all the action units. [sent-164, score-0.283]

54 Importantly, the captured relationships among the action units are global. [sent-178, score-0.695]

55 A regular HCRF impose an layer of latent units between the input and output to better model the intermediate feature structures. [sent-206, score-0.419]

56 In our model however, the latent units are imposed in the top layer to capture the global relationships among the action units. [sent-207, score-0.878]

57 Inference Given the query sample x during testing, we classify each action unit ai by maximizing its posterior probability given x with Equation 10. [sent-210, score-0.482]

58 Incorporating Related Factors The relationships among the action units depending on factors such as the expression, age and gender of the subject. [sent-217, score-0.745]

59 Here we focus on the facial expressions, but the same approach is readily applicable to capture other related factors. [sent-221, score-0.325]

60 Denote the facial expression with a discrete variable l= [l1, · · · , lK] , with lk = 1representing the presence of the kth, expression. [sent-222, score-0.479]

61 Inspired by the 3-way restricted Boltzmann machine [14], the basic idea is to modulate the connection between each pair of action unit and latent unit (ai , hj) with the expression variable l, as shown in Figure 4a. [sent-223, score-0.758]

62 Each clique in this case contains three variables (ai , hj , lk), and their energy is defined as E(ai, hj , lk) = −Wi1jkaihjlk. [sent-224, score-0.245]

63 We can see that the expression variab)le = multiplicatively interacts with both the action unit variable and the latent variable to determine the energy of the model, and the parameters {Wi1jk} now form a 3D tensor instead of a matrix in the previous case. [sent-225, score-0.62]

64 wTh feo graphical depiction odf o a more complex example is shown in Figure 4b, which contains two latent units, two action units and two expressions. [sent-226, score-0.657]

65 The shorthand depiction of our proposed model to capture related factors is shown in Figure 4c. [sent-228, score-0.179]

66 k dklk modeling the lbaitaios onsfh htihpe expression n aondd? [sent-238, score-0.163]

67 Figure 4: (a) Every clique in the proposed model contains a latent unit, an action unit and an expression unit. [sent-268, score-0.559]

68 lk multiplicatively modulate the connection between hj and ai. [sent-269, score-0.204]

69 (b) An example with two action units, two latent units and two expressions. [sent-270, score-0.596]

70 =1logP(ai,li|xi,θ) (12) Given a query sample x during testing, each action unit ai is also classified by maximizing its marginal posterior probability: ai∗ = arg maxai P(ai |x), which can still be efficiently calculated with the Gibbs| sampling hm ceatnho sdti. [sent-295, score-0.562]

71 [20], which uses a Bayesian network to capture the AU relationships (BN). [sent-300, score-0.243]

72 For efficiency, the raw features are first fed into a set of Support Vector Machines (SVM), each of which is trained independently to recognize one action unit. [sent-303, score-0.262]

73 To make a thorough comparison, we test their performances on both posed and non-posed facial behavior databases. [sent-305, score-0.319]

74 Performance for Posed Facial Actions First we evaluate the proposed models against the baselines on the extended Cohn-Kanade database (CK+) [11, 8], which contains 593 posed facial activity videos from 210 adults. [sent-317, score-0.319]

75 The detailed results of different models for each action unit are illustrated in the left bar graph of Figure 5, and the average scores are shown in Table 1. [sent-331, score-0.371]

76 First, we can see that all the three models that capture AU relationships improve the performance of the baseTable 1: F1-score of different models on CK+ AveraMgeeth Fo1d-score7S4. [sent-332, score-0.194]

77 4B4M%+ line SVM for almost all the action units in different degrees. [sent-336, score-0.508]

78 This demonstrates that the top-down information of AU relationships are especially useful when an AU is difficult to recognize from the measurements in the bottom-up direction. [sent-340, score-0.184]

79 Finally, by properly incorporating the expression information only during training, HRBM+ further improves the recognition performance of HRBM by more than 3%. [sent-350, score-0.154]

80 14% Similarly, while all the other three models improve the performance of the baseline SVM for all action units, the improvement achieved by HRBM is more than that of BN. [sent-367, score-0.229]

81 Figure 5: Comparison of different models for each action unit in terms of F1-score. [sent-441, score-0.342]

82 Comparison with Related Works The proposed model outperforms BN for recognizing action units in both posed and non-posed facial behaviors. [sent-449, score-0.827]

83 Several models were also used to capture AU relationships on CK dataset, including the Bayesian network (BN) and the dynamic Bayesian network (DBN). [sent-452, score-0.292]

84 BN used in [20] captures the static relationships among AUs. [sent-453, score-0.187]

85 Since the expression is ambiguous between the neutral and peak frame, we did not implement HRBM+. [sent-473, score-0.208]

86 So far no works have been done to capture AU relationships on CK+ or SEMAINE. [sent-476, score-0.194]

87 As discussed in Section 4, each latent unit captures a specific AU presence or absence pattern that is implied by the parameters {Wij }. [sent-494, score-0.255]

88 Parameters corresponding to two latent units of HRBM learned on the CK+ dataset are graphically illustrated in Figure 6. [sent-497, score-0.395]

89 Unlike BN which can only encode pairwise AU dependencies, the proposed model captures higher-order presence or absence patterns that involve all the action units. [sent-498, score-0.283]

90 For example in Figure 6, the first latent unit encodes the pattern that a person is likely to “lower brow”, “wrinkle nose”, “pull lip corner” and “drop jaw”, but is unlikely to “depress lip corner” and “raise chin”. [sent-499, score-0.277]

91 The model is further developed to capture related factors such as the facial expressions to achieve better characterization of the AU relationships. [sent-547, score-0.424]

92 Experimental results on both posed and nonposed facial action datasets demonstrated the power of the proposed model in capturing AU relationships as well as its advantage over existing methodologies for AU recognition. [sent-548, score-0.723]

93 Recognizing facial expression: machine learning and application to spontaneous behavior. [sent-558, score-0.305]

94 Recognizing facial actions using gabor wavelets with neutral face average difference. [sent-563, score-0.348]

95 Detection, tracking, and classification of action units in facial expression. [sent-620, score-0.79]

96 The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. [sent-629, score-0.342]

97 Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. [sent-663, score-0.62]

98 Facial action recognition for facial expression analysis from static face images. [sent-668, score-0.696]

99 Facial action unit recognition by exploiting their dynamic and semantic relationships. [sent-706, score-0.396]

100 Fully automatic recognition of the temporal phases of facial actions. [sent-723, score-0.307]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('au', 0.526), ('hrbm', 0.327), ('facial', 0.282), ('units', 0.279), ('action', 0.229), ('aus', 0.187), ('bn', 0.158), ('relationships', 0.151), ('rbm', 0.14), ('expression', 0.129), ('semaine', 0.121), ('ai', 0.113), ('unit', 0.113), ('ck', 0.111), ('hj', 0.097), ('latent', 0.088), ('raise', 0.086), ('pantic', 0.076), ('boltzmann', 0.073), ('dependencies', 0.072), ('wim', 0.069), ('gibbs', 0.068), ('dbn', 0.068), ('brow', 0.061), ('depiction', 0.061), ('facs', 0.061), ('valstar', 0.057), ('equation', 0.056), ('contrastive', 0.053), ('layer', 0.052), ('cjhj', 0.052), ('hrsvbmm', 0.052), ('cybernetics', 0.051), ('gesture', 0.051), ('factors', 0.05), ('expressions', 0.049), ('network', 0.049), ('visible', 0.047), ('stretch', 0.047), ('mouth', 0.046), ('lk', 0.045), ('peak', 0.044), ('capture', 0.043), ('anger', 0.042), ('hm', 0.042), ('cheek', 0.038), ('lip', 0.038), ('sampling', 0.038), ('posed', 0.037), ('affective', 0.037), ('tong', 0.036), ('among', 0.036), ('restricted', 0.035), ('recognized', 0.035), ('neutral', 0.035), ('biai', 0.034), ('dklk', 0.034), ('happiness', 0.034), ('hcrf', 0.034), ('multiplicatively', 0.034), ('wrinkle', 0.034), ('compatibility', 0.034), ('recognize', 0.033), ('bayesian', 0.032), ('kanade', 0.032), ('face', 0.031), ('absence', 0.031), ('surprise', 0.031), ('rules', 0.03), ('conference', 0.03), ('cohn', 0.029), ('auc', 0.029), ('semantic', 0.029), ('bar', 0.029), ('transactions', 0.028), ('graphically', 0.028), ('fera', 0.028), ('modulate', 0.028), ('influenced', 0.028), ('lbp', 0.027), ('ieee', 0.027), ('maximizing', 0.027), ('international', 0.027), ('wij', 0.027), ('deep', 0.027), ('energy', 0.027), ('privileged', 0.027), ('argm', 0.026), ('axl', 0.025), ('shorthand', 0.025), ('july', 0.025), ('belief', 0.025), ('pages', 0.025), ('recognition', 0.025), ('variables', 0.024), ('layers', 0.024), ('capturing', 0.024), ('machine', 0.023), ('presence', 0.023), ('occur', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition

Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji

Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.

2 0.46981716 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks

Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang

Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.

3 0.23442329 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera

Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai

Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.

4 0.15892071 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection

Author: Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen

Abstract: In this paper, we propose a novel cascaded face shape space pruning algorithm for robust facial landmark detection. Through progressively excluding the incorrect candidate shapes, our algorithm can accurately and efficiently achieve the globally optimal shape configuration. Specifically, individual landmark detectors are firstly applied to eliminate wrong candidates for each landmark. Then, the candidate shape space is further pruned by jointly removing incorrect shape configurations. To achieve this purpose, a discriminative structure classifier is designed to assess the candidate shape configurations. Based on the learned discriminative structure classifier, an efficient shape space pruning strategy is proposed to quickly reject most incorrect candidate shapes while preserve the true shape. The proposed algorithm is carefully evaluated on a large set of real world face images. In addition, comparison results on the publicly available BioID and LFW face databases demonstrate that our algorithm outperforms some state-of-the-art algorithms.

5 0.1518048 86 iccv-2013-Concurrent Action Detection with Structural Prediction

Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu

Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.

6 0.15159923 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

7 0.1443733 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

8 0.14261535 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

9 0.12971047 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

10 0.12120172 219 iccv-2013-Internet Based Morphable Model

11 0.11746189 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild

12 0.1171301 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

13 0.11502822 157 iccv-2013-Fast Face Detector Training Using Tailored Views

14 0.11429634 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions

15 0.11033466 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction

16 0.10997332 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition

17 0.1090169 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition

18 0.10664729 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification

19 0.103945 243 iccv-2013-Learning Slow Features for Behaviour Analysis

20 0.10242824 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, 0.135), (2, 0.004), (3, 0.092), (4, 0.004), (5, -0.097), (6, 0.248), (7, -0.007), (8, -0.005), (9, -0.012), (10, -0.063), (11, 0.085), (12, 0.055), (13, 0.026), (14, 0.05), (15, 0.021), (16, 0.041), (17, 0.004), (18, -0.052), (19, -0.07), (20, -0.005), (21, 0.069), (22, -0.046), (23, -0.019), (24, -0.102), (25, -0.095), (26, 0.047), (27, -0.132), (28, 0.019), (29, 0.107), (30, -0.02), (31, 0.132), (32, -0.027), (33, 0.036), (34, -0.073), (35, 0.041), (36, -0.052), (37, 0.078), (38, -0.016), (39, 0.103), (40, 0.079), (41, 0.043), (42, 0.075), (43, -0.156), (44, 0.008), (45, 0.037), (46, -0.085), (47, 0.065), (48, 0.059), (49, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95752043 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition

Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji

Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.

2 0.7776705 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks

Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang

Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.

3 0.71312708 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification

Author: Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers

Abstract: Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, we show that it is possible to improve the state ofthe art in thisproblem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on different kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.

4 0.69363528 243 iccv-2013-Learning Slow Features for Behaviour Analysis

Author: Lazaros Zafeiriou, Mihalis A. Nicolaou, Stefanos Zafeiriou, Symeon Nikitidis, Maja Pantic

Abstract: A recently introduced latent feature learning technique for time varying dynamic phenomena analysis is the socalled Slow Feature Analysis (SFA). SFA is a deterministic component analysis technique for multi-dimensional sequences that by minimizing the variance of the first order time derivative approximation of the input signal finds uncorrelated projections that extract slowly-varying features ordered by their temporal consistency and constancy. In this paper, we propose a number of extensions in both the deterministic and the probabilistic SFA optimization frameworks. In particular, we derive a novel deterministic SFA algorithm that is able to identify linear projections that extract the common slowest varying features of two or more sequences. In addition, we propose an Expectation Maximization (EM) algorithm to perform inference in a probabilistic formulation of SFA and similarly extend it in order to handle two and more time varying data sequences. Moreover, we demonstrate that the probabilistic SFA (EMSFA) algorithm that discovers the common slowest varying latent space of multiple sequences can be combined with dynamic time warping techniques for robust sequence timealignment. The proposed SFA algorithms were applied for facial behavior analysis demonstrating their usefulness and appropriateness for this task.

5 0.64956927 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera

Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai

Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.

6 0.56344944 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild

7 0.53213686 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

8 0.50860381 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition

9 0.50318581 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection

10 0.49256068 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

11 0.48257405 86 iccv-2013-Concurrent Action Detection with Structural Prediction

12 0.46754265 166 iccv-2013-Finding Actors and Actions in Movies

13 0.46287441 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization

14 0.4623642 321 iccv-2013-Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model

15 0.46127799 38 iccv-2013-Action Recognition with Actons

16 0.46011806 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

17 0.45994017 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition

18 0.44176912 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

19 0.43810913 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

20 0.42965388 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.048), (7, 0.013), (12, 0.011), (26, 0.062), (31, 0.054), (42, 0.102), (48, 0.021), (50, 0.015), (64, 0.057), (73, 0.032), (78, 0.329), (89, 0.137)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78416049 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition

Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji

Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.

2 0.75180578 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling

Author: Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still image by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between human and object as well as their appearance. Existing approaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensitive to large variations of human poses, occlusion and unsatisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is proposed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density functions describing how a person is interacting with a manipulated object for different activities spatially in a probabilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new framework consists of a proposed exemplar based HOI descriptor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-ofthe-art performance.

3 0.73608816 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations

Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang

Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (label classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objective, a large number of structured sparsity-inducing norms are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algorithm with proved convergence. We perform extensive experiments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approaches.

4 0.73257905 252 iccv-2013-Line Assisted Light Field Triangulation and Stereo Matching

Author: Zhan Yu, Xinqing Guo, Haibing Lin, Andrew Lumsdaine, Jingyi Yu

Abstract: Light fields are image-based representations that use densely sampled rays as a scene description. In this paper, we explore geometric structures of 3D lines in ray space for improving light field triangulation and stereo matching. The triangulation problem aims to fill in the ray space with continuous and non-overlapping simplices anchored at sampled points (rays). Such a triangulation provides a piecewise-linear interpolant useful for light field superresolution. We show that the light field space is largely bilinear due to 3D line segments in the scene, and direct triangulation of these bilinear subspaces leads to large errors. We instead present a simple but effective algorithm to first map bilinear subspaces to line constraints and then apply Constrained Delaunay Triangulation (CDT). Based on our analysis, we further develop a novel line-assisted graphcut (LAGC) algorithm that effectively encodes 3D line constraints into light field stereo matching. Experiments on synthetic and real data show that both our triangulation and LAGC algorithms outperform state-of-the-art solutions in accuracy and visual quality.

5 0.67589211 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding

Author: Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai

Abstract: We present a multi-attributed dictionary learning algorithm for sparse coding. Considering training samples with multiple attributes, a new distance matrix is proposed by jointly incorporating data and attribute similarities. Then, an objective function is presented to learn categorydependent dictionaries that are compact (closeness of dictionary atoms based on data distance and attribute similarity), reconstructive (low reconstruction error with correct dictionary) and label-consistent (encouraging the labels of dictionary atoms to be similar). We have demonstrated our algorithm on action classification and face recognition tasks on several publicly available datasets. Experimental results with improved performance over previous dictionary learning methods are shown to validate the effectiveness of the proposed algorithm.

6 0.67381537 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

7 0.62680781 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks

8 0.57992375 150 iccv-2013-Exemplar Cut

9 0.5649032 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition

10 0.55908358 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

11 0.55531698 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects

12 0.55154455 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization

13 0.54778618 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

14 0.54545343 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

15 0.54480028 277 iccv-2013-Multi-channel Correlation Filters

16 0.54292154 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

17 0.54211259 243 iccv-2013-Learning Slow Features for Behaviour Analysis

18 0.54121912 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization

19 0.53869331 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps

20 0.53815842 362 iccv-2013-Robust Tucker Tensor Decomposition for Effective Image Representation