iccv iccv2013 iccv2013-155 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.
Reference: text
sentIndex sentText sentNum sentScore
1 AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. [sent-3, score-0.246]
2 In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. [sent-8, score-0.189]
3 To analyze information afforded by facial expression, Ekman and Friesen proposed the Facial Action Coding System (FACS) [12]. [sent-13, score-0.228]
4 FACS describes facial activity in terms of anatomically based action units. [sent-14, score-0.296]
5 Action units can occur alone or in combinations to represent all possible facial expressions. [sent-15, score-0.228]
6 Action units (AUs) have a temporal envelope that minimally include an an onset (or start) and an offset (or stop) and may include change in intensity. [sent-16, score-0.285]
7 Because of its descriptive power, FACS has become widely used to study facial expression [13]. [sent-18, score-0.294]
8 Detection of AU 12 (smile) from its onset to offset using our proposed CoT method. [sent-20, score-0.21]
9 Next, CoT uses the responses of the framelevel detector and segment-based features to detect a segment for AU 12 (Task 2). [sent-24, score-0.208]
10 Finally, CoT more precisely estimates the onset and offset frames by learning transition detectors (Task 3). [sent-25, score-0.436]
11 Frame-level detection independently evaluates each video frame for the occurrence of one or more AUs. [sent-31, score-0.146]
12 Transition detection seeks to detect the 22440000 onset and offset of each segment, or event. [sent-33, score-0.33]
13 Examples of segment-level detection are [4, 26, 27, 30], and examples of transition detection are [11]. [sent-35, score-0.323]
14 They perceive AU as events that have a beginning (onset), an ending (offset), and a certain duration. [sent-41, score-0.148]
15 Much effort in manual FACS coding consists of first perceiving an AU event and then identifying its precise onset and offset. [sent-42, score-0.271]
16 Often, segment detectors miss AUs in the vicinity of onsets and offsets where discriminability is low. [sent-44, score-0.275]
17 We seek to detect AU events including onsets and offsets with high fidelity to human perception. [sent-46, score-0.344]
18 CoT detects AU events including their onsets and offsets, by sequentially integrating the three AU detection tasks: frame-level detection, segment-level detection, and detection of onsets and offsets. [sent-48, score-0.605]
19 The results of this task tend to be noisy, or less reliable, because frame-level detection fails to exploit the temporal dependencies among proximal frames. [sent-52, score-0.196]
20 The second task combines the output of the frame-level detection with new segmentlevel features with segment-based classifier (see Fig. [sent-53, score-0.187]
21 Observe that the segment-level detector gives a rough location of the AU event and reduces the framelevel false positives, but it is imprecise in the boundaries (i. [sent-55, score-0.27]
22 The third and final task refines the onset and offset locations. [sent-58, score-0.234]
23 2) CoT fully recovers AU events instead of isolated AU frames or incorrectly parsed segments. [sent-64, score-0.229]
24 To evaluate AU detection performance at event-level, we propose a new event-based metric, as opposed to conventional frame-based metrics that evaluate frames independently. [sent-65, score-0.189]
25 Frame-level detection is done by extracting geometric or appearance features to represent each frame, and then feeds the features into static classifiers (e. [sent-74, score-0.169]
26 Frame-level detectors are shown to be able to detect subtle AU events because of the sensitivity to each frame. [sent-82, score-0.24]
27 Segment-level approaches seek to incorporate temporal information by using either dynamic features or temporal classifiers. [sent-84, score-0.174]
28 Recent work on exploiting dynamic features includes bag of temporal words [27] and temporal extensions of LBP and LPQ [16, 36]. [sent-86, score-0.174]
29 However, these methods tend to favor segments with high AU intensity, leading to mis-detection on AU boundaries and partial detection around the AU apex. [sent-90, score-0.187]
30 An important yet relatively unexplored task is to detect only AU transition (onsets and offsets), which is arguably challenging due to subtle changes between AU and non-AU frames. [sent-93, score-0.207]
31 In previous approaches, accurate transition detection was detected with the help of additional information, such as an AU apex location [11]. [sent-94, score-0.288]
32 Cascade of Tasks (CoT) This section introduces the proposed Cascade of Tasks (CoT) for detecting facial AU events. [sent-99, score-0.228]
33 These frame-level detectors offer reasonable predictions for frames with AU presence, but often are prone to noise due to the lack of temporal consistency. [sent-106, score-0.2]
34 2(a) illustrates a frame-based detector on a video of 3 1 frames that contains the onset of an AU 12. [sent-108, score-0.265]
35 Observe that the frame-level detector detects correctly the frames where the AU is present (12-31 frames) but has many false positives. [sent-109, score-0.208]
36 Segment-level SVM To eliminate isolated false detections while preserving the sensitivity of frame-level detectors, we will use the outputs of the frame-level detection in combination with new segment-based features. [sent-116, score-0.148]
37 Segment-level feature: We divide each segment evenly into three sub-segments, and compute for each sub-segment a temporal bag of words [27] with geometric features [38], as a complement to the appearance features used in the frame-level detector. [sent-117, score-0.187]
38 Introducing this geometric features promotes diversity among the tasks and hence produces more robust AU detection (as will be shown in Section 4). [sent-118, score-0.162]
39 The final segment-level representation is a concatenation of the histograms of temporal words and frame score statistics from the three sub-segments. [sent-121, score-0.164]
40 Recall that in segment-level detection, the positive segments are the manually labelled AU events (of different length and intensity). [sent-134, score-0.214]
41 The negative segments are sam- × pled segments at random locations and temporal scales, and typically outnumber positive segments. [sent-135, score-0.207]
42 For each segment S [sk , ek], we computed the confidence weight as the averaged absolute value of the frame-level detection scores, that is vk ? [sent-136, score-0.187]
43 With this definition of confidence weights, we give more importance to the segments that are more likely to contain many frames where the frame-level detector returns higher scores. [sent-140, score-0.18]
44 Segment-level detectors achieve more robust decision on contiguous frames, but often mis-detect subtle AU events due to insufficient positive events for training, specially in the onset and offset. [sent-143, score-0.541]
45 Example from the RU-FACS dataset [2]: (a) A video of subject 77, (b) Frame detection result in thin orange line and ground truth (GT) in thick gray line. [sent-153, score-0.165]
46 Using the transition score in (e) as a refinement, FST detector (? [sent-157, score-0.224]
47 In order to improve the detection around the onset/offset, we will add the transition detection task. [sent-162, score-0.323]
48 In this section, we propose a transition detection to refine boundaries of the segments previously detected. [sent-166, score-0.316]
49 We construct positive samples by extracting segment-level features in segments centered in the offsets and onsets. [sent-169, score-0.153]
50 We select a window of 6 frames before each onset/offset and 6 frames after, so our segments are of 13 frames. [sent-170, score-0.184]
51 2(e) shows an example of onset detector scores (green dotted line) and offset detector scores (purple dotted line). [sent-173, score-0.416]
52 2(e) transition detectors are prone to noise and contain many false positives. [sent-175, score-0.224]
53 We linearly combine the transition and segment detection scores. [sent-177, score-0.29]
54 Specifically, for any given segment S [s, e] , we define the event score as fevent(S [s, e] ) = αfseg(S [s, e]) + βfon(s) + (1−α −β)foff(e). [sent-178, score-0.224]
55 2(d) shows the event score matrix of all possible segments in the input video. [sent-184, score-0.226]
56 m To detect multiple AU events in a given video, we apply Dynamic Programming (DP) [15] to the event score matrix. [sent-188, score-0.331]
57 Recall that the original DP solution [15] could return a long segment that merged multiple events as a long event. [sent-189, score-0.212]
58 However, using the transition score provides more accurate information about where the true boundaries are, and CoT avoids this under-segmentation problem. [sent-190, score-0.193]
59 Datasets: CK+ contains 593 posed facial expression sequences from 123 participants. [sent-196, score-0.294]
60 Sequences vary in duration between 4 and 71 frames and the temporal structure of facial movements is predetermined. [sent-197, score-0.362]
61 AUs occur during emotional speech, and hence onset and offset of AU events are ambiguous, and AU may have multiple apexes. [sent-203, score-0.381]
62 RU-FACS is more challenging than the other two datasets, and it consists of facial behavior recorded during interviews. [sent-206, score-0.228]
63 Face Registration: For the CK+ and RU-FACS dataset person-specific Active Appearance Model [20] tracking of 66 facial landmarks was available. [sent-211, score-0.228]
64 All tracked facial features points were registered to a reference face by a similarity transformation. [sent-213, score-0.278]
65 Face images are then warped based on registered facial features. [sent-214, score-0.228]
66 Note that JSC can be seen as segment detection in CoT without the input of the framelevel detector. [sent-228, score-0.203]
67 In F1-Frame, det 1 scores higher although it has multiple false positives and misses a whole event. [sent-244, score-0.171]
68 As an illustration, a synthetic detection example on 100 frames is shown in Fig. [sent-253, score-0.156]
69 Note that det 1misses one event and generates multiple false positives, while det 2 detects the cor- rect number of events and roughly recovers their temporal locations. [sent-256, score-0.649]
70 3(a)), there is an overlap between the ground truth event [a, c] and the detected event [b, d], therefore EA considers that the event is correctly detected (even if the overlap is minimal). [sent-266, score-0.524]
71 This is because, considering the thick line as ground truth two events are correctly detected (assuming a minimal overlap). [sent-268, score-0.244]
72 Then, considering the thin line as ground truth two events are correctly detected. [sent-269, score-0.2]
73 The EA is the ratio ofevents detected considering each of the signal as ground truth over the total number of events (in the two signals). [sent-270, score-0.212]
74 F1-Event Curve: A major problem for EA to be used as a measure for AU detection, is that a single frame of overlap between the detected AU event and ground truth is considered as an event agreement. [sent-274, score-0.36]
75 3, although det 2 gets full score in EA, it is not a perfect detection, especially in transition regions. [sent-276, score-0.286]
76 01 F1-Event = 2E·ERR+·EEPP, where Event-based Recall (ER) is the ratio of correctly detected events over the true events, while the Event-based Precision (EP) is the ratio of correctly detected events over the the detected events. [sent-293, score-0.448]
77 Unlike EA, F1-Event considers that there is an event agreement if the overlap is above a certain threshold, which can be set depending on specific applications. [sent-294, score-0.206]
78 3 (b), F1-Event curves for det 1 and det 2 are shown. [sent-297, score-0.234]
79 This is because detected events of det 1 are shorter and once they are agreed they tend to get high overlap ratio. [sent-299, score-0.336]
80 We also reported intermediate results, F (frame detection result) and FS (frame and segment detection without transition), in order to analyze the contribution of each task. [sent-304, score-0.258]
81 This shows how frame detection helps in segment detection stage. [sent-367, score-0.307]
82 Third, because EA does not consider the overlap ratio, the performance improvement done by using the transition task is not well reflected with the metric. [sent-368, score-0.188]
83 This explains why under EA the advantage of FST over FS is insignificant, and in some cases when transition detection is highly noisy, FS is even better. [sent-369, score-0.226]
84 Second, because most AU events in RU-FACS are complete, opposed to lots of incomplete events in FERA, RU-FACS contains more AU transitions. [sent-374, score-0.296]
85 Hence transition detection (only in FST) plays a more important role, which is revealed by the gap between the top two curves. [sent-375, score-0.226]
86 In some cases in FERA, false transition detection even results in worse FST results than FS. [sent-376, score-0.255]
87 This improvement is more obvious on RU-FACS where more complete AU events were present. [sent-409, score-0.148]
88 Conclusion This paper proposes a novel approach to detect facial AU events from image sequences. [sent-411, score-0.399]
89 In a sequential manner, we use a cascade to combine three complementary detection tasks, as opposed to merely combining different features or classifiers aimed at single task. [sent-412, score-0.195]
90 With simple algorithms in each task, our method outperforms state-of-the-art methods in three public datasets with diverse facial expression dynamics. [sent-414, score-0.294]
91 The idea of using a cascade to combine tasks for detection is general, and one future work is to extend this to other temporal detection problems such as human activity detection in videos. [sent-416, score-0.457]
92 Learning partiallyobserved hidden conditional random fields for facial expression recognition. [sent-493, score-0.294]
93 Selective transfer machine for personalized facial action unit detection. [sent-511, score-0.33]
94 Observer-based measurement of facial expression with the Facial Action Coding System. [sent-525, score-0.294]
95 Investigating spontaneous facial action recognition through AAM representations of the face. [sent-606, score-0.342]
96 A model of the perception of facial expressions of emotion by humans: research overview and perspectives. [sent-611, score-0.258]
97 Nonparametric discriminant HMM and application to facial expression recognition. [sent-656, score-0.294]
98 A unified probabilistic framework for spontaneous facial action modeling and understanding. [sent-692, score-0.342]
99 Fully automatic recognition of the temporal phases of facial actions. [sent-707, score-0.303]
100 Dynamic cascades with bidirectional bootstrapping for action unit detection in spontaneous facial behavior. [sent-762, score-0.473]
wordName wordTfidf (topN-words)
[('au', 0.551), ('cot', 0.387), ('fera', 0.232), ('facial', 0.228), ('fst', 0.223), ('onset', 0.151), ('events', 0.148), ('jsc', 0.139), ('transition', 0.129), ('aus', 0.121), ('event', 0.12), ('det', 0.117), ('mkl', 0.114), ('facs', 0.111), ('onsets', 0.11), ('fs', 0.098), ('detection', 0.097), ('ck', 0.081), ('temporal', 0.075), ('aov', 0.07), ('action', 0.068), ('ea', 0.067), ('segments', 0.066), ('expression', 0.066), ('cybernetics', 0.065), ('segment', 0.064), ('cohn', 0.063), ('offsets', 0.063), ('health', 0.061), ('torre', 0.06), ('frames', 0.059), ('offset', 0.059), ('lf', 0.057), ('affective', 0.056), ('detector', 0.055), ('seg', 0.054), ('ef', 0.053), ('ambadar', 0.052), ('agreement', 0.051), ('cascade', 0.05), ('frame', 0.049), ('la', 0.049), ('ek', 0.048), ('spontaneous', 0.046), ('detects', 0.043), ('sk', 0.042), ('chew', 0.042), ('segmentlevel', 0.042), ('afgr', 0.042), ('framelevel', 0.042), ('smiles', 0.042), ('tasks', 0.041), ('score', 0.04), ('ekman', 0.039), ('detectors', 0.038), ('thick', 0.038), ('detected', 0.036), ('lucey', 0.035), ('overlap', 0.035), ('unit', 0.034), ('metrics', 0.033), ('svm', 0.032), ('gle', 0.032), ('eventbased', 0.031), ('ffirm', 0.031), ('fon', 0.031), ('segmentbased', 0.031), ('tariq', 0.031), ('valstar', 0.031), ('oxford', 0.031), ('subtle', 0.031), ('emotion', 0.03), ('thin', 0.03), ('pittsburgh', 0.029), ('false', 0.029), ('timing', 0.029), ('prone', 0.028), ('ofevents', 0.028), ('friesen', 0.028), ('fseg', 0.028), ('polite', 0.028), ('de', 0.028), ('man', 0.026), ('vk', 0.026), ('face', 0.026), ('apex', 0.026), ('scores', 0.025), ('contiguous', 0.025), ('boundaries', 0.024), ('fires', 0.024), ('bartlett', 0.024), ('classifiers', 0.024), ('features', 0.024), ('task', 0.024), ('emotional', 0.023), ('pantic', 0.023), ('dotted', 0.023), ('detect', 0.023), ('isolated', 0.022), ('correctly', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks
Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.
2 0.46981716 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.
3 0.18906724 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.
4 0.16189903 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
Author: Yifan Zhang, Qiang Ji, Hanqing Lu
Abstract: In complex scenes with multiple atomic events happening sequentially or in parallel, detecting each individual event separately may not always obtain robust and reliable result. It is essential to detect them in a holistic way which incorporates the causality and temporal dependency among them to compensate the limitation of current computer vision techniques. In this paper, we propose an interval temporal constrained dynamic Bayesian network to extendAllen ’s interval algebra network (IAN) [2]from a deterministic static model to a probabilistic dynamic system, which can not only capture the complex interval temporal relationships, but also model the evolution dynamics and handle the uncertainty from the noisy visual observation. In the model, the topology of the IAN on each time slice and the interlinks between the time slices are discovered by an advanced structure learning method. The duration of the event and the unsynchronized time lags between two correlated event intervals are captured by a duration model, so that we can better determine the temporal boundary of the event. Empirical results on two real world datasets show the power of the proposed interval temporal constrained model.
5 0.14346394 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
Author: Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
Abstract: In this paper, we propose a novel cascaded face shape space pruning algorithm for robust facial landmark detection. Through progressively excluding the incorrect candidate shapes, our algorithm can accurately and efficiently achieve the globally optimal shape configuration. Specifically, individual landmark detectors are firstly applied to eliminate wrong candidates for each landmark. Then, the candidate shape space is further pruned by jointly removing incorrect shape configurations. To achieve this purpose, a discriminative structure classifier is designed to assess the candidate shape configurations. Based on the learned discriminative structure classifier, an efficient shape space pruning strategy is proposed to quickly reject most incorrect candidate shapes while preserve the true shape. The proposed algorithm is carefully evaluated on a large set of real world face images. In addition, comparison results on the publicly available BioID and LFW face databases demonstrate that our algorithm outperforms some state-of-the-art algorithms.
6 0.13494395 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition
7 0.12365457 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM
8 0.11873031 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
9 0.11053441 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
10 0.1072019 157 iccv-2013-Fast Face Detector Training Using Tailored Views
11 0.10587005 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
12 0.10358167 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild
13 0.099388912 163 iccv-2013-Feature Weighting via Optimal Thresholding for Video Analysis
14 0.095420137 81 iccv-2013-Combining the Right Features for Complex Event Recognition
15 0.094682664 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?
16 0.092923716 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
17 0.090871684 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
18 0.090141498 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
19 0.087576665 243 iccv-2013-Learning Slow Features for Behaviour Analysis
20 0.08683838 321 iccv-2013-Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model
topicId topicWeight
[(0, 0.177), (1, 0.093), (2, 0.024), (3, 0.076), (4, 0.043), (5, -0.064), (6, 0.213), (7, 0.017), (8, -0.042), (9, -0.048), (10, -0.093), (11, 0.012), (12, 0.045), (13, 0.107), (14, -0.08), (15, -0.03), (16, 0.039), (17, 0.046), (18, -0.04), (19, -0.045), (20, 0.062), (21, 0.057), (22, -0.058), (23, 0.056), (24, -0.07), (25, -0.067), (26, -0.009), (27, -0.101), (28, 0.006), (29, 0.018), (30, -0.061), (31, 0.138), (32, -0.064), (33, 0.017), (34, -0.055), (35, 0.056), (36, -0.01), (37, 0.031), (38, -0.01), (39, 0.078), (40, 0.111), (41, 0.12), (42, 0.034), (43, -0.088), (44, 0.015), (45, 0.017), (46, -0.082), (47, 0.048), (48, 0.077), (49, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.94080818 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks
Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.
2 0.7949394 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.
3 0.70731044 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
Author: Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers
Abstract: Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, we show that it is possible to improve the state ofthe art in thisproblem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on different kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.
4 0.64292556 243 iccv-2013-Learning Slow Features for Behaviour Analysis
Author: Lazaros Zafeiriou, Mihalis A. Nicolaou, Stefanos Zafeiriou, Symeon Nikitidis, Maja Pantic
Abstract: A recently introduced latent feature learning technique for time varying dynamic phenomena analysis is the socalled Slow Feature Analysis (SFA). SFA is a deterministic component analysis technique for multi-dimensional sequences that by minimizing the variance of the first order time derivative approximation of the input signal finds uncorrelated projections that extract slowly-varying features ordered by their temporal consistency and constancy. In this paper, we propose a number of extensions in both the deterministic and the probabilistic SFA optimization frameworks. In particular, we derive a novel deterministic SFA algorithm that is able to identify linear projections that extract the common slowest varying features of two or more sequences. In addition, we propose an Expectation Maximization (EM) algorithm to perform inference in a probabilistic formulation of SFA and similarly extend it in order to handle two and more time varying data sequences. Moreover, we demonstrate that the probabilistic SFA (EMSFA) algorithm that discovers the common slowest varying latent space of multiple sequences can be combined with dynamic time warping techniques for robust sequence timealignment. The proposed SFA algorithms were applied for facial behavior analysis demonstrating their usefulness and appropriateness for this task.
5 0.62706631 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.
6 0.56996572 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
7 0.56327999 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild
8 0.54750657 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
9 0.52664411 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
10 0.49387717 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition
11 0.49252141 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM
12 0.48437971 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization
13 0.46898088 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
14 0.4503676 321 iccv-2013-Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model
15 0.44870442 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
16 0.44564366 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?
17 0.41256696 277 iccv-2013-Multi-channel Correlation Filters
18 0.40737712 157 iccv-2013-Fast Face Detector Training Using Tailored Views
19 0.39679566 219 iccv-2013-Internet Based Morphable Model
20 0.39300951 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
topicId topicWeight
[(2, 0.046), (4, 0.018), (7, 0.018), (12, 0.012), (26, 0.072), (31, 0.029), (35, 0.016), (42, 0.088), (48, 0.012), (50, 0.207), (64, 0.073), (73, 0.023), (78, 0.098), (89, 0.161), (95, 0.01), (98, 0.02)]
simIndex simValue paperId paperTitle
1 0.80661708 148 iccv-2013-Example-Based Facade Texture Synthesis
Author: Dengxin Dai, Hayko Riemenschneider, Gerhard Schmitt, Luc Van_Gool
Abstract: There is an increased interest in the efficient creation of city models, be it virtual or as-built. We present a method for synthesizing complex, photo-realistic facade images, from a single example. After parsing the example image into its semantic components, a tiling for it is generated. Novel tilings can then be created, yielding facade textures with different dimensions or with occluded parts inpainted. A genetic algorithm guides the novel facades as well as inpainted parts to be consistent with the example, both in terms of their overall structure and their detailed textures. Promising results for multiple standard datasets in particular for the different building styles they contain demonstrate the potential of the method. – –
same-paper 2 0.8025915 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks
Author: Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use ofdifferent tasks (i.e., , frame, segment and transition)for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional framebased metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RUFACS.
3 0.7736218 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
Author: Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
Abstract: Current Light Field (LF) cameras offer fixed resolution in space, time and angle which is decided a-priori and is independent of the scene. These cameras either trade-off spatial resolution to capture single-shot LF [20, 27, 12] or tradeoff temporal resolution by assuming a static scene to capture high spatial resolution LF [18, 3]. Thus, capturing high spatial resolution LF video for dynamic scenes remains an open and challenging problem. We present the concept, design and implementation of a LF video camera that allows capturing high resolution LF video. The spatial, angular and temporal resolution are not fixed a-priori and we exploit the scene-specific redundancy in space, time and angle. Our reconstruction is motion-aware and offers a continuum of resolution tradeoff with increasing motion in the scene. The key idea is (a) to design efficient multiplexing matrices that allow resolution tradeoffs, (b) use dictionary learning and sparse repre- sentations for robust reconstruction, and (c) perform local motion-aware adaptive reconstruction. We perform extensive analysis and characterize the performance of our motion-aware reconstruction algorithm. We show realistic simulations using a graphics simulator as well as real results using a LCoS based programmable camera. We demonstrate novel results such as high resolution digital refocusing for dynamic moving objects.
4 0.74069774 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
Author: Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
Abstract: In this paper we tackle the problem of facial action unit (AU) recognition by exploiting the complex semantic relationships among AUs, which carry crucial top-down information yet have not been thoroughly exploited. Towards this goal, we build a hierarchical model that combines the bottom-level image features and the top-level AU relationships to jointly recognize AUs in a principled manner. The proposed model has two major advantages over existing methods. 1) Unlike methods that can only capture local pair-wise AU dependencies, our model is developed upon the restricted Boltzmann machine and therefore can exploit the global relationships among AUs. 2) Although AU relationships are influenced by many related factors such as facial expressions, these factors are generally ignored by the current methods. Our model, however, can successfully capture them to more accurately characterize the AU relationships. Efficient learning and inference algorithms of the proposed model are also developed. Experimental results on benchmark databases demonstrate the effectiveness of the proposed approach in modelling complex AU relationships as well as its superior AU recognition performance over existing approaches.
5 0.73230469 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
Author: Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Abstract: Due to occlusions and objects ’ non-rigid deformation in the scene, the obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based trajectories into multiple motions is still a hard problem. In this paper, we present an approach that exploits temporal and spatial characteristics from tracked points to facilitate segmentation of incomplete and corrupted trajectories, thereby obtain highly robust results against severe data missing and noises. Our method first uses the Discrete Cosine Transform (DCT) bases as a temporal smoothness constraint on trajectory projection to ensure the validity of resulting components to repair pathological trajectories. Then, based on an observation that the trajectories of foreground and background in a scene may have different spatial distributions, we propose a two-stage clustering strategy that first performs foreground-background separation then segments remaining foreground trajectories. We show that, with this new clustering strategy, sequences with complex motions can be accurately segmented by even using a simple trans- lational model. Finally, a series of experiments on Hopkins 155 dataset andBerkeley motion segmentation dataset show the advantage of our method over other state-of-the-art motion segmentation algorithms in terms of both effectiveness and robustness.
6 0.72631097 252 iccv-2013-Line Assisted Light Field Triangulation and Stereo Matching
7 0.72508454 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
8 0.72030222 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
9 0.71914506 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
10 0.70750576 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
11 0.7071954 149 iccv-2013-Exemplar-Based Graph Matching for Robust Facial Landmark Localization
12 0.70156378 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
13 0.69692951 150 iccv-2013-Exemplar Cut
14 0.68803674 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition
15 0.68427145 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
16 0.68397593 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
17 0.68221402 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
18 0.68161851 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
19 0.68120873 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
20 0.68056095 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning