iccv iccv2013 iccv2013-67 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
Reference: text
sentIndex sentText sentNum sentScore
1 nl , Abstract We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. [sent-6, score-1.963]
2 Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. [sent-7, score-1.093]
3 When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). [sent-8, score-2.001]
4 Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. [sent-9, score-2.056]
5 The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4. [sent-11, score-1.139]
6 This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. [sent-14, score-1.026]
7 To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators. [sent-15, score-1.954]
8 In general, gaze estimation methods fall into two categories: 1) appearance-based methods [5, 6, 7] and 2) 3Deye-model-based methods [8, 9, 10, 14]. [sent-22, score-0.956]
9 The former extracts features from images of the eyes and map them to points on the gaze plane (i. [sent-23, score-1.04]
10 The intersection of the axis and the gaze plane determines the gaze point. [sent-27, score-1.892]
11 Regardless of the gaze estimation method, a calibration procedure is needed to set some parameters. [sent-28, score-0.994]
12 An extensive overview of the different approaches of gaze estimation can be found in [11]. [sent-30, score-0.956]
13 During calibration, users are usually asked to fixate their gaze on certain points while images of their eyes are captured. [sent-39, score-1.083]
14 In case of, for example, tracing costumers attention in shops, estimating the gaze points or regions should be done passively. [sent-41, score-1.001]
15 However, in case of passive gaze estimation, the calibration should be done completely automatically without a calibration procedure enforced on the user. [sent-43, score-1.015]
16 [4, 3] treat saliency maps extracted from videos as probability distributions for gaze points. [sent-46, score-0.994]
17 Gaussian process regression is used to learn the mapping between the images of the eyes and the gaze points. [sent-47, score-0.991]
18 In this paper, we claim that the gaze patterns of several viewers provide important cues for the auto-calibration of new viewers. [sent-51, score-1.03]
19 This is based on the assumption that humans produce similar gaze patterns when they look at a stimulus. [sent-52, score-1.049]
20 To the best of our knowledge, our work is the first to use human gaze patterns in order to auto-calibrate gaze estimators. [sent-55, score-1.954]
21 We present a novel approach to auto-calibrate gaze estimators based on the similarity of human gaze patterns. [sent-56, score-1.911]
22 It would be difficult to indicate where the gaze points are on the gaze plane. [sent-59, score-1.928]
23 In a fully uncalibrated setting, when a new subject looks at a stimulus, initial gaze points are inferred. [sent-61, score-1.102]
24 Then, a transformation is computed to map the initial gaze points to match the gaze patterns of other users. [sent-62, score-2.064]
25 In this way, we use all the initial gaze points to match the human gaze patterns instead of using each gaze point at the time. [sent-63, score-3.003]
26 Consequently, the transformed points represent the auto-calibrated estimated gaze points. [sent-64, score-1.007]
27 Calibration-free gaze estimation using hu- man gaze patterns We build upon the observation that gaze patterns of individuals are similar for a certain stimulus [12]. [sent-70, score-3.078]
28 Although, there is no guarantee that people always look at the same regions, human gaze patterns provide important cues about the locations of the gaze points of a new observer. [sent-71, score-2.052]
29 The pipeline of the proposed method is as follows: when a new user is looking at a stimulus, the initial gaze points are computed first. [sent-72, score-1.075]
30 Bright spots indicate the saliency model predictions and the red dots refer to the human gaze points. [sent-76, score-0.991]
31 the initial gaze points to gaze patterns of other individuals. [sent-77, score-2.027]
32 Yet, for simplicity, we focus on translation and scaling which are the most common transformations for gaze estimation. [sent-80, score-0.969]
33 Initial gaze points The final gaze points should eventually match the human gaze patterns. [sent-84, score-2.954]
34 However, we need to start from an initial estimation of the gaze points. [sent-85, score-0.991]
35 Hereafter, we present two methods to achieve this: estimation of initial gaze points from eye templates and estimation based on 2D-manifold. [sent-86, score-1.288]
36 1 Eye templates In this approach, the eye images of a person are captured (templates) while fixating the eyes on points on a gaze plane. [sent-89, score-1.272]
37 When a new subject uses the gaze estimator, his or her eye images are compared with the already-collected eye templates. [sent-93, score-1.344]
38 This is different from the traditional calibration-based gaze estimator where the eye templates are captured and stored for each subject. [sent-94, score-1.199]
39 Template gaze patterns refer to the gaze points of other individuals for the same gaze plane (display). [sent-100, score-2.963]
40 When a new user looks at the stimulus, his or her initial gaze points are first estimated which preserves the relative locations between the gaze points. [sent-101, score-2.02]
41 These points are transformed so that they match the template gaze patterns. [sent-102, score-1.112]
42 For a new user in a different unknown scene setup, the initial gaze points will be incorrect (without calibration). [sent-121, score-1.047]
43 However, the relative locations between the gaze points are preserved. [sent-122, score-1.012]
44 The projection of features of 9 eye images on a 2-D manifold (red, left) and the positions of the corresponding gaze points on the gaze plane (blue, right). [sent-124, score-2.171]
45 The 2D manifold is computed using 800 eye images corresponding to various locations on the gaze plane. [sent-125, score-1.191]
46 Figure 3 shows the projection of features of 9 eye images on a 2D manifold and their corresponding 9 gaze points on the gaze plane. [sent-131, score-2.157]
47 It can be derived that the feature projections preserve the relative locations of the corresponding gaze points. [sent-132, score-0.962]
48 However, the locations on the 2D manifold might be interchanged, transposed, or rotated when compared with the corresponding gaze points. [sent-134, score-1.003]
49 As this step is performed once offline, the projected locations are checked once and transformed to match the corresponding gaze points locations. [sent-137, score-1.068]
50 When a new user looks at a stimulus, the eye features are projected on the offline-learned 2D manifold and the projected values are treated as initial gaze points. [sent-139, score-1.263]
51 The previous two methods (eye templates and 2D manifold) provide a way to find the initial gaze points. [sent-140, score-1.016]
52 In the next section we explain how to map these points to match the template (human) gaze patterns. [sent-141, score-1.094]
53 [12] show that the fixation points of several humans correspond strongly with the gaze points of a new user. [sent-145, score-1.088]
54 To this end, we transform the initial (uncalibrated) gaze points so they match the template gaze patterns for a stimulus. [sent-147, score-2.132]
55 By applying the aforementioned transformation, we aim to transfer the gaze points to their correct positions without explicit calibration. [sent-148, score-0.989]
56 pM} denotes the gaze patterns oaft oMn users P(h =ere {apfter, we call th}em de ntoemtepsla thtee gaze patterns) where pu = {pu1, p2u, . [sent-154, score-1.952]
57 pSuu} consists of the gaze points of user u, and= =S {up is different fo}r ceoanchsi user. [sent-158, score-1.012]
58 {Tphe following tbweo t hmee itnhiotdiasl agiamze et op otrinatn ssefotrm for p so wit can match the template gaze patterns P. [sent-164, score-1.108]
59 φ¯ is the computed mapping and ¯p = φ¯(p) represents the autocalibrated gaze points. [sent-181, score-0.954]
60 Note that we try to match the initial gaze points with all the gaze patterns in P simultaneously. [sent-182, score-2.052]
61 Since our matching measure is biased to smaller scales of the initial gaze points, the minimum scale is set to the average scale of the gaze patterns. [sent-190, score-1.913]
62 To improve the search efficiency, we set the scale and the location of the initial gaze points to the average scale and location of the template gaze patterns. [sent-191, score-2.043]
63 2 Mixture model To find the best mapping, this method models the fixations of the template gaze patterns P by a Gaussian mixture and transforms the initial gaze points to maximize the probability density function of the transformed points. [sent-194, score-2.161]
64 Hence, we can use this data as template gaze patterns. [sent-213, score-1.019]
65 For obtaining the ground truth, the Tobii T60XL gaze estimator [16] is used. [sent-220, score-0.969]
66 The recording of each subject is saved and later analyzed to estimate the gaze points. [sent-235, score-0.979]
67 Results on artificially distorted data Our assumption is that a collection of gaze patterns of individuals can be used to automatically infer the calibration for the gaze estimation of a new user. [sent-246, score-2.058]
68 The distorted fixations are considered as a simulation of the initial (uncalibrated) gaze points. [sent-249, score-1.019]
69 2 are used to transform the distorted gaze points to their correct locations. [sent-255, score-1.016]
70 We discarded the images where the number of active subjects (10 or more fixations) was less than 6 to ensure sufficient gaze patterns. [sent-258, score-1.004]
71 The same procedure is applied on the ground truth gaze points obtained from our collected data. [sent-262, score-0.989]
72 The results show the validity of the proposed methods to bring the distorted (uncalibrated) gaze points closer to their correct locations for different sets of template gaze patterns. [sent-265, score-2.074]
73 Results on the real data The previous section shows how artificially distorted gaze points can be transformed to their correct locations with sufficient accuracy using the K-closest points. [sent-270, score-1.084]
74 In this section, we use the aforementioned collected data to au- tomatically calibrate the gaze estimator and find the gaze points from the videos acquired from the web camera. [sent-271, score-1.99]
75 The distances between the initial gaze points are much larger than the actual corresponding gaze points. [sent-288, score-1.963]
76 Yet, this will not affect the results as the initial gaze points will be scaled while finding the mapping to match the initial gaze points with the template gaze patterns. [sent-289, score-3.107]
77 We select the gaze template patterns in two ways: First, we use the fixation points provided in the eye tracking dataset [12]. [sent-290, score-1.362]
78 In this case, for each subject, we consider the gaze points of the other subjects as template gaze patterns. [sent-292, score-2.056]
79 The K-closest points and fitting the mixture model methods are applied to the initial gaze points. [sent-293, score-1.042]
80 The results show that the K-closest points method achieves higher accuracy than using the mixture model while 2D manifold outperforms eye templates for both template gaze pattern sets. [sent-295, score-1.369]
81 Regarding the template gaze patterns, the accuracies are similar for both sets with a slight improvement using the gaze patterns from [12] dataset. [sent-300, score-2.044]
82 The template gaze pattern sets were collected in two different experiments on two different groups of subjects. [sent-301, score-1.019]
83 This is interesting as it shows the general similarity of gaze patterns and hence suggests the validity of using them in auto-calibration regardless of the viewers. [sent-302, score-1.019]
84 The relatively lower accuracies for some subjects might be either due to errors in estimating the initial gaze points, i. [sent-304, score-1.044]
85 because of eye appearance variations with the template subject eye templates which leads to incorrect ini- tialization, or because of the gaze behavior of the subjects and its variation with the template gaze patterns. [sent-306, score-2.533]
86 [3] adopt an appearance-based gaze estimator and use visual saliency for auto-calibration. [sent-317, score-1.009]
87 Accuracies over different methods and template gaze pattern sets. [sent-332, score-1.019]
88 Accuracies of the gaze estimation auto-calibrated using K-closest points and 2D manifold. [sent-340, score-1.006]
89 This is especially important for tasks where gaze estimation is required with no active participation from the user and using off-the-shelf hardware. [sent-343, score-1.011]
90 There is a trend nowadays to use eye gaze estimation for electronic consumer relationship marketing which aims to employ information technology to understand and fulfill consumers needs. [sent-345, score-1.186]
91 Practical gaze estimators should be invariant to such head pose changes. [sent-351, score-0.972]
92 The method assumes that the template gaze patterns are already available which might not be always the case. [sent-352, score-1.083]
93 Our future research work is to make use of the initial gaze points of the subsequent subjects to gradually autocalibrate the gaze estimator and to combine the saliency information with the template gaze patterns. [sent-353, score-3.1]
94 Conclusion We presented a novel method to auto-calibrate gaze estimators in an uncalibrated setup. [sent-355, score-0.998]
95 Based on the observation that humans produce similar gaze patterns when looking at a stimulus, we use the gaze patterns of individuals to estimate the gaze points for new viewers without active calibration. [sent-356, score-3.106]
96 To estimate the gaze points, the viewer needs to look at an image for only 3 seconds without any explicit participation in the calibration. [sent-358, score-1.023]
97 To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators. [sent-361, score-1.954]
98 Calibration-free gaze [5] [6] [7] [8] [9] [10] [11] [12] sensing using saliency maps. [sent-390, score-0.979]
99 General theory of remote gaze estimation using the pupil center and corneal reflections. [sent-431, score-0.992]
100 The red traces represent the estimated gaze points while the blue traces represent the ground truth obtained from the Tobii gaze estimator. [sent-494, score-1.958]
wordName wordTfidf (topN-words)
[('gaze', 0.939), ('eye', 0.188), ('stimulus', 0.098), ('template', 0.08), ('patterns', 0.064), ('sugano', 0.052), ('points', 0.05), ('subjects', 0.048), ('tobii', 0.043), ('templates', 0.042), ('manifold', 0.041), ('saliency', 0.04), ('uncalibrated', 0.038), ('calibration', 0.038), ('eyes', 0.037), ('stimuli', 0.036), ('initial', 0.035), ('chin', 0.035), ('viewer', 0.031), ('estimator', 0.03), ('fixate', 0.03), ('typing', 0.029), ('subject', 0.029), ('fixation', 0.028), ('setup', 0.028), ('looking', 0.028), ('distorted', 0.027), ('viewers', 0.027), ('look', 0.025), ('match', 0.025), ('pupil', 0.024), ('judd', 0.024), ('user', 0.023), ('locations', 0.023), ('marketing', 0.023), ('accuracies', 0.022), ('hansen', 0.022), ('landscapes', 0.022), ('pj', 0.021), ('humans', 0.021), ('estimators', 0.021), ('equipment', 0.021), ('consumers', 0.019), ('translation', 0.019), ('mixture', 0.018), ('fixations', 0.018), ('ten', 0.018), ('transformed', 0.018), ('individuals', 0.018), ('asked', 0.017), ('valenti', 0.017), ('active', 0.017), ('web', 0.017), ('estimation', 0.017), ('advertisements', 0.016), ('etra', 0.016), ('fixating', 0.016), ('inch', 0.016), ('artificially', 0.016), ('validity', 0.016), ('dedicated', 0.016), ('mapping', 0.015), ('participation', 0.015), ('traces', 0.015), ('videos', 0.015), ('guestrin', 0.014), ('tries', 0.014), ('volume', 0.014), ('infrared', 0.014), ('plane', 0.014), ('topology', 0.014), ('display', 0.014), ('tracking', 0.013), ('seconds', 0.013), ('amsterdam', 0.013), ('projected', 0.013), ('indicative', 0.013), ('head', 0.012), ('human', 0.012), ('tracing', 0.012), ('transformation', 0.012), ('remote', 0.012), ('corners', 0.012), ('matsushita', 0.012), ('issue', 0.012), ('looks', 0.011), ('flexible', 0.011), ('rest', 0.011), ('street', 0.011), ('regarding', 0.011), ('recording', 0.011), ('accuracy', 0.011), ('transformations', 0.011), ('conference', 0.011), ('cropped', 0.011), ('principal', 0.011), ('mounted', 0.01), ('screen', 0.01), ('users', 0.01), ('camera', 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 67 iccv-2013-Calibration-Free Gaze Estimation Using Human Gaze Patterns
Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
2 0.81920391 247 iccv-2013-Learning to Predict Gaze in Egocentric Video
Author: Yin Li, Alireza Fathi, James M. Rehg
Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.
3 0.56302458 381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs
Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin
Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.
4 0.40701428 325 iccv-2013-Predicting Primary Gaze Behavior Using Social Saliency Fields
Author: Hyun Soo Park, Eakta Jain, Yaser Sheikh
Abstract: We present a method to predict primary gaze behavior in a social scene. Inspired by the study of electric fields, we posit “social charges ”—latent quantities that drive the primary gaze behavior of members of a social group. These charges induce a gradient field that defines the relationship between the social charges and the primary gaze direction of members in the scene. This field model is used to predict primary gaze behavior at any location or time in the scene. We present an algorithm to estimate the time-varying behavior of these charges from the primary gaze behavior of measured observers in the scene. We validate the model by evaluating its predictive precision via cross-validation in a variety of social scenes.
5 0.14692958 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
Author: Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fixation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.
6 0.14393017 373 iccv-2013-Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics
7 0.13240454 180 iccv-2013-From Where and How to What We See
8 0.10510326 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
9 0.06141815 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
10 0.055468041 369 iccv-2013-Saliency Detection: A Boolean Map Approach
11 0.049629994 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
12 0.043412201 370 iccv-2013-Saliency Detection in Large Point Sets
13 0.04290171 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction
14 0.039393015 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
15 0.038343068 71 iccv-2013-Category-Independent Object-Level Saliency Detection
17 0.032853208 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
18 0.032664269 246 iccv-2013-Learning the Visual Interpretation of Sentences
19 0.032236133 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
20 0.032071967 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection
topicId topicWeight
[(0, 0.087), (1, -0.025), (2, 0.193), (3, -0.102), (4, -0.058), (5, -0.035), (6, 0.081), (7, -0.033), (8, 0.003), (9, 0.114), (10, 0.023), (11, -0.131), (12, -0.113), (13, 0.114), (14, -0.031), (15, 0.285), (16, -0.533), (17, 0.137), (18, 0.313), (19, -0.396), (20, -0.053), (21, 0.037), (22, -0.138), (23, 0.051), (24, -0.071), (25, 0.03), (26, -0.095), (27, 0.031), (28, -0.054), (29, 0.003), (30, -0.001), (31, 0.033), (32, -0.006), (33, 0.022), (34, -0.019), (35, 0.005), (36, 0.034), (37, 0.002), (38, -0.001), (39, 0.003), (40, -0.021), (41, -0.019), (42, 0.006), (43, 0.022), (44, 0.003), (45, 0.028), (46, 0.02), (47, -0.012), (48, 0.002), (49, 0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.98263985 67 iccv-2013-Calibration-Free Gaze Estimation Using Human Gaze Patterns
Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
2 0.95993853 247 iccv-2013-Learning to Predict Gaze in Egocentric Video
Author: Yin Li, Alireza Fathi, James M. Rehg
Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.
3 0.89583713 325 iccv-2013-Predicting Primary Gaze Behavior Using Social Saliency Fields
Author: Hyun Soo Park, Eakta Jain, Yaser Sheikh
Abstract: We present a method to predict primary gaze behavior in a social scene. Inspired by the study of electric fields, we posit “social charges ”—latent quantities that drive the primary gaze behavior of members of a social group. These charges induce a gradient field that defines the relationship between the social charges and the primary gaze direction of members in the scene. This field model is used to predict primary gaze behavior at any location or time in the scene. We present an algorithm to estimate the time-varying behavior of these charges from the primary gaze behavior of measured observers in the scene. We validate the model by evaluating its predictive precision via cross-validation in a variety of social scenes.
4 0.77984625 381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs
Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin
Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.
5 0.2597957 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction
Author: Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fixation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.
6 0.2388574 373 iccv-2013-Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics
7 0.20935211 180 iccv-2013-From Where and How to What We See
8 0.17363337 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
9 0.15159541 369 iccv-2013-Saliency Detection: A Boolean Map Approach
10 0.14889294 416 iccv-2013-The Interestingness of Images
11 0.14534362 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection
12 0.11476362 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
13 0.1065663 246 iccv-2013-Learning the Visual Interpretation of Sentences
14 0.10478399 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
15 0.10248478 446 iccv-2013-Visual Semantic Complex Network for Web Images
16 0.10214952 407 iccv-2013-Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length
17 0.10184869 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
18 0.10068236 49 iccv-2013-An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points
19 0.098874778 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
20 0.098836452 278 iccv-2013-Multi-scale Topological Features for Hand Posture Representation and Analysis
topicId topicWeight
[(2, 0.057), (7, 0.021), (12, 0.018), (26, 0.072), (31, 0.037), (34, 0.015), (42, 0.124), (64, 0.04), (73, 0.04), (89, 0.146), (91, 0.23), (95, 0.013), (97, 0.015), (98, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.78034663 67 iccv-2013-Calibration-Free Gaze Estimation Using Human Gaze Patterns
Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
2 0.68470979 44 iccv-2013-Adapting Classification Cascades to New Domains
Author: Vidit Jain, Sachin Sudhakar Farfade
Abstract: Classification cascades have been very effective for object detection. Such a cascade fails to perform well in data domains with variations in appearances that may not be captured in the training examples. This limited generalization severely restricts the domains for which they can be used effectively. A common approach to address this limitation is to train a new cascade of classifiers from scratch for each of the new domains. Building separate detectors for each of the different domains requires huge annotation and computational effort, making it not scalable to a large number of data domains. Here we present an algorithm for quickly adapting a pre-trained cascade of classifiers using a small number oflabeledpositive instancesfrom a different yet similar data domain. In our experiments with images of human babies and human-like characters from movies, we demonstrate that the adapted cascade significantly outperforms both of the original cascade and the one trained from scratch using the given training examples. –
3 0.68414313 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
Author: Hongteng Xu, Hongyuan Zha
Abstract: Data sparsity has been a thorny issuefor manifold-based image synthesis, and in this paper we address this critical problem by leveraging ideas from transfer learning. Specifically, we propose methods based on generating auxiliary data in the form of synthetic samples using transformations of the original sparse samples. To incorporate the auxiliary data, we propose a weighted data synthesis method, which adaptively selects from the generated samples for inclusion during the manifold learning process via a weighted iterative algorithm. To demonstrate the feasibility of the proposed method, we apply it to the problem of face image synthesis from sparse samples. Compared with existing methods, the proposed method shows encouraging results with good performance improvements.
4 0.6835435 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
Author: Claudia Nieuwenhuis, Evgeny Strekalovskiy, Daniel Cremers
Abstract: We propose a convex multilabel framework for image sequence segmentation which allows to impose proportion priors on object parts in order to preserve their size ratios across multiple images. The key idea is that for strongly deformable objects such as a gymnast the size ratio of respective regions (head versus torso, legs versus full body, etc.) is typically preserved. We propose different ways to impose such priors in a Bayesian framework for image segmentation. We show that near-optimal solutions can be computed using convex relaxation techniques. Extensive qualitative and quantitative evaluations demonstrate that the proportion priors allow for highly accurate segmentations, avoiding seeping-out of regions and preserving semantically relevant small-scale structures such as hands or feet. They naturally apply to multiple object instances such as players in sports scenes, and they can relate different objects instead of object parts, e.g. organs in medical imaging. The algorithm is efficient and easily parallelized leading to proportion-consistent segmentations at runtimes around one second.
5 0.6833663 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
Author: Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
Abstract: We propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statisticallyaligned part based face representation, namely the PEP representation. To adapt a general face detector to a collection of images, we compute the PEP representations of the candidate detections from the general face detector, and then train a discriminative classifier with the top positives and negatives. Then we re-rank all the candidate detections with this classifier. This way, a face detector tailored to the statistics of the specific image collection is adapted from the original detector. We present extensive results on three datasets with two state-of-the-art face detectors. The significant improvement of detection accuracy over these state- of-the-art face detectors strongly demonstrates the efficacy of the proposed face detector adaptation algorithm.
6 0.68301249 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
7 0.68241191 180 iccv-2013-From Where and How to What We See
8 0.68200994 150 iccv-2013-Exemplar Cut
9 0.6817345 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
10 0.68105972 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
11 0.68089849 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
12 0.68051153 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
13 0.67998636 157 iccv-2013-Fast Face Detector Training Using Tailored Views
14 0.67995322 277 iccv-2013-Multi-channel Correlation Filters
15 0.6798746 349 iccv-2013-Regionlets for Generic Object Detection
16 0.67944831 52 iccv-2013-Attribute Adaptation for Personalized Image Search
17 0.6793586 338 iccv-2013-Randomized Ensemble Tracking
18 0.67929727 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
19 0.67919374 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
20 0.67857826 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation