iccv iccv2013 iccv2013-381 knowledge-graph by maker-knowledge-mining

381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs


Source: pdf

Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. [sent-2, score-0.444]

2 Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. [sent-3, score-0.499]

3 The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. [sent-4, score-0.981]

4 Experiments and analysis performed on human eye gaze data verify the effectiveness of this method. [sent-8, score-0.682]

5 Thel fta2ndr32i 1g 1htimages how human scanpath segments and corresponding estimates from our algorithm, respectively, where the correspondences are indicated by matching colors. [sent-19, score-0.388]

6 end, much work has focused on salient object detection [15][7] and gaze density estimation [8][29]. [sent-20, score-0.589]

7 On the other hand, only a few works have considered the process of gaze shifting and recovered a temporal ordering of attention points over an image [10][28][27]. [sent-21, score-0.7]

8 In this paper, we present a method to estimate gaze shifts and infer scanpaths such as shown in Fig. [sent-23, score-0.893]

9 Factors that influence gaze shift can be categorized into three types: low-level feature saliency, spatial position, and semantic content. [sent-25, score-0.74]

10 Low-level feature saliency has been the most widely adopted and investigated cue for visual attention, and is often modeled based on feature contrast. [sent-26, score-0.313]

11 In the estimation of gaze shifts, we utilize feature differences to calculate transition probabilities between different image regions, with more visually salient regions having greater attraction of gaze. [sent-27, score-0.866]

12 Spatial position has commonly been used to calculate transition probabilities in graph based and random walk based methods [28][8][7][32]. [sent-28, score-0.275]

13 In [3], it was empirically shown that gaze shifting is a Levy flight process, which is a particular type of random walk with a step length that follows a heavy-tailed distribution. [sent-29, score-0.762]

14 In our work, we incorporate Levy flight with a 2D Cauchy distribution to model the effect of spatial position on gaze shifts. [sent-30, score-0.724]

15 Spatial position as well as low-level feature saliency are stimulus-driven rather 33222325 than interpretation-driven factors, and as such they both lie in the domain of bottom-up attention. [sent-31, score-0.293]

16 The third factor, semantic content, provides a top-down attention component that has received little consideration due to its complexity. [sent-32, score-0.193]

17 In [5], it was experimentally shown that discrete objects attract more attention and predict visual fixation much better than early saliency cues. [sent-40, score-0.421]

18 The experiments in [9] also demonstrate significant semantic guidance of eye movements for real-world scenes. [sent-41, score-0.249]

19 In spite of the empirical support on the importance of semantic content in guiding attention, semantic content remains difficult to exploit because object segmentation and scene interpretation are still challenging problems. [sent-42, score-0.362]

20 A practical, unsupervised approach for extracting semantic concepts is through latent semantic analysis [6]. [sent-44, score-0.275]

21 Motivated by this approach, we attempt to infer latent semantic concepts and account for them in estimating gaze shifts. [sent-45, score-0.731]

22 In our work, latent semantic concepts which are difficult to discern are modeled by the hidden states, while the observations produced from these states are visible in the image and extracted as lowlevel descriptors. [sent-47, score-0.53]

23 Gaze shift patterns are modeled by transition probabilities between the states, and the hidden states are obtained through an unsupervised training process convenient for application. [sent-48, score-0.599]

24 The main technical contribution of our human scanpath estimation method is the incorporation of semantic content through an HMM formulation in which latent semantics are represented by the hidden states, and gaze shift patterns are modeled in the transition matrix. [sent-49, score-1.474]

25 To evaluate the similarity of estimated scanpaths to ground truth, we employ a method based on gene sequence alignment. [sent-51, score-0.259]

26 The results of our experiments on human gaze data provide strong support for this approach. [sent-52, score-0.591]

27 Related Work There exist numerous works on visual attention that estimate saliency or the gaze distribution over an image. [sent-54, score-0.955]

28 Relatively few techniques consider the dynamic process of gaze shifts and estimate scanpaths. [sent-55, score-0.682]

29 In this section, we first review existing saliency calculation methods since saliency reflects gaze allocation and represents part of the basis for gaze shifts. [sent-56, score-1.664]

30 We then review the methods for scanpath generation and other related works. [sent-57, score-0.316]

31 Saliency Calculation The family of contrast based methods occupies a major position in the field of saliency calculation, and is motivated by the biological aspect of attention. [sent-60, score-0.335]

32 This family includes Itti’s saliency method [10] and its descendants, including those based on graphs [8] and proto-objects [27]. [sent-62, score-0.322]

33 The second important family of saliency methods is based on information pursuit and explores the psychological aspect of attention. [sent-66, score-0.294]

34 fed a saliency map into a neural network and employed the Winner Take All (WTA) and Inhibition of Return strategies [10], while Walthera and Koch identified proto-objects in the image and ranked them according to saliency value [27]. [sent-76, score-0.504]

35 These two methods generate scanpaths according to saliency but in fact what motivates gaze shifts is far more than that. [sent-77, score-1.145]

36 Lee proposed that gaze shifting is due heavily to the radial decrease in resolution within the fovea, and that gaze shifting aims to maximize the information gain [12]. [sent-78, score-1.216]

37 Renninger proposed that the purpose of gaze shifts is for information maximization [21], and further verified that gaze shifts aim to reduce local uncertainty [22]. [sent-79, score-1.396]

38 Based on the above ideas, Wang simulated 33222336 human scanpaths by exploiting the properties of the human visual system, including the decrease of resolution in the fovea, the storing and fading of working memory, and information maximization on the residual image [28]. [sent-80, score-0.339]

39 The primary difference of our work from the method in [28] is in the calculation of transition probabilities. [sent-81, score-0.183]

40 Other related works There are a few works related to our HMM-based scanpath generation in that they use hidden states to model the invisible factors affecting gaze shifts. [sent-85, score-1.213]

41 These include methods that model eye movements for camera control [23], and passive/active patterns [18] or brain states [1] to generate saliency maps. [sent-86, score-0.585]

42 In all of these works, the hidden states are manually defined. [sent-87, score-0.297]

43 th(oad −, we segment the image into regions and model gaze ? [sent-101, score-0.617]

44 shifts in terms of transition probabilities from one region to another. [sent-102, score-0.372]

45 (1) We assume gaze shifts to be a Markov process, meaning that the next gaze location depends only on the current one. [sent-104, score-1.261]

46 Low-level feature saliency The transition probabilities determined by low-level features are calculated through feature differences between image regions. [sent-112, score-0.492]

47 (4) The low-level features used in this paper are the YUV color values and Gabor features at five scales and eight orientations, since measures of intensity, color, orientation, and texture have been widely adopted and shown to be effective for estimating saliency [10][8]. [sent-118, score-0.252]

48 The transition probability from region r to region s is calculated by normalizing the corresponding weight by the sum of outgoing weights from region r: p(y(s)|y(r) =? [sent-119, score-0.326]

49 e regions as nodes and the transition probabilities are the weights of the edges. [sent-122, score-0.247]

50 Random walks on such graphs have been used to construct saliency maps [28][8]. [sent-123, score-0.313]

51 Semantic content We describe the influence of semantic content on gaze shifts using a hidden Markov model. [sent-126, score-1.076]

52 The states in an HMM are not directly visible but can be estimated from the visible output which is dependent on the state. [sent-129, score-0.208]

53 This property of the HMM makes it a suitable choice for modeling semantic content in scanpath estimation, as the hidden states can represent latent seman- tic concepts while the output corresponds to descriptors for the visible image. [sent-130, score-0.878]

54 1 HMM-based prediction of gaze shifts An HMM with M hidden states can be represented by three parameters, λ = (π, Θ, Φ). [sent-133, score-0.979]

55 Θ ∈ RM×M is the transition matrix of the states, with Θentr ∈ies θi,j representing the probability of transiting from state ito state j. [sent-135, score-0.314]

56 1 Given a sequence of gazed image regions {g1, g2, · · · , gT}, we represent its BoVW representa{tiogns as X·· =g }{x,1 , x2 , · · · , xT}, where each xt denotes tthioen Bs aosVW X representation ·o f, xthe} ,t- wthh region han xd T is the sequence length. [sent-154, score-0.283]

57 , the i-th entry of π) is the prior probability of the i-th hidden state. [sent-165, score-0.214]

58 , a saccadic scanpath from one user on one image), palned ( iN. [sent-190, score-0.404]

59 Moreover, given state zt = i, we can also estimate the probability of the partial sequence after time t (i. [sent-197, score-0.254]

60 The numbers below the columns are the prior probabilities of the hidden states. [sent-211, score-0.213]

61 (10) Recall that Θ models the transition probability between any two states. [sent-215, score-0.181]

62 We first define ξt,i,j = p(zt = i, zt+1 = j |X) as the probability of a sequence being in state iat tim=e tj a|Xnd) in state j at time t 1, which can be calculated as + ξt,i,j=? [sent-216, score-0.214]

63 probability of a state zt = ican be calculated as ηt,i = ? [sent-233, score-0.263]

64 3 Discussion To visualize the hidden states, for each user scanpath we estimate the state of each gazed region (Eq. [sent-249, score-0.656]

65 visualization shows that each hidden state has a consistent visual pattern (e. [sent-258, score-0.215]

66 The parameters of the HMM have practical meaning in the context of scanpath estimation. [sent-261, score-0.338]

67 2, the prior probabilities of the states are given in the bottom row. [sent-264, score-0.228]

68 The transitions between states describe human gaze shift patterns. [sent-266, score-0.807]

69 found that human gaze tends to shift to similar concepts. [sent-268, score-0.651]

70 Spatial position As mentioned previously, gaze shifting has been shown to be a Levy flight, which is a random walk with steps in an isotropically random direction and a step length subject to a heavy-tailed distribution [3]. [sent-276, score-0.729]

71 Here, we use a 2D Cauchy distribution to model the gaze shift. [sent-277, score-0.583]

72 Let ut = (ut , vt) be the position of the t-th gaze position. [sent-278, score-0.69]

73 The probability of transiting from ut to position u = (u, v) is defined as p(ut+1=u|ut)=2π? [sent-279, score-0.218]

74 In several existing methods, a 2D Gaussian function is used to model the gaze shift [7][8][32]. [sent-292, score-0.617]

75 4 for human gaze data from the NUSEF dataset [25], plbiotyar84026x10−35 10 ChGau mcsahniydat150 step length Figure 4. [sent-294, score-0.62]

76 Step length distribution for human gaze shifts, with fit- ting results by using a Cauchy distribution and a Gaussian distribution. [sent-295, score-0.672]

77 a Gaussian function is less suitable than a Cauchy distribution for modeling gaze shift. [sent-296, score-0.583]

78 To achieve scale invariance with block based methods, low-level feature saliency needs to be calculated at multiple scales [10] [15]. [sent-303, score-0.283]

79 To determine the region that will be gazed next in a scanpath, we take the region with the highest probability computed from Eq. [sent-308, score-0.21]

80 The length of the scanpath is set to 20 for our method and the comparison techniques. [sent-311, score-0.345]

81 Both record human gaze in a free viewing setting. [sent-315, score-0.629]

82 On average, the scanpaths of about 25 users are recorded for each image. [sent-317, score-0.232]

83 The JUDD dataset consists of 1003 images with scanpaths of 15 subjects recorded by an eye tracking machine. [sent-318, score-0.302]

84 , multiple segments of a scanpath may be matched to the same segment of another one. [sent-325, score-0.376]

85 m Inis omurat mcehth) o=d ,w w(ex, u s−e) h=e wset(t−in, gxs) =( gap, w)h =ere 1 gap can bies mseta t coh 12=, (13x, or =41 wto( −in,dxic)ate = a toalpe,ra wnhceer eof g gap length 1e,t 2 to, or 3 , b −etw,e oenr −the −) −w )− = first and second matched elements. [sent-334, score-0.177]

86 For each image, we have multiple ground truth scanpaths from different users, so we compare the estimated scanpath with all of them and report the average similarity. [sent-343, score-0.527]

87 The two main HMM settings are the number ofvisual words (K) and the number of hidden states (M), while the number of training samples (N) also impacts performance. [sent-354, score-0.32]

88 We sample the number of states as M = {2, 3, · · · , 10} and the number of visual words as KM = = { {120,, 20, 30, 1400}, 5 a0n}d. [sent-357, score-0.184]

89 Gaze factors We compared the performance using each individual gaze factor (low-level feature saliency, semantic content with HMM, and spatial position with Levy flight) as well as the full gaze shift method with all three factors. [sent-372, score-1.424]

90 This demonstrates the importance of modeling the gaze transitions. [sent-405, score-0.557]

91 Comparison with other methods To our knowledge, Itti’s saliency based method (Itti) [10], Walthera’s proto-object based method (proto) [27], and Wang’s scanpath simulation method (WW) [28] are the only existing techniques for estimating scanpaths. [sent-416, score-0.568]

92 7, from the results for proto, we can see that sorting regions according to saliency does not provide good estimates of scanpaths. [sent-429, score-0.29]

93 8 shows scanpath results for two images in the NUSEF-portrait set. [sent-432, score-0.316]

94 Conclusion In this paper, we have proposed a human scanpath estimation method that employs an HMM to model the influence of semantic content, and uses Levy flight to account for spatial position. [sent-434, score-0.573]

95 Experiments on challenging datasets show our method to outperform existing scanpath estimation techniques. [sent-435, score-0.316]

96 Probabilistic multi-task learning for visual saliency estimation in video. [sent-517, score-0.28]

97 An eye fixation database for saliency detection in images. [sent-604, score-0.367]

98 What are we looking for: towards statistical modeling of saccadic eye movements and visual saliency. [sent-610, score-0.239]

99 Top-down visual saliency via joint crf and dictionary learning. [sent-636, score-0.28]

100 Learning a saliency map using fixated locations in natural scenes. [sent-652, score-0.252]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gaze', 0.557), ('scanpath', 0.316), ('hmm', 0.296), ('saliency', 0.252), ('scanpaths', 0.211), ('states', 0.156), ('zt', 0.142), ('hidden', 0.141), ('transition', 0.137), ('nusef', 0.127), ('levy', 0.126), ('shifts', 0.125), ('semantic', 0.101), ('flight', 0.1), ('cauchy', 0.095), ('attention', 0.092), ('ut', 0.092), ('eye', 0.091), ('itti', 0.091), ('gazed', 0.09), ('judd', 0.074), ('gap', 0.074), ('xt', 0.073), ('probabilities', 0.072), ('content', 0.065), ('saccadic', 0.063), ('renninger', 0.061), ('walthera', 0.061), ('shift', 0.06), ('movements', 0.057), ('wk', 0.056), ('shifting', 0.051), ('proto', 0.05), ('gt', 0.049), ('calculation', 0.046), ('state', 0.046), ('bovw', 0.045), ('probability', 0.044), ('factors', 0.043), ('concepts', 0.043), ('family', 0.042), ('position', 0.041), ('fovea', 0.041), ('nusefportrait', 0.041), ('transiting', 0.041), ('verghese', 0.041), ('region', 0.038), ('viewing', 0.038), ('segments', 0.038), ('regions', 0.038), ('markov', 0.036), ('yt', 0.035), ('human', 0.034), ('walks', 0.033), ('modeled', 0.033), ('codebook', 0.033), ('maximization', 0.032), ('significance', 0.032), ('salient', 0.032), ('calculated', 0.031), ('examine', 0.031), ('latent', 0.03), ('molecular', 0.03), ('traced', 0.03), ('attraction', 0.03), ('interpretation', 0.03), ('length', 0.029), ('brain', 0.029), ('entry', 0.029), ('bi', 0.028), ('graphs', 0.028), ('visual', 0.028), ('singapore', 0.028), ('sihite', 0.027), ('xk', 0.027), ('similarity', 0.026), ('china', 0.026), ('distribution', 0.026), ('visible', 0.026), ('borji', 0.025), ('iat', 0.025), ('walk', 0.025), ('displayed', 0.025), ('hwang', 0.025), ('attract', 0.025), ('sydney', 0.025), ('user', 0.025), ('nn', 0.024), ('ww', 0.024), ('jm', 0.024), ('fixation', 0.024), ('entropy', 0.024), ('settings', 0.023), ('influences', 0.023), ('superscript', 0.022), ('meaning', 0.022), ('segment', 0.022), ('influence', 0.022), ('sequence', 0.022), ('users', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs

Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

2 0.56302458 67 iccv-2013-Calibration-Free Gaze Estimation Using Human Gaze Patterns

Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab

Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.

3 0.52444357 247 iccv-2013-Learning to Predict Gaze in Egocentric Video

Author: Yin Li, Alireza Fathi, James M. Rehg

Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.

4 0.34375906 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction

Author: Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti

Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fixation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.

5 0.28559077 325 iccv-2013-Predicting Primary Gaze Behavior Using Social Saliency Fields

Author: Hyun Soo Park, Eakta Jain, Yaser Sheikh

Abstract: We present a method to predict primary gaze behavior in a social scene. Inspired by the study of electric fields, we posit “social charges ”—latent quantities that drive the primary gaze behavior of members of a social group. These charges induce a gradient field that defines the relationship between the social charges and the primary gaze direction of members in the scene. This field model is used to predict primary gaze behavior at any location or time in the scene. We present an algorithm to estimate the time-varying behavior of these charges from the primary gaze behavior of measured observers in the scene. We validate the model by evaluating its predictive precision via cross-validation in a variety of social scenes.

6 0.25966397 373 iccv-2013-Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics

7 0.22052106 71 iccv-2013-Category-Independent Object-Level Saliency Detection

8 0.22031458 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction

9 0.18167971 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection

10 0.15797094 371 iccv-2013-Saliency Detection via Absorbing Markov Chain

11 0.15747246 396 iccv-2013-Space-Time Robust Representation for Action Recognition

12 0.14498135 369 iccv-2013-Saliency Detection: A Boolean Map Approach

13 0.13604961 374 iccv-2013-Salient Region Detection by UFO: Uniqueness, Focusness and Objectness

14 0.13340642 370 iccv-2013-Saliency Detection in Large Point Sets

15 0.12976927 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

16 0.12434669 180 iccv-2013-From Where and How to What We See

17 0.11860055 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features

18 0.11443593 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM

19 0.1111109 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

20 0.097367145 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.171), (1, 0.003), (2, 0.358), (3, -0.183), (4, -0.09), (5, -0.013), (6, 0.08), (7, -0.04), (8, 0.009), (9, 0.064), (10, -0.007), (11, -0.127), (12, -0.085), (13, 0.12), (14, -0.043), (15, 0.207), (16, -0.393), (17, 0.086), (18, 0.226), (19, -0.259), (20, -0.036), (21, 0.011), (22, -0.072), (23, 0.02), (24, -0.044), (25, 0.001), (26, -0.046), (27, 0.002), (28, -0.01), (29, 0.045), (30, 0.002), (31, 0.028), (32, -0.002), (33, 0.012), (34, -0.034), (35, 0.007), (36, -0.0), (37, 0.006), (38, 0.033), (39, 0.011), (40, -0.009), (41, -0.003), (42, 0.007), (43, 0.028), (44, -0.011), (45, 0.011), (46, 0.011), (47, 0.027), (48, 0.003), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95670289 325 iccv-2013-Predicting Primary Gaze Behavior Using Social Saliency Fields

Author: Hyun Soo Park, Eakta Jain, Yaser Sheikh

Abstract: We present a method to predict primary gaze behavior in a social scene. Inspired by the study of electric fields, we posit “social charges ”—latent quantities that drive the primary gaze behavior of members of a social group. These charges induce a gradient field that defines the relationship between the social charges and the primary gaze direction of members in the scene. This field model is used to predict primary gaze behavior at any location or time in the scene. We present an algorithm to estimate the time-varying behavior of these charges from the primary gaze behavior of measured observers in the scene. We validate the model by evaluating its predictive precision via cross-validation in a variety of social scenes.

2 0.93329436 247 iccv-2013-Learning to Predict Gaze in Egocentric Video

Author: Yin Li, Alireza Fathi, James M. Rehg

Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.

same-paper 3 0.93273431 381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs

Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

4 0.92999887 67 iccv-2013-Calibration-Free Gaze Estimation Using Human Gaze Patterns

Author: Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab

Abstract: We present a novel method to auto-calibrate gaze estimators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze patterns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimulus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten subjects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an average accuracy of 4.3◦. Although the reported performance is lower than what could be achieved with dedicated hardware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.

5 0.57222068 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction

Author: Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti

Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fixation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.

6 0.54203027 373 iccv-2013-Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics

7 0.47252923 369 iccv-2013-Saliency Detection: A Boolean Map Approach

8 0.36795202 370 iccv-2013-Saliency Detection in Large Point Sets

9 0.36685613 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection

10 0.36548296 71 iccv-2013-Category-Independent Object-Level Saliency Detection

11 0.36420569 372 iccv-2013-Saliency Detection via Dense and Sparse Reconstruction

12 0.36161169 371 iccv-2013-Saliency Detection via Absorbing Markov Chain

13 0.35372621 374 iccv-2013-Salient Region Detection by UFO: Uniqueness, Focusness and Objectness

14 0.34989691 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

15 0.31527367 396 iccv-2013-Space-Time Robust Representation for Action Recognition

16 0.31108171 180 iccv-2013-From Where and How to What We See

17 0.26283747 217 iccv-2013-Initialization-Insensitive Visual Tracking through Voting with Salient Local Features

18 0.25859031 416 iccv-2013-The Interestingness of Images

19 0.23263502 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

20 0.21573123 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.089), (7, 0.026), (12, 0.019), (26, 0.073), (31, 0.056), (34, 0.011), (40, 0.011), (42, 0.08), (48, 0.012), (64, 0.04), (73, 0.026), (84, 0.264), (89, 0.139), (95, 0.014), (97, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84840435 401 iccv-2013-Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology

Author: Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin

Abstract: Image-based classification ofhistology sections, in terms of distinct components (e.g., tumor, stroma, normal), provides a series of indices for tumor composition. Furthermore, aggregation of these indices, from each whole slide image (WSI) in a large cohort, can provide predictive models of the clinical outcome. However, performance of the existing techniques is hindered as a result of large technical variations and biological heterogeneities that are always present in a large cohort. We propose a system that automatically learns a series of basis functions for representing the underlying spatial distribution using stacked predictive sparse decomposition (PSD). The learned representation is then fed into the spatial pyramid matching framework (SPM) with a linear SVM classifier. The system has been evaluated for classification of (a) distinct histological components for two cohorts of tumor types, and (b) colony organization of normal and malignant cell lines in 3D cell culture models. Throughput has been increased through the utility of graphical processing unit (GPU), and evalu- ation indicates a superior performance results, compared with previous research.

same-paper 2 0.74755085 381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs

Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

3 0.7205596 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection

Author: Tianfu Wu, Song-Chun Zhu

Abstract: Many object detectors, such as AdaBoost, SVM and deformable part-based models (DPM), compute additive scoring functions at a large number of windows scanned over image pyramid, thus computational efficiency is an important consideration beside accuracy performance. In this paper, we present a framework of learning cost-sensitive decision policy which is a sequence of two-sided thresholds to execute early rejection or early acceptance based on the accumulative scores at each step. A decision policy is said to be optimal if it minimizes an empirical global risk function that sums over the loss of false negatives (FN) and false positives (FP), and the cost of computation. While the risk function is very complex due to high-order connections among the two-sided thresholds, we find its upper bound can be optimized by dynamic programming (DP) efficiently and thus say the learned policy is near-optimal. Given the loss of FN and FP and the cost in three numbers, our method can produce a policy on-the-fly for Adaboost, SVM and DPM. In experiments, we show that our decision policy outperforms state-of-the-art cascade methods significantly in terms of speed with similar accuracy performance.

4 0.68762779 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing

Author: Naiyan Wang, Dit-Yan Yeung

Abstract: Matrix factorization is a fundamental problem that is often encountered in many computer vision and machine learning tasks. In recent years, enhancing the robustness of matrix factorization methods has attracted much attention in the research community. To benefit from the strengths of full Bayesian treatment over point estimation, we propose here a full Bayesian approach to robust matrix factorization. For the generative process, the model parameters have conjugate priors and the likelihood (or noise model) takes the form of a Laplace mixture. For Bayesian inference, we devise an efficient sampling algorithm by exploiting a hierarchical view of the Laplace distribution. Besides the basic model, we also propose an extension which assumes that the outliers exhibit spatial or temporal proximity as encountered in many computer vision applications. The proposed methods give competitive experimental results when compared with several state-of-the-art methods on some benchmark image and video processing tasks.

5 0.67982608 168 iccv-2013-Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms

Author: Yu Pang, Haibin Ling

Abstract: Evaluating visual tracking algorithms, or “trackers ” for short, is of great importance in computer vision. However, it is hard to “fairly” compare trackers due to many parameters need to be tuned in the experimental configurations. On the other hand, when introducing a new tracker, a recent trend is to validate it by comparing it with several existing ones. Such an evaluation may have subjective biases towards the new tracker which typically performs the best. This is mainly due to the difficulty to optimally tune all its competitors and sometimes the selected testing sequences. By contrast, little subjective bias exists towards the “second best” ones1 in the contest. This observation inspires us with a novel perspective towards inhibiting subjective bias in evaluating trackers by analyzing the results between the second bests. In particular, we first collect all tracking papers published in major computer vision venues in recent years. From these papers, after filtering out potential biases in various aspects, we create a dataset containing many records of comparison results between various visual trackers. Using these records, we derive performance rank- ings of the involved trackers by four different methods. The first two methods model the dataset as a graph and then derive the rankings over the graph, one by a rank aggregation algorithm and the other by a PageRank-like solution. The other two methods take the records as generated from sports contests and adopt widely used Elo’s and Glicko ’s rating systems to derive the rankings. The experimental results are presented and may serve as a reference for related research.

6 0.66825628 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image

7 0.63056421 219 iccv-2013-Internet Based Morphable Model

8 0.62446088 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

9 0.62376159 50 iccv-2013-Analysis of Scores, Datasets, and Models in Visual Saliency Prediction

10 0.61479294 180 iccv-2013-From Where and How to What We See

11 0.61425704 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

12 0.61368001 20 iccv-2013-A Max-Margin Perspective on Sparse Representation-Based Classification

13 0.61194146 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding

14 0.61134058 227 iccv-2013-Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning

15 0.60625768 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

16 0.60609305 218 iccv-2013-Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data

17 0.6038031 347 iccv-2013-Recursive Estimation of the Stein Center of SPD Matrices and Its Applications

18 0.60256958 91 iccv-2013-Contextual Hypergraph Modeling for Salient Object Detection

19 0.60187382 71 iccv-2013-Category-Independent Object-Level Saliency Detection

20 0.60178125 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition