iccv iccv2013 iccv2013-381 iccv2013-381-reference knowledge-graph by maker-knowledge-mining

381 iccv-2013-Semantically-Based Human Scanpath Estimation with HMMs

Source: pdf

Author: Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scanpaths are modeled based on three principal factors that influence human attention, namely low-levelfeature saliency, spatialposition, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

reference text

[1] A. Borji, D. N. Sihite, and L. Itti. An Object-Based Bayesian Framework for Top-Down Visual Attention. AAAI, 2012. 3

[2] A. Borji, D. Sihite, and L. Itti. Probabilistic learning of task-specific visual attention. CVPR, 2012. 2

[3] D. Brockmann and T. Geisel. Are human scanpaths Levy flights? ICANN, 1999. 1, 5

[4] N. Bruce and J. Tsotsos. Saliency based on information maximization. NIPS, 2006. 2

[5] W. Einh a¨user, M. Spain, and P. Perona. Objects predict fixations better than early saliency. Journal of Vision, 8(14): 18, 1-26, 2008. 2

[6] L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. CVPR, 2005. 2, 3 33223381 measured from users, while the green ones are estimated scanpaths. the similarity scores for gap=-1/3 are given at the top-left corners.

[7] V. Gopalakrishnan, Y. Hu, and D. Rajan. Random walks on graphs for salient object detection in images. T-IP, 19(12): 3232-3242, 2010. 1, 2, 5

[8] J. Harel and C. Koch. Graph-based visual saliency. NIPS, 2006. 1, 2, 3, 5, 7

[9] A. D. Hwang, H.-C. Wang, and M. Pomplun. Semantic guidance of eye movements in real-world scenes. Vision Research, 5 1: 11921205, 201 1. 2, 5

[10] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. T-PAMI, 20(1 1): 1254-1259, 1998. 1, 2, 3, 5, 7

[11] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. ICCV, 2009. 2, 5

[12] T. Lee. An information-theoretic framework for understanding saccadic eye movements. NIPS, 2000. 2

[13] J. Li, Y. Tian, T. Huang, and W. Gao. Probabilistic multi-task learning for visual saliency estimation in video. IJCV, 90: 150-165, 2010. 2

[14] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy rate superpixel segmentation. CVPR, 2011. 5

[15] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.-Y. Shum. Learning to detect a salient object. T-PAMI, 33(2): 353-367, 2011. 1, 2, 5

[16] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91-1 10, 2004. 3

[17] S. Lu and J. Lim. Saliency modeling from image histograms. ECCV, 2012. 2

[18] D. Pang, A. Kimura, T. Takeuchi, J. Yamato, and K. Kashino. A stochastic model of selective visual attention with a dynamic Bayesian network. ICME, 2008. 3

[19] R. J. Peters and L. Itti. Beyond bottom-up: Incorporating taskdependent influences into a computational model of spatial attention. CVPR, 2007. 2 The corresponding methods are shown at the bottom-left corners, and

[20] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2), 1999. 4

[21] L. W. Renninger, J. Coughlan, P. Verghese, and J. Malik. An information maximization model of eye movements. NIPS, 2004. 2

[22] L. W. Renninger, P. Verghese, and J. Coughlan. Where to look next? Eye movements reduce local uncertainty. Journal of Vision, 7(3):6, 1-17, 2007. 2

[23] R. D. Rimey and C. M. Brown. Controlling eye movements with hidden Markov models. IJCV, 7(1):47-65, 1991. 3

[24] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147: 195-197, 1981. 6

[25] R. Subramanian, H. Katti, N. Sebe, M. Kankanhalli, and T.-S. Chua. An eye fixation database for saliency detection in images. ECCV, 2010. 5

[26] X. Sun, H. Yao, and R. Ji. What are we looking for: towards statistical modeling of saccadic eye movements and visual saliency. CVPR, 2012. 2

[27] D. Walthera and C. Koch. Modeling attention to salient protoobjects. Neural Networks, 19: 1395-1407, 2006. 1, 2, 7

[28] W. Wang, C. Chen, Y. Wang, T. Jiang, F. Fang, and Y. Yao. Simulating human saccadic scanpaths on natural images. CVPR, 2011. 1, 3, 6, 7

[29] W. Wang, Y. Wang, Q. Huang, and W. Gao. Measuring visual saliency by site entropy rate. CVPR, 2010. 1

[30] J. Yang and M. Yang. Top-down visual saliency via joint crf and dictionary learning. CVPR, 2012. 2

[31] L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell. SUN: A Bayesian framework for salience using natural statistics. Journal of Vision, 8(7):32, 1-20, 2008. 2

[32] Q. Zhao and C. Koch. Learning a saliency map using fixated locations in natural scenes. Journal of Vision, 11(3):9, 1-15, 2011. 1, 2, 5 33223392