iccv iccv2013 iccv2013-247 iccv2013-247-reference knowledge-graph by maker-knowledge-mining

247 iccv-2013-Learning to Predict Gaze in Egocentric Video

Source: pdf

Author: Yin Li, Alireza Fathi, James M. Rehg

Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer’s behaviors. Specifically, we compute the camera wearer’s head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.

reference text

[1] S. Ba and J. Odobez. Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE TPAMI, 33(1): 101– 116, 2011. 2

[2] A. Borji and L. Itti. State-of-the-art in visual attention modeling. TPAMI, 35(1): 185–207, 2013. 1, 2

[3] A. Borji, D. N. Sihite, and L. Itti. Probabilistic learning of taskspecific visual attention. In CVPR, pages 470–477, 2012. 1, 2

[4] T. Brox and J. Malik. Large displacement optical flow: descriptor

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18] matching in variational motion estimation. IEEE TPAMI, 33(3):500– 513, 2011. 3 J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. In CVPR, pages 3241–3248, 2010. 7 A. Fathi, A. Farhadi, and J. Rehg. Understanding egocentric activities. In ICCV, pages 407–414, 2011. 2 A. Fathi, Y. Li, and J. M. Rehg. Learning to recognize daily actions using gaze. In ECCV, pages 314–327, 2012. 1, 2, 3, 6, 7, 8 J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In NIPS, pages 545–552, 2006. 2, 6 X. Hou, J. Harel, and C. Koch. Image signature: Highlighting sparse salient regions. IEEE TPAMI, 34(1): 194–201, 2012. 2 X. Hou and L. Zhang. Dynamic visual attention: searching for coding length increments. In NIPS, pages 681–688, 2008. 2, 6 L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI, 20(1 1): 1254 –1259, 1998. 1, 2, 6 T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In ICCV, 2009. 2, 3 T. Kanade and M. Hebert. First-person vision. Proceedings of the IEEE, 100(8):2442 –2453, 2012. 1 M. F. Land. The coordination of rotations of the eyes, head and trunk in saccadic turns produced in natural situations. Experimental Brain Research, 159: 151–160, 2004. 1, 2, 3 M. F. Land and M. Hayhoe. In what ways do eye movements contribute to everyday activities? Vision Research, 41:3559 – 3565, 2001. 2, 3, 5, 7 Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, pages 1346 –1353, 2012. 2 A. K. Mishra, Y. Aloimonos, L. Cheong, and A. Kassim. Active visual segmentation. IEEE TPAMI, 34(2):639–653, 2012. 1, 7 M. Nystrom and K. Holmqvist. An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data. Behavior Research Methods, 42(1): 188–204, 2010. 4, 5

[19] J. Pelz, M. Hayhoe, and R. Loeber. The coordination of eye, head, and hand movements in a natural task. Experimental Brain Research, 139:266–277, 2001. 1, 2, 3

[20] H. Pirsiavash and D. Ramanan. Detecting activities of daily living in first-person camera views. In CVPR, pages 2847–2854, 2012. 2

[21] L. Ren and J. Crawford. Coordinate transformations for hand-guided saccades. Experimental Brain Research, 195:455–465, 2009. 3

[22] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, pages 1–15, 2006. 4

[23] E. H. Spriggs, F. De la Torre Frade, and M. Hebert. Temporal segmentation and activity classification from first-person sensing. In IEEE Workshop on Egocentric Vision, CVPR, 2009. 2

[24] A. Torralba, A. Oliva, M. S. Castelhano, and J. M. Henderson. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological Review, 113(4):766–786, 2006. 1, 2

[25] K. Yamada, Y. Sugano, T. Okabe, Y. Sato, A. Sugimoto, and K. Hiraki. Attention prediction in egocentric video using motion and visual saliency. In Advances in Image and Video Technology, volume 7087 of Lecture Notes in Computer Science, pages 277–288. 2012. 2

[26] C. Yu and D. Ballard. Understanding human behaviors based on eyehead-hand coordination. In Biologically Motivated Computer Vision, volume 2525 of Lecture Notes in Computer Science, pages 611–619. 2002. 2 33221236