cvpr cvpr2013 cvpr2013-258 cvpr2013-258-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor
Abstract: During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel methodfor video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previousframe. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.
[1] B. Block. The visual story: seeing the structure of film, TV, and new media. Focal Press, 2001.
[2] A. Borji and L. Itti. State-of-the-art in visual attention modeling. PAMI, 2012.
[3] A. Borji, D. Sihite, and L. Itti. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 2012.
[4] L. Bourdev and J. Brandt. Robust object detection via
[5]
[6]
[7]
[8]
[9]
[10] soft cascade. In CVPR, pages 236–243, 2005. L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, pages 1365–1372, 2009. G. Buswell. How people look at pictures: a study of the psychology and perception in art. 1935. Y. Cheng. Mean shift, mode seeking, and clustering. PAMI, 17(8):790–799, 1995. X. Cui, Q. Liu, and D. Metaxas. Temporal spectral residual: fast motion saliency detection. In Proceedings of the ACM international Conference on Multimedia, 2009. S. Goferman, L. Zelnik-Manor, and A. Tal. Contextaware saliency detection. PAMI, 34(10): 1915–1926, 2012. R. Goldstein, R. Woods, and E. Peli. Where people look when watching movies: Do all viewers look at the same place? Computers in biology and medicine, 37(7):957–964, 2007. 1 1 1 1 1 15 5 53 1 1 Humans Ours Center GBVS PQFT Hou Figure 7. Our saliency maps resemble the ground truth. Examples of saliency detection results using different methods show that the saliency predicted by the proposed method better approximates the human gaze map.
[11] C. Guo, Q. Ma, and L. Zhang. Spatio-temporal
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20] saliency detection using phase spectrum of quaternion fourier transform. In CVPR, pages 1–8, 2008. J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. NIPS, 19:545, 2007. J. Henderson. Human gaze control during realworld scene perception. Trends in cognitive sciences, 7(11):498–504, 2003. X. Hou and L. Zhang. Dynamic visual attention: Searching for coding length increments. NIPS, 21:681–688, 2008. L. Itti. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, 13(10): 1304–1318, 2004. L. Itti, C. Koch, and E. Niebur. A model of saliencybased visual attention for rapid scene analysis. PAMI, 20(1 1): 1254–1259, 1998. G. Johansson. Visual perception of biological motion and a model for its analysis. Perceiving events and objects, 1973. T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In ICCV, pages 2106–21 13, 2009. W. Kim, C. Jung, and C. Kim. Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Transactions on Circuits and Systems for Video Technology, 21(4):446–456, 2011. C. Koch and S. Ullman. Shifts in selective visual atten-
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28] tion: towards the underlying neural circuitry. Human Neurobiology, 4(4):219–27, 1985. A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3): 18–22, 2002. C. Liu. Beyond pixels: exploring new representations and applicationsfor motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009. V. Mahadevan and N. Vasconcelos. Spatiotemporal saliency in dynamic scenes. PAMI, 32(1): 171–177, 2010. P. Mital, T. Smith, R. Hill, and J. Henderson. Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, 3(1):5–24, 2011. B. Schauerte and R. Stiefelhagen. Predicting human gaze using quaternion dct image signature saliency and face detection. In IEEE Workshop on Applications of Computer Vision (WACV), pages 137–144. IEEE, 2012. H. Seo and P. Milanfar. Static and space-time visual saliency detection by self-resemblance. Journal of Vision, 9(7), 2009. T. Smith. Attentional theory of cinematic continuity. Projections: The Journal for Movies and Mind, 6(1): 1–27, 2012. T. Smith and J. Henderson. Edit blindness: The relationship between attention and global change blindness in dynamic scenes. Journal of Eye Movement Research, 2(2):6, 2008.
[29] Tobii. Advertising research and eye tracking. http : / /www .t obi i .com/ eye-t racking-re search/ global / re search/ advert i ing-re search/ . s
[30] A. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive psychology, 12(1):97–136, 1980. [3 1] P. Tseng, R. Carmi, I. Cameron, D. Munoz, and L. Itti. Quantifying center bias of observers in free viewing of dynamic natural scenes. Journal of Vision, 9(7), 2009.
[32] S. Zanetti, L. Zelnik-Manor, and P. Perona. A walk through the webs video clips. In CVPRW, pages 1–8. IEEE, 2008. 1 1 1 1 1 15 5 54 2 2