nips nips2007 nips2007-202 nips2007-202-reference knowledge-graph by maker-knowledge-mining

202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency

Source: pdf

Author: Dashan Gao, Vijay Mahadevan, Nuno Vasconcelos

Abstract: The classical hypothesis, that bottom-up saliency is a center-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the discriminant power of a set of features with respect to the classiﬁcation problem that opposes stimuli at center and surround, at that location. It is shown that the resulting saliency detector makes accurate quantitative predictions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to various stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion ﬁelds, and even dynamic textures), and applied to a number of the latter (the prediction of human eye ﬁxations, motion-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye ﬁxations better than previous models, and produces background subtraction algorithms that outperform the state-of-the-art in computer vision. 1

reference text

[1] N. D. Bruce and J. K. Tsotsos. Saliency based on information maximization. In Proc. NIPS, 2005.

[2] R. Buccigrossi and E. Simoncelli. Image compression via joint statistical characterization in the wavelet domain. IEEE Transactions on Image Processing, 8:1688–1701, 1999.

[3] A. B. Chan and N. Vasconcelos. Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Trans. PAMI, In Press.

[4] M. N. Do and M. Vetterli. Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance. IEEE Trans. Image Processing, 11(2):146–158, 2002.

[5] G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto. Dynamic textures. Int. J. Comput. Vis., 51, 2003.

[6] D. Gao and N. Vasconcelos. Discriminant saliency for visual recognition from cluttered scenes. In Proc. NIPS, pages 481–488, 2004.

[7] D. Gao and N. Vasconcelos. Decision-theoretic saliency: computational principle, biological plausibility, and implications for neurophysiology and psychophysics. submitted to Neural Computation, 2007.

[8] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In Proc. NIPS, 2006.

[9] J. Huang and D. Mumford. Statistics of Natural Images and Models. In Proc. IEEE Conf. CVPR, 1999.

[10] D. H. Hubel and T. N. Wiesel. Receptive ﬁelds and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol., 28:229–289, 1965. 7 (a) (b) (c) Figure 8: Results on Boats: (a) original; b) discriminant saliency with DT; and c) GMM model of [16, 24]. (a) (b) (c) Figure 9: Results on Surfer: (a) original; b) discriminant saliency with DT; and c) GMM model of [16, 24].

[11] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40:1489–1506, 2000.

[12] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. PAMI, 20(11), 1998.

[13] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. PAMI, 11(7):674–693, 1989.

[14] H. C. Nothdurft. The conspicuousness of orientation and motion contrast. Spat. Vis., 7, 1993.

[15] Y. Sheikh and M. Shah. Bayesian modeling of dynamic scenes for object detection. IEEE Trans. on PAMI, 27(11):1778–92, 2005.

[16] C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In CVPR, pages 246–52, 1999.

[17] B. W. Tatler, R. J. Baddeley, and I. D. Gilchrist. Visual correlates of ﬁxation selection: effects of scale and time. Vision Research, 45:643–659, 2005.

[18] A. Treisman and G. Gelade. A feature-integratrion theory of attention. Cognit. Psych., 12, 1980.

[19] A. Treisman and S. Gormican. Feature analysis in early vision: Evidence from search asymmetries. Psychological Review, 95:14–58, 1988.

[20] A. Tversky. Features of similarity. Psychol. Rev., 84, 1977.

[21] N. Vasconcelos. Scalable discriminant feature selection for image retrieval. In CVPR, 2004.

[22] D. Walther and C. Koch. Modeling attention to salient proto-objects. Neural Networks, 19, 2006.

[23] J. Zhong and S. Sclaroff. Segmenting foreground objects from a dynamic textured background via a robust Kalman ﬁlter. In ICCV, 2003.

[24] Z. Zivkovic. Improved adaptive Gaussian mixture model for background subtraction. In ICVR, 2004. 8