nips nips2001 nips2001-54 nips2001-54-reference knowledge-graph by maker-knowledge-mining

54 nips-2001-Contextual Modulation of Target Saliency

Source: pdf

Author: Antonio Torralba

Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1

reference text

[1] Biederman, I., Mezzanotte, R.J., & Rabinowitz, J.C. (1982). Scene perception: detecting and judging objects undergoing relational violations. Cognitive Psychology, 14:143177. Feature maps \ I V t---HXJ---+l . . . . ~ Figure 6: Schema for object detection (e.g. cars) integrating local and giobal information.

[2] Carson, C., Belongie, S., Greenspan, H., and Malik, J. (1997). Region-based image querying. Proc. IEEE W. on Content-Based Access of Image and Video Libraries, pp: 42-49.

[3] Gershnfeld, N. The nature of mathematical modeling. Cambridge university press, 1999.

[4] Gorkani, M. M., Picard, R. W. (1994). Texture orientation for sorting photos 'at a glance'. Proc. Int. Conf. Pat. Rec., Jerusalem, Vol. I: 459-464.

[5] Heisle, B., T. Serre, S. Mukherjee and T. Poggio. (2001) Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images. In: Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Jauai, Hawaii.

[6] Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Vision, 20(11):1254.

[7] Moghaddam, B., & Pentland, A. (1997). Probabilistic Visual Learning for Object Representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696-710.

[8] Oliva, A., & Torralba, A. (2001). Modeling the Shape of the Scene: A holistic representation of the spatial envelope. Int. Journal of Computer Vision, 42(3):145-175.

[9] Rao, R.P.N., Zelinsky, G.J., Hayhoe, M.M., & Ballard, D.H. (1996). Modeling saccadic targeting in visual search. NIPS 8. Cambridge, MA: MIT Press.

[10] Schiele, B., Crowley, J. L. (2000) Recognition without Correspondence using Multidimensional Receptive Field Histograms, Int. Journal of Computer Vision, Vol. 36(1):31-50.

[11] Strat, T. M., & Fischler, M. A. (1991). Context-based vision: recognizing objects using information from both 2-D and 3-D imagery. IEEE trans. on Pattern Analysis and Machine Intelligence, 13(10): 1050-1065.

[12] Szummer, M., and Picard, R. W. (1998). Indoor-outdoor image classification. In IEEE intl. workshop on Content-based Access of Image and Video Databases, 1998.

[13] Torralba, A., & Sinha, P. (2001). Statistical context priming for object detection. IEEE Proc. Of Int. Conf in Compo Vision.

[14] Treisman, A., & Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, Vol. 12:97-136.

[15] Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In: Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), IEEE Computer Society Press, Jauai, Hawaii.

[16] Wolfe, J. M. (1994). Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1:202-228