nips nips2003 nips2003-12 nips2003-12-reference knowledge-graph by maker-knowledge-mining

12 nips-2003-A Model for Learning the Semantics of Pictures

Source: pdf

Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon

Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model signiﬁcantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1

reference text

[1] K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107-1135, 2003.

[2] D. Blei (2003) Private Communication.

[3] D. Blei, and M. I. Jordan. (2003) Modeling annotated data. In Proceedings of the 26th Intl. ACM SIGIR Conf., pages 127–134, 2003

[4] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a ﬁxed image vocabulary. In Seventh European Conf. on Computer Vision, pages 97-112, 2002.

[5] J. Jeon, V. Lavrenko and R. Manmatha. (2003) Automatic Image Annotation and Retrieval using Cross-Media Relevance Models In Proceedings of the 26th Intl. ACM SIGIR Conf., pages 119–126, 2003

[6] Ponte, J. M. and Croft, W. B. (1998). A language modeling approach to information retrieval. Proceedings of the 21st Intl. ACM SIGIR Conf., pages 275–281.

[7] V. Lavrenko and W. Croft. Relevance-based language models. Proceedings of the 24th Intl. ACM SIGIR Conf., pages 120-127, 2001.

[8] V. Lavrenko, M. Choquette, and W. Croft. Cross-lingual relevance models. Proceedings of the 25th Intl. ACM SIGIR Conf., pages 175–182, 2002.

[9] Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In MISRM’99 First Intl. Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.

[10] H. Schneiderman, T. Kanade. A Statistical Method for 3D Object Detection Applied to Faces and Cars. Proc. IEEE CVPR 2000: 1746-1759

[11] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.