nips nips2004 nips2004-205 nips2004-205-reference knowledge-graph by maker-knowledge-mining

205 nips-2004-Who's In the Picture

Source: pdf

Author: Tamara L. Berg, Alexander C. Berg, Jaety Edwards, David A. Forsyth

Abstract: The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer, with these faces. A simple clustering method can produce fair results. We improve these results signiﬁcantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation. 1

reference text

[1] K. Barnard, D.A. Forsyth, “Clustering Art,” Computer Vision and Pattern Recognition, 2001

[2] K. Barnard, P. Duygulu, N. de Freitas, D.A. Forsyth, D. Blei, and M.I. Jordan, “Matching Words and Pictures,” Journal of Machine Learning Research, Vol 3, pp 1107-1135, 2003.

[3] P. Belhumeur, J. Hespanha, D. Kriegman “Eigenfaces vs. Fisherfaces: Recognition Using Class Speciﬁc Linear Projection” Transactions on Pattern Analysis and Machine Intelligence, Special issue on face recognition, pp. 711-720, July 1997.

[4] A.C. Berg, J. Malik, “Geometric Blur for Template Matching,” Computer Vision and Pattern Recognition,Vol I, pp. 607-614, 2001.

[5] T.L. Berg, A.C. Berg, J. Edwards, M. Maire, R. White, E. Learned-Miller, D.A. Forsyth “Names and Faces in the News” Computer Vision and Pattern Recognition, 2004.

[6] V. Blanz, T. Vetter, “Face Recognition Based on Fitting a 3D Morphable Model,” Transactions on Pattern Analysis and Machine Intelligence Vol. 25 no.9, 2003.

[7] C. Carson, S. Belongie, H. Greenspan, J. Malik, “Blobworld – Image segmentation using expectationmaximization and its application to image querying,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(8), pp. 1026–1038, 2002.

[8] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” 40th Anniversary Meeting of the Association for Computational Linguistics”, Philadelphia, July 2002. ¨

[9] C. Fowlkes, S. Belongie, F. Chung, J. Malik, “Spectral Grouping Using The Nystr om Method,” TPAMI, Vol. 26, No. 2, February 2004.

[10] R. Gross, J. Shi and J. Cohn, “Quo Vadis Face Recognition?,” Third Workshop on Empirical Evaluation Methods in Computer Vision, December, 2001.

[11] R. Gross, I. Matthews, and S. Baker, “Appearance-Based Face Recognition and LightFields,” Transactions on Pattern Analysis and Machine Intelligence, 2004.

[12] V. Lavrenko, R. Manmatha., J. Jeon, “A Model for Learning the Semantics of Pictures,” Neural Information Processing Systems, 2003

[13] J. Li and J. Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1075-1088, 2003

[14] K. Mikolajczyk “Face detector,” Ph.D report, INRIA Rhone-Alpes

[15] J. Scheeres, “Airport face scanner failed”, Wired News, 2002. http://www.wired.com/news/privacy/0,1848,52563,00.html.

[16] B. Scholkopf, A. Smola, K.-R. Muller “Nonlinear Component Analysis as a Kernel Eigenvalue Problem” Neural Computation, Vol. 10, pp. 1299-1319, 1998. ¨

[17] C. Williams, M. Seeger “Using the Nystrom Method to Speed up Kernel Machines,” Advances in Neural Information Processing Systems, Vol 13, pp. 682-688, 2001.