nips nips2009 nips2009-259 nips2009-259-reference knowledge-graph by maker-knowledge-mining

259 nips-2009-Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation


Source: pdf

Author: Jie Luo, Barbara Caputo, Vittorio Ferrari

Abstract: Given a corpus of news items consisting of images accompanied by text captions, we want to find out “who’s doing what”, i.e. associate names and action verbs in the captions to the face and body pose of the persons in the images. We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. These models can then be used to recognize people and actions in novel images without captions. We demonstrate experimentally that our joint ‘face and pose’ model solves the correspondence problem better than earlier models covering only the face, and that it can perform recognition of new uncaptioned images. 1


reference text

[1] http://opennlp.sourceforge.net/.

[2] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan. Matching words and pictures. JMLR, 3:1107–1135, 2003.

[3] K. Barnard and Q. Fan. Reducing correspondence ambiguity in loosely labeled training data. In Proc. CVPR’07.

[4] S. Basu, M. Bilenko, A. Banerjee, and R. J. Mooney. Probabilistic semi-supervised clustering with constraints. In O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors, Semi-Supervised o Learning, pages 71–98. MIT Press, 2006.

[5] T. Berg, A. Berg, J. Edwards, and D. Forsyth. Names and faces in the news. In Proc. CVPR’04.

[6] T. Berg, A. Berg, J. Edwards, and D. Forsyth. Who’s in the picture? In Proc. NIPS’04.

[7] A. P. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal Royal Statistical Society, 39:1–38, 1977.

[8] K. Deschacht and M.-F. Moens. Semi-supervised semantic role labeling using the latent words language model. In Proc. EMNLP’09.

[9] I. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proc. KDD’04.

[10] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. ECCV’02.

[11] M. Eichner and V. Ferrari. Better appearance models for pictorial structures. In Proc. BMVC’09.

[12] M. Everingham, J. Sivic, and A. Zisserman. Hello! my name is... buffy - automatic naming of characters in tv video. In Proc. BMVC’06.

[13] V. Ferrari, M. Marin, and A. Zisserman. Pose search: retrieving people using their pose. In Proc. CVPR’09.

[14] V. Ferrari, M. Marin, and A. Zisserman. Progressive search space reduction for human pose estimation. In Proc. CVPR’08.

[15] A. Frome, Y. Singer, and J. Malik. Image retrieval and classification using local distance functions. In Proc. NIPS’06.

[16] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Automatic face naming with captionbased supervision. In Proc. CVPR’08.

[17] A. Gupta and L. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In Proc. ECCV’08.

[18] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

[19] J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.

[20] T. Malisiewicz and A. Efros. Recognition by association via learning per-exemplar distances. In Proc. CVPR’08.

[21] T. Mensink and J. Verbeek. Improving people search using query expansions: How friends help to find people. In Proc. ECCV’08.

[22] R. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355–368. Kluwer Academic Publishers, 1998. ´

[23] Y. Rodriguez. Face Detection and Verification using Local Binary Patterns. PhD thesis, Ecole Polytechnique F´ d´ rale de Lausanne, 2006. e e

[24] N. Shental, A. Bar-Hillel, T. Hertz, and D. Weinshall. Computing gaussian mixture models with em using equivalence constraints. In Proc. NIPS’03.

[25] K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In Proc. ICML’01. 9