nips nips2013 nips2013-356 nips2013-356-reference knowledge-graph by maker-knowledge-mining

356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

Source: pdf

Author: Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Ng

Abstract: This work introduces a model that can recognize objects in images even if no training data is available for the object class. The only necessary knowledge about unseen visual categories comes from unsupervised text corpora. Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes. This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model does not require any manually deﬁned semantic or visual features for either words or images. Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class. We then use novelty detection methods to differentiate unseen classes from seen classes. We demonstrate two novelty detection strategies; the ﬁrst gives high accuracy on unseen classes, while the second is conservative in its prediction of novelty and keeps the seen classes’ accuracy high. 1

reference text

[1] M. Baroni and A. Lenci. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721, 2010.

[2] E. Bart and S. Ullman. Cross-generalization: learning novel classes from a single example by feature replacement. In CVPR, 2005.

[3] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3, March 2003.

[4] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classiﬁcation. In ACL, 2007.

[5] E. Bruni, G. Boleda, M. Baroni, and N. Tran. Distributional semantics in technicolor. In ACL, 2012.

[6] A. Coates and A. Ng. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization . In ICML, 2011.

[7] R. Collobert and J. Weston. A uniﬁed architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008.

[8] J. Curran. From Distributional to Semantic Similarity. PhD thesis, University of Edinburgh, 2004.

[9] K. Erk and S. Pad´ . A structured vector space model for word meaning in context. In EMNLP, 2008. o

[10] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In CVPR, 2009.

[11] Y. Feng and M. Lapata. Visual information in semantic representation. In HLT-NAACL, 2010.

[12] M. Fink. Object classiﬁcation from a single example utilizing class relevance pseudo-metrics. In NIPS, 2004.

[13] X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for Large-Scale sentiment classiﬁcation: A deep learning approach. In ICML, 2011.

[14] D. Hoiem, A.A. Efros, and M. Herbert. Geometric context from a single image. In ICCV, 2005.

[15] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL, 2012.

[16] Yangqing Jia, Chang Huang, and T. Darrell. Beyond spatial pyramids: Receptive ﬁeld learning for pooled image features. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3370 –3377, june 2012.

[17] H. Kriegel, P. Kr¨ ger, E. Schubert, and A. Zimek. LoOP: local Outlier Probabilities. In Proceedings of o the 18th ACM conference on Information and knowledge management, CIKM ’09, 2009.

[18] Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Computer Science Department, University of Toronto, 2009.

[19] R.; Perona L. Fei-Fei; Fergus. One-shot learning of object categories. TPAMI, 28, 2006.

[20] B. M. Lake, J. Gross R. Salakhutdinov, and J. B. Tenenbaum. One shot learning of simple visual concepts. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011.

[21] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to Detect Unseen Object Classes by BetweenClass Attribute Transfer. In CVPR, 2009.

[22] T. K. Landauer and S. T. Dumais. A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2):211–240, 1997.

[23] C.W. Leong and R. Mihalcea. Going beyond text: A hybrid image-text approach for measuring word relatedness. In IJCNLP, 2011.

[24] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL, pages 768–774, 1998.

[25] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A.Y. Ng. Multimodal deep learning. In ICML, 2011.

[26] S. Pado and M. Lapata. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199, 2007.

[27] M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell. Zero-shot learning with semantic output codes. In NIPS, 2009.

[28] Guo-Jun Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, and T. Huang. Towards cross-category knowledge propagation for learning visual concepts. In CVPR, 2011.

[29] A. Torralba R. Salakhutdinov, J. Tenenbaum. Learning to learn with compound hierarchical-deep models. In NIPS, 2012.

[30] H. Sch¨ tze. Automatic word sense discrimination. Computational Linguistics, 24:97–124, 1998. u 9

[31] R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.

[32] P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. Journal of Artiﬁcial Intelligence Research, 37:141–188, 2010.

[33] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008. 10