acl acl2013 acl2013-249 acl2013-249-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
M. Andrews, G. Vigliocco, and D. Vinson. 2009. Integrating Experiential and Distributional Data to Learn Semantic Representations. Psychological Review, 116(3):463–498. M. Baroni, B. Murphy, E. Barbu, and M. Poesio. 2010. Strudel: A Corpus-Based Semantic Model 580 Based on Properties and Types. Cognitive Science, 34(2):222–254. L. W. Barsalou. 2008. Grounded Cognition. Annual Review of Psychology, 59:617–845. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, March. M. Borga. 2001. Canonical Correlation January. – a Tutorial, M. H. Bornstein, L. R. Cote, S. Maital, K. Painter, S.-Y. Park, L. Pascual, M. G. P ˆecheux, J. Ruel, P. Venuti, and A. Vyt. 2004. Cross-linguistic Analysis of Vocabulary in Young Children: Spanish, Dutch, French, Hebrew, Italian, Korean, and American English. Child Development, 75(4): 1115–1 139. B. B ¨orschinger, B. K. Jones, and M. Johnson. 2011. Reducing Grounded Learning Tasks to Grammatical Inference. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1416–1425, Edinburgh, UK. S.R.K. Branavan, H. Chen, L. S. Zettlemoyer, and R. Barzilay. 2009. Reinforcement Learning for Mapping Instructions to Actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore. E. Bruni, G. Tran, and M. Baroni. 2011. Distributional Semantics from Text and Images. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 22–32, Edinburgh, UK. E. Bruni, G. Boleda, M. Baroni, and N. Tran. 2012a. Distributional Semantics in Technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 136–145, Jeju Island, Korea. E. Bruni, J. Uijlings, M. Baroni, and N. Sebe. 2012b. Distributional semantics with eyes: Using image analysis to improve computational representations of word meaning. In Proceedings of the 20th ACM International Conference on Multimedia, pages 1219–1228., New York, NY. C. Chai and C. Hung. 2008. Automatically Annotating Images with Keywords: A Review ofImage Annotation Systems. Recent Patents on Computer Science, 1:55–68. R. Datta, D. Joshi, J. Li, and J. Z. Wang. 2008. Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys, 40(2): 1–60. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. FeiFei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 248–255, Miami, Florida. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2008. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascalnetwork.org/challenges/VOC/voc2008/workshop. R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9: 1871–1874. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. 2009. Describing Objects by their Attributes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1778–1785, Miami Beach, Florida. L. Fei-Fei and P. Perona. 2005. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 524–53 1, San Diego, California. C. Fellbaum, editor. 1998. WordNet: an Electronic Lexical Database. MIT Press. Y. Feng and M. Lapata. 2008. Automatic image annotation using auxiliary text information. In Proceedings of ACL-08: HLT, pages 272–280, Columbus, Ohio. Y. Feng and M. Lapata. 2010. Visual Information in Semantic Representation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 91–99, Los Angeles, California. ACL. V. Ferrari and A. Zisserman. 2007. Learning Visual Attributes. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 433–440. MIT Press, Cambridge, Massachusetts. G. H. Golub, F. T. Luk, and M. L. Overton. 1981. A block lanczoz method for computing the singular values and corresponding singular vectors of a matrix. ACM Transactions on Mathematical Software, 7: 149–169. P. Gorniak and D. Roy. 2004. Grounded Semantic Composition for Visual Scenes. Journal of Artificial Intelligence Research, 21:429–470. T. L. Griffiths, M. Steyvers, and J. B. Tenenbaum. 2007. Topics in Semantic Representation. Psychological Review, 114(2):21 1–244. D. R. Hardoon, S. R. Szedmak, and J. R. ShaweTaylor. 2004. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16(12):2639–2664. B. T. Johns and M. N. Jones. 2012. Perceptual Inference through Global Lexical Similarity. Topics in Cognitive Science, 4(1): 103–120. D. Joshi, J.Z. Wang, and J. Li. 2006. The Story Picturing Engine—A System for Automatic Text illustration. ACM Transactions on Multimedia Computing, Communications, and Applications, 2(1):68–89. 581 R. J. Kate and R. J. Mooney. 2007. Learning Language Semantics from Ambiguous Supervision. In Proceedings of the 22nd Conference on Artificial Intelligence, pages 895–900, Vancouver, Canada. N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. 2009. Attribute and Simile Classifiers for Face Verification. In Proceedings of the IEEE 12th International Conference on Computer Vision, pages 365–372, Kyoto, Japan. C. H. Lampert, H. Nickisch, and S. Harmeling. 2009. Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer. In Computer Vision and Pattern Recognition, pages 951–958, Miami Beach, Florida. B. Landau, L. Smith, and S. Jones. 1998. Object Perception and Object Naming in Early Development. Trends in Cognitive Science, 27: 19–24. C. Leong and R. Mihalcea. 2011. Going Beyond Text: A Hybrid Image-Text Approach for Measuring Word Relatedness. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 1403–1407, Chiang Mai, Thailand. J. Liu, B. Kuipers, and S. Savarese. 2011. Recognizing Human Actions by Attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3337–3344, Colorado Springs, Colorado. D. G. Lowe. 1999. Object Recognition from Local Scale-invariant Features. In Proceedings of the International Conference on Computer Vision, pages 1150–1 157, Corfu, Greece. D. Lowe. 2004. Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision, 60(2):91–1 10. W. Lu, H. T. Ng, W.S. Lee, and L. S. Zettlemoyer. 2008. A Generative Model for Parsing Natural Language to Meaning Representations. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 783–792, Honolulu, Hawaii. K. McRae, G. S. Cree, M. S. Seidenberg, and C. McNorgan. 2005. Semantic Feature Production Norms for a Large Set of Living and Nonliving Things. Behavior Research Methods, 37(4):547–559. D. L. Nelson, C. L. McEvoy, and T. A. Schreiber. 1998. The University of South Florida Word Association, Rhyme, and Word Fragment Norms. J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Leanring, pages 689–696, Bellevue, Washington. A. Oliva and A. Torralba. 2007. The Role ofContext in Object Recognition. 11(12):520–527. Trends in Cognitive Sciences, D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith. 1991 . Default Probability. Cognitive Science, 2(15):251–269. G. Patterson and J. Hays. 2012. SUN Attribute Database: Discovering, Annotating and Recognizing Scene Attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 275 1–2758, Providence, Rhode Island. T. Regier. 1996. The Human Semantic Potential. MIT Press, Cambridge, Massachusetts. D. Roy and A. Pentland. 2002. Learning Words from Sights and Sounds: A Computational Model. Cognitive Science, 26(1): 113–146. C. Silberer and M. Lapata. 2012. Grounded Models of Semantic Representation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1423–1433, Jeju Island, Korea. J. M. Siskind. 2001 . Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic. Journal of Artificial Intelligence Research, 15:31–90. S. A. Sloman and L. J. Ripps. 1998. Similarity as an Explanatory Construct. Cognition, 65:87–101. N. Srivastava and R. Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, pages 223 1–2239, Lake Tahoe, Nevada. M. Steyvers. 2010. Combining feature norms and text data with topic models. Acta Psychologica, 133(3):234–342. S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. Gopal Banerjee, S. Teller, and N. Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Manipulation. In Proceedings of the 25th National Conference on Artificial Intelligence, pages 1507–15 14, San Francisco, California. L. von Ahn and L. Dabbish. 2004. Labeling images with a computer game. In Proceeings of the Human Factors in Computing Systems Conference, pages 3 19–326, Vienna, Austria. C. Yu and D. H. Ballard. 2007. A Unified Model of Early Word Learning Integrating Statistical and Social Cues. Neurocomputing, 70:2149–2165. M. D. Zeigenfuse and M. D. Lee. 2010. Finding the Features that Represent Stimuli. Acta Psychological, 133(3):283–295. J. M. Zelle and R. J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Program- ming. In Proceedings of the 13th National Conference on Artificial Intelligence, pages 1050–1055, Portland, Oregon. L. S. Zettlemoyer and M. Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 658–666, Edinburgh, UK. 582