acl acl2012 acl2012-76 acl2012-76-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran
Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.
Marco Baroni and Alessandro Lenci. 2008. Concepts and properties in word spaces. Italian Journal of Linguistics, 20(1):55–88. Marco Baroni and Alessandro Lenci. 2010. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4):673–721 . Shane Bergsma and Randy Goebel. 2011. Using visual information to predict lexical preference. In Proceedings ofRecentAdvances in Natural Language Processing, pages 399–405, Hissar. Shane Bergsma and Benjamin Van Durme. 2011. Learning bilingual lexicons using the visual similarity of labeled web images. In Proc. IJCAI, pages 1764–1769, Barcelona, Spain, July. Brent Berlin and Paul Key. 1969. Basic Color Terms: Their Universality and Evolution. University of California Press, Berkeley, CA. Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Image Classification using Random Forests and Ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. Elia Bruni, Giang Binh Tran, and Marco Baroni. 2011. Distributional semantics from text and images. In Proceedings of the EMNLP GEMS Workshop, pages 22– 32, Edinburgh. John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell, 36(4):679–698. Gabriella Csurka, Christopher Dance, Lixin Fan, Jutta Willamowski, and C ´edric Bray. 2004. Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1–22. Stefan Evert. 2005. The Statistics of Word Cooccurrences. Dissertation, Stuttgart University. Mark D. Fairchild. 2005. Status of cie color appearance models. A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. 2010. Every picture tells a story: Generating sentences from images. In Proceedings of ECCV. Yansong Feng and Mirella Lapata. 2010. Visual information in semantic representation. In Proceedings of HLT-NAACL, pages 91–99, Los Angeles, CA. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1): 116–131 . Kristen Grauman and Trevor Darrell. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In In ICCV, pages 1458–1465. G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, and T. Berg. 2011. Baby talk: Understanding and generating simple image descriptions. In Proceedings of CVPR. 144 Thomas Landauer and Susan Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):21 1–240. Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR 2006, pages 2169–2178, Washington, DC, USA. IEEE Computer Society. Chee Wee Leong and Rada Mihalcea. 2011. Going beyond text: A hybrid image-text approach for measuring word relatedness. In Proceedings of IJCNLP, pages 1403–1407, Chiang Mai, Thailand. Max Louwerse. 2011. Symbol interdependency in symbolic and embodied cognition. Topics in Cognitive Science, 3:273–302. David Lowe. 1999. Object Recognition from Local Scale-Invariant Features. Computer Vision, IEEE International Conference on, 2: 1150–1 157 vol.2, August. David Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), November. Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical cooccurrence. Behavior Research Methods, 28:203–208. Saif Mohammad. 2011. Colourful language: Measuring word-colour associations. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, pages 97–106, Portland, Oregon. Raymond J. Mooney. 2008. Learning to connect language and perception. David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR ’06, pages 2161–2168. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision, 42: 145–175. G o¨zde O¨zbal, Carlo Strapparava, Rada Mihalcea, and Daniele Pighin. 2011. A comparison of unsupervised methods to associate colors with words. In Proceedings of ACII, pages 42–5 1, Memphis, TN. Ekaterina Shutova. 2010. Models of metaphor in NLP. In Proceedings of ACL, pages 688–697, Uppsala, Sweden. Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470–1477, October. Richard Szeliski. 2010. Computer Vision : Algorithms and Applications. Springer-Verlag New York Inc. Peter Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37: 141–188. Peter Turney, Yair Neuman, Dan Assaf, and Yohai Cohen. 2011. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of EMNLP, pages 680–690, Edinburgh, UK. Andrea Vedaldi and Brian Fulkerson. 2008. VLFeat: An open and portable library of computer vision algorithms. http : / /www .vlfeat .org/ . Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 319–326, Vienna, Austria. Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo. 2007. Evaluating bag-of-visualwords representations in scene classification. In Multimedia Information Retrieval, pages 197–206. Song Chun Zhu, Cheng en Guo, Ying Nian Wu, and Yizhou Wang. 2002. What are textons? In Computer Vision - ECCV 2002, 7th European Conference on Computer Vision, Copenhagen, Denmark, May 28-31, 2002, Proceedings, Part IV, pages 793–807. Springer. 145