emnlp emnlp2013 emnlp2013-78 emnlp2013-78-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.