acl acl2012 acl2012-76 knowledge-graph by maker-knowledge-mining

76 acl-2012-Distributional Semantics in Technicolor


Source: pdf

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. [sent-6, score-1.217]

2 Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance. [sent-8, score-0.876]

3 Recent developments in computer vision make it possible to computationally model one vital human perceptual channel: vision (Mooney, 2008). [sent-11, score-0.556]

4 A few studies have begun to use visual information extracted from images as part of distributional semantic models (Bergsma and Van 136 name . [sent-12, score-0.847]

5 These preliminary studies all focus on how vision may help text-based models in general terms, by evaluating performance on, for instance, word similarity datasets such as WordSim353. [sent-16, score-0.333]

6 This paper contributes to connecting language and perception, focusing on how to exploit visual information to build better models of word meaning, in three ways: (1) We carry out a systematic comparison ofmodels using textual, visual, and both types of information. [sent-17, score-0.581]

7 (2) We evaluate the models on general semantic relatedness tasks and on two specific tasks where visual information is highly relevant, as they focus on color terms. [sent-18, score-1.108]

8 (3) Unlike previous work, we study the impact of using different kinds of visual information for these semantic tasks. [sent-19, score-0.557]

9 Moreover, we show that visual and textual information are tapping on different aspects of meaning, such that they are complementary sources of information, and indeed combining them in multimodal models often improves performance. [sent-21, score-0.876]

10 We also show that “hybrid” models exploiting the patterns of co-occurrence of words as tags of the same images can be a powerful surrogate of visual information under certain circumstances. [sent-22, score-0.727]

11 1 Textual models For the current project, we constructed a set of textual distributional models that implement various standard ways to extract them from a corpus, chosen to be representative of the state of the art. [sent-30, score-0.321]

12 2 Visual models The visual models use information extracted from images instead of textual corpora. [sent-51, score-0.904]

13 We use image data where each image is associated with one or more words or tags (we use “tag” for each word associated to the image, and “label” for the set of tags of an image). [sent-52, score-0.472]

14 We build one vector with visual features for each tag in the dataset. [sent-56, score-0.509]

15 The visual features are extracted with the use of a standard bag-of-visual-words (BoVW) representation of images, inspired by NLP (Sivic and Zisserman, 2003; Csurka et al. [sent-57, score-0.536]

16 org pens in NLP, where words are (mostly) discrete and easy to identify, in vision the visual words need to be first defined. [sent-71, score-0.77]

17 From every image in a dataset, relevant areas are identified and a low-level feature vector (called a “descriptor”) is built to represent each area. [sent-74, score-0.236]

18 Each cluster is treated as a discrete visual word, and the clusters will be the vocabulary of visual words used to represent all the images in the collection. [sent-76, score-1.164]

19 The values of each dimension are obtained by summing the occurrences of the relevant visual word in all the images tagged with the word. [sent-79, score-0.655]

20 The process to extract visual words and use them to create image-based vectors to represent (real) words is illustrated in Figure 1, for a hypothetical example in which there is only one image in the collection labeled with the word horse. [sent-81, score-0.794]

21 $(+%# Figure 1: Procedure to build a visual representation for a word, exemplified with SIFT features. [sent-83, score-0.536]

22 Second, LAB features (Fairchild, 2005), which encode only color information. [sent-86, score-0.385]

23 We also experimented with other visual features, such as those focusing on edges (Canny, 1986), texture (Zhu et al. [sent-87, score-0.509]

24 , 2002), and shapes (Oliva and Torralba, 2001), but they were not useful for the color tasks. [sent-88, score-0.385]

25 Moreover, we experimented also with different color scales, such as LUV, HSV and RGB, obtaining significantly worse performance compared to LAB. [sent-89, score-0.415]

26 SIFT features are designed to be invariant to image scale and rotation, and have been shown to provide a robust matching across affine distortion, noise and change in illumination. [sent-91, score-0.236]

27 The version of SIFT features that we use is sensitive to color (RGB scale; LUV, LAB and OPPONENT gave worse results). [sent-92, score-0.445]

28 We automatically identified keypoints for each image and extracted SIFT features on a regular grid defined around the keypoint with five pixels spacing, at four multiple scales (10, 15, 20, 25 pixel radii), zeroing the low contrast ones. [sent-93, score-0.236]

29 To obtain the visual word vocabulary, we cluster the SIFT feature vectors with the standardly used k-means clustering algorithm. [sent-94, score-0.558]

30 We varied the number k of visual words between 500 and 2,500 in steps of 500. [sent-95, score-0.509]

31 , 2006), dividing the image into several (spatial) regions, rep- ×× resenting each region in terms of BoVW, and then concatenating the vectors. [sent-97, score-0.268]

32 In our experiments, the spatial regions were obtained by dividing the image in 4 4, for a total of 16 regions (other values and a global representation d 1id6 not perform as well). [sent-98, score-0.369]

33 aNndote a that, following standard practice, descriptor clustering was performed ignoring the region partition, but the resulting visual words correspond to different dimensions in the concatenated BoVW vectors, depending on the region in which they occur. [sent-99, score-0.665]

34 Consequently, a vocabulary of k visual words results in BoVW vectors with k 16 dimensions. [sent-100, score-0.558]

35 The LAB color space plots image data in 3 dimensions along 3 independent (orthogonal) axes, one for brightness (luminance) and two for color (chrominance). [sent-104, score-1.032]

36 Luminance corresponds closely to brightness as recorded by the brain-eye system; the chrominance (red-green and yellow-blue) axes mimic the oppositional color sensations the retina reports to the brain (Szeliski, 2010). [sent-105, score-0.471]

37 We varied the number of k visual words between 128 and 1,024 in steps of 128. [sent-108, score-0.509]

38 3 Multimodal models To assemble the textual and visual representations in multimodal semantic spaces, we concatenate the two vectors after normalizing them. [sent-110, score-0.947]

39 (201 1): Given a word that is present both in the textual model and in the visual model, we separately normalize the two vectors Ft and Fv and we combine them as follows: F = α Ft ⊕ (1 α) Fv where ⊕ is the vector concatenate operator. [sent-112, score-0.663]

40 5 for most model combinations, suggesting that textual and visual information should have similar weight. [sent-115, score-0.614]

41 Like textual models, these models are based on word co-occurrence; like visual models, they consider co-occurrence in images (image labels). [sent-119, score-0.832]

42 In one model (ESP-Win, analogous to window-based models), words tagging an image were represented in terms of co-occurrence with the other tags in the image label (Baroni and Lenci (2008) are a precedent for the use of ESP-Win). [sent-120, score-0.472]

43 com/ s 2m/FUSE 139 models) represented words in terms of their cooccurrence with images, using each image as a different dimension. [sent-122, score-0.263]

44 3 Textual and visual models as general semantic models We test the models just presented in two different ways: First, as general models of word meaning, testing their correlation to human judgements on word similarity and relatedness (this section). [sent-125, score-0.885]

45 Second, as models of the meaning of color terms (sections 4 and 5). [sent-126, score-0.507]

46 MEN is a new evaluation benchmark with a better coverage of our multimodal semantic models. [sent-132, score-0.212]

47 We used the development set of MEN to test the effect of varying the number k of visual words in SIFT and LAB. [sent-140, score-0.509]

48 models built with these visual models and the best textual models (Window2 and Window20). [sent-150, score-0.83]

49 As expected, because they are more mature and capture a broader range of semantic information, textual models perform much better than purely visual models. [sent-152, score-0.734]

50 A first indication that visual information helps is the fact that, for MEN, multimodal models perform best. [sent-154, score-0.745]

51 Note that all models that are sensitive to visual information perform better for MEN than for WordSim, and the reverse is true for textual models. [sent-155, score-0.716]

52 Because of its design, word pairs in MEN can be expected to be more imageable than those in WordSim, so the visual information is more relevant for this dataset. [sent-156, score-0.509]

53 Surprisingly, hybrid models perform quite well: They are around 10 points worse than textual and multimodal models for WordSim, and only slightly worse than multimodal models for MEN. [sent-158, score-0.775]

54 Objects that do not have an obvious characteristic color (computer) and those with more than one characteristic color (zebra, bear) were eliminated. [sent-162, score-0.77]

55 com/ s ite / geomet ricalmode l / shared-evaluat i s on 140 hybrid models on the general semantic tasks (first two columns, section 3; Pearson ρ) and Experiments 1 (E1, section 4) and 2 (E2, section 5). [sent-166, score-0.213]

56 E1 reports the median rank of the correct color and the number of top matches (in parentheses), and E2 the average difference in normalized cosines between literal and nonliteral adjectivenoun phrases, with the significance of a t-test (*** for p< 0. [sent-167, score-0.887]

57 For evaluation, we measured the cosine of each noun with the 11basic color words in the space produced by each model, and recorded the rank of the correct color in the resulting ordered list. [sent-173, score-0.796]

58 2 Results Column E1 in Table 1 reports the median rank for each model (the smaller the rank, the better the model), as well as the number of exact matches (that is, number of nouns for which the model ranks the correct color first). [sent-175, score-0.515]

59 that textual models fail this simple task, with median ranks around 3. [sent-177, score-0.225]

60 11 This is consistent with the findings in Baroni and Lenci (2008) that standard distributional models do not capture the association between concrete concepts and their typical attributes. [sent-178, score-0.217]

61 Visual models, as expected, are better at capturing the association between concepts and visual attributes. [sent-179, score-0.536]

62 In fact, all models that are sensitive to visual information achieve median rank 1. [sent-180, score-0.685]

63 Multimodal models do not increase performance with respect to visual models: For instance, both W2-LAB128 and W20-LAB128 have the same median rank and number of exact matches as LAB128 alone. [sent-181, score-0.655]

64 Textual information in this case is not complementary to visual information, but simply poorer. [sent-182, score-0.509]

65 For example, pigs are pink in LAB space but brown in SIFT space, perhaps because SIFT focused on the color of the typical environment of a pig. [sent-186, score-0.414]

66 We can thus confirm that, by limiting multimodal spaces to SIFT features, as has been done until now in the literature, we are missing important semantic information, such as the color information that we can mine with LAB. [sent-187, score-0.597]

67 5 Experiment 2 Experiment 2 requires more sophisticated information than Experiment 1, as it involves distinguishing between literal and nonliteral uses of color terms. [sent-190, score-0.768]

68 These were tagged by consensus by two human judges as literal (white towel, black feather) or nonliteral (white wine, white musician, green future). [sent-196, score-0.604]

69 Some phrases had both literal and nonliteral uses, such as blue book in “book that is blue” vs. [sent-197, score-0.452]

70 The dataset consists of 370 phrases, of which our models cover 342, 227 literal and 115 nonliteral. [sent-200, score-0.244]

71 12 The prediction is that, in good semantic models, literal uses will in general result in a higher similarity between the noun and color term vectors: A white towel is white, while wine or musicians are not white in the same manner. [sent-201, score-0.868]

72 We test this prediction by comparing the average cosine between the color term and the nouns across the literal and nonliteral pairs (similar results were obtained in an evaluation in terms of prediction accuracy of a simple classifier). [sent-202, score-0.824]

73 This is particularly striking for the Document model, which performs quite well in general semantic tasks but bad in visual tasks. [sent-206, score-0.584]

74 Visual models are all able to discriminate between the two uses, suggesting that indeed visual information can capture nonliteral aspects of meaning. [sent-207, score-0.823]

75 One crucial question to ask, given the goals of our research, is whether textual and visual models are doing essentially the same job, only using different types of information. [sent-211, score-0.686]

76 Note that, in this case, multimodal models increase performance over the individual modalities, and are the best models for this task. [sent-212, score-0.308]

77 This suggests that the information used in the individual models is complementary, and indeed there is no correlation between the cosines obtained with the best textual and visual models (Pearson’s ρ = . [sent-213, score-0.803]

78 13 Both modalities can capture the differences for black and green, probably because nonliteral uses of these color terms have also clear textual correlates (more concretely, topical correlates, as they are related to race and ecology, respectively). [sent-217, score-0.846]

79 14 Significantly, however, vision can capture nonliteral uses of blue and red, while text can’t. [sent-218, score-0.572]

80 Note that these uses (blue note, shark, shield, red meat, dis- trict, face) do not have a clear topical correlate, and thus it makes sense that vision does a better job. [sent-219, score-0.3]

81 15 Overall, 13 Yellow and brown are excluded because the dataset contains only one and two instances of nonliteral cases for these terms, respectively. [sent-221, score-0.302]

82 15The hybrid model that performs best in the color tasks is ESP-Doc. [sent-224, score-0.478]

83 This model can only detect a relation between an adjective and a noun ifthey directly co-occur in the label of at least one image (a “document” in this setting). [sent-225, score-0.236]

84 6 Related work There is an increasing amount of work in computer vision that exploits text-derived information for image retrieval and annotation tasks (Farhadi et al. [sent-233, score-0.558]

85 Recently, NLPers have begun exploiting BoVW to enrich distributional models that represent word meaning with visual features automatically extracted from images (Feng and Lapata, 2010; Bruni et al. [sent-237, score-0.849]

86 Previous work in this area relied on SIFT features only, whereas we have enriched the visual representation of words with other kinds of features from computer vision, namely, color-related features (LAB). [sent-239, score-0.57]

87 Like us, O¨zbal and colleagues use both a textual model and a visual model (as well as Google adjective-noun cooccurrence counts) to find the typical color of an object. [sent-243, score-1.026]

88 However, their visual model works by analyzing pictures associated with an object, and determining the color ofthe object directly by image analysis. [sent-244, score-1.168]

89 012LNG Text: black Vision: red Text: blue Text: green Vision: white 1. [sent-262, score-0.329]

90 LG G NG G Text: red Text: white Figure 2: Discrimination of literal (L) vs. [sent-276, score-0.294]

91 nonliteral (N) uses by the best visual and textual models. [sent-277, score-0.856]

92 termining the color of an object by the nearness of the noun denoting the object to the color term. [sent-278, score-0.846]

93 In other words, we are trying to model the meaning of color terms and how they relate to other words, and not to directly extract the color of an object from pictures depicting them. [sent-279, score-0.858]

94 Turney and colleagues try, among other things, to distinguish literal and metaphorical usages of adjectives when combined with nouns, including the highly visual adjective dark (dark hair vs. [sent-283, score-0.779]

95 7 Conclusion We have presented evidence that distributional semantic models based on text, while providing a good general semantic representation of word meaning, can be outperformed by models using visual information for semantic aspects of words where vision is relevant. [sent-287, score-1.157]

96 More generally, this suggests that computer vision is mature enough to significantly contribute to perceptually grounded computational models of language. [sent-288, score-0.393]

97 We have also shown 143 that different types of visual features (LAB, SIFT) are appropriate for different tasks. [sent-289, score-0.509]

98 Learning bilingual lexicons using the visual similarity of labeled web images. [sent-321, score-0.509]

99 The pyramid match kernel: Discriminative classification with sets of image features. [sent-378, score-0.236]

100 VLFeat: An open and portable library of computer vision algorithms. [sent-466, score-0.295]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('visual', 0.509), ('color', 0.385), ('vision', 0.261), ('nonliteral', 0.242), ('image', 0.236), ('sift', 0.191), ('multimodal', 0.164), ('men', 0.148), ('images', 0.146), ('literal', 0.141), ('bovw', 0.136), ('white', 0.114), ('textual', 0.105), ('descriptor', 0.092), ('lab', 0.084), ('baroni', 0.079), ('bruni', 0.076), ('wordsim', 0.076), ('distributional', 0.072), ('models', 0.072), ('blue', 0.069), ('hybrid', 0.066), ('green', 0.057), ('nouns', 0.056), ('adjectives', 0.054), ('turney', 0.05), ('meaning', 0.05), ('black', 0.05), ('vectors', 0.049), ('median', 0.048), ('lenci', 0.048), ('dm', 0.048), ('semantic', 0.048), ('spatial', 0.046), ('concrete', 0.046), ('cosines', 0.045), ('dark', 0.045), ('elia', 0.045), ('lmi', 0.045), ('bergsma', 0.04), ('relatedness', 0.04), ('zbal', 0.039), ('esp', 0.039), ('red', 0.039), ('object', 0.038), ('wine', 0.036), ('marco', 0.036), ('colors', 0.034), ('computer', 0.034), ('correlates', 0.034), ('scene', 0.032), ('region', 0.032), ('dataset', 0.031), ('worse', 0.03), ('regions', 0.03), ('leong', 0.03), ('axes', 0.03), ('chrominance', 0.03), ('csurka', 0.03), ('eccv', 0.03), ('gemma', 0.03), ('grauman', 0.03), ('iccv', 0.03), ('lazebnik', 0.03), ('luminance', 0.03), ('lund', 0.03), ('metaphorical', 0.03), ('nister', 0.03), ('race', 0.03), ('rgb', 0.03), ('shark', 0.03), ('sivic', 0.03), ('towel', 0.03), ('vedaldi', 0.03), ('vlfeat', 0.03), ('zisserman', 0.03), ('sensitive', 0.03), ('brown', 0.029), ('experiment', 0.029), ('concepts', 0.027), ('representation', 0.027), ('game', 0.027), ('cooccurrence', 0.027), ('tasks', 0.027), ('luv', 0.026), ('shane', 0.026), ('cvpr', 0.026), ('farhadi', 0.026), ('kulkarni', 0.026), ('lowe', 0.026), ('oliva', 0.026), ('yellow', 0.026), ('brightness', 0.026), ('finkelstein', 0.026), ('perceptually', 0.026), ('tapping', 0.026), ('tran', 0.026), ('ukwac', 0.026), ('wackypedia', 0.026), ('rank', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

2 0.27917662 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

3 0.0947593 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

Author: Fei Liu ; Fuliang Weng ; Xiao Jiang

Abstract: Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, i.e., for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates. In this paper, we propose a cognitivelydriven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90% word-coverage across all data sets (a . 10% absolute increase compared to state-ofthe-art); the broad word-coverage can also successfully translate into message-level performance gain, yielding 6% absolute increase compared to the best prior approach.

4 0.082376167 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs

Author: Yukino Baba ; Hisami Suzuki

Abstract: This paper presents a comparative study of spelling errors that are corrected as you type, vs. those that remain uncorrected. First, we generate naturally occurring online error correction data by logging users’ keystrokes, and by automatically deriving pre- and postcorrection strings from them. We then perform an analysis of this data against the errors that remain in the final text as well as across languages. Our analysis shows a clear distinction between the types of errors that are generated and those that remain uncorrected, as well as across languages.

5 0.071250632 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

Author: Eric Huang ; Richard Socher ; Christopher Manning ; Andrew Ng

Abstract: Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1

6 0.061821096 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

7 0.060539745 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

8 0.047459465 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

9 0.046589207 56 acl-2012-Computational Approaches to Sentence Completion

10 0.044359654 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

11 0.042200744 186 acl-2012-Structuring E-Commerce Inventory

12 0.041946977 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

13 0.040154535 145 acl-2012-Modeling Sentences in the Latent Space

14 0.039621551 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

15 0.03853685 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

16 0.038297348 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

17 0.037285157 7 acl-2012-A Computational Approach to the Automation of Creative Naming

18 0.03706903 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs

19 0.036684662 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

20 0.036240187 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.134), (1, 0.053), (2, -0.013), (3, 0.023), (4, 0.001), (5, 0.107), (6, 0.001), (7, 0.052), (8, -0.052), (9, 0.036), (10, -0.051), (11, 0.034), (12, 0.029), (13, 0.075), (14, -0.062), (15, -0.041), (16, 0.043), (17, 0.038), (18, -0.012), (19, -0.055), (20, -0.013), (21, -0.092), (22, 0.019), (23, 0.097), (24, -0.043), (25, 0.203), (26, 0.125), (27, -0.105), (28, 0.068), (29, -0.169), (30, -0.117), (31, -0.285), (32, -0.174), (33, -0.041), (34, -0.136), (35, -0.175), (36, -0.152), (37, -0.257), (38, 0.042), (39, -0.036), (40, -0.101), (41, -0.031), (42, -0.107), (43, 0.025), (44, -0.138), (45, 0.016), (46, -0.099), (47, -0.006), (48, 0.122), (49, 0.088)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94494468 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

2 0.87730879 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

3 0.41918364 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

Author: Fei Liu ; Fuliang Weng ; Xiao Jiang

Abstract: Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, i.e., for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates. In this paper, we propose a cognitivelydriven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90% word-coverage across all data sets (a . 10% absolute increase compared to state-ofthe-art); the broad word-coverage can also successfully translate into message-level performance gain, yielding 6% absolute increase compared to the best prior approach.

4 0.3751865 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs

Author: Yukino Baba ; Hisami Suzuki

Abstract: This paper presents a comparative study of spelling errors that are corrected as you type, vs. those that remain uncorrected. First, we generate naturally occurring online error correction data by logging users’ keystrokes, and by automatically deriving pre- and postcorrection strings from them. We then perform an analysis of this data against the errors that remain in the final text as well as across languages. Our analysis shows a clear distinction between the types of errors that are generated and those that remain uncorrected, as well as across languages.

5 0.30797562 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

6 0.29715112 129 acl-2012-Learning High-Level Planning from Text

7 0.29261792 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

8 0.28188437 186 acl-2012-Structuring E-Commerce Inventory

9 0.2520369 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

10 0.24610901 195 acl-2012-The Creation of a Corpus of English Metalanguage

11 0.23088688 56 acl-2012-Computational Approaches to Sentence Completion

12 0.22296171 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

13 0.21874616 112 acl-2012-Humor as Circuits in Semantic Networks

14 0.21804576 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

15 0.21528266 89 acl-2012-Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

16 0.21403667 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs

17 0.20467845 190 acl-2012-Syntactic Stylometry for Deception Detection

18 0.20466577 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

19 0.20427522 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

20 0.20258333 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.043), (26, 0.047), (28, 0.031), (30, 0.025), (37, 0.032), (39, 0.052), (53, 0.049), (56, 0.017), (59, 0.02), (67, 0.277), (74, 0.023), (82, 0.022), (84, 0.027), (85, 0.035), (90, 0.103), (92, 0.052), (94, 0.013), (99, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69155818 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

2 0.5015139 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

3 0.48200202 108 acl-2012-Hierarchical Chunk-to-String Translation

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

Abstract: We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks.

4 0.47477511 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

5 0.47103292 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

6 0.4705914 191 acl-2012-Temporally Anchored Relation Extraction

7 0.46888727 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

8 0.46828723 187 acl-2012-Subgroup Detection in Ideological Discussions

9 0.4680959 167 acl-2012-QuickView: NLP-based Tweet Search

10 0.46802711 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

11 0.46705389 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

12 0.46702746 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

13 0.46690446 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

14 0.46567357 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

15 0.46490443 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

16 0.46483466 34 acl-2012-Automatically Learning Measures of Child Language Development

17 0.46462116 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

18 0.46247527 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

19 0.46191922 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

20 0.46126872 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling