acl acl2013 acl2013-249 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. [sent-10, score-1.034]
2 We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. [sent-11, score-1.014]
3 , 2011), associating simplified language to perceptual data such as images or video (Siskind, 2001 ; Roy and Pentland, 2002; Gorniak and Roy, 2004; Yu and Ballard, 2007), and learning the meaning of words based on linguistic and perceptual input (Bruni et al. [sent-19, score-0.468]
4 Feature norms are obtained by asking native speakers to write down attributes they consider important in describing the meaning of a word. [sent-33, score-0.586]
5 The attributes represent perceived physical and functional properties associated with the referents of words. [sent-34, score-0.476]
6 The number and types of attributes generated can vary substantially as a function of the amount of time devoted to each concept. [sent-37, score-0.418]
7 It is not entirely clear how people generate attributes and whether all of these are important for representing concepts. [sent-38, score-0.418]
8 Another strand of research focuses exclusively on the visual modality, even though the grounding problem could involve auditory, motor, and haptic modalities as well. [sent-42, score-0.562]
9 Furthermore, since images are ubiquitous, visual data can be gathered far easier than some of the other modalities. [sent-45, score-0.806]
10 Distributional models that integrate the visual modality have been learned from texts and images (Feng and Lapata, 2010; Bruni et al. [sent-46, score-0.848]
11 , by exploiting the fact that images in this database are hierarchically organized according to WordNet synsets (Leong and Mihalcea, 2011). [sent-50, score-0.404]
12 Our work also focuses on images as a way of physically grounding the meaning of words. [sent-52, score-0.461]
13 We, however, represent them by high-level visual attributes instead of low-level image features. [sent-53, score-0.981]
14 Furthermore, attributes allow us to generalize to unseen objects; it is possible to say something about them even though we cannot identify them (e. [sent-57, score-0.463]
15 We show that this attribute-centric approach to representing images is beneficial for distributional models of lexical meaning. [sent-60, score-0.448]
16 Our attributes are similar to those provided by participants in norming studies, however, importantly they are learned from training data (a database of images and their visual attributes) and thus generalize to new images without additional human involvement. [sent-61, score-1.788]
17 In the following we describe our efforts to create a new large-scale dataset that consists of 688K images that match the same concrete concepts used in the feature norming study of McRae et al. [sent-62, score-0.617]
18 We derive a taxonomy of 412 visual attributes and explain how we learn attribute classifiers following recent work in computer vision (Lampert et al. [sent-64, score-1.258]
19 2 Related Work Grounding semantic representations with visual information is an instance of multimodal learning. [sent-68, score-0.527]
20 Special-purpose models that address the fusion of distributional meaning with visual information have been also proposed. [sent-72, score-0.544]
21 Feng and Lapata (2010) represent documents and images by a common multimodal vocabulary consisting oftextual words and visual terms which they obtain by quantizing SIFT descriptors (Lowe, 2004). [sent-73, score-0.906]
22 (2012b) who obtain distinct representations for the textual and visual modalities. [sent-80, score-0.529]
23 Specifically, they extract a visual space from images contained in the ESP-Game data set (von Ahn and Dabbish, 2004) and a text-based semantic space from a large corpus collection totaling approximately two billion words. [sent-81, score-0.806]
24 The ability to describe images by their attributes allows to generalize to new instances for which there are no training examples available. [sent-89, score-0.788]
25 Moreover, attributes can transcend category and task boundaries and thus provide a generic description of visual data. [sent-90, score-0.854]
26 Initial work (Ferrari and Zisserman, 2007) 573 focused on simple color and texture attributes (e. [sent-91, score-0.488]
27 , blue, stripes) and showed that these can be learned in a weakly supervised setting from images returned by a search engine when using the attribute as a query. [sent-93, score-0.615]
28 (2009) were among the first to use visual attributes in an object recognition task. [sent-95, score-0.919]
29 , hairy, four-legged) were used to identify familiar objects and to describe unfamiliar objects when new images and bounding box annotations were provided. [sent-100, score-0.442]
30 Their dataset contained over 30,000 images representing 50 animal concepts and used 85 attributes from the norming study of Osherson et al. [sent-103, score-1.035]
31 The use of visual attributes in models of distributional semantics is novel to our knowledge. [sent-108, score-0.932]
32 Firstly, they are cognitively plausible; humans employ visual attributes when describing the properties of concept classes. [sent-110, score-1.012]
33 Attributes crucially represent image properties, however by being words themselves, they can be easily integrated in any text-based distributional model thus eschewing known difficulties with rendering images into word-like units. [sent-112, score-0.575]
34 A key prerequisite in describing images by their attributes is the availability of training data for learning attribute classifiers. [sent-113, score-1.033]
35 Moreover, we show that automatically computed attributes are comparable and in some cases superior to those provided by humans (e. [sent-125, score-0.443]
36 3 The Attribute Dataset Concepts and Images We created a dataset of images and their visual attributes for the nouns contained in McRae et al. [sent-128, score-1.224]
37 To avoid confusion, in the remainder of this paper we will use the term attribute to refer to properties of concepts and the term feature to refer to image features, such as color or edges. [sent-133, score-0.563]
38 We chose this database due to its high coverage and the high quality of its images (i. [sent-138, score-0.404]
39 ’s norms contain 541 concepts out of which 5 16 appear in ImageNet1 and are represented by 688K images overall. [sent-142, score-0.668]
40 The average number of images per concept is 1,3 10 with the most popular being closet (2,149 images) and the least popular prune (5 images). [sent-143, score-0.443]
41 The images depicting each concept were randomly partitioned into a training, development, and test set. [sent-148, score-0.443]
42 For most concepts the development set contained a maximum of 100 images and the test set a maximum of 200 images. [sent-149, score-0.504]
43 Attribute Annotation Our aim was to develop a set of visual attributes that are both discriminating and cognitively plausible, i. [sent-153, score-0.885]
44 As a starting point, we thus used the visual attributes from McRae et al. [sent-156, score-0.854]
45 For example, is purple is a valid visual attribute for an eggplant, whereas a vegetable is not, since it cannot be visualized. [sent-161, score-0.681]
46 Collating all the visual attributes in the norms resulted in a total of 673 which we further modified and extended during the annotation process explained below. [sent-162, score-0.992]
47 If an attribute was generally true for the concept, but the images did not provide enough evidence, the attribute was nevertheless chosen and labeled with . [sent-169, score-0.86]
48 For example, has lights and has bumper are attributes of cars but are not included in the norms. [sent-172, score-0.418]
49 Two annotators (both co-authors of this paper) developed the set of attributes for each category. [sent-177, score-0.418]
50 (2009) in that we did not simply transfer the attributes from the norms to the con- cepts in question but refined and extended them according to the visual data. [sent-181, score-0.992]
51 Firstly, it makes sense to select attributes corroborated by the images. [sent-183, score-0.446]
52 Thirdly, during the annotation process, we normalized synonymous attributes (e. [sent-187, score-0.418]
53 Finally, our aim was to collect an exhaustive list of visual attributes for each concept which is consistent across all members of a category. [sent-193, score-0.927]
54 As a result, the attributes of a concept denote the set of properties humans consider most salient. [sent-197, score-0.545]
55 Examples of concepts and their attributes from our database2 are shown in Table 2. [sent-205, score-0.552]
56 3 The training set consisted of 91,980 images (with a maximum of 350 images per concept). [sent-211, score-0.74]
57 3We only trained classifiers for attributes corroborated by the images and excluded those labeled with . [sent-225, score-0.869]
58 Note that attributes are predicted on an imageby-image basis; our task, however, is to describe a concept w by its visual attributes. [sent-243, score-0.927]
59 Since concepts are represented by many images we must somehow aggregate their attributes into a single representation. [sent-244, score-0.948]
60 For each image iw ∈ Iw of concept w, we output an F-dimensional vec∈to Ir containing prediction scores scorea(iw) for attributes a = 1, . [sent-245, score-0.648]
61 The vector is normalized to obtain a probability distribution over attributes given w: pw=(∑ iwFa∈=Iw1s∑ciow∈reIwas(ciwo)r)ea =(i1,w. [sent-250, score-0.418]
62 Again, we measure the cosine similarity between a concept and all other concepts in the dataset when these are represented by their visual attribute vector pw. [sent-256, score-0.914]
63 We represent the visual modality by attribute vectors computed as shown in Equation (1). [sent-259, score-0.748]
64 , 2010) to obtain these attributes for the nouns in our dataset. [sent-262, score-0.418]
65 used in our study and how the textual and visual modalities were fused to create a joint representation. [sent-278, score-0.557]
66 Let P ∈ [0, 1]N×F dofen ao wteo a v ainsuda al mtexatturiaxl, representing a probability distribution over visual attributes for each word. [sent-282, score-0.854]
67 A word’s meaning can be then represented by the concatenation of its normalized textual and visual vectors. [sent-283, score-0.587]
68 (2004)) to learn a joint semantic representation from the textual and visual modalities. [sent-285, score-0.492]
69 The linguistic and visual views are the same as in the simple concatenation model just explained. [sent-288, score-0.475]
70 After applying CCA we obtain two matrices projected onto lbasis vectors, T˜ ∈ RN×l, resulting from the projection of the 577 textual matrix T onto the new basis and P˜ ∈ RN×l, resulting from the projection of the corresponding visual attribute matrix. [sent-292, score-0.765]
71 , 2003) where words in documents and their associated attributes are treated as observed variables that are explained by a generative process. [sent-296, score-0.418]
72 For example, most o|fX th =e probability mass of a component x would be reserved for the words shirt, coat, dress and the attributes has 1 piece, has seams, made of material and so on. [sent-311, score-0.418]
73 In our work, the training data is a corpus D of textual attributes (rather than documents). [sent-314, score-0.474]
74 For some of these concepts, our classifiers predict visual attributes. [sent-320, score-0.489]
75 In this case, the concepts are paired with one of their visual attributes. [sent-321, score-0.57]
76 We sample attributes for a concept w from their distribution given w (Eq. [sent-322, score-0.491]
77 ’s study but belonged to concepts covered by our attribute taxonomy (e. [sent-341, score-0.405]
78 8 Parameter Settings In order to integrate the visual attributes with the models described in Section 5 we must select the appropriate threshold value δ (see Eq. [sent-345, score-0.854]
79 We also experimented with thresholding the attribute prediction scores and with excluding attributes with low precision. [sent-348, score-0.663]
80 9 We also discarded attributes co-occurring with less than two different words. [sent-358, score-0.418]
81 This evaluation metric assumes that there are many associates for a given cue which unfortunately is not the case in our study which is restricted to the concepts represented in our attribute taxonomy. [sent-361, score-0.467]
82 Results Our experiments were designed to answer four questions: (1) Do visual attributes improve the performance of distributional models? [sent-374, score-0.932]
83 , are some models better suited to the integration of visual information? [sent-377, score-0.436]
84 This indicates that our attribute classifiers generalize well beyond the concepts found in our database and can produce useful visual information even on unseen images. [sent-412, score-0.947]
85 Each concept was represented as a vector with dimen- sions corresponding to attributes generated by participants of the norming study. [sent-417, score-0.677]
86 Table 7 presents results for different model variants which we created by manipulating the number and type of attributes involved. [sent-420, score-0.418]
87 The first model uses the full set of attributes present in the norms (All Attributes). [sent-421, score-0.556]
88 The second model (Text Attributes) uses all attributes but those classified as visual (e. [sent-422, score-0.854]
89 The third model (Visual Attributes) considers solely visual attributes. [sent-425, score-0.436]
90 Taking visual attributes into account increases the fit with Nelson’s (1998) association norms, whereas visual and textual attributes on their own perform worse. [sent-427, score-1.764]
91 (1998) cue-associate pairs; models are based on gold human generated attributes (McRae et al. [sent-430, score-0.418]
92 Seen are concepts known to the attribute classifiers and covered by MixLDA (N = 85). [sent-438, score-0.432]
93 Unseen are concepts covered by LDA but unknown to the attribute classifiers (N = 388). [sent-439, score-0.432]
94 performance is comparable to the All Attributes model (see Table 5, second column), despite using automatic attributes (both textual and visual). [sent-442, score-0.474]
95 Furthermore, visual attributes obtained through our classifiers (see Table 5) achieve a marginally lower correlation coefficient against human generated ones (see Table 7). [sent-443, score-0.967]
96 Finally, to address our last question, we compared our approach against Feng and Lapata (2010) who represent visual information via quantized SIFT features. [sent-444, score-0.494]
97 The best performing model on the development set used 500 visual terms and 750 topics and the asso- ciation measure proposed in Griffiths et al. [sent-450, score-0.436]
98 8 Conclusions In this paper we proposed the use of automatically computed visual attributes as a way of physically grounding word meaning. [sent-458, score-0.915]
99 Our results demonstrate that visual attributes improve the performance of distributional models across the board. [sent-459, score-0.932]
100 Finally, we have only scratched the surface in terms of possible models for integrating the textual and visual modality. [sent-464, score-0.492]
wordName wordTfidf (topN-words)
[('visual', 0.436), ('attributes', 0.418), ('images', 0.37), ('attribute', 0.245), ('mcrae', 0.196), ('cca', 0.196), ('norms', 0.138), ('concepts', 0.134), ('image', 0.127), ('nelson', 0.125), ('norming', 0.113), ('concat', 0.099), ('mixlda', 0.099), ('farhadi', 0.092), ('bruni', 0.087), ('eggplant', 0.085), ('lampert', 0.085), ('topicattr', 0.085), ('vision', 0.08), ('distributional', 0.078), ('concept', 0.073), ('modalities', 0.065), ('silberer', 0.062), ('grounding', 0.061), ('correlation', 0.06), ('quantized', 0.058), ('visattr', 0.056), ('textual', 0.056), ('imagenet', 0.054), ('multimodal', 0.054), ('classifiers', 0.053), ('xc', 0.052), ('andrews', 0.052), ('grounded', 0.05), ('strudel', 0.05), ('feng', 0.05), ('lapata', 0.048), ('participants', 0.047), ('descriptors', 0.046), ('unseen', 0.045), ('coefficients', 0.044), ('hardoon', 0.042), ('textattr', 0.042), ('texture', 0.042), ('modality', 0.042), ('animals', 0.041), ('object', 0.04), ('concatenation', 0.039), ('tellex', 0.037), ('ferrari', 0.037), ('miami', 0.037), ('representations', 0.037), ('objects', 0.036), ('scene', 0.036), ('pw', 0.035), ('vehicles', 0.035), ('database', 0.034), ('perceptual', 0.034), ('sift', 0.033), ('associates', 0.032), ('actions', 0.032), ('johns', 0.031), ('cognitive', 0.031), ('cognitively', 0.031), ('cue', 0.03), ('meaning', 0.03), ('iw', 0.03), ('properties', 0.029), ('physical', 0.029), ('canonical', 0.029), ('baroni', 0.029), ('lowe', 0.029), ('bornstein', 0.028), ('clothing', 0.028), ('corroborated', 0.028), ('cueassociate', 0.028), ('datta', 0.028), ('encyclopaedic', 0.028), ('everingham', 0.028), ('gorniak', 0.028), ('hog', 0.028), ('landau', 0.028), ('patterson', 0.028), ('shiny', 0.028), ('sloman', 0.028), ('stripes', 0.028), ('topicattrtextattr', 0.028), ('zeigenfuse', 0.028), ('basis', 0.028), ('color', 0.028), ('represented', 0.026), ('taxonomy', 0.026), ('jones', 0.026), ('recognition', 0.025), ('vectors', 0.025), ('griffiths', 0.025), ('humans', 0.025), ('oranges', 0.025), ('golub', 0.025), ('oliva', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999863 249 acl-2013-Models of Semantic Representation with Visual Attributes
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
2 0.4544512 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Author: Elia Bruni ; Marco Baroni
Abstract: unkown-abstract
3 0.36673456 380 acl-2013-VSEM: An open library for visual semantics representation
Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya
Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.
4 0.27081841 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
5 0.25792423 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
Author: Shane Bergsma ; Benjamin Van Durme
Abstract: We describe a novel approach for automatically predicting the hidden demographic properties of social media users. Building on prior work in common-sense knowledge acquisition from third-person text, we first learn the distinguishing attributes of certain classes of people. For example, we learn that people in the Female class tend to have maiden names and engagement rings. We then show that this knowledge can be used in the analysis of first-person communication; knowledge of distinguishing attributes allows us to both classify users and to bootstrap new training examples. Our novel approach enables substantial improvements on the widelystudied task of user gender prediction, ob- taining a 20% relative error reduction over the current state-of-the-art.
6 0.19899978 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
7 0.1428955 29 acl-2013-A Visual Analytics System for Cluster Exploration
8 0.11238911 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
9 0.10333509 175 acl-2013-Grounded Language Learning from Video Described with Sentences
10 0.10321874 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
11 0.098487191 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures
12 0.072055504 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
13 0.067953177 370 acl-2013-Unsupervised Transcription of Historical Documents
14 0.067616172 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
15 0.06329754 238 acl-2013-Measuring semantic content in distributional vectors
17 0.059301991 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
18 0.058911998 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
19 0.056974962 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses
20 0.055585779 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
topicId topicWeight
[(0, 0.161), (1, 0.093), (2, -0.007), (3, -0.107), (4, -0.083), (5, -0.169), (6, 0.103), (7, -0.044), (8, -0.064), (9, 0.189), (10, -0.375), (11, -0.315), (12, 0.084), (13, 0.247), (14, 0.173), (15, -0.004), (16, 0.063), (17, 0.036), (18, -0.06), (19, 0.038), (20, 0.087), (21, 0.084), (22, 0.026), (23, 0.0), (24, 0.025), (25, -0.004), (26, -0.002), (27, -0.03), (28, 0.001), (29, -0.019), (30, -0.023), (31, -0.006), (32, 0.059), (33, 0.074), (34, -0.032), (35, 0.0), (36, 0.022), (37, -0.016), (38, -0.006), (39, -0.031), (40, -0.011), (41, 0.045), (42, -0.013), (43, -0.011), (44, -0.017), (45, 0.004), (46, -0.009), (47, -0.02), (48, -0.02), (49, 0.013)]
simIndex simValue paperId paperTitle
1 0.97592753 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Author: Elia Bruni ; Marco Baroni
Abstract: unkown-abstract
2 0.95687312 380 acl-2013-VSEM: An open library for visual semantics representation
Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya
Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.
same-paper 3 0.94521755 249 acl-2013-Models of Semantic Representation with Visual Attributes
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
4 0.8403849 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
5 0.49550128 29 acl-2013-A Visual Analytics System for Cluster Exploration
Author: Andreas Lamprecht ; Annette Hautli ; Christian Rohrdantz ; Tina Bogel
Abstract: This paper offers a new way of representing the results of automatic clustering algorithms by employing a Visual Analytics system which maps members of a cluster and their distance to each other onto a twodimensional space. A case study on Urdu complex predicates shows that the system allows for an appropriate investigation of linguistically motivated data. 1 Motivation In recent years, Visual Analytics systems have increasingly been used for the investigation of linguistic phenomena in a number of different areas, starting from literary analysis (Keim and Oelke, 2007) to the cross-linguistic comparison of language features (Mayer et al., 2010a; Mayer et al., 2010b; Rohrdantz et al., 2012a) and lexical semantic change (Rohrdantz et al., 2011; Heylen et al., 2012; Rohrdantz et al., 2012b). Visualization has also found its way into the field of computational linguistics by providing insights into methods such as machine translation (Collins et al., 2007; Albrecht et al., 2009) or discourse parsing (Zhao et al., 2012). One issue in computational linguistics is the interpretability of results coming from machine learning algorithms and the lack of insight they offer on the underlying data. This drawback often prevents theoretical linguists, who work with computational models and need to see patterns on large data sets, from drawing detailed conclusions. The present paper shows that a Visual Analytics system facilitates “analytical reasoning [...] by an interactive visual interface” (Thomas and Cook, 2006) and helps resolving this issue by offering a customizable, in-depth view on the statistically generated result and simultaneously an at-a-glance overview of the overall data set. In particular, we focus on the visual representa- tion of automatically generated clusters, in itself not a novel idea as it has been applied in other fields like the financial sector, biology or geography (Schreck et al., 2009). But as far as the literature is concerned, interactive systems are still less common, particularly in computational linguistics, and they have not been designed for the specific needs of theoretical linguists. This paper offers a method of visually encoding clusters and their internal coherence with an interactive user interface, which allows users to adjust underlying parameters and their views on the data depending on the particular research question. By this, we partly open up the “black box” of machine learning. The linguistic phenomenon under investigation, for which the system has originally been designed, is the varied behavior of nouns in N+V CP complex predicates in Urdu (e.g., memory+do = ‘to remember’) (Mohanan, 1994; Ahmed and Butt, 2011), where, depending on the lexical semantics of the noun, a set of different light verbs is chosen to form a complex predicate. The aim is an automatic detection of the different groups of nouns, based on their light verb distribution. Butt et al. (2012) present a static visualization for the phenomenon, whereas the present paper proposes an interactive system which alleviates some of the previous issues with respect to noise detection, filtering, data interaction and cluster coherence. For this, we proceed as follows: section 2 explains the proposed Visual Analytics system, followed by the linguistic case study in section 3. Section 4 concludes the paper. 2 The system The system requires a plain text file as input, where each line corresponds to one data object.In our case, each line corresponds to one Urdu noun (data object) and contains its unique ID (the name of the noun) and its bigram frequencies with the 109 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 109–1 4, four light verbs under investigation, namely kar ‘do’, ho ‘be’, hu ‘become’ and rakH ‘put’ ; an exemplary input file is shown in Figure 1. From a data analysis perspective, we have four- dimensional data objects, where each dimension corresponds to a bigram frequency previously extracted from a corpus. Note that more than four dimensions can be loaded and analyzed, but for the sake of simplicity we focus on the fourdimensional Urdu example for the remainder of this paper. Moreover, it is possible to load files containing absolute bigram frequencies and relative frequencies. When loading absolute frequencies, the program will automatically calculate the relative frequencies as they are the input for the clustering. The absolute frequencies, however, are still available and can be used for further processing (e.g. filtering). Figure 1: preview of appropriate file structures 2.1 Initial opening and processing of a file It is necessary to define a metric distance function between data objects for both clustering and visualization. Thus, each data object is represented through a high dimensional (in our example fourdimensional) numerical vector and we use the Euclidean distance to calculate the distances between pairs of data objects. The smaller the distance between two data objects, the more similar they are. For visualization, the high dimensional data is projected onto the two-dimensional space of a computer screen using a principal component analysis (PCA) algorithm1 . In the 2D projection, the distances between data objects in the highdimensional space, i.e. the dissimilarities of the bigram distributions, are preserved as accurately as possible. However, when projecting a highdimensional data space onto a lower dimension, some distinctions necessarily level out: two data objects may be far apart in the high-dimensional space, but end up closely together in the 2D projection. It is important to bear in mind that the 2D visualization is often quite insightful, but interpre1http://workshop.mkobos.com/201 1/java-pca- transformation-library/ tations have to be verified by interactively investigating the data. The initial clusters are calculated (in the highdimensional data space) using a default k-Means algorithm2 with k being a user-defined parameter. There is also the option of selecting another clustering algorithm, called the Greedy Variance Minimization3 (GVM), and an extension to include further algorithms is under development. 2.2 Configuration & Interaction 2.2.1 The main window The main window in Figure 2 consists of three areas, namely the configuration area (a), the visualization area (b) and the description area (c). The visualization area is mainly built with the piccolo2d library4 and initially shows data objects as colored circles with a variable diameter, where color indicates cluster membership (four clusters in this example). Hovering over a dot displays information on the particular noun, the cluster membership and the light verb distribution in the de- scription area to the right. By using the mouse wheel, the user can zoom in and out of the visualization. A very important feature for the task at hand is the possibility to select multiple data objects for further processing or for filtering, with a list of selected data objects shown in the description area. By right-clicking on these data objects, the user can assign a unique class (and class color) to them. Different clustering methods can be employed using the options item in the menu bar. Another feature of the system is that the user can fade in the cluster centroids (illustrated by a larger dot in the respective cluster color in Figure 2), where the overall feature distribution of the cluster can be examined in a tooltip hovering over the corresponding centroid. 2.2.2 Visually representing data objects To gain further insight into the data distribution based on the 2D projection, the user can choose between several ways to visualize the individual data objects, all of which are shown in Figure 3. The standard visualization type is shown on the left and consists of a circle which encodes cluster membership via color. 2http://java-ml.sourceforge.net/api/0.1.7/ (From the JML library) 3http://www.tomgibara.com/clustering/fast-spatial/ 4http://www.piccolo2d.org/ 110 Figure 2: Overview of the main window of the system, including the configuration area (a), the visualization area (b) and the description area (c). Large circles are cluster centroids. Figure 3: Different visualizations of data points Alternatively, normal glyphs and star glyphs can be displayed. The middle part of Figure 3 shows the data displayed with normal glyphs. In linestarinorthpsiflvtrheinorqsbgnutheviasnemdocwfya,proepfthlpdienaoecsr.nihetloa Titnghve det clockwise around the center according to their occurrence in the input file. This view has the advantage that overall feature dominance in a cluster can be seen at-a-glance. The visualization type on the right in Figure 3 agislnycpaehlxset. dnstHhioe nrset ,oarthngeolyrmlpinhae,l endings are connected, forming a “star”. As in the representation with the glyphs, this makes similar data objects easily recognizable and comparable with each other. 2.2.3 Filtering options Our systems offers options for filtering data ac- cording to different criteria. Filter by means of bigram occurrence By activating the bigram occurrence filtering, it is possible to only show those nouns, which occur in bigrams with a certain selected subset of all features (light verbs) only. This is especially useful when examining possible commonalities. Filter selected words Another opportunity of showing only items of interest is to select and display them separately. The PCA is recalculated for these data objects and the visualization is stretched to the whole area. 111 Filter selected cluster Additionally, the user can visualize a specific cluster of interest. Again, the PCA is recalculated and the visualization stretched to the whole area. The cluster can then be manually fine-tuned and cleaned, for instance by removing wrongly assigned items. 2.2.4 Options to handle overplotting Due to the nature of the data, much overplotting occurs. For example, there are many words, which only occur with one light verb. The PCA assigns the same position to these words and, as a consequence, only the top bigram can be viewed in the visualization. In order to improve visual access to overplotted data objects, several methods that allow for a more differentiated view of the data have been included and are described in the following paragraphs. Change transparency of data objects By modifying the transparency with the given slider, areas with a dense data population can be readily identified, as shown in the following example: Repositioning of data objects To reduce the overplotting in densely populated areas, data objects can be repositioned randomly having a fixed deviation from their initial position. The degree of deviation can be interactively determined by the user employing the corresponding slider: The user has the option to reposition either all data objects or only those that are selected in advance. Frequency filtering If the initial data contains absolute bigram frequencies, the user can filter the visualized words by frequency. For example, many nouns occur only once and therefore have an observed probability of 100% for co-occurring with one of the light verbs. In most cases it is useful to filter such data out. Scaling data objects If the user zooms beyond the maximum zoom factor, the data objects are scaled down. This is especially useful, if data objects are only partly covered by many other objects. In this case, they become fully visible, as shown in the following example: 2.3 Alternative views on the data In order to enable a holistic analysis it is often valuable to provide the user with different views on the data. Consequently, we have integrated the option to explore the data with further standard visualization methods. 2.3.1 Correlation matrix The correlation matrix in Figure 4 shows the correlations between features, which are visualized by circles using the following encoding: The size of a circle represents the correlation strength and the color indicates whether the corresponding features are negatively (white) or positively (black) correlated. Figure 4: example of a correlation matrix 2.3.2 Parallel coordinates The parallel coordinates diagram shows the distribution of the bigram frequencies over the different dimensions (Figure 5). Every noun is represented with a line, and shows, when hovered over, a tooltip with the most important information. To filter the visualized words, the user has the option of displaying previously selected data objects, or s/he can restrict the value range for a feature and show only the items which lie within this range. 2.3.3 Scatter plot matrix To further examine the relation between pairs of features, a scatter plot matrix can be used (Figure 6). The individual scatter plots give further insight into the correlation details of pairs of features. 112 Figure 5: Parallel coordinates diagram Figure 6: Example showing a scatter plot matrix. 3 Case study In principle, the Visual Analytics system presented above can be used for any kind of cluster visualization, but the built-in options and add-ons are particularly designed for the type of work that linguists tend to be interested in: on the one hand, the user wants to get a quick overview of the overall patterns in the phenomenon, but on the same time, the system needs to allow for an in-depth data inspection. Both is given in the system: The overall cluster result shown in Figure 2 depicts the coherence of clusters and therefore the overall pattern of the data set. The different glyph visualizations in Figure 3 illustrate the properties of each cluster. Single data points can be inspected in the description area. The randomization of overplotted data points helps to see concentrated cluster patterns where light verbs behave very similarly in different noun+verb complex predicates. The biggest advantage of the system lies in the ability for interaction: Figure 7 shows an example of the visualization used in Butt et al. (2012), the input being the same text file as shown in Figure 1. In this system, the relative frequencies of each noun with each light verb is correlated with color saturation the more saturated the color to the right of the noun, the higher the relative frequency of the light verb occurring with it. The number of the cluster (here, 3) and the respective nouns (e.g. kAm ‘work’) is shown to the left. The user does — not get information on the coherence of the cluster, nor does the visualization show prototypical cluster patterns. Figure 7: Cluster visualization in Butt et al. (2012) Moreover, the system in Figure 7 only has a limited set of interaction choices, with the consequence that the user is not able to adjust the underlying data set, e.g. by filtering out noise. However, Butt et al. (2012) report that the Urdu data is indeed very noisy and requires a manual cleaning of the data set before the actual clustering. In the system presented here, the user simply marks conspicuous regions in the visualization panel and removes the respective data points from the original data set. Other filtering mechanisms, e.g. the removal of low frequency items which occur due to data sparsity issues, can be removed from the overall data set by adjusting the parameters. A linguistically-relevant improvement lies in the display of cluster centroids, in other words the typical noun + light verb distribution of a cluster. This is particularly helpful when the linguist wants to pick out prototypical examples for the cluster in order to stipulate generalizations over the other cluster members. 113 4 Conclusion In this paper, we present a novel visual analytics system that helps to automatically analyze bigrams extracted from corpora. The main purpose is to enable a more informed and steered cluster analysis than currently possible with standard methods. This includes rich options for interaction, e.g. display configuration or data manipulation. Initially, the approach was motivated by a concrete research problem, but has much wider applicability as any kind of high-dimensional numerical data objects can be loaded and analyzed. However, the system still requires some basic understanding about the algorithms applied for clustering and projection in order to prevent the user to draw wrong conclusions based on artifacts. Bearing this potential pitfall in mind when performing the analysis, the system enables a much more insightful and informed analysis than standard noninteractive methods. In the future, we aim to conduct user experiments in order to learn more about how the functionality and usability could be further enhanced. Acknowledgments This work was partially funded by the German Research Foundation (DFG) under grant BU 1806/7-1 “Visual Analysis of Language Change and Use Patterns” and the German Fed- eral Ministry of Education and Research (BMBF) under grant 01461246 “VisArgue” under research grant. References Tafseer Ahmed and Miriam Butt. 2011. Discovering Semantic Classes for Urdu N-V Complex Predicates. In Proceedings of the international Conference on Computational Semantics (IWCS 2011), pages 305–309. Joshua Albrecht, Rebecca Hwa, and G. Elisabeta Marai. 2009. The Chinese Room: Visualization and Interaction to Understand and Correct Ambiguous Machine Translation. Comput. Graph. Forum, 28(3): 1047–1054. Miriam Butt, Tina B ¨ogel, Annette Hautli, Sebastian Sulger, and Tafseer Ahmed. 2012. Identifying Urdu Complex Predication via Bigram Extraction. In In Proceedings of COLING 2012, Technical Papers, pages 409 424, Mumbai, India. Christopher Collins, M. Sheelagh T. Carpendale, and Gerald Penn. 2007. Visualization of Uncertainty in Lattices to Support Decision-Making. In EuroVis 2007, pages 5 1–58. Eurographics Association. Kris Heylen, Dirk Speelman, and Dirk Geeraerts. 2012. Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch – synsets. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, pages 16–24. Daniel A. Keim and Daniela Oelke. 2007. Literature Fingerprinting: A New Method for Visual Literary Analysis. In IEEE VAST 2007, pages 115–122. IEEE. Thomas Mayer, Christian Rohrdantz, Miriam Butt, Frans Plank, and Daniel A. Keim. 2010a. Visualizing Vowel Harmony. Linguistic Issues in Language Technology, 4(Issue 2): 1–33, December. Thomas Mayer, Christian Rohrdantz, Frans Plank, Peter Bak, Miriam Butt, and Daniel A. Keim. 2010b. Consonant Co-Occurrence in Stems across Languages: Automatic Analysis and Visualization of a Phonotactic Constraint. In Proceedings of the 2010 Workshop on NLP andLinguistics: Finding the Common Ground, pages 70–78, Uppsala, Sweden, July. Association for Computational Linguistics. Tara Mohanan. 1994. Argument Structure in Hindi. Stanford: CSLI Publications. Christian Rohrdantz, Annette Hautli, Thomas Mayer, Miriam Butt, Frans Plank, and Daniel A. Keim. 2011. Towards Tracking Semantic Change by Visual Analytics. In ACL 2011 (Short Papers), pages 305–3 10, Portland, Oregon, USA, June. Association for Computational Linguistics. Christian Rohrdantz, Michael Hund, Thomas Mayer, Bernhard W ¨alchli, and Daniel A. Keim. 2012a. The World’s Languages Explorer: Visual Analysis of Language Features in Genealogical and Areal Contexts. Computer Graphics Forum, 3 1(3):935–944. Christian Rohrdantz, Andreas Niekler, Annette Hautli, Miriam Butt, and Daniel A. Keim. 2012b. Lexical Semantics and Distribution of Suffixes - A Visual Analysis. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, pages 7–15, April. Tobias Schreck, J ¨urgen Bernard, Tatiana von Landesberger, and J o¨rn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, 8(1): 14–29. James J. Thomas and Kristin A. Cook. 2006. A Visual Analytics Agenda. IEEE Computer Graphics and Applications, 26(1): 10–13. Jian Zhao, Fanny Chevalier, Christopher Collins, and Ravin Balakrishnan. 2012. Facilitating Discourse Analysis with Interactive Visualization. IEEE Trans. Vis. Comput. Graph., 18(12):2639–2648. 114
6 0.4824985 175 acl-2013-Grounded Language Learning from Video Described with Sentences
7 0.47146207 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
8 0.42791435 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
9 0.41218209 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic
10 0.40973336 370 acl-2013-Unsupervised Transcription of Historical Documents
11 0.34707218 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars
12 0.31757128 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
13 0.31063056 371 acl-2013-Unsupervised joke generation from big data
14 0.30004945 381 acl-2013-Variable Bit Quantisation for LSH
15 0.2756373 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
16 0.2728754 238 acl-2013-Measuring semantic content in distributional vectors
17 0.2690042 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
18 0.25777456 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
19 0.25774124 311 acl-2013-Semantic Neighborhoods as Hypergraphs
20 0.25646347 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
topicId topicWeight
[(0, 0.069), (6, 0.027), (11, 0.043), (14, 0.015), (24, 0.054), (26, 0.033), (35, 0.081), (41, 0.189), (42, 0.042), (48, 0.063), (64, 0.016), (70, 0.162), (88, 0.025), (90, 0.019), (95, 0.05)]
simIndex simValue paperId paperTitle
1 0.85582858 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews
Author: Kevin Reschke ; Adam Vogel ; Dan Jurafsky
Abstract: Recommendation dialog systems help users navigate e-commerce listings by asking questions about users’ preferences toward relevant domain attributes. We present a framework for generating and ranking fine-grained, highly relevant questions from user-generated reviews. We demonstrate ourapproachon anew dataset just released by Yelp, and release a new sentiment lexicon with 1329 adjectives for the restaurant domain.
same-paper 2 0.82782859 249 acl-2013-Models of Semantic Representation with Visual Attributes
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
3 0.75582159 220 acl-2013-Learning Latent Personas of Film Characters
Author: David Bamman ; Brendan O'Connor ; Noah A. Smith
Abstract: We present two latent variable models for learning character types, or personas, in film, in which a persona is defined as a set of mixtures over latent lexical classes. These lexical classes capture the stereotypical actions of which a character is the agent and patient, as well as attributes by which they are described. As the first attempt to solve this problem explicitly, we also present a new dataset for the text-driven analysis of film, along with a benchmark testbed to help drive future work in this area.
4 0.74814487 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs
Author: Shay B. Cohen ; Mark Johnson
Abstract: Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.
5 0.74678165 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang
Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.
6 0.74404311 296 acl-2013-Recognizing Identical Events with Graph Kernels
7 0.73149222 89 acl-2013-Computerized Analysis of a Verbal Fluency Test
8 0.7308622 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
10 0.71670753 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
11 0.69868517 380 acl-2013-VSEM: An open library for visual semantics representation
13 0.69401377 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
14 0.68545079 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities
15 0.67006636 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
16 0.6638447 224 acl-2013-Learning to Extract International Relations from Political Context
17 0.66315794 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
18 0.66169077 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
19 0.66007757 80 acl-2013-Chinese Parsing Exploiting Characters
20 0.65769368 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis