acl acl2013 acl2013-380 knowledge-graph by maker-knowledge-mining

380 acl-2013-VSEM: An open library for visual semantics representation


Source: pdf

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 VSEM: An open library for visual semantics representation Elia Bruni University of Trento e l . [sent-1, score-0.513]

2 it Abstract VSEM is an open library for visual semantics. [sent-11, score-0.433]

3 Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. [sent-12, score-0.13]

4 The software is accompanied by a website with supporting documentation and examples. [sent-14, score-0.045]

5 1 Introduction In the last years we have witnessed great progress in the area of automated image analysis. [sent-15, score-0.376]

6 Important advances, such as the introduction oflocal features for a robust description of the image content (see Mikolajczyk et al. [sent-16, score-0.376]

7 (2005) for a systematic review) and the bag-of-visual-words method (BoVW)1 for a standard representation across multiple images (Sivic and Zisserman, 2003), have contributed to make image analysis ubiquitous, with applications ranging from robotics to biology, from medicine to photography. [sent-17, score-0.579]

8 First, the introduction of very well defined challenges which have been attracting also a wide community of “outsiders" specialized in a variety of disciplines (e. [sent-19, score-0.059]

9 Second, the sharing of effective, well documented implementations of cutting edge image analysis algorithms, such as OpenCV2 1Bag-of-visual-words model is a popular technique for image classification inspired by the traditional bag-of-words model in Information Retrieval. [sent-22, score-0.807]

10 In particular, under the assumption that meaning can be captured by patterns of co-occurrences of words, distributional semantic models such as Latent Semantic Analysis (Landauer and Dumais, 1997) or Topic Models (Blei et al. [sent-29, score-0.052]

11 4 Nowadays, given the parallel success of the two disciplines, there is growing interest in making the visual and textual channels interact for mutual benefit. [sent-32, score-0.427]

12 If we look at the image analysis community, we discover a well established tradition of studies that exploit both channels of information. [sent-33, score-0.407]

13 For example, there is a relatively extended amount of literature about enhancing the performance on visual tasks such as object recognition or image retrieval by replacing a purely image-based pipeline with hybrid methods augmented with textual information (Barnard et al. [sent-34, score-0.877]

14 Unfortunately, the same cannot be said of the exploitation of image analysis from within the text community. [sent-39, score-0.376]

15 Despite the huge potential that automatically induced visual features could represent as a new source of perceptually grounded 3http : / /www . [sent-40, score-0.396]

16 One possible reason of this delay with respect to the image analysis community might be ascribed to the high entry barriers that NLP researchers adopting image analysis methods have to face. [sent-52, score-0.779]

17 Although many of the image analysis toolkits are open source and well documented, they mainly address users within the same community and therefore their use is not as intuitive for others. [sent-53, score-0.403]

18 The final goal of libraries such VLFeat and OpenCV is the representation and classification of images. [sent-54, score-0.054]

19 Therefore, they naturally lack of a series of complementary functionalities that are necessary to bring the visual representation to the level of semantic concepts. [sent-55, score-0.505]

20 6 To fill the gap we just described, we present hereby VSEM,7 a novel toolkit which allows the extraction of image-based representations of concepts in an easy fashion. [sent-56, score-0.103]

21 VSEM is equipped with state-of-the-art algorithms, from low-level feature detection and description up to the BoVW representation ofimages, together with a set ofnew routines necessary to move from an image-wise to a concept-wise representation of image content. [sent-57, score-0.484]

22 In a nutshell, VSEM extracts visual information in a way that resembles how it is done for automatic text analysis. [sent-58, score-0.428]

23 Thanks to BoVW, the image content is indeed discretized and visual units somehow comparable to words in text are produced (the visual words). [sent-59, score-1.168]

24 In this way, from a corpus of images annotated with a set of concepts, it is possible to derive semantic vectors of co-occurrence counts of concepts and visual words akin to the representations of words in terms of textual collocates in standard distributional semantics. [sent-60, score-0.777]

25 This is due to the fact that they are lacking of perceptual information (Andrews et al. [sent-62, score-0.038]

26 We chose to call them concepts to account for the both theoretical and practical differences standing between a word and the perceptual information it brings along, which we define its concept. [sent-66, score-0.114]

27 it/vsem/ tantly, the obtained visual semantic vectors can be easily combined with more traditional text-based vectors to arrive at a multimodal representation of meaning (see e. [sent-70, score-0.534]

28 VSEM functionalities concerning image analysis is based on VLFeat (Vedaldi and Fulkerson, 2010). [sent-76, score-0.431]

29 This guarantees that the image analysis underpinnings of the library are well maintained and state-of-the-art. [sent-77, score-0.413]

30 In Section 2 we introduce the procedure to obtain an image-based representation of a concept. [sent-79, score-0.054]

31 (201 1) and Leong and Mihalcea (201 1), it is possible to construct an image-based representation of a set of target concepts by starting from a collection of images depicting those concepts, encoding the image contents into low-level features (e. [sent-84, score-0.791]

32 (2012b), better representations can be extracted if the object depicting the concept is first localized in the image. [sent-88, score-0.227]

33 More in detail, the pipeline encapsulating the whole process mentioned above takes as input a collection of images together with their associated tags and optionally object location annotations. [sent-89, score-0.254]

34 Its output is a set of concept representation vectors for individual tags. [sent-90, score-0.18]

35 A brief description of the individual 188 feature extraction Figure 1: An example of a visual vocabulary creation pipeline. [sent-92, score-0.396]

36 From a set of images, a larger set offeatures are extracted and clustered, forming the visual vocabulary. [sent-93, score-0.396]

37 Local features Local features are designed to find local image structures in a repeatable fashion and to represent them in robust ways that are invariant to typical image transformations, such as translation, rotation, scaling, and affine deformation. [sent-95, score-0.869]

38 The most popular local feature extraction method is the Scale Invariant Feature Transform (SIFT), introduced by Lowe (2004). [sent-97, score-0.048]

39 Visual vocabulary To obtain a BoVW representation of the image content, a large set of local features extracted from a large corpus of images are clustered. [sent-99, score-0.627]

40 In this way the local feature space is divided into informative regions (visual words) and the collection of the obtained visual words is called visual vocabulary. [sent-100, score-0.84]

41 In the special case of Fisher encoding (see below), the clustering of the features is performed with a Gaussian mixture model (GMM), see Perronnin et al. [sent-102, score-0.085]

42 Encoding The encoding step maps the local features extracted from an image to the corresponding visual words of the previously created vocabulary. [sent-106, score-0.905]

43 The most common encoding strategy is called hard quantization, which assigns each feature to the nearest visual word’s centroid (in Euclidean distance). [sent-107, score-0.481]

44 Recently, more effective encoding methods have been introduced, among which the Fisher encoding (Perronnin et al. [sent-108, score-0.17]

45 Spatial binning A consolidated way of introducing spatial information in BoVW is the use of spatial histograms (Lazebnik et al. [sent-112, score-0.351]

46 The main idea is to divide the image into several (spatial) regions, compute the encoding for each region and stack the resulting histograms. [sent-114, score-0.461]

47 This technique is referred to as spatial binning and it is implemented in VSEM. [sent-115, score-0.216]

48 Figure 2 exemplifies the BoVW pipeline for a single image, involving local features extraction, encoding and spatial binning. [sent-116, score-0.308]

49 feature xtractionencodingspatialbin ing Figure 2: An example of a BoVW representation pipeline for an image. [sent-117, score-0.094]

50 Each feature extracted from the target image is assigned to the corresponding visual word(s). [sent-120, score-0.772]

51 Moreover, the input of spatial binning can be further refined by introducing localization. [sent-122, score-0.216]

52 Three different types of localization are typically used: global, object, and surrounding. [sent-123, score-0.097]

53 Global extracts visual information from the whole image and it is also the default option when the localization information is missing. [sent-124, score-0.901]

54 Object extracts visual information from the object location only and the surrounding extracts visual information from outside the object location. [sent-125, score-0.986]

55 Localization itself can either be done by humans (or ground truth annotation) but also by existing localization methods (Uijlings et al. [sent-126, score-0.097]

56 For localization, VSEM uses annotated object locations (in the format of bounding boxes) of the target object. [sent-128, score-0.065]

57 Aggregation Since each concept is represented by multiple images, an aggregation function for pooling the visual word occurrences across images has to be defined. [sent-129, score-0.679]

58 An example for the aggregation step is sketched in 189 ++ + aaggggrreeggaattioionn cat = Figure 3: An example of a concept representation pipeline for cat. [sent-131, score-0.228]

59 First, several images depicting a cat are represented as vectors of visual word counts and, second, the vectors are aggregated into one single concept vector. [sent-132, score-0.799]

60 Transformations Once the conceptrepresenting visual vectors are built, two types of transformation can be performed over them to refine their raw visual word counts: association scores and dimensionality reduction. [sent-135, score-0.88]

61 So far, the vectors that we have obtained represent cooccurrence counts of visual words with concepts. [sent-136, score-0.473]

62 On the other hand, dimensionality reduction leads to matrices that are smaller and easier to work with. [sent-139, score-0.046]

63 Common dimensionality reduction methods are singular value decomposition (Manning et al. [sent-141, score-0.046]

64 3 Framework design VSEM offers a friendly implementation of the pipeline described in Section 2. [sent-144, score-0.04]

65 The framework is organized into five parts, which correspond to an equal number ofMATLAB packages and it is written in object-oriented programming to encourage reusability. [sent-145, score-0.046]

66 • • This package contains the code that manages the image data sets. [sent-147, score-0.446]

67 Therefore, to use a new image data set two solutions are possible: either write a new class which extends GenericDataset or use directly VsemDataset after having rearranged the new data as described in help Vs emDat a s et . [sent-149, score-0.376]

68 datasets vision This package contains the code for extracting the bag-of-visual-words representation of images. [sent-150, score-0.167]

69 Nevertheless, if the user wants to add new functionalities such as new features or encodings, this is possible by simply extending the corresponding generic classes and the class VsemHi st ogramExt ract or. [sent-152, score-0.229]

70 • • • concepts This is the package that deals with the construction of the image-based representation of concepts. [sent-153, score-0.2]

71 It applies the image analysis methods to obtain the BoVW representation of the image data and then aggregates visual word counts conceptwise. [sent-155, score-1.237]

72 The main class of this package is ConceptSpace, which takes care of storing concepts names and vectors and provides managing and transformation utilities as its methods. [sent-156, score-0.188]

73 VSEM offers a benchmarking suite to assess the quality of the visual concept representations. [sent-157, score-0.48]

74 For example, it can be used to find the optimal parametrization of the visual pipeline. [sent-158, score-0.396]

75 This package contains supporting There is a general helpers with functionalities shared across packages and several package specific helpers. [sent-160, score-0.283]

76 190 Documentation All the MATLAB commands of VSEM are self documented (e. [sent-165, score-0.055]

77 help vsem) and an HTML version of the MATLAB command documentation is available from the VSEM website. [sent-167, score-0.045]

78 The Pascal VOC demo The Pascal VOC demo provides a comprehensive example of the workings of VSEM. [sent-168, score-0.102]

79 Additional settings are available and documented for each function, class or package in the toolbox (see Documentation). [sent-171, score-0.125]

80 Running the demo file executes the following lines of code and returns as output ConceptSpace, which contains the visual concept representations for the Pascal data set. [sent-172, score-0.585]

81 % C reat e a mat l ab st ructure with the % who l s et o f image s in the P a s cal e % dat a s et along with the i annot at i r on dat a s et = dat a s et s . [sent-173, score-0.753]

82 image sP ath ’ on annot at ionFo lde r ’ configurat ion . [sent-175, score-0.475]

83 annot at ionP ath ) ; , , % I nit i e the c l s s that handle s at a % the ext ract i on o f vi sua l feature s . [sent-176, score-0.251]

84 featureExt ract or = vis ion feature s PhowFeatureExt ract or ( ) ; . [sent-177, score-0.32]

85 % C reat e the visual vo cabul ary vocabul ary = KmeansVocabu l ary t rainVocabul ary ( dat a s et featureExt ract or ) ; , . [sent-179, score-0.892]

86 % Cal culate s emant i vect ors c concept Space = conceptExt ract or . [sent-180, score-0.232]

87 ext ractConcept s ( dat a s et hi st ogramExt ract or ) ; , % Comput e po intwi s e mutua l % informat ion concept Space = concept Space . [sent-181, score-0.469]

88 rewe ight ( ) ; % Conc lude the demo , comput ing % the s imi l ity o f co rre l i ar at on % mea sure s o f the 19 0 po s s ible % pai r o f concept s from the P a s cal % dat a s et aga inst a go ld st andard [ corre lat ionS core , p-value ] = s imi larityBenchmark . [sent-182, score-0.393]

89 computeBenchmark ( concept Space ,s imi larityExt ract or ) ; 5 Conclusions We have introduced VSEM, an open library for visual semantics. [sent-183, score-0.711]

90 With VSEM it is possible to extract visual semantic information from tagged images and arrange such information into concept representations according to the tenets of distributional semantics, as applied to images instead of text. [sent-184, score-0.857]

91 To analyze images, it uses state-of-the-art techniques such as the SIFT features and the bagof-visual-words with spatial pyramid and Fisher encoding. [sent-185, score-0.161]

92 In the future, we would like to add automatic localization strategies, new aggregation functions and a completely new package for fusing image- and text-based representations. [sent-186, score-0.217]

93 Integrating experiential and distributional data to learn semantic representations. [sent-189, score-0.052]

94 Strudel: A distributional semantic model based on properties and types. [sent-201, score-0.052]

95 Distributional semantics with eyes: Using image analysis to improve computational representations of word meaning. [sent-228, score-0.429]

96 The devil is in the details: an evaluation of recent feature encoding methods. [sent-232, score-0.085]

97 A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. [sent-262, score-0.054]

98 Redundancy in perceptual and linguistic experience: Comparing feature-based and distributional models of semantic representation. [sent-306, score-0.09]

99 Video Google: A text retrieval approach to object matching in videos. [sent-310, score-0.065]

100 Vlfeat an open and portable library of computer vision algorithms. [sent-332, score-0.08]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vsem', 0.504), ('visual', 0.396), ('image', 0.376), ('bovw', 0.21), ('bruni', 0.193), ('images', 0.149), ('ract', 0.148), ('spatial', 0.135), ('localization', 0.097), ('encoding', 0.085), ('vlfeat', 0.084), ('concept', 0.084), ('binning', 0.081), ('concepts', 0.076), ('dat', 0.075), ('uijlings', 0.074), ('package', 0.07), ('elia', 0.068), ('object', 0.065), ('baroni', 0.064), ('chatfield', 0.063), ('grauman', 0.063), ('perronnin', 0.063), ('vedaldi', 0.063), ('berg', 0.061), ('fisher', 0.061), ('ary', 0.059), ('unitn', 0.057), ('voc', 0.056), ('documented', 0.055), ('functionalities', 0.055), ('representation', 0.054), ('distributional', 0.052), ('demo', 0.051), ('annot', 0.051), ('matlab', 0.051), ('depicting', 0.051), ('cvpr', 0.051), ('aggregation', 0.05), ('leong', 0.048), ('sift', 0.048), ('trento', 0.048), ('local', 0.048), ('packages', 0.046), ('imi', 0.046), ('dimensionality', 0.046), ('documentation', 0.045), ('vision', 0.043), ('vectors', 0.042), ('barnard', 0.042), ('bordignon', 0.042), ('conceptspace', 0.042), ('eccv', 0.042), ('emdat', 0.042), ('featureext', 0.042), ('gmm', 0.042), ('helpers', 0.042), ('jasper', 0.042), ('leibe', 0.042), ('mikolajczyk', 0.042), ('ogramext', 0.042), ('opencv', 0.042), ('riordan', 0.042), ('sivic', 0.042), ('vsemdataset', 0.042), ('pipeline', 0.04), ('marco', 0.039), ('cal', 0.038), ('perceptual', 0.038), ('pascal', 0.038), ('lazebnik', 0.037), ('affine', 0.037), ('zisserman', 0.037), ('reat', 0.037), ('feng', 0.037), ('library', 0.037), ('counts', 0.035), ('farhadi', 0.034), ('invariant', 0.032), ('disciplines', 0.032), ('tamara', 0.032), ('extracts', 0.032), ('andrews', 0.031), ('channels', 0.031), ('quantization', 0.031), ('brian', 0.03), ('ext', 0.028), ('representations', 0.027), ('multimedia', 0.027), ('kulkarni', 0.027), ('file', 0.027), ('community', 0.027), ('comput', 0.027), ('st', 0.026), ('semantics', 0.026), ('pyramid', 0.026), ('approximating', 0.026), ('hinton', 0.025), ('ath', 0.024), ('ion', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 380 acl-2013-VSEM: An open library for visual semantics representation

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

2 0.47988269 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Author: Elia Bruni ; Marco Baroni

Abstract: unkown-abstract

3 0.36673456 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

4 0.35118589 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

5 0.14290448 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency

Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.

6 0.12481728 29 acl-2013-A Visual Analytics System for Cluster Exploration

7 0.085213855 175 acl-2013-Grounded Language Learning from Video Described with Sentences

8 0.072254956 370 acl-2013-Unsupervised Transcription of Historical Documents

9 0.063072622 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics

10 0.06229746 238 acl-2013-Measuring semantic content in distributional vectors

11 0.057628315 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit

12 0.048209332 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models

13 0.047432091 126 acl-2013-Diverse Keyword Extraction from Conversations

14 0.046680965 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics

15 0.046080902 217 acl-2013-Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information

16 0.043515991 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

17 0.043024607 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

18 0.042261366 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

19 0.041227352 219 acl-2013-Learning Entity Representation for Entity Disambiguation

20 0.038690604 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.118), (1, 0.063), (2, 0.001), (3, -0.092), (4, -0.081), (5, -0.159), (6, 0.107), (7, -0.043), (8, -0.102), (9, 0.184), (10, -0.379), (11, -0.328), (12, 0.082), (13, 0.256), (14, 0.189), (15, 0.012), (16, 0.056), (17, 0.012), (18, -0.093), (19, 0.042), (20, 0.099), (21, 0.097), (22, -0.006), (23, 0.021), (24, -0.035), (25, -0.03), (26, -0.041), (27, -0.049), (28, 0.018), (29, -0.015), (30, 0.008), (31, -0.027), (32, 0.014), (33, 0.015), (34, -0.026), (35, 0.005), (36, 0.015), (37, -0.001), (38, 0.009), (39, -0.037), (40, 0.008), (41, -0.019), (42, -0.001), (43, -0.001), (44, 0.014), (45, -0.014), (46, -0.029), (47, 0.004), (48, 0.01), (49, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98236036 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Author: Elia Bruni ; Marco Baroni

Abstract: unkown-abstract

same-paper 2 0.96022993 380 acl-2013-VSEM: An open library for visual semantics representation

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

3 0.88908058 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

4 0.85415173 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

5 0.45669568 29 acl-2013-A Visual Analytics System for Cluster Exploration

Author: Andreas Lamprecht ; Annette Hautli ; Christian Rohrdantz ; Tina Bogel

Abstract: This paper offers a new way of representing the results of automatic clustering algorithms by employing a Visual Analytics system which maps members of a cluster and their distance to each other onto a twodimensional space. A case study on Urdu complex predicates shows that the system allows for an appropriate investigation of linguistically motivated data. 1 Motivation In recent years, Visual Analytics systems have increasingly been used for the investigation of linguistic phenomena in a number of different areas, starting from literary analysis (Keim and Oelke, 2007) to the cross-linguistic comparison of language features (Mayer et al., 2010a; Mayer et al., 2010b; Rohrdantz et al., 2012a) and lexical semantic change (Rohrdantz et al., 2011; Heylen et al., 2012; Rohrdantz et al., 2012b). Visualization has also found its way into the field of computational linguistics by providing insights into methods such as machine translation (Collins et al., 2007; Albrecht et al., 2009) or discourse parsing (Zhao et al., 2012). One issue in computational linguistics is the interpretability of results coming from machine learning algorithms and the lack of insight they offer on the underlying data. This drawback often prevents theoretical linguists, who work with computational models and need to see patterns on large data sets, from drawing detailed conclusions. The present paper shows that a Visual Analytics system facilitates “analytical reasoning [...] by an interactive visual interface” (Thomas and Cook, 2006) and helps resolving this issue by offering a customizable, in-depth view on the statistically generated result and simultaneously an at-a-glance overview of the overall data set. In particular, we focus on the visual representa- tion of automatically generated clusters, in itself not a novel idea as it has been applied in other fields like the financial sector, biology or geography (Schreck et al., 2009). But as far as the literature is concerned, interactive systems are still less common, particularly in computational linguistics, and they have not been designed for the specific needs of theoretical linguists. This paper offers a method of visually encoding clusters and their internal coherence with an interactive user interface, which allows users to adjust underlying parameters and their views on the data depending on the particular research question. By this, we partly open up the “black box” of machine learning. The linguistic phenomenon under investigation, for which the system has originally been designed, is the varied behavior of nouns in N+V CP complex predicates in Urdu (e.g., memory+do = ‘to remember’) (Mohanan, 1994; Ahmed and Butt, 2011), where, depending on the lexical semantics of the noun, a set of different light verbs is chosen to form a complex predicate. The aim is an automatic detection of the different groups of nouns, based on their light verb distribution. Butt et al. (2012) present a static visualization for the phenomenon, whereas the present paper proposes an interactive system which alleviates some of the previous issues with respect to noise detection, filtering, data interaction and cluster coherence. For this, we proceed as follows: section 2 explains the proposed Visual Analytics system, followed by the linguistic case study in section 3. Section 4 concludes the paper. 2 The system The system requires a plain text file as input, where each line corresponds to one data object.In our case, each line corresponds to one Urdu noun (data object) and contains its unique ID (the name of the noun) and its bigram frequencies with the 109 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 109–1 4, four light verbs under investigation, namely kar ‘do’, ho ‘be’, hu ‘become’ and rakH ‘put’ ; an exemplary input file is shown in Figure 1. From a data analysis perspective, we have four- dimensional data objects, where each dimension corresponds to a bigram frequency previously extracted from a corpus. Note that more than four dimensions can be loaded and analyzed, but for the sake of simplicity we focus on the fourdimensional Urdu example for the remainder of this paper. Moreover, it is possible to load files containing absolute bigram frequencies and relative frequencies. When loading absolute frequencies, the program will automatically calculate the relative frequencies as they are the input for the clustering. The absolute frequencies, however, are still available and can be used for further processing (e.g. filtering). Figure 1: preview of appropriate file structures 2.1 Initial opening and processing of a file It is necessary to define a metric distance function between data objects for both clustering and visualization. Thus, each data object is represented through a high dimensional (in our example fourdimensional) numerical vector and we use the Euclidean distance to calculate the distances between pairs of data objects. The smaller the distance between two data objects, the more similar they are. For visualization, the high dimensional data is projected onto the two-dimensional space of a computer screen using a principal component analysis (PCA) algorithm1 . In the 2D projection, the distances between data objects in the highdimensional space, i.e. the dissimilarities of the bigram distributions, are preserved as accurately as possible. However, when projecting a highdimensional data space onto a lower dimension, some distinctions necessarily level out: two data objects may be far apart in the high-dimensional space, but end up closely together in the 2D projection. It is important to bear in mind that the 2D visualization is often quite insightful, but interpre1http://workshop.mkobos.com/201 1/java-pca- transformation-library/ tations have to be verified by interactively investigating the data. The initial clusters are calculated (in the highdimensional data space) using a default k-Means algorithm2 with k being a user-defined parameter. There is also the option of selecting another clustering algorithm, called the Greedy Variance Minimization3 (GVM), and an extension to include further algorithms is under development. 2.2 Configuration & Interaction 2.2.1 The main window The main window in Figure 2 consists of three areas, namely the configuration area (a), the visualization area (b) and the description area (c). The visualization area is mainly built with the piccolo2d library4 and initially shows data objects as colored circles with a variable diameter, where color indicates cluster membership (four clusters in this example). Hovering over a dot displays information on the particular noun, the cluster membership and the light verb distribution in the de- scription area to the right. By using the mouse wheel, the user can zoom in and out of the visualization. A very important feature for the task at hand is the possibility to select multiple data objects for further processing or for filtering, with a list of selected data objects shown in the description area. By right-clicking on these data objects, the user can assign a unique class (and class color) to them. Different clustering methods can be employed using the options item in the menu bar. Another feature of the system is that the user can fade in the cluster centroids (illustrated by a larger dot in the respective cluster color in Figure 2), where the overall feature distribution of the cluster can be examined in a tooltip hovering over the corresponding centroid. 2.2.2 Visually representing data objects To gain further insight into the data distribution based on the 2D projection, the user can choose between several ways to visualize the individual data objects, all of which are shown in Figure 3. The standard visualization type is shown on the left and consists of a circle which encodes cluster membership via color. 2http://java-ml.sourceforge.net/api/0.1.7/ (From the JML library) 3http://www.tomgibara.com/clustering/fast-spatial/ 4http://www.piccolo2d.org/ 110 Figure 2: Overview of the main window of the system, including the configuration area (a), the visualization area (b) and the description area (c). Large circles are cluster centroids. Figure 3: Different visualizations of data points Alternatively, normal glyphs and star glyphs can be displayed. The middle part of Figure 3 shows the data displayed with normal glyphs. In linestarinorthpsiflvtrheinorqsbgnutheviasnemdocwfya,proepfthlpdienaoecsr.nihetloa Titnghve det clockwise around the center according to their occurrence in the input file. This view has the advantage that overall feature dominance in a cluster can be seen at-a-glance. The visualization type on the right in Figure 3 agislnycpaehlxset. dnstHhioe nrset ,oarthngeolyrmlpinhae,l endings are connected, forming a “star”. As in the representation with the glyphs, this makes similar data objects easily recognizable and comparable with each other. 2.2.3 Filtering options Our systems offers options for filtering data ac- cording to different criteria. Filter by means of bigram occurrence By activating the bigram occurrence filtering, it is possible to only show those nouns, which occur in bigrams with a certain selected subset of all features (light verbs) only. This is especially useful when examining possible commonalities. Filter selected words Another opportunity of showing only items of interest is to select and display them separately. The PCA is recalculated for these data objects and the visualization is stretched to the whole area. 111 Filter selected cluster Additionally, the user can visualize a specific cluster of interest. Again, the PCA is recalculated and the visualization stretched to the whole area. The cluster can then be manually fine-tuned and cleaned, for instance by removing wrongly assigned items. 2.2.4 Options to handle overplotting Due to the nature of the data, much overplotting occurs. For example, there are many words, which only occur with one light verb. The PCA assigns the same position to these words and, as a consequence, only the top bigram can be viewed in the visualization. In order to improve visual access to overplotted data objects, several methods that allow for a more differentiated view of the data have been included and are described in the following paragraphs. Change transparency of data objects By modifying the transparency with the given slider, areas with a dense data population can be readily identified, as shown in the following example: Repositioning of data objects To reduce the overplotting in densely populated areas, data objects can be repositioned randomly having a fixed deviation from their initial position. The degree of deviation can be interactively determined by the user employing the corresponding slider: The user has the option to reposition either all data objects or only those that are selected in advance. Frequency filtering If the initial data contains absolute bigram frequencies, the user can filter the visualized words by frequency. For example, many nouns occur only once and therefore have an observed probability of 100% for co-occurring with one of the light verbs. In most cases it is useful to filter such data out. Scaling data objects If the user zooms beyond the maximum zoom factor, the data objects are scaled down. This is especially useful, if data objects are only partly covered by many other objects. In this case, they become fully visible, as shown in the following example: 2.3 Alternative views on the data In order to enable a holistic analysis it is often valuable to provide the user with different views on the data. Consequently, we have integrated the option to explore the data with further standard visualization methods. 2.3.1 Correlation matrix The correlation matrix in Figure 4 shows the correlations between features, which are visualized by circles using the following encoding: The size of a circle represents the correlation strength and the color indicates whether the corresponding features are negatively (white) or positively (black) correlated. Figure 4: example of a correlation matrix 2.3.2 Parallel coordinates The parallel coordinates diagram shows the distribution of the bigram frequencies over the different dimensions (Figure 5). Every noun is represented with a line, and shows, when hovered over, a tooltip with the most important information. To filter the visualized words, the user has the option of displaying previously selected data objects, or s/he can restrict the value range for a feature and show only the items which lie within this range. 2.3.3 Scatter plot matrix To further examine the relation between pairs of features, a scatter plot matrix can be used (Figure 6). The individual scatter plots give further insight into the correlation details of pairs of features. 112 Figure 5: Parallel coordinates diagram Figure 6: Example showing a scatter plot matrix. 3 Case study In principle, the Visual Analytics system presented above can be used for any kind of cluster visualization, but the built-in options and add-ons are particularly designed for the type of work that linguists tend to be interested in: on the one hand, the user wants to get a quick overview of the overall patterns in the phenomenon, but on the same time, the system needs to allow for an in-depth data inspection. Both is given in the system: The overall cluster result shown in Figure 2 depicts the coherence of clusters and therefore the overall pattern of the data set. The different glyph visualizations in Figure 3 illustrate the properties of each cluster. Single data points can be inspected in the description area. The randomization of overplotted data points helps to see concentrated cluster patterns where light verbs behave very similarly in different noun+verb complex predicates. The biggest advantage of the system lies in the ability for interaction: Figure 7 shows an example of the visualization used in Butt et al. (2012), the input being the same text file as shown in Figure 1. In this system, the relative frequencies of each noun with each light verb is correlated with color saturation the more saturated the color to the right of the noun, the higher the relative frequency of the light verb occurring with it. The number of the cluster (here, 3) and the respective nouns (e.g. kAm ‘work’) is shown to the left. The user does — not get information on the coherence of the cluster, nor does the visualization show prototypical cluster patterns. Figure 7: Cluster visualization in Butt et al. (2012) Moreover, the system in Figure 7 only has a limited set of interaction choices, with the consequence that the user is not able to adjust the underlying data set, e.g. by filtering out noise. However, Butt et al. (2012) report that the Urdu data is indeed very noisy and requires a manual cleaning of the data set before the actual clustering. In the system presented here, the user simply marks conspicuous regions in the visualization panel and removes the respective data points from the original data set. Other filtering mechanisms, e.g. the removal of low frequency items which occur due to data sparsity issues, can be removed from the overall data set by adjusting the parameters. A linguistically-relevant improvement lies in the display of cluster centroids, in other words the typical noun + light verb distribution of a cluster. This is particularly helpful when the linguist wants to pick out prototypical examples for the cluster in order to stipulate generalizations over the other cluster members. 113 4 Conclusion In this paper, we present a novel visual analytics system that helps to automatically analyze bigrams extracted from corpora. The main purpose is to enable a more informed and steered cluster analysis than currently possible with standard methods. This includes rich options for interaction, e.g. display configuration or data manipulation. Initially, the approach was motivated by a concrete research problem, but has much wider applicability as any kind of high-dimensional numerical data objects can be loaded and analyzed. However, the system still requires some basic understanding about the algorithms applied for clustering and projection in order to prevent the user to draw wrong conclusions based on artifacts. Bearing this potential pitfall in mind when performing the analysis, the system enables a much more insightful and informed analysis than standard noninteractive methods. In the future, we aim to conduct user experiments in order to learn more about how the functionality and usability could be further enhanced. Acknowledgments This work was partially funded by the German Research Foundation (DFG) under grant BU 1806/7-1 “Visual Analysis of Language Change and Use Patterns” and the German Fed- eral Ministry of Education and Research (BMBF) under grant 01461246 “VisArgue” under research grant. References Tafseer Ahmed and Miriam Butt. 2011. Discovering Semantic Classes for Urdu N-V Complex Predicates. In Proceedings of the international Conference on Computational Semantics (IWCS 2011), pages 305–309. Joshua Albrecht, Rebecca Hwa, and G. Elisabeta Marai. 2009. The Chinese Room: Visualization and Interaction to Understand and Correct Ambiguous Machine Translation. Comput. Graph. Forum, 28(3): 1047–1054. Miriam Butt, Tina B ¨ogel, Annette Hautli, Sebastian Sulger, and Tafseer Ahmed. 2012. Identifying Urdu Complex Predication via Bigram Extraction. In In Proceedings of COLING 2012, Technical Papers, pages 409 424, Mumbai, India. Christopher Collins, M. Sheelagh T. Carpendale, and Gerald Penn. 2007. Visualization of Uncertainty in Lattices to Support Decision-Making. In EuroVis 2007, pages 5 1–58. Eurographics Association. Kris Heylen, Dirk Speelman, and Dirk Geeraerts. 2012. Looking at word meaning. An interactive visualization of Semantic Vector Spaces for Dutch – synsets. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, pages 16–24. Daniel A. Keim and Daniela Oelke. 2007. Literature Fingerprinting: A New Method for Visual Literary Analysis. In IEEE VAST 2007, pages 115–122. IEEE. Thomas Mayer, Christian Rohrdantz, Miriam Butt, Frans Plank, and Daniel A. Keim. 2010a. Visualizing Vowel Harmony. Linguistic Issues in Language Technology, 4(Issue 2): 1–33, December. Thomas Mayer, Christian Rohrdantz, Frans Plank, Peter Bak, Miriam Butt, and Daniel A. Keim. 2010b. Consonant Co-Occurrence in Stems across Languages: Automatic Analysis and Visualization of a Phonotactic Constraint. In Proceedings of the 2010 Workshop on NLP andLinguistics: Finding the Common Ground, pages 70–78, Uppsala, Sweden, July. Association for Computational Linguistics. Tara Mohanan. 1994. Argument Structure in Hindi. Stanford: CSLI Publications. Christian Rohrdantz, Annette Hautli, Thomas Mayer, Miriam Butt, Frans Plank, and Daniel A. Keim. 2011. Towards Tracking Semantic Change by Visual Analytics. In ACL 2011 (Short Papers), pages 305–3 10, Portland, Oregon, USA, June. Association for Computational Linguistics. Christian Rohrdantz, Michael Hund, Thomas Mayer, Bernhard W ¨alchli, and Daniel A. Keim. 2012a. The World’s Languages Explorer: Visual Analysis of Language Features in Genealogical and Areal Contexts. Computer Graphics Forum, 3 1(3):935–944. Christian Rohrdantz, Andreas Niekler, Annette Hautli, Miriam Butt, and Daniel A. Keim. 2012b. Lexical Semantics and Distribution of Suffixes - A Visual Analysis. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, pages 7–15, April. Tobias Schreck, J ¨urgen Bernard, Tatiana von Landesberger, and J o¨rn Kohlhammer. 2009. Visual cluster analysis of trajectory data with interactive kohonen maps. Information Visualization, 8(1): 14–29. James J. Thomas and Kristin A. Cook. 2006. A Visual Analytics Agenda. IEEE Computer Graphics and Applications, 26(1): 10–13. Jian Zhao, Fanny Chevalier, Christopher Collins, and Ravin Balakrishnan. 2012. Facilitating Discourse Analysis with Interactive Visualization. IEEE Trans. Vis. Comput. Graph., 18(12):2639–2648. 114

6 0.42553994 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

7 0.42375079 175 acl-2013-Grounded Language Learning from Video Described with Sentences

8 0.37278336 370 acl-2013-Unsupervised Transcription of Historical Documents

9 0.31061843 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

10 0.29422745 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars

11 0.28619674 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

12 0.23893984 381 acl-2013-Variable Bit Quantisation for LSH

13 0.22809681 126 acl-2013-Diverse Keyword Extraction from Conversations

14 0.22567298 279 acl-2013-PhonMatrix: Visualizing co-occurrence constraints of sounds

15 0.21307871 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics

16 0.21125875 238 acl-2013-Measuring semantic content in distributional vectors

17 0.20984767 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

18 0.20834275 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit

19 0.20388095 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse

20 0.20353799 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.064), (6, 0.022), (11, 0.048), (14, 0.015), (24, 0.034), (26, 0.05), (35, 0.074), (42, 0.034), (47, 0.259), (48, 0.055), (64, 0.01), (70, 0.152), (88, 0.019), (90, 0.017), (95, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80906236 380 acl-2013-VSEM: An open library for visual semantics representation

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

2 0.63938266 288 acl-2013-Punctuation Prediction with Transition-based Parsing

Author: Dongdong Zhang ; Shuangzhi Wu ; Nan Yang ; Mu Li

Abstract: Punctuations are not available in automatic speech recognition outputs, which could create barriers to many subsequent text processing tasks. This paper proposes a novel method to predict punctuation symbols for the stream of words in transcribed speech texts. Our method jointly performs parsing and punctuation prediction by integrating a rich set of syntactic features when processing words from left to right. It can exploit a global view to capture long-range dependencies for punctuation prediction with linear complexity. The experimental results on the test data sets of IWSLT and TDT4 show that our method can achieve high-level performance in punctuation prediction over the stream of words in transcribed speech text. 1

3 0.62285924 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

Author: Shay B. Cohen ; Mark Johnson

Abstract: Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.

4 0.62228566 220 acl-2013-Learning Latent Personas of Film Characters

Author: David Bamman ; Brendan O'Connor ; Noah A. Smith

Abstract: We present two latent variable models for learning character types, or personas, in film, in which a persona is defined as a set of mixtures over latent lexical classes. These lexical classes capture the stereotypical actions of which a character is the agent and patient, as well as attributes by which they are described. As the first attempt to solve this problem explicitly, we also present a new dataset for the text-driven analysis of film, along with a benchmark testbed to help drive future work in this area.

5 0.61681503 296 acl-2013-Recognizing Identical Events with Graph Kernels

Author: Goran Glavas ; Jan Snajder

Abstract: Identifying news stories that discuss the same real-world events is important for news tracking and retrieval. Most existing approaches rely on the traditional vector space model. We propose an approach for recognizing identical real-world events based on a structured, event-oriented document representation. We structure documents as graphs of event mentions and use graph kernels to measure the similarity between document pairs. Our experiments indicate that the proposed graph-based approach can outperform the traditional vector space model, and is especially suitable for distinguishing between topically similar, yet non-identical events.

6 0.61492831 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

7 0.6065641 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

8 0.60340393 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

9 0.59534729 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

10 0.58704656 249 acl-2013-Models of Semantic Representation with Visual Attributes

11 0.57481378 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

12 0.5692578 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

13 0.567173 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

14 0.55363631 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

15 0.55207753 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

16 0.54912484 80 acl-2013-Chinese Parsing Exploiting Characters

17 0.54530811 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

18 0.54403341 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

19 0.54266548 224 acl-2013-Learning to Extract International Relations from Political Context

20 0.54132593 275 acl-2013-Parsing with Compositional Vector Grammars