acl acl2013 acl2013-167 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. [sent-3, score-0.258]
2 However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. [sent-4, score-0.573]
3 We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. [sent-5, score-0.883]
4 Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. [sent-6, score-0.351]
5 Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer. [sent-7, score-1.412]
6 1 Introduction The vast number of online images with accompanying text raises hope for drawing synergistic connections between human language technologies and computer vision. [sent-8, score-0.288]
7 However, subtleties and complexity in the relationship between image content and text make exploiting paired visual-textual data an open and interesting problem. [sent-9, score-0.573]
8 ” “Sections of the bridge sitting in the Dyer Construction yard south of Cabelas Driver. [sent-11, score-0.045]
9 ” Figure 1: Examples of captions that are not readily applicable to other visually similar images. [sent-13, score-0.505]
10 text from the retrieved samples to the query image (e. [sent-14, score-0.549]
11 Feng and Lapata (2010a), Feng and Lapata (2010b)) uses computer vision to bias summarization of text associated with images to produce descriptions. [sent-22, score-0.375]
12 All of these approaches rely on existing text that describes visual content, but many times existing image descriptions contain significant amounts of extraneous, non-visual, or otherwise non-desirable content. [sent-23, score-0.73]
13 The goal of this paper is to develop techniques to automatically clean up visually descriptive text to make it more directly usable for applications exploiting the connection between images and language. [sent-24, score-0.411]
14 As a concrete example, consider the first image in Figure 1. [sent-25, score-0.538]
15 This caption was written by the photo owner and therefore contains information related to the context of when and where the photo was taken. [sent-26, score-0.424]
16 Objects such as “lamp”, “door”, “camera” are not visually present in the photo. [sent-27, score-0.187]
17 The second image shows a similar but somewhat different issue. [sent-28, score-0.51]
18 Its caption describes visible objects such as “bridge ” and “yard”, but “Cabelas Driver” are overly specific and not visually detectable. [sent-29, score-0.57]
19 1 This phenomenon of information gap between the visual content of the images and their corresponding narratives has been studied closely by Dodge et al. [sent-35, score-0.465]
20 The content misalignment between images and text limits the extent to which visual detectors can learn meaningful mappings between images and text. [sent-37, score-0.689]
21 To tackle this challenge, we introduce the new task of image caption generalization that rewrites captions to be more visually relevant and more readily applicable to other visually similar images. [sent-38, score-1.597]
22 Our end goal is to convert noisy imagetext pairs in the wild (Ordonez et al. [sent-39, score-0.045]
23 , 2011) into pairs with tighter content alignment, resulting in new simplified captions over 1 million images. [sent-40, score-0.445]
24 Evaluation results show both the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus. [sent-41, score-0.495]
25 The new parallel corpus will be made publicly available. [sent-42, score-0.029]
26 Dependencybased constraints guide the generalized caption 1Open domain computer vision remains to be an open problem, and it would be difficult to reliably distinguish pictures of subtle visual differences, e. [sent-44, score-0.808]
27 , pictures of “a water front house with a docked boat” from those of “a floating house pulled by a boat”. [sent-46, score-0.203]
28 Content Selection Visual Estimates: The computer vision system used consists of 7404 visual classifiers for recognizing leaf level WordNet synsets (Fellbaum, 1998). [sent-65, score-0.329]
29 Each classifier is trained using labeled images from the ImageNet dataset (Deng et al. [sent-66, score-0.224]
30 , 2009) an image database of over 14 million hand labeled images orga– – nized according to the WordNet hierarchy. [sent-67, score-0.765]
31 Idf driven scores to favor salient – topics, as those are more likely to generalize across many different images. [sent-73, score-0.035]
32 Additionally, we assign a very low content selection score (−∞) for proper nouns aonwd c onunmtebnetr ses aenctdi a very high score (larger then maximum idf or visual score) for the 2k most frequent words in our corpus. [sent-74, score-0.241]
33 Local Linguistic Fluency: We model linguistic fluency with 3-gram conditional probabilities: xl(yi−1), (1) ψ(xl(yi), xl(yi−2)) = p(xl(yi) |xl(yi−2), xl(yi−1)) We experiment with two different ngram statistics, one extracted from the Google Web 1T corpus (Brants and Franz. [sent-75, score-0.087]
34 Dependency-driven Constraints: Table 1 defines the list of dependencies used as constraints driven from the typed dependencies (de Marneffe and Manning, 2009; de Marneffe et al. [sent-78, score-0.257]
35 We determine the uni- or bi-directionality of these constraints by manually examining a few example sentences corresponding to each of these typed dependencies. [sent-83, score-0.124]
36 Note that some dependencies such as det(←→) would hold regardless of the particular det(←→) 3Code wo was provided by Deng et al. [sent-84, score-0.049]
37 Those dependencies that we determine as largely context dependent are marked with * in Table 1. [sent-92, score-0.049]
38 One could consider enforcing all dependency constraints in Table 1 as hard constraints so that the compressed sentence must not violate any of those directed dependency constraints. [sent-93, score-0.306]
39 Doing so would lead to overly conservative compression with least compression ratio however. [sent-94, score-0.307]
40 Therefore, we relax those that are largely context dependent as soft constraints (marked in Table 1 with *) by introducing a constant penalty term in the objective function. [sent-95, score-0.08]
41 Alternatively, the dependency based constraints can be learned statistically from the training corpus of paired original and compressed sentences. [sent-96, score-0.193]
42 In our work, hard constraints are based only on typed dependencies, and we find that long range dependencies occur infrequently in actual image descriptions, as plotted in Figure 2. [sent-99, score-0.683]
43 With this insight, we opt for decoding based on dynamic programming with dynamically adjusted Alternatively, one can find an approximate solution using Integer Linear Programming (e. [sent-100, score-0.03]
44 4 3 Evaluation Since there age caption ation using empirically is no existing benchmark data for imgeneralization, we crowdsource evaluAmazon Mechanical Turk (AMT). [sent-104, score-0.334]
45 We compare the following options: 4The required beam size at each step depends on how many words have dependency constraints involving any word following the current one – beam size is at most 2p, where p is the max number of words dependent on any future words. [sent-105, score-0.191]
46 • • • • • • ORIG: original uncompressed captions HUMAN: compressed by humans (See § 3. [sent-129, score-0.398]
47 2) SALIENCY: linguistic fluency + saliency-based content selection + dependency constraints VISUAL: linguistic fluency + visually-guided content selection + dependency constraints x W/O CONSTR: method x without dependency constraints NGRAM-ONLY: linguistic fluency only 3. [sent-130, score-0.726]
48 1 Intrinsic Evaluation: Forced Choice Turkers are provided with an image and two captions (produced by different methods) and are asked to select a better one, i. [sent-131, score-0.828]
49 , the most relevant and plausible caption that contains the least extraneous information. [sent-133, score-0.394]
50 We observe that VISUAL (full model with visually guided content selection) performs the best, being selected over SALIENCY (content-selection without visual information) in 72. [sent-135, score-0.428]
51 48% cases, and even over the original image caption in 81. [sent-136, score-0.844]
52 This forced-selection experiment between VISUAL and ORIG demonstrates the degree of noise prevalent in the image captions in the wild. [sent-138, score-0.828]
53 Of course, if compared against human-compressed captions, the automatic captions are preferred much less frequently in 19% of the cases. [sent-139, score-0.351]
54 In those 19% cases when automatic captions are preferred over human-compressed ones, it is sometimes that humans did not fully remove information that is not visually present or verifiable, and other times humans overly compressed. [sent-140, score-0.587]
55 To verify the utility of dependency-based constraints, we also compare two variations of VISUAL, with and without dependency-based constraints. [sent-141, score-0.036]
56 As expected, the algorithm with constraints is preferred in the majority of cases. [sent-142, score-0.113]
57 2 Extrinsic Evaluation: Image-based Caption Retrieval We evaluate the usefulness of our new image-text parallel corpus for automatic generation of image descriptions. [sent-144, score-0.577]
58 Here the task is to produce, for a query image, a relevant description, i. [sent-145, score-0.039]
59 (201 1), we produce a caption for a query image by finding top k most similar images within the 1M image-text corpus (Ordonez et al. [sent-149, score-1.107]
60 , 2011) and then transferring their captions to the query image. [sent-150, score-0.387]
61 Image similarity is computed using two global (whole) image descriptors. [sent-152, score-0.51]
62 The first is the GIST feature (Oliva and Torralba, 2001), an image descriptor related to perceptual characteristics of scenes naturalness, roughness, openness, etc. [sent-153, score-0.566]
63 The second descriptor is also a global image descriptor, computed by resizing the image into a “tiny image” (Torralba et al. [sent-154, score-1.076]
64 To find visually relevant images, we compute the similarity of the query image to im– 793 Hugewal ofglas AviewofthepostoficeMyfo tprintinaJamesthecatisThislitleboywas oCel phoneshotof at the Conference Centre in Yohohama Japan. [sent-156, score-0.781]
65 Wal ofglas tohfe icAsevidbie uwildoinftghefropmostAsandbox raelveaRvluaen ynt(i)n ogtin(csuetmeT. [sent-159, score-0.045]
66 Figure 4: Good (left three, in blue) and bad examples (right three, in red) of generalized captions ages in the whole dataset using an unweighted sum of gist similarity and tiny image similarity. [sent-161, score-0.935]
67 Gold standard (human compressed) captions are obtained using AMT for 1K images. [sent-162, score-0.318]
68 Strict matching gives credit only to identical words between the gold-standard caption and the automatically produced caption. [sent-164, score-0.334]
69 5 The best performing approach with semantic matching is VISUAL (with LM = Image corpus), improving BLEU, Precision, F-score substantially over those of ORIG, demonstrating the extrinsic utility of our newly generated image-text parallel corpus in comparison to the original database. [sent-167, score-0.107]
70 4 Related Work Several recent studies presented approaches to automatic caption generation for images (e. [sent-169, score-0.596]
71 The end goal of our work differs in that we aim to revise original image captions into 5We take Wu-Palmer Similarity as similarity measure (Wu and Palmer, 1994). [sent-177, score-0.828]
72 descriptions that are more general and align more closely to the visual image content. [sent-180, score-0.73]
73 , Turner and Charniak (2005), Filippova and Strube (2008)) in that there is not an in-domain training corpus to learn generalization patterns directly. [sent-183, score-0.061]
74 Future work includes exploring more direct supervision from human edited sample generalization (e. [sent-184, score-0.061]
75 5 Conclusion We have introduced the task of image caption generalization as a means to reduce noise in the parallel corpus of images and text. [sent-194, score-1.158]
76 Intrinsic and extrin- sic evaluations confirm that the captions in the resulting corpus align better with the image contents (are often preferred over the original captions by people), and can be practically more useful with respect to a concrete application. [sent-195, score-1.207]
77 Generating typed dependency parses from phrase structure parses. [sent-230, score-0.077]
78 Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. [sent-240, score-0.178]
79 Paraphrastic sentence compression with a character-based metric: Tightening without deletion. [sent-321, score-0.129]
80 Modeling the shape of the scene: a holistic representation of the spatial envelope. [sent-326, score-0.037]
81 80 million tiny images: a large dataset for non-parametric object and scene recognition. [sent-336, score-0.104]
wordName wordTfidf (topN-words)
[('image', 0.51), ('caption', 0.334), ('captions', 0.318), ('images', 0.224), ('visually', 0.187), ('visual', 0.178), ('berg', 0.166), ('ordonez', 0.161), ('xl', 0.129), ('compression', 0.129), ('tamara', 0.122), ('vision', 0.117), ('yi', 0.088), ('fluency', 0.087), ('constraints', 0.08), ('compressed', 0.08), ('kuznetsova', 0.074), ('yejin', 0.07), ('orig', 0.068), ('content', 0.063), ('generalization', 0.061), ('brook', 0.06), ('extraneous', 0.06), ('girish', 0.06), ('torralba', 0.06), ('vicente', 0.06), ('feng', 0.06), ('kulkarni', 0.059), ('house', 0.057), ('descriptor', 0.056), ('pulled', 0.056), ('stony', 0.056), ('boat', 0.056), ('farhadi', 0.056), ('lapata', 0.054), ('deng', 0.05), ('clarke', 0.049), ('overly', 0.049), ('dependencies', 0.049), ('cabelas', 0.045), ('dodge', 0.045), ('imagetext', 0.045), ('lamp', 0.045), ('ofglas', 0.045), ('photo', 0.045), ('yard', 0.045), ('tiny', 0.044), ('typed', 0.044), ('extrinsic', 0.042), ('descriptions', 0.042), ('mirella', 0.042), ('saliency', 0.04), ('cordeiro', 0.04), ('onybrook', 0.04), ('lazebnik', 0.04), ('oliva', 0.04), ('siming', 0.04), ('marneffe', 0.04), ('query', 0.039), ('beam', 0.039), ('generation', 0.038), ('intrinsic', 0.038), ('polina', 0.037), ('filippova', 0.037), ('yansong', 0.037), ('spatial', 0.037), ('utility', 0.036), ('turner', 0.035), ('pub', 0.035), ('imagenet', 0.035), ('driven', 0.035), ('alexander', 0.034), ('xi', 0.034), ('computer', 0.034), ('preferred', 0.033), ('dep', 0.033), ('tighter', 0.033), ('pictures', 0.033), ('door', 0.033), ('napoles', 0.033), ('dependency', 0.033), ('generalized', 0.032), ('choi', 0.032), ('million', 0.031), ('jia', 0.031), ('gist', 0.031), ('programming', 0.03), ('transferring', 0.03), ('accompanying', 0.03), ('daume', 0.03), ('association', 0.029), ('scene', 0.029), ('kai', 0.029), ('pattern', 0.029), ('parallel', 0.029), ('pyramid', 0.028), ('antonio', 0.028), ('det', 0.028), ('amt', 0.028), ('concrete', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999869 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
2 0.36814904 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Author: Elia Bruni ; Marco Baroni
Abstract: unkown-abstract
3 0.35118589 380 acl-2013-VSEM: An open library for visual semantics representation
Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya
Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.
4 0.27081841 249 acl-2013-Models of Semantic Representation with Visual Attributes
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
5 0.17745596 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse
Author: Ben King ; Rahul Jha ; Dragomir Radev ; Robert Mankoff
Abstract: In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways ofbeing realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1) factoid discovery, finding groups of words that correspond to the same factoid, and (2) factoid assignment, using these groups of words to mark collective discourse units that contain the respective factoids. We study this on two novel data sets: the New Yorker caption contest data set, and the crossword clues data set.
6 0.13170682 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
7 0.095716938 175 acl-2013-Grounded Language Learning from Video Described with Sentences
8 0.086436749 370 acl-2013-Unsupervised Transcription of Historical Documents
9 0.073752597 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
10 0.072709486 29 acl-2013-A Visual Analytics System for Cluster Exploration
11 0.070598632 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
12 0.058466833 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
13 0.055970464 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
14 0.053442344 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
15 0.05286311 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
16 0.052835915 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
17 0.052280847 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning
18 0.049916416 257 acl-2013-Natural Language Models for Predicting Programming Comments
19 0.048765063 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
20 0.048597798 126 acl-2013-Diverse Keyword Extraction from Conversations
topicId topicWeight
[(0, 0.147), (1, 0.023), (2, -0.013), (3, -0.062), (4, -0.054), (5, -0.108), (6, 0.12), (7, -0.048), (8, -0.128), (9, 0.119), (10, -0.316), (11, -0.266), (12, -0.024), (13, 0.232), (14, 0.163), (15, 0.038), (16, 0.126), (17, -0.013), (18, -0.079), (19, 0.055), (20, 0.085), (21, 0.105), (22, -0.016), (23, 0.025), (24, -0.029), (25, -0.044), (26, -0.058), (27, -0.006), (28, 0.038), (29, -0.022), (30, 0.015), (31, -0.046), (32, -0.035), (33, 0.047), (34, -0.002), (35, -0.058), (36, 0.023), (37, 0.011), (38, 0.037), (39, -0.058), (40, 0.029), (41, -0.009), (42, 0.063), (43, -0.017), (44, 0.064), (45, -0.046), (46, -0.037), (47, 0.003), (48, -0.01), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.93541199 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
2 0.9277674 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Author: Elia Bruni ; Marco Baroni
Abstract: unkown-abstract
3 0.90297884 380 acl-2013-VSEM: An open library for visual semantics representation
Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya
Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.
4 0.8241834 249 acl-2013-Models of Semantic Representation with Visual Attributes
Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata
Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.
5 0.42802304 175 acl-2013-Grounded Language Learning from Video Described with Sentences
Author: Haonan Yu ; Jeffrey Mark Siskind
Abstract: We present a method that learns representations for word meanings from short video clips paired with sentences. Unlike prior work on learning language from symbolic input, our input consists of video of people interacting with multiple complex objects in outdoor environments. Unlike prior computer-vision approaches that learn from videos with verb labels or images with noun labels, our labels are sentences containing nouns, verbs, prepositions, adjectives, and adverbs. The correspondence between words and concepts in the video is learned in an unsupervised fashion, even when the video depicts si- multaneous events described by multiple sentences or when different aspects of a single event are described with multiple sentences. The learned word meanings can be subsequently used to automatically generate description of new video.
6 0.42075938 29 acl-2013-A Visual Analytics System for Cluster Exploration
7 0.40598404 370 acl-2013-Unsupervised Transcription of Historical Documents
8 0.39780793 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
9 0.3622109 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse
10 0.31098786 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
11 0.29784852 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic
12 0.28546202 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
13 0.28191644 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
14 0.27457979 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
15 0.27103305 311 acl-2013-Semantic Neighborhoods as Hypergraphs
16 0.27037713 126 acl-2013-Diverse Keyword Extraction from Conversations
17 0.26990831 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars
18 0.25468048 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords
19 0.25081593 54 acl-2013-Are School-of-thought Words Characterizable?
20 0.24136524 381 acl-2013-Variable Bit Quantisation for LSH
topicId topicWeight
[(0, 0.039), (4, 0.012), (6, 0.029), (11, 0.055), (15, 0.01), (17, 0.174), (24, 0.032), (26, 0.048), (35, 0.094), (42, 0.044), (47, 0.015), (48, 0.04), (64, 0.013), (70, 0.122), (71, 0.011), (77, 0.018), (80, 0.017), (88, 0.023), (90, 0.03), (95, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.86059791 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi
Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.
2 0.80905688 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering
Author: Roi Reichart ; Anna Korhonen
Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.
3 0.72296774 220 acl-2013-Learning Latent Personas of Film Characters
Author: David Bamman ; Brendan O'Connor ; Noah A. Smith
Abstract: We present two latent variable models for learning character types, or personas, in film, in which a persona is defined as a set of mixtures over latent lexical classes. These lexical classes capture the stereotypical actions of which a character is the agent and patient, as well as attributes by which they are described. As the first attempt to solve this problem explicitly, we also present a new dataset for the text-driven analysis of film, along with a benchmark testbed to help drive future work in this area.
4 0.71028334 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
Author: Zhigang Wang ; Zhixing Li ; Juanzi Li ; Jie Tang ; Jeff Z. Pan
Abstract: Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.
5 0.7097562 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering
Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang
Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.
6 0.70711052 296 acl-2013-Recognizing Identical Events with Graph Kernels
8 0.70319915 249 acl-2013-Models of Semantic Representation with Visual Attributes
9 0.7016331 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs
10 0.69705331 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities
11 0.69617957 380 acl-2013-VSEM: An open library for visual semantics representation
12 0.69199944 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
13 0.69016343 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
14 0.68328202 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
15 0.68147814 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
16 0.6802063 224 acl-2013-Learning to Extract International Relations from Political Context
17 0.67935652 89 acl-2013-Computerized Analysis of a Verbal Fluency Test
18 0.67790544 80 acl-2013-Chinese Parsing Exploiting Characters
19 0.67553508 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
20 0.67530525 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews