acl acl2013 acl2013-167 knowledge-graph by maker-knowledge-mining

167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus


Source: pdf

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. [sent-3, score-0.258]

2 However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. [sent-4, score-0.573]

3 We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. [sent-5, score-0.883]

4 Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. [sent-6, score-0.351]

5 Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer. [sent-7, score-1.412]

6 1 Introduction The vast number of online images with accompanying text raises hope for drawing synergistic connections between human language technologies and computer vision. [sent-8, score-0.288]

7 However, subtleties and complexity in the relationship between image content and text make exploiting paired visual-textual data an open and interesting problem. [sent-9, score-0.573]

8 ” “Sections of the bridge sitting in the Dyer Construction yard south of Cabelas Driver. [sent-11, score-0.045]

9 ” Figure 1: Examples of captions that are not readily applicable to other visually similar images. [sent-13, score-0.505]

10 text from the retrieved samples to the query image (e. [sent-14, score-0.549]

11 Feng and Lapata (2010a), Feng and Lapata (2010b)) uses computer vision to bias summarization of text associated with images to produce descriptions. [sent-22, score-0.375]

12 All of these approaches rely on existing text that describes visual content, but many times existing image descriptions contain significant amounts of extraneous, non-visual, or otherwise non-desirable content. [sent-23, score-0.73]

13 The goal of this paper is to develop techniques to automatically clean up visually descriptive text to make it more directly usable for applications exploiting the connection between images and language. [sent-24, score-0.411]

14 As a concrete example, consider the first image in Figure 1. [sent-25, score-0.538]

15 This caption was written by the photo owner and therefore contains information related to the context of when and where the photo was taken. [sent-26, score-0.424]

16 Objects such as “lamp”, “door”, “camera” are not visually present in the photo. [sent-27, score-0.187]

17 The second image shows a similar but somewhat different issue. [sent-28, score-0.51]

18 Its caption describes visible objects such as “bridge ” and “yard”, but “Cabelas Driver” are overly specific and not visually detectable. [sent-29, score-0.57]

19 1 This phenomenon of information gap between the visual content of the images and their corresponding narratives has been studied closely by Dodge et al. [sent-35, score-0.465]

20 The content misalignment between images and text limits the extent to which visual detectors can learn meaningful mappings between images and text. [sent-37, score-0.689]

21 To tackle this challenge, we introduce the new task of image caption generalization that rewrites captions to be more visually relevant and more readily applicable to other visually similar images. [sent-38, score-1.597]

22 Our end goal is to convert noisy imagetext pairs in the wild (Ordonez et al. [sent-39, score-0.045]

23 , 2011) into pairs with tighter content alignment, resulting in new simplified captions over 1 million images. [sent-40, score-0.445]

24 Evaluation results show both the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus. [sent-41, score-0.495]

25 The new parallel corpus will be made publicly available. [sent-42, score-0.029]

26 Dependencybased constraints guide the generalized caption 1Open domain computer vision remains to be an open problem, and it would be difficult to reliably distinguish pictures of subtle visual differences, e. [sent-44, score-0.808]

27 , pictures of “a water front house with a docked boat” from those of “a floating house pulled by a boat”. [sent-46, score-0.203]

28 Content Selection Visual Estimates: The computer vision system used consists of 7404 visual classifiers for recognizing leaf level WordNet synsets (Fellbaum, 1998). [sent-65, score-0.329]

29 Each classifier is trained using labeled images from the ImageNet dataset (Deng et al. [sent-66, score-0.224]

30 , 2009) an image database of over 14 million hand labeled images orga– – nized according to the WordNet hierarchy. [sent-67, score-0.765]

31 Idf driven scores to favor salient – topics, as those are more likely to generalize across many different images. [sent-73, score-0.035]

32 Additionally, we assign a very low content selection score (−∞) for proper nouns aonwd c onunmtebnetr ses aenctdi a very high score (larger then maximum idf or visual score) for the 2k most frequent words in our corpus. [sent-74, score-0.241]

33 Local Linguistic Fluency: We model linguistic fluency with 3-gram conditional probabilities: xl(yi−1), (1) ψ(xl(yi), xl(yi−2)) = p(xl(yi) |xl(yi−2), xl(yi−1)) We experiment with two different ngram statistics, one extracted from the Google Web 1T corpus (Brants and Franz. [sent-75, score-0.087]

34 Dependency-driven Constraints: Table 1 defines the list of dependencies used as constraints driven from the typed dependencies (de Marneffe and Manning, 2009; de Marneffe et al. [sent-78, score-0.257]

35 We determine the uni- or bi-directionality of these constraints by manually examining a few example sentences corresponding to each of these typed dependencies. [sent-83, score-0.124]

36 Note that some dependencies such as det(←→) would hold regardless of the particular det(←→) 3Code wo was provided by Deng et al. [sent-84, score-0.049]

37 Those dependencies that we determine as largely context dependent are marked with * in Table 1. [sent-92, score-0.049]

38 One could consider enforcing all dependency constraints in Table 1 as hard constraints so that the compressed sentence must not violate any of those directed dependency constraints. [sent-93, score-0.306]

39 Doing so would lead to overly conservative compression with least compression ratio however. [sent-94, score-0.307]

40 Therefore, we relax those that are largely context dependent as soft constraints (marked in Table 1 with *) by introducing a constant penalty term in the objective function. [sent-95, score-0.08]

41 Alternatively, the dependency based constraints can be learned statistically from the training corpus of paired original and compressed sentences. [sent-96, score-0.193]

42 In our work, hard constraints are based only on typed dependencies, and we find that long range dependencies occur infrequently in actual image descriptions, as plotted in Figure 2. [sent-99, score-0.683]

43 With this insight, we opt for decoding based on dynamic programming with dynamically adjusted Alternatively, one can find an approximate solution using Integer Linear Programming (e. [sent-100, score-0.03]

44 4 3 Evaluation Since there age caption ation using empirically is no existing benchmark data for imgeneralization, we crowdsource evaluAmazon Mechanical Turk (AMT). [sent-104, score-0.334]

45 We compare the following options: 4The required beam size at each step depends on how many words have dependency constraints involving any word following the current one – beam size is at most 2p, where p is the max number of words dependent on any future words. [sent-105, score-0.191]

46 • • • • • • ORIG: original uncompressed captions HUMAN: compressed by humans (See § 3. [sent-129, score-0.398]

47 2) SALIENCY: linguistic fluency + saliency-based content selection + dependency constraints VISUAL: linguistic fluency + visually-guided content selection + dependency constraints x W/O CONSTR: method x without dependency constraints NGRAM-ONLY: linguistic fluency only 3. [sent-130, score-0.726]

48 1 Intrinsic Evaluation: Forced Choice Turkers are provided with an image and two captions (produced by different methods) and are asked to select a better one, i. [sent-131, score-0.828]

49 , the most relevant and plausible caption that contains the least extraneous information. [sent-133, score-0.394]

50 We observe that VISUAL (full model with visually guided content selection) performs the best, being selected over SALIENCY (content-selection without visual information) in 72. [sent-135, score-0.428]

51 48% cases, and even over the original image caption in 81. [sent-136, score-0.844]

52 This forced-selection experiment between VISUAL and ORIG demonstrates the degree of noise prevalent in the image captions in the wild. [sent-138, score-0.828]

53 Of course, if compared against human-compressed captions, the automatic captions are preferred much less frequently in 19% of the cases. [sent-139, score-0.351]

54 In those 19% cases when automatic captions are preferred over human-compressed ones, it is sometimes that humans did not fully remove information that is not visually present or verifiable, and other times humans overly compressed. [sent-140, score-0.587]

55 To verify the utility of dependency-based constraints, we also compare two variations of VISUAL, with and without dependency-based constraints. [sent-141, score-0.036]

56 As expected, the algorithm with constraints is preferred in the majority of cases. [sent-142, score-0.113]

57 2 Extrinsic Evaluation: Image-based Caption Retrieval We evaluate the usefulness of our new image-text parallel corpus for automatic generation of image descriptions. [sent-144, score-0.577]

58 Here the task is to produce, for a query image, a relevant description, i. [sent-145, score-0.039]

59 (201 1), we produce a caption for a query image by finding top k most similar images within the 1M image-text corpus (Ordonez et al. [sent-149, score-1.107]

60 , 2011) and then transferring their captions to the query image. [sent-150, score-0.387]

61 Image similarity is computed using two global (whole) image descriptors. [sent-152, score-0.51]

62 The first is the GIST feature (Oliva and Torralba, 2001), an image descriptor related to perceptual characteristics of scenes naturalness, roughness, openness, etc. [sent-153, score-0.566]

63 The second descriptor is also a global image descriptor, computed by resizing the image into a “tiny image” (Torralba et al. [sent-154, score-1.076]

64 To find visually relevant images, we compute the similarity of the query image to im– 793 Hugewal ofglas AviewofthepostoficeMyfo tprintinaJamesthecatisThislitleboywas oCel phoneshotof at the Conference Centre in Yohohama Japan. [sent-156, score-0.781]

65 Wal ofglas tohfe icAsevidbie uwildoinftghefropmostAsandbox raelveaRvluaen ynt(i)n ogtin(csuetmeT. [sent-159, score-0.045]

66 Figure 4: Good (left three, in blue) and bad examples (right three, in red) of generalized captions ages in the whole dataset using an unweighted sum of gist similarity and tiny image similarity. [sent-161, score-0.935]

67 Gold standard (human compressed) captions are obtained using AMT for 1K images. [sent-162, score-0.318]

68 Strict matching gives credit only to identical words between the gold-standard caption and the automatically produced caption. [sent-164, score-0.334]

69 5 The best performing approach with semantic matching is VISUAL (with LM = Image corpus), improving BLEU, Precision, F-score substantially over those of ORIG, demonstrating the extrinsic utility of our newly generated image-text parallel corpus in comparison to the original database. [sent-167, score-0.107]

70 4 Related Work Several recent studies presented approaches to automatic caption generation for images (e. [sent-169, score-0.596]

71 The end goal of our work differs in that we aim to revise original image captions into 5We take Wu-Palmer Similarity as similarity measure (Wu and Palmer, 1994). [sent-177, score-0.828]

72 descriptions that are more general and align more closely to the visual image content. [sent-180, score-0.73]

73 , Turner and Charniak (2005), Filippova and Strube (2008)) in that there is not an in-domain training corpus to learn generalization patterns directly. [sent-183, score-0.061]

74 Future work includes exploring more direct supervision from human edited sample generalization (e. [sent-184, score-0.061]

75 5 Conclusion We have introduced the task of image caption generalization as a means to reduce noise in the parallel corpus of images and text. [sent-194, score-1.158]

76 Intrinsic and extrin- sic evaluations confirm that the captions in the resulting corpus align better with the image contents (are often preferred over the original captions by people), and can be practically more useful with respect to a concrete application. [sent-195, score-1.207]

77 Generating typed dependency parses from phrase structure parses. [sent-230, score-0.077]

78 Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. [sent-240, score-0.178]

79 Paraphrastic sentence compression with a character-based metric: Tightening without deletion. [sent-321, score-0.129]

80 Modeling the shape of the scene: a holistic representation of the spatial envelope. [sent-326, score-0.037]

81 80 million tiny images: a large dataset for non-parametric object and scene recognition. [sent-336, score-0.104]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('image', 0.51), ('caption', 0.334), ('captions', 0.318), ('images', 0.224), ('visually', 0.187), ('visual', 0.178), ('berg', 0.166), ('ordonez', 0.161), ('xl', 0.129), ('compression', 0.129), ('tamara', 0.122), ('vision', 0.117), ('yi', 0.088), ('fluency', 0.087), ('constraints', 0.08), ('compressed', 0.08), ('kuznetsova', 0.074), ('yejin', 0.07), ('orig', 0.068), ('content', 0.063), ('generalization', 0.061), ('brook', 0.06), ('extraneous', 0.06), ('girish', 0.06), ('torralba', 0.06), ('vicente', 0.06), ('feng', 0.06), ('kulkarni', 0.059), ('house', 0.057), ('descriptor', 0.056), ('pulled', 0.056), ('stony', 0.056), ('boat', 0.056), ('farhadi', 0.056), ('lapata', 0.054), ('deng', 0.05), ('clarke', 0.049), ('overly', 0.049), ('dependencies', 0.049), ('cabelas', 0.045), ('dodge', 0.045), ('imagetext', 0.045), ('lamp', 0.045), ('ofglas', 0.045), ('photo', 0.045), ('yard', 0.045), ('tiny', 0.044), ('typed', 0.044), ('extrinsic', 0.042), ('descriptions', 0.042), ('mirella', 0.042), ('saliency', 0.04), ('cordeiro', 0.04), ('onybrook', 0.04), ('lazebnik', 0.04), ('oliva', 0.04), ('siming', 0.04), ('marneffe', 0.04), ('query', 0.039), ('beam', 0.039), ('generation', 0.038), ('intrinsic', 0.038), ('polina', 0.037), ('filippova', 0.037), ('yansong', 0.037), ('spatial', 0.037), ('utility', 0.036), ('turner', 0.035), ('pub', 0.035), ('imagenet', 0.035), ('driven', 0.035), ('alexander', 0.034), ('xi', 0.034), ('computer', 0.034), ('preferred', 0.033), ('dep', 0.033), ('tighter', 0.033), ('pictures', 0.033), ('door', 0.033), ('napoles', 0.033), ('dependency', 0.033), ('generalized', 0.032), ('choi', 0.032), ('million', 0.031), ('jia', 0.031), ('gist', 0.031), ('programming', 0.03), ('transferring', 0.03), ('accompanying', 0.03), ('daume', 0.03), ('association', 0.029), ('scene', 0.029), ('kai', 0.029), ('pattern', 0.029), ('parallel', 0.029), ('pyramid', 0.028), ('antonio', 0.028), ('det', 0.028), ('amt', 0.028), ('concrete', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999869 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

2 0.36814904 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Author: Elia Bruni ; Marco Baroni

Abstract: unkown-abstract

3 0.35118589 380 acl-2013-VSEM: An open library for visual semantics representation

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

4 0.27081841 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

5 0.17745596 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse

Author: Ben King ; Rahul Jha ; Dragomir Radev ; Robert Mankoff

Abstract: In this paper, we study the problem of automatically annotating the factoids present in collective discourse. Factoids are information units that are shared between instances of collective discourse and may have many different ways ofbeing realized in words. Our approach divides this problem into two steps, using a graph-based approach for each step: (1) factoid discovery, finding groups of words that correspond to the same factoid, and (2) factoid assignment, using these groups of words to mark collective discourse units that contain the respective factoids. We study this on two novel data sets: the New Yorker caption contest data set, and the crossword clues data set.

6 0.13170682 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

7 0.095716938 175 acl-2013-Grounded Language Learning from Video Described with Sentences

8 0.086436749 370 acl-2013-Unsupervised Transcription of Historical Documents

9 0.073752597 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

10 0.072709486 29 acl-2013-A Visual Analytics System for Cluster Exploration

11 0.070598632 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

12 0.058466833 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation

13 0.055970464 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

14 0.053442344 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora

15 0.05286311 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

16 0.052835915 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

17 0.052280847 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

18 0.049916416 257 acl-2013-Natural Language Models for Predicting Programming Comments

19 0.048765063 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

20 0.048597798 126 acl-2013-Diverse Keyword Extraction from Conversations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.147), (1, 0.023), (2, -0.013), (3, -0.062), (4, -0.054), (5, -0.108), (6, 0.12), (7, -0.048), (8, -0.128), (9, 0.119), (10, -0.316), (11, -0.266), (12, -0.024), (13, 0.232), (14, 0.163), (15, 0.038), (16, 0.126), (17, -0.013), (18, -0.079), (19, 0.055), (20, 0.085), (21, 0.105), (22, -0.016), (23, 0.025), (24, -0.029), (25, -0.044), (26, -0.058), (27, -0.006), (28, 0.038), (29, -0.022), (30, 0.015), (31, -0.046), (32, -0.035), (33, 0.047), (34, -0.002), (35, -0.058), (36, 0.023), (37, 0.011), (38, 0.037), (39, -0.058), (40, 0.029), (41, -0.009), (42, 0.063), (43, -0.017), (44, 0.064), (45, -0.046), (46, -0.037), (47, 0.003), (48, -0.01), (49, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93541199 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

2 0.9277674 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers

Author: Elia Bruni ; Marco Baroni

Abstract: unkown-abstract

3 0.90297884 380 acl-2013-VSEM: An open library for visual semantics representation

Author: Elia Bruni ; Ulisse Bordignon ; Adam Liska ; Jasper Uijlings ; Irina Sergienya

Abstract: VSEM is an open library for visual semantics. Starting from a collection of tagged images, it is possible to automatically construct an image-based representation of concepts by using off-theshelf VSEM functionalities. VSEM is entirely written in MATLAB and its objectoriented design allows a large flexibility and reusability. The software is accompanied by a website with supporting documentation and examples.

4 0.8241834 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

5 0.42802304 175 acl-2013-Grounded Language Learning from Video Described with Sentences

Author: Haonan Yu ; Jeffrey Mark Siskind

Abstract: We present a method that learns representations for word meanings from short video clips paired with sentences. Unlike prior work on learning language from symbolic input, our input consists of video of people interacting with multiple complex objects in outdoor environments. Unlike prior computer-vision approaches that learn from videos with verb labels or images with noun labels, our labels are sentences containing nouns, verbs, prepositions, adjectives, and adverbs. The correspondence between words and concepts in the video is learned in an unsupervised fashion, even when the video depicts si- multaneous events described by multiple sentences or when different aspects of a single event are described with multiple sentences. The learned word meanings can be subsequently used to automatically generate description of new video.

6 0.42075938 29 acl-2013-A Visual Analytics System for Cluster Exploration

7 0.40598404 370 acl-2013-Unsupervised Transcription of Historical Documents

8 0.39780793 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

9 0.3622109 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse

10 0.31098786 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

11 0.29784852 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

12 0.28546202 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

13 0.28191644 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization

14 0.27457979 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

15 0.27103305 311 acl-2013-Semantic Neighborhoods as Hypergraphs

16 0.27037713 126 acl-2013-Diverse Keyword Extraction from Conversations

17 0.26990831 313 acl-2013-Semantic Parsing with Combinatory Categorial Grammars

18 0.25468048 23 acl-2013-A System for Summarizing Scientific Topics Starting from Keywords

19 0.25081593 54 acl-2013-Are School-of-thought Words Characterizable?

20 0.24136524 381 acl-2013-Variable Bit Quantisation for LSH


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.039), (4, 0.012), (6, 0.029), (11, 0.055), (15, 0.01), (17, 0.174), (24, 0.032), (26, 0.048), (35, 0.094), (42, 0.044), (47, 0.015), (48, 0.04), (64, 0.013), (70, 0.122), (71, 0.011), (77, 0.018), (80, 0.017), (88, 0.023), (90, 0.03), (95, 0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86059791 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

2 0.80905688 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

3 0.72296774 220 acl-2013-Learning Latent Personas of Film Characters

Author: David Bamman ; Brendan O'Connor ; Noah A. Smith

Abstract: We present two latent variable models for learning character types, or personas, in film, in which a persona is defined as a set of mixtures over latent lexical classes. These lexical classes capture the stereotypical actions of which a character is the agent and patient, as well as attributes by which they are described. As the first attempt to solve this problem explicitly, we also present a new dataset for the text-driven analysis of film, along with a benchmark testbed to help drive future work in this area.

4 0.71028334 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

Author: Zhigang Wang ; Zhixing Li ; Juanzi Li ; Jie Tang ; Jeff Z. Pan

Abstract: Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.

5 0.7097562 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

6 0.70711052 296 acl-2013-Recognizing Identical Events with Graph Kernels

7 0.70612824 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

8 0.70319915 249 acl-2013-Models of Semantic Representation with Visual Attributes

9 0.7016331 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

10 0.69705331 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities

11 0.69617957 380 acl-2013-VSEM: An open library for visual semantics representation

12 0.69199944 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

13 0.69016343 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

14 0.68328202 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

15 0.68147814 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

16 0.6802063 224 acl-2013-Learning to Extract International Relations from Political Context

17 0.67935652 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

18 0.67790544 80 acl-2013-Chinese Parsing Exploiting Characters

19 0.67553508 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

20 0.67530525 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews