acl acl2012 acl2012-51 knowledge-graph by maker-knowledge-mining

51 acl-2012-Collective Generation of Natural Image Descriptions

Source: pdf

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. [sent-5, score-1.043]

2 More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. [sent-6, score-0.666]

3 We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. [sent-7, score-0.297]

4 Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines. [sent-8, score-0.249]

5 1 Introduction Automatically describing images in natural language is an intriguing, but complex AI task, requiring accurate computational visual recognition, comprehensive world knowledge, and natural language generation. [sent-9, score-0.397]

6 Some past research has simplified the general image description goal by assuming that relevant text for an image is provided (e. [sent-10, score-0.878]

7 This allows descriptions to be generated using effective summarization techniques with relatively surface level image understanding. [sent-13, score-0.636]

8 , news articles 359 or encyclopedic text) is often only loosely related to an image’s specific content and many natural images do not come with associated text for summarization. [sent-16, score-0.407]

9 In contrast, other recent work has focused more on the visual recognition aspect by detecting content elements (e. [sent-17, score-0.228]

10 , scenes, objects, attributes, actions, etc) and then composing descriptions from scratch (e. [sent-19, score-0.165]

11 (2011)) , or by retrieving existing whole descriptions from visually similar images (e. [sent-25, score-0.547]

12 For the latter approaches, it is unrealistic to expect that there will always exist a single complete description for retrieval that is pertinent to a given query image. [sent-30, score-0.168]

13 For the former approaches, visual recognition first generates an intermediate representation of image content using a set of English words, then language generation constructs a full description by adding function words and optionally applying simple re-ordering. [sent-31, score-0.76]

14 Because the generation process sticks relatively closely to the recognized content, the resulting descriptions often lack the kind of coverage, creativity, and complexity typically found in humanwritten text. [sent-32, score-0.218]

15 We also lift the restriction of retrieving existing whole descriptions by gathering visually relevant phrases which we combine to produce novel and query-image specific descriptions. [sent-36, score-0.387]

16 By judiciously exploiting the correspondence between image content elements and phrases, it is possible to generate natural language descriptions that are substantially richer in content and more linguistically interesting than previous work. [sent-37, score-0.847]

17 , Roy (2002) , Dindo and Zambuto (2010) , Monner and Reggia (2011)) , as in our approach the meaning of a phrase in a description is implicitly grounded by the relevant content of the image. [sent-46, score-0.254]

18 Another important thrust of this work is collective image-level content-planning, integrating saliency, content relations, and discourse structure based on statistics drawn from a large image-text parallel corpus. [sent-47, score-0.226]

19 For example, for an image showing a flock of birds, generating a large number of sentences stating the relative position of each bird is probably not useful. [sent-52, score-0.399]

20 Content planning and phrase synthesis can be naturally viewed as constraint optimization problems. [sent-53, score-0.159]

21 Our ILP formulation encodes a rich set of linguistically motivated constraints and weights that incorporate multiple aspects of the generation process. [sent-57, score-0.2]

22 Empirical results demonstrate that our final system generates linguistically more appealing and semantically more cor360 rect descriptions than two nontrivial baselines. [sent-58, score-0.284]

23 For a query image, we first retrieve candidate descriptive phrases from a large image-caption database using measures of visual similarity (§2) . [sent-61, score-0.462]

24 th Wesee candidates using ILP formulations for content planning (§4) and surface realization (§5) . [sent-63, score-0.297]

25 2 Vision & Phrase Retrieval For a query image, we retrieve relevant candidate natural language phrases by visually comparing the query image to database images from the SBU Captioned Photo Collection (Ordonez et al. [sent-64, score-1.222]

26 Visual similarity for several kinds of image content are used to compare the query image to images from the database, including: 1) object detections for 89 common object categories (Felzenszwalb et al. [sent-66, score-1.684]

27 , 2010) , 2) scene classifications for 26 common scene categories (Xiao et al. [sent-67, score-0.272]

28 All content types are pre-computed on the million database photos, and caption parsing is performed using the Berkeley PCFG parser (Petrov et al. [sent-72, score-0.288]

29 Given a query image, we identify content elements present using the above classifiers and detectors and then retrieve phrases referring to those content elements from the database. [sent-74, score-0.582]

30 For example, if we detect a horse in a query image, then we retrieve phrases referring to visually similar horses in the database by comparing the color, texture (Leung and Malik, 1999) , or shape (Dalal and Triggs, 2005; Lowe, 2004) of the detected horse to detected horses in the database images. [sent-75, score-0.627]

31 We collect four types of phrases for each query image as follows: [1] NPs We retrieve noun phrases for each query object detection (e. [sent-76, score-1.085]

32 , “the brown cow” ) from database captions using visual similarity between object detections computed as an equally weighted linear combination of L2 distances on histograms of color, texton (Leung and Malik, 1999) , HoG (Dalal and Triggs, 2005) and SIFT (Lowe, 2004) features. [sent-78, score-0.698]

33 [2] VPs We retrieve verb phrases for each query object detection (e. [sent-79, score-0.436]

34 “boy running” ) from database captions using the same measure of visual similarity as for NPs, but restricting the search to only those database instances whose captions contain a verb phrase referring to the object category. [sent-81, score-1.003]

35 [3] Region/Stuff PPs We collect prepositional phrases for each query stuff detection (e. [sent-82, score-0.285]

36 “in the sky” , “on the road” ) by measuring visual similarity of appearance (color, texton, HoG) and geometric configuration (object-stuff relative location and distance) between query and database detections. [sent-84, score-0.274]

37 [4] Scene PPs We also collect prepositonal phrases referring to general image scene context (e. [sent-85, score-0.73]

38 “at the market” , “on hot summer days” , “in Sweden” ) based on global scene similarity computed using L2 distance between scene classification score vectors (Xiao et al. [sent-87, score-0.272]

39 3 Overview of ILP Formulation For each image, we aim to generate multiple sentences, each sentence corresponding to a single distinct object detected in the given image. [sent-89, score-0.16]

40 Each sentence comprises of the NP for the main object, and a subset of the corresponding VP, region/stuff PP, and scene PP retrieved in §2. [sent-90, score-0.136]

41 Selecting the set of objects to describe (one object per sentence) . [sent-92, score-0.272]

42 The goals are to (1) select a subset of the objects based on saliency and semantically compatibility, and (2) order the selected objects based on their content relations. [sent-112, score-0.385]

43 1 Variables and Objective Function The following set of indicator variables encodes the selection of objects and ordering: ysk=10,, if obfjoreoct pthose sri wtsio s ne le kcted (1) where k = 1, . [sent-114, score-0.205]

44 2 Constraints Consistency Constraints: We enforce consistency between indicator variables for indivisual objects (Eq. [sent-128, score-0.21]

45 2) so that yskt(k+1) = 1iff ysk = 1and yt(k+1) = 1: ∀stk, yskt(k+1) ≤ ysk (4) yskt(k+1) ≤ yt(k+1) (5) yskt(k+1) + (1 − ysk) + (1 − yt(k+1)) + ≥ 1 (6) putational and implementation efficiency however, we opt for the two-step approach. [sent-130, score-0.142]

46 ,S − 1, Xys(k+1) ≤ Xysk (8) Xs Xs Discourse constraints: To avoid spurious descriptions, we allow at most two objects of the same type, where cs is the type of object s: XS ∀c ∈ objTypes, 4. [sent-134, score-0.272]

47 3 {s : X Xysk ≤ 2 Xcs =c} Xk=1 (9) Weight Fs: Object Detection Confidence In order to quantify the confidence of the object detector for the object s, we define 0 ≤ Fs ≤ 1 as ttehcet mean o thf tehe o bdjeetcetc sto,r w scores nfoer 0th ≤at F object type in the image. [sent-135, score-0.48]

48 4 Weight Fst: Ordering and Compatibility The weight 0 ≤ Fst ≤ 1 quantifies the compatibility eoifg thhte 0 object pairing (s, t) . [sent-137, score-0.264]

49 This way, we create a competing tension between the single object selection scores and the pairwise compatibility scores, so that variable number of objects can be selected. [sent-139, score-0.378]

50 We measure these biases by collecting statistics on ordering of object names from the 1million image descriptions in the SBU Captioned Dataset (Ordonez et al. [sent-141, score-0.774]

51 For instance, ford (window, house) = 2895 and ford (house, window) = 1250, suggesting that people are more likely to mention a window before mentioning a house/building2 . [sent-144, score-0.154]

52 We use these ordering statistics to enhance content flow. [sent-145, score-0.169]

53 5 Surface Realization Recall that for each image, the computer vision system identifies phrases from descriptions of images that are similar in a variety of aspects. [sent-148, score-0.682]

54 ect a subset and glue them together to compose a complete sentence that is linguistically plausible and semantically truthful to the content of the image. [sent-151, score-0.164]

55 1 Variables and Objective Function The following set of variables encodes the selection of phrases and their ordering in constructing S0 sentences. [sent-153, score-0.271]

56   ,N10enifcophidnreoast psheotnehsl riewtconif ecsnrdetyspkerijngof(t1h)e selected phrases, and j indexes one of the four phrases types (object-NPs, action-VPs, regionPPs, scene-PPs) , i = 1, . [sent-155, score-0.162]

57 , M indexes one of the M candidate phrases of each phrase type, and s = 1, . [sent-158, score-0.217]

58 Finally, we define the objective function F as: F = XN XFsij ·Xxsijk Xsij Xk=1 NX−1 − X Fsijpq ·Xxsijkpq(k+1) (12) sXijpq Xk=1 where Fsij weights individual phrase goodness and Fsijpq adjacent phrase goodness. [sent-163, score-0.153]

59 We optionally prepend the first sentence in a generated description with a cognitive phrase. [sent-167, score-0.155]

60 3 3We collect most frequent 200 phrases of length 17 that start a caption from the SBU Captioned Photo Collection. [sent-168, score-0.254]

61 In HMM generated captions, underlined phrases show redundancy across different objects (due to lack of discourse constraints) , and phrases in boldface show awkward topic flow (due to lack of content planning) . [sent-170, score-0.575]

62 Via collective image-level content planning (see §4) , some of these erroneous detection can be corrected, as shown in the ILP result. [sent-172, score-0.279]

63 These are generic constructs that are often used to start a description about an image, for instance, “This is an image of. [sent-174, score-0.479]

64 We treat these phrases as an additional type, but omit corresponding variables and constraints for brevity. [sent-178, score-0.243]

65 11) and the pairwise variables so that xsijkpqm = 1 iff xsijk = 1 and xspqm = 1: ∀ijkpqm, xsijkpqm ≤ xsijk (13) xsijkpqm ≤ xspqm (14) + (1 − xsijk) + (1 − xspqm) ≥ 1 (15) Next we include constraints similar to Eq. [sent-181, score-0.62]

66 Finally, we add constraints to ensure at least two phrases are selected for each sentence, to promote informative descriptions. [sent-183, score-0.19]

67 4 Pairwise Phrase Cohesion In this section, we describe the pairwise phrase cohesion score Fsijpq defined for each xsijpq in − IbHrawLoiuPvmdxt:re. [sent-189, score-0.26]

68 rinowgsvbdahelinorwmtbahnderJiyu4gftoh Figure 2: In some cases (16%) , ILP generated captions were preferred over human written ones! [sent-196, score-0.283]

69 Via Fsijpq, we aim to quantify the degree of syntactic and semantic cohesion across two phrases xsij and xspq. [sent-199, score-0.36]

70 Note that we subtract this cohesion score from the objective function. [sent-200, score-0.204]

71 Let fΣ (hsij , hspq) be the sum frequency of all n-grams that start with hsij , end with hspq and contain a preposition prep(spq) of the phrase spq. [sent-206, score-0.161]

72 Then the 5 4We include the n-gram cohesion for the sentence boundaries as well, by approximating statistics for sentence boundaries with punctuation marks in the Google Web 1-T data. [sent-207, score-0.161]

73 6 Evaluation TestSet: Because computer vision is a challenging and unsolved problem, we restrict our query set to images where we have high confidence that visual recognition algorithms perform well. [sent-211, score-0.586]

74 We collect 1000 test images by running a large number (89) of object detectors on 20,000 images and selecting images that receive confident object detection scores, with some preference for images with multiple object detections to obtain good examples for testing discourse constraints. [sent-212, score-1.823]

75 (2011)) , which takes as input the same set of candidate phrases described in §2, but for decoding, we fhixra tsehes ordering o ifn phrases as [ N deP– VP – Region PP – Scene PP] and find the best combination of phrases using the Viterbi algorithm. [sent-214, score-0.434]

76 8%% Table 3: Human Evaluation (with images) phrase cohesion scores (§5. [sent-236, score-0.216]

77 , 2011) , that searches the large parallel corpus of images and captions, and transfers a caption from a visually similar database image to the query. [sent-239, score-0.95]

78 This again is a very strong baseline, as it exploits the vast amount of image-caption data, and produces a description high in linguistic quality (since the captions were written by human annotators). [sent-240, score-0.326]

79 , 2002) , despite its simplicity and limitations, has been one of the common choices for automatic evaluation of image descriptions (Farhadi et al. [sent-243, score-0.564]

80 In ranking evaluation, we ask raters to choose a better caption between two choices7. [sent-273, score-0.157]

81 When images are shown, raters evaluate content relevance as well as linguistic quality of the captions. [sent-275, score-0.521]

82 We found that raters generally prefer ILP generated captions over HMM generated ones, twice as much (67. [sent-277, score-0.385]

83 However the difference is less pronounced when images are shown. [sent-282, score-0.288]

84 The first is that when images are shown, the Turkers do not try as hard to tell apart the subtle difference between the two imperfect captions. [sent-284, score-0.288]

85 The second is that the relative content relevance of ILP generated captions is negating the superiority in linguistic quality. [sent-285, score-0.451]

86 , 2011) , despite the generated captions tendency to be more prone to grammatical and cognitive errors than retrieved ones. [sent-289, score-0.321]

87 This indicates that the generated captions must have substantially better content relevance to the query image, supporting the direction of this research. [sent-290, score-0.539]

88 Finally, notice that as much as 16% of the time, ILP generated captions are preferred over the original human generated ones (examples in Figure 2) . [sent-291, score-0.32]

89 Human Evaluation II– Multi-Aspect Rating: Table 4 presents rating in the 1–5 scale (5: perfect, 4: almost perfect, 3: 70∼80% good, 2: 7We present two captions in a randomized order. [sent-292, score-0.289]

90 It turns out human raters are generally more critical against the relevance aspect, as can be seen in the ratings given to the original human generated captions. [sent-298, score-0.151]

91 Notice that HMM captions look robotic, containing spurious and redundant phrases due to lack of discourse constraints, and often discussing an awkward set of objects due to lack of image-level content planning. [sent-300, score-0.656]

92 Also notice how image-level content planning underpinned by language statistics helps correct some of the erroneous vision detections. [sent-301, score-0.324]

93 7 Related Work & Discussion Although not directly focused on image description generation, some previous work in the realm of summarization shares the similar problem of content planning and surface realization. [sent-303, score-0.737]

94 First, sentence compression is hardly the goal of image description generation, as human written descriptions are not necessarily succinct. [sent-308, score-0.644]

95 As a result, choosing an additional phrase in the image description is much riskier than it is in summarization. [sent-310, score-0.534]

96 Some recent research proposed very elegant approaches to summarization using ILP for collective content planning and/or surface realization (e. [sent-311, score-0.353]

97 To conclude, we have presented a collective approach to generating natural image descriptions. [sent-317, score-0.455]

98 Our approach is the first to systematically incorporate state of the art computer vision to retrieve visually relevant candidate phrases, then produce images descriptions that are substantially more complex and human-like than previous attempts. [sent-318, score-0.708]

99 9On a related note, the notion of saliency also differs in that human written captions often digress on details that might be tangential to the visible content of the image. [sent-334, score-0.407]

100 Learning visually-grounded words and syntax for a scene description task. [sent-443, score-0.216]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('image', 0.399), ('ilp', 0.341), ('images', 0.288), ('captions', 0.246), ('descriptions', 0.165), ('cohesion', 0.161), ('object', 0.16), ('ordonez', 0.159), ('scene', 0.136), ('phrases', 0.128), ('content', 0.119), ('objects', 0.112), ('visual', 0.109), ('fsijpq', 0.106), ('planning', 0.104), ('vision', 0.101), ('berg', 0.098), ('hmm', 0.096), ('visually', 0.094), ('caption', 0.092), ('yskt', 0.088), ('query', 0.088), ('description', 0.08), ('ford', 0.077), ('kulkarni', 0.077), ('database', 0.077), ('captioned', 0.071), ('detections', 0.071), ('tamara', 0.071), ('xsij', 0.071), ('xsijk', 0.071), ('xsijkpqm', 0.071), ('ysk', 0.071), ('woodsend', 0.07), ('raters', 0.065), ('constraints', 0.062), ('compatibility', 0.062), ('xs', 0.061), ('retrieve', 0.06), ('collective', 0.056), ('phrase', 0.055), ('variables', 0.053), ('generation', 0.053), ('dalal', 0.053), ('girish', 0.053), ('hsij', 0.053), ('hspq', 0.053), ('msij', 0.053), ('npmi', 0.053), ('sbu', 0.053), ('xspqm', 0.053), ('xxsijk', 0.053), ('xysk', 0.053), ('discourse', 0.051), ('ordering', 0.05), ('relevance', 0.049), ('fst', 0.047), ('brook', 0.046), ('farhadi', 0.046), ('leung', 0.046), ('stony', 0.046), ('linguistically', 0.045), ('enforce', 0.045), ('pairwise', 0.044), ('objective', 0.043), ('rating', 0.043), ('quantifies', 0.042), ('saliency', 0.042), ('encodes', 0.04), ('nontrivial', 0.039), ('kx', 0.039), ('realization', 0.039), ('cognitive', 0.038), ('martins', 0.037), ('generated', 0.037), ('reading', 0.035), ('surface', 0.035), ('fs', 0.035), ('yejin', 0.035), ('cogn', 0.035), ('detectors', 0.035), ('dindo', 0.035), ('fsciojpq', 0.035), ('fsij', 0.035), ('hafiz', 0.035), ('hog', 0.035), ('horses', 0.035), ('lsijpq', 0.035), ('monner', 0.035), ('rect', 0.035), ('siming', 0.035), ('stuff', 0.035), ('texton', 0.035), ('triggs', 0.035), ('vicente', 0.035), ('xk', 0.035), ('collect', 0.034), ('xn', 0.034), ('indexes', 0.034), ('referring', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

2 0.27917662 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

3 0.12042358 89 acl-2012-Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Author: Qiuye Zhao ; Mitch Marcus

Abstract: We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. For tagging, learned constraints are directly used to constrain Viterbi decoding. For segmentation, character-based tagging constraints can be learned with the same templates. However, they are better applied to a word-based model, thus an integer linear programming (ILP) formulation is proposed. For both problems, the corresponding constrained solutions have advantages in both efficiency and accuracy. 1 introduction In recent work, interesting results are reported for applications of integer linear programming (ILP) such as semantic role labeling (SRL) (Roth and Yih, 2005), dependency parsing (Martins et al., 2009) and so on. In an ILP formulation, ’non-local’ deterministic constraints on output structures can be naturally incorporated, such as ”a verb cannot take two subject arguments” for SRL, and the projectivity constraint for dependency parsing. In contrast to probabilistic constraints that are estimated from training examples, this type of constraint is usually hand-written reflecting one’s linguistic knowledge. Dynamic programming techniques based on Markov assumptions, such as Viterbi decoding, cannot handle those ’non-local’ constraints as discussed above. However, it is possible to constrain Viterbi 1054 decoding by ’local’ constraints, e.g. ”assign label t to word w” for POS tagging. This type of constraint may come from human input solicited in interactive inference procedure (Kristjansson et al., 2004). In this work, we explore deterministic constraints for two fundamental NLP problems, English POS tagging and Chinese word segmentation. We show by experiments that, with proper representation, large number of deterministic constraints can be learned automatically from training data, which can then be used to constrain probabilistic inference. For POS tagging, the learned constraints are directly used to constrain Viterbi decoding. The corresponding constrained tagger is 10 times faster than searching in a raw space pruned with beam-width 5. Tagging accuracy is moderately improved as well. For Chinese word segmentation (CWS), which can be formulated as character tagging, analogous constraints can be learned with the same templates as English POS tagging. High-quality constraints can be learned with respect to a special tagset, however, with this tagset, the best segmentation accuracy is hard to achieve. Therefore, these character-based constraints are not directly used for determining predictions as in English POS tagging. We propose an ILP formulation of the CWS problem. By adopting this ILP formulation, segmentation F-measure is increased from 0.968 to 0.974, as compared to Viterbi decoding with the same feature set. Moreover, the learned constraints can be applied to reduce the number of possible words over a character sequence, i.e. to reduce the number of variables to set. This reduction of problem size immediately speeds up an ILP solver by more than 100 times. ProceediJnegjus, o Rfe thpeu 5bl0icth o Afn Knouraela M, 8e-e1t4in Jgul oyf t 2h0e1 A2.s ?oc c2ia0t1io2n A fsosro Cciaotmiopnu ftaotrio Cnoamlp Luintagtuioisntaicls L,i pnaggueis t 1i0c5s4–1062, 2 English POS tagging 2.1 Explore deterministic constraints Suppose that, following (Chomsky, 1970), we distinguish major lexical categories (Noun, Verb, Adjective and Preposition) by two binary features: + |− N and +|− V. Let (+N −V) =Noun, (−N +V) =Verb, (+N, +V) =Adjective, aonudn (−N, −V) =preposition. A word occurring in betw(e−eNn a preceding wosoitrdio nth.e Aand w a following wgo irnd of always bears the feature +N. On the other hand, consider the annotation guideline of English Treebank (Marcus et al., 1993) instead. Part-of-speech (POS) tags are used to categorize words, for example, the POS tag VBG tags verbal gerunds, NNS tags nominal plurals, DT tags determiners and so on. Following this POS representation, there are as many as 10 possible POS tags that may occur in between the–of, as estimated from the WSJ corpus of Penn Treebank. , 2.1.1 Templates of deterministic constraints , To explore determinacy in the distribution of POS tags in Penn Treebank, we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection. A constraint that may determine the POS category should reflect both the context and the morphological feature of the corresponding word. The practical difficulty in representing such deterministic constraints is that we do not have a perfect mechanism to analyze morphological features of a word. Endings or prefixes of English words do not deterministically mark their morphological inflections. We propose to compute the morph feature of a word as the set of all of its possible tags, i.e. all tag types that are assigned to the word in training data. Furthermore, we approximate unknown words in testing data by rare words in training data. For a word that occurs less than 5 times in the training corpus, we compute its morph feature as its last two characters, which is also conjoined with binary features indicating whether the rare word contains digits, hyphens or upper-case characters respectively. See examples of morph features in Table 1. We consider bigram and trigram templates for generating potentially deterministic constraints. Let denote the ith word relative to the current word w0; and mi denote the morph feature of wi. A wi 1055 w(fr0e=qtruaednets)(set of pmos0s=ib{lNeN taSg,s V oBfZ th}e word) w0=t(imraere-s)hares(thme0 l=as{t- tewso, c HhYaPraHcEteNrs}. .) Table 1: Morph features offrequent words and rare words as computed from the WSJ Corpus of Penn Treebank. -gtbr ai -m w −1w 0w−mw1 m,wm 0−, 1mw1 0 w mw1 , mw m− 1m 1mw0m0w,1 wm, m0 −m1 m 0wm1 Table 2: The templates for generating potentially deterministic constraints of English POS tagging. bigram constraint includes one contextual word (w−1 |w1) or the corresponding morph feature; and a trigram constraint includes both contextual words or their morph features. Each constraint is also con- joined with w0 or m0, as described in Table 2. 2.1.2 Learning of deterministic constraints In the above section, we explore templates for potentially deterministic constraints that may determine POS category. With respect to a training corpus, if a constraint C relative to w0 ’always’ assigns a certain POS category t∗ to w0 in its context, i.e. > thr, and this constraint occurs more than a cutoff number, we consider it as a deterministic constraint. The threshold thr is a real number just under 1.0 and the cutoff number is empirically set to 5 in our experiments. counctou(Cnt∧(tC0)=t∗) 2.1.3 Decoding of deterministic constraints By the above definition, the constraint of w−1 = the, m0 = {NNS VBZ } and w1 = of is deterministic. It det=er{mNiNneSs, ,the V BPZO}S category of w0 to be NNS. There are at least two ways of decoding these constraints during POS tagging. Take the word trades for example, whose morph feature is {NNS, VBZ}. fOonre e xaaltemrnplaet,ive w hiso sthea tm as long as rtera dises { occurs Zb e}-. tween the-of, it is tagged with NNS. The second alternative is that the tag decision is made only if all deterministic constraints relative to this occurrence , of trades agree on the same tag. Both ways of decoding are purely rule-based and involve no probabilistic inference. In favor of a higher precision, we adopt the latter one in our experiments. tTchoe/nDscrotTamwSpci&lnoeLmxpd;/–fiulenbtaxp/i–cloufntg/aNpnlOci(amgnw/1–tOhNTpe(lanS+Ti&/m2cNL)lubTdaien2ls/)IoVNuBtlZamwn.1=ic2l3ud,ems.2=1 Table 3: Comparison of raw input and constrained input. 2.2 Search in a constrained space Following most previous work, we consider POS tagging as a sequence classification problem and de- compose the overall sequence scnore over the linear structure, i.e. ˆt =t∈atraggGmENa(xw)Xi=1score(ti) where function tagGEN maps input seXntence w = w1...wn to the set of all tag sequences that are of length n. If a POS tagger takes raw input only, i.e. for every word, the number of possible tags is a constant T, the space of tagGEN is as large as Tn. On the other hand, if we decode deterministic constraints first be- fore a probabilistic search, i.e. for some words, the number of possible tags is reduced to 1, the search space is reduced to Tm, where m is the number of (unconstrained) words that are not subject to any deterministic constraints. Viterbi algorithm is widely used for tagging, and runs in O(nT2) when searching in an unconstrained space. On the other hand, consider searching in a constrained space. Suppose that among the m unconstrained words, m1 of them follow a word that has been tagged by deterministic constraints and m2 (=m-m1) of them follow another unconstrained word. Viterbi decoder runs in O(m1T + m2T2) while searching in such a constrained space. The example in Table 3 shows raw and constrained input with respect to a typical input sentence. Lookahead features The score of tag predictions are usually computed in a high-dimensional feature space. We adopt the basic feature set used in (Ratnaparkhi, 1996) and (Collins, 2002). Moreover, when deterministic constraints have applied to contextual words of w0, it is also possible to include some lookahead feature templates, such as: t0&t1; , t0&t1;&t2; , and t−1&t0;&t1; where ti represents the tag of the ith word relative 1056 to the current word w0. As discussed in (Shen et al., 2007), categorical information of neighbouring words on both sides of w0 help resolve POS ambiguity of w0. In (Shen et al., 2007), lookahead features may be available for use during decoding since searching is bidirectional instead of left-to-right as in Viterbi decoding. In this work, deterministic constraints are decoded before the application of probabilistic models, therefore lookahead features are made available during Viterbi decoding. 3 Chinese Word Segmentation (CWS) 3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and the out-of-vocabulary (OOV) problem that it is impossible to observe all words in training data, CWS is widely formulated as a character tagging problem (Xue, 2003). A character-based CWS decoder is to find the highest scoring tag sequence tˆ over the input character sequence c, i.e. Xn tˆ =t∈ atraggGmEaNx(c)Xi=1score(ti) . This is the same formulation as POS tagging. The Viterbi algorithm is also widely used for decoding. The tag of each character represents its relative position in a word. Two popular tagsets include 1) IB: where B tags the beginning of a word and I all other positions; and 2) BMES: where B, M and E represent the beginning, middle and end of a multicharacter word respectively, and S tags a singlecharacter word. For example, after decoding with BMES, 4 consecutive characters associated with the tag sequence BMME compose a word. However, after decoding with IB, characters associated with BIII may compose a word if the following tag is B or only form part of a word if the following tag is I. Even though character tagging accuracy is higher with tagset IB, tagset BMES is more popular in use since better performance of the original problem CWS can be achieved by this tagset. Character-based feature templates We adopt the ’non-lexical-target’ feature templates in (Jiang et al., 2008a). Let ci denote the ith character relative to the current character c0 and t0 denote the tag assigned to c0. The following templates are used: ci&t0; (i=-2...2), cici+1&t0; (i=-2...1) and c−1c1&t0.; Character-based deterministic constraints We can use the same templates as described in Table 2 to generate potentially deterministic constraints for CWS character tagging, except that there are no morph features computed for Chinese characters. As we will show with experimental results in Section 5.2, useful deterministic constraints for CWS can be learned with tagset IB but not with tagset BMES. It is interesting but not surprising to notice, again, that the determinacy of a problem is sensitive to its representation. Since it is hard to achieve the best segmentations with tagset IB, we propose an indirect way to use these constraints in the following section, instead of applying these constraints as straightforwardly as in English POS tagging. 3.2 Word-based word segmentation A word-based CWS decoder finds the highest scoring segmentation sequence wˆ that is composed by the input character sequence c, i.e. wˆ =w∈arseggGmEaNx(c)Xi|=w1|score(wi) . where function segGEN maps character sequence c to the set of all possible segmentations of c. For example, w = (c1. .cl1 ) ...(cn−lk+1 ...cn) represents a segmentation of k words and the lengths of the first and last word are l1 and lk respectively. In early work, rule-based models find words one by one based on heuristics such as forward maximum match (Sproat et al., 1996). Exact search is possible with a Viterbi-style algorithm, but beamsearch decoding is more popular as used in (Zhang and Clark, 2007) and (Jiang et al., 2008a). We propose an Integer Linear Programming (ILP) formulation of word segmentation, which is naturally viewed as a word-based model for CWS. Character-based deterministic constraints, as discussed in Section 3.1, can be easily applied. 3.3 ILP formulation of CWS Given a character sequence c=c1 ...cn, there are s(= n(n + 1)/2) possible words that are contiguous subsets of c, i.e. w1, ..., ws ⊆ c. Our goal is to find 1057 Table 4: Comparison of raw input and constrained input. an optimal solution x = ...xs that maximizes x1 Xs Xscore(wi) · xi, subject to Xi= X1 (1) X xi = 1, ∀c ∈ c; (2) ix:Xic∈∈wi {0,1},1 ≤i≤s The boolean value of xi, as guaranteed by constraint (2), indicates whether wi is selected in the segmentation solution or not. Constraint (1) requires every character to be included in exactly one selected word, thus guarantees a proper segmentation of the whole sequence. This resembles the ILP formulation of the set cover problem, though the first con- straint is different. Take n = 2 for example, i.e. c = c1c2, the set of possible words is {c1, c2 , c1c2}, i.e. s = |x| = t3 o. T pohesrseib are only t iwso { possible soli.uet.ion ss = subject t o3 .co Tnhsetrreain artse (1) yan tdw (2), x = 1 s1o0giving an output set {c1, c2}, or x = 001 giving an output asent {c1c2}. tTphuet efficiency o.f solving this problem depends on the number of possible words (contiguous subsets) over a character sequence, i.e. the number of variables in x. So as to reduce |x|, we apply determiniasbtlice sc ionn xs.tra Sinots a predicting I |xB| tags first, w dehtiecrhm are learned as described in Section 3.1. Possible words are generated with respect to the partially tagged character sequence. A character tagged with B always occurs at the beginning of a possible word. Table 4 illustrates the constrained and raw input with respect to a typical character sequence. 3.4 Character- and word-based features As studied in previous work, word-based feature templates usually include the word itself, sub-words contained in the word, contextual characters/words and so on. It has been shown that combining the use of character- and word-based features helps improve performance. However, in the character tag- ging formulation, word-based features are non-local. To incorporate these non-local features and make the search tractable, various efforts have been made. For example, Jiang et al. (2008a) combine different levels of knowledge in an outside linear model of a twolayer cascaded model; Jiang et al. (2008b) uses the forest re-ranking technique (Huang, 2008); and in (Kruengkrai et al., 2009), only known words in vocabulary are included in the hybrid lattice consisting of both character- and word-level nodes. We propose to incorporate character-based features in word-based models. Consider a characterbased feature function φ(c, t,c) that maps a character-tag pair to a high-dimensional feature space, with respect to an input character sequence c. For a possible word over c of length l , wi = ci0 ...ci0+l−1, tag each character cij in this word with a character-based tag tij . Character-based features of wi can be computed as {φ(cij , tij , c) |0 ≤ j < l}. The ficrsant row oofm pTautbeled a5s i {llφus(tcrates c,ch)a|r0ac ≤ter j-b

4 0.088868968 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

5 0.087451547 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

Author: Yafang Wang ; Maximilian Dylla ; Marc Spaniol ; Gerhard Weikum

Abstract: The Web and digitized text sources contain a wealth of information about named entities such as politicians, actors, companies, or cultural landmarks. Extracting this information has enabled the automated construction oflarge knowledge bases, containing hundred millions of binary relationships or attribute values about these named entities. However, in reality most knowledge is transient, i.e. changes over time, requiring a temporal dimension in fact extraction. In this paper we develop a methodology that combines label propagation with constraint reasoning for temporal fact extraction. Label propagation aggressively gathers fact candidates, and an Integer Linear Program is used to clean out false hypotheses that violate temporal constraints. Our method is able to improve on recall while keeping up with precision, which we demonstrate by experiments with biography-style Wikipedia pages and a large corpus of news articles.

6 0.078083806 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs

7 0.076107986 129 acl-2012-Learning High-Level Planning from Text

8 0.074483164 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

9 0.059474364 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

10 0.055870451 186 acl-2012-Structuring E-Commerce Inventory

11 0.054284967 101 acl-2012-Fully Abstractive Approach to Guided Summarization

12 0.053887784 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

13 0.053568531 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

14 0.053365406 176 acl-2012-Sentence Compression with Semantic Role Constraints

15 0.052008472 32 acl-2012-Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

16 0.052004497 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

17 0.048910372 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora

18 0.048559114 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

19 0.047958206 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

20 0.04683537 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.162), (1, 0.036), (2, -0.036), (3, 0.05), (4, 0.014), (5, 0.085), (6, -0.01), (7, 0.031), (8, -0.07), (9, 0.051), (10, -0.061), (11, 0.043), (12, -0.013), (13, 0.063), (14, -0.076), (15, -0.048), (16, 0.081), (17, 0.032), (18, 0.018), (19, -0.06), (20, 0.052), (21, -0.061), (22, 0.046), (23, 0.107), (24, -0.024), (25, 0.216), (26, 0.189), (27, -0.117), (28, -0.025), (29, -0.256), (30, -0.197), (31, -0.205), (32, -0.115), (33, -0.01), (34, -0.222), (35, -0.187), (36, -0.145), (37, -0.291), (38, 0.021), (39, -0.037), (40, -0.094), (41, 0.046), (42, -0.058), (43, 0.058), (44, -0.06), (45, -0.0), (46, -0.065), (47, -0.036), (48, 0.05), (49, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96181023 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

2 0.86669868 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

3 0.35548052 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language

Author: Fei Liu ; Fuliang Weng ; Xiao Jiang

Abstract: Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, i.e., for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates. In this paper, we propose a cognitivelydriven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90% word-coverage across all data sets (a . 10% absolute increase compared to state-ofthe-art); the broad word-coverage can also successfully translate into message-level performance gain, yielding 6% absolute increase compared to the best prior approach.

4 0.34263644 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs

Author: Yukino Baba ; Hisami Suzuki

Abstract: This paper presents a comparative study of spelling errors that are corrected as you type, vs. those that remain uncorrected. First, we generate naturally occurring online error correction data by logging users’ keystrokes, and by automatically deriving pre- and postcorrection strings from them. We then perform an analysis of this data against the errors that remain in the final text as well as across languages. Our analysis shows a clear distinction between the types of errors that are generated and those that remain uncorrected, as well as across languages.

5 0.33910516 129 acl-2012-Learning High-Level Planning from Text

Author: S.R.K. Branavan ; Nate Kushman ; Tao Lei ; Regina Barzilay

Abstract: Comprehending action preconditions and effects is an essential step in modeling the dynamics of the world. In this paper, we express the semantics of precondition relations extracted from text in terms of planning operations. The challenge of modeling this connection is to ground language at the level of relations. This type of grounding enables us to create high-level plans based on language abstractions. Our model jointly learns to predict precondition relations from text and to perform high-level planning guided by those relations. We implement this idea in the reinforcement learning framework using feedback automatically obtained from plan execution attempts. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, yielding an F-measure of 66% compared to the baseline’s 65%. Additionally, we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline successfully completing 80% of planning tasks as compared to 69% for the baseline.1 –

6 0.30585063 89 acl-2012-Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

7 0.2950404 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

8 0.28107214 186 acl-2012-Structuring E-Commerce Inventory

9 0.27896786 195 acl-2012-The Creation of a Corpus of English Metalanguage

10 0.25649911 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

11 0.25263548 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

12 0.24295656 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

13 0.24269575 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

14 0.23992774 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

15 0.23535772 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

16 0.21903481 190 acl-2012-Syntactic Stylometry for Deception Detection

17 0.21391833 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

18 0.21035238 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

19 0.20598501 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

20 0.20544192 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.026), (26, 0.038), (28, 0.03), (30, 0.028), (37, 0.028), (39, 0.069), (53, 0.358), (59, 0.018), (74, 0.033), (82, 0.034), (84, 0.015), (85, 0.033), (90, 0.089), (92, 0.036), (94, 0.032), (99, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75566638 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

2 0.6136536 108 acl-2012-Hierarchical Chunk-to-String Translation

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

Abstract: We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks.

3 0.39172363 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

4 0.37330511 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

5 0.36919031 187 acl-2012-Subgroup Detection in Ideological Discussions

Author: Amjad Abu-Jbara ; Pradeep Dasigi ; Mona Diab ; Dragomir Radev

Abstract: The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.

6 0.36568016 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

7 0.3654235 191 acl-2012-Temporally Anchored Relation Extraction

8 0.36286983 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

9 0.3623282 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

10 0.35956004 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

11 0.35816282 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

12 0.3578122 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

13 0.35775909 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

14 0.35764042 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

15 0.35706323 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

16 0.35603303 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

17 0.35566598 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

18 0.3555876 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

19 0.35542899 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

20 0.35523266 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation