acl acl2013 acl2013-65 knowledge-graph by maker-knowledge-mining

65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

Source: pdf

Author: Gozde Ozbal ; Daniele Pighin ; Carlo Strapparava

Abstract: Daniele Pighin Google Inc. Z ¨urich, Switzerland danie le . pighin@ gmai l com . Carlo Strapparava FBK-irst Trento, Italy st rappa@ fbk . eu you”. As another scenario, creative sentence genWe present BRAINSUP, an extensible framework for the generation of creative sentences in which users are able to force several words to appear in the sentences and to control the generation process across several semantic dimensions, namely emotions, colors, domain relatedness and phonetic properties. We evaluate its performance on a creative sentence generation task, showing its capability of generating well-formed, catchy and effective sentences that have all the good qualities of slogans produced by human copywriters.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We evaluate its performance on a creative sentence generation task, showing its capability of generating well-formed, catchy and effective sentences that have all the good qualities of slogans produced by human copywriters. [sent-8, score-0.925]

2 Moreover, making the slogan evoke “joy” or “satisfaction” could make the advertisement even more catchy for customers. [sent-16, score-0.385]

3 On the other hand, there are many examples of provocative slogans in which copywriters try to impress their readers by suscitating strong negative feelings, as in the case of antismoke campaigns (e. [sent-17, score-0.323]

4 (2012) just to name a few, to our best knowledge there is no attempt to develop an unified framework for the generation of creative sentences in which users can control all the variables involved in the creative process to achieve the desired effect. [sent-30, score-0.845]

5 In this paper, we advocate the use of syntactic information to generate creative utterances by describing a methodology that accounts for lexical and phonetic constraints and multiple semantic dimensions at the same time. [sent-31, score-0.633]

6 We present BRAINSUP, an extensible framework for creative sentence generation in which users can control all the parameters of the creative process, thus generating sentences that can be used for practical ap- plications. [sent-32, score-0.899]

7 health day, juice, sunshine With juice and cereal the normal day becomes a summer sunshine. [sent-38, score-0.23]

8 At the same time, they can require a sentence to include desired phonetic properties, such as rhymes, alliteration or plosives. [sent-49, score-0.316]

9 The combination of these features allows for the generation of potentially catchy and memorable sentences by establishing connections between linguistic, emotional (LaBar and Cabeza, 2006), echoic and visual (Borman et al. [sent-50, score-0.468]

10 Other creative dimensions can easily be plugged in, due to the inherently modular structure of the system. [sent-52, score-0.323]

11 BRAINSUP supports the creative process by greedily exploring a huge solution space to produce completely novel utterances responding to user requisites. [sent-53, score-0.374]

12 It exploits syntactic constraints to dramatically cut the size of the search space, thus making it possible to focus on the creative aspects of sentence generation. [sent-54, score-0.419]

13 2 Related work Research in creative language generation has bloomed in recent years. [sent-55, score-0.418]

14 The variation takes place considering the phonetic distance and semantic constraints such as semantic similarity, semantic domain opposition and affective polarity difference. [sent-61, score-0.273]

15 Poetry generation systems face similar challenges to BRAINSUP as they struggle to combine semantic, lexical and phonetic features in a unified framework. [sent-66, score-0.317]

16 (2010) describe a model for poetry generation in which users can control meter and rhyme scheme. [sent-68, score-0.385]

17 (2012) attempt to generate novel poems by replacing words in existing poetry with morphologically compatible words that are semantically related to a target domain. [sent-71, score-0.245]

18 Content control and the inclusion of phonetic features are left as future work and syntactic information is not taken into account. [sent-72, score-0.261]

19 The Electronic Text Composition project1 is a corpus based approach to poetry generation which recursively combines automatically generated linguistic constituents into grammatical sentences. [sent-73, score-0.293]

20 (2012) propose another data-driven approach to poetry generation based on simile transformation. [sent-75, score-0.245]

21 Constraints about phonetic properties of the selected words or their frequencies can be enforced during retrieval. [sent-77, score-0.177]

22 Unlike these examples, BRAINSUP makes heavy use of syntactic information to enforce well-formed sentences and to constraint the search for a solution, and provides an extensible framework in which various forms of linguistic creativity can easily be incorporated. [sent-78, score-0.263]

23 Several slogan generators are available on the web2, but their capabilities are very limited as they can only replace single words or word sequences within existing slogan. [sent-79, score-0.204]

24 1447 3 Architecture of BRAINSUP To effectively support the creative process with useful suggestions, we must be able to generate sentences conforming to the user needs. [sent-93, score-0.442]

25 For slogan generation, the target words could be the key features of a product, or targetdefining keywords that copywriters want to explicitly mention. [sent-96, score-0.384]

26 The sentence generation process is based on morpho-syntactic patterns which we automatically discover from a corpus of dependency parsed sentences P. [sent-111, score-0.377]

27 s Tohf wseel pla-ftotermrnesd r spernetseennctes v trhya tg we employ to generate creative sentences by only focusing on the lexical aspects of the process. [sent-113, score-0.383]

28 Candidate fillers for each empty position (slot) in the patterns are chosen according to the lexical and syntactic constraints enforced by the dependency relations in the patterns. [sent-114, score-0.426]

29 Algorithm 1provides a high-level description of the creative sentence generation process. [sent-118, score-0.472]

30 A “*” represents an empty slot to be filled with a filler. [sent-121, score-0.277]

31 1 Pattern selection We generate creative sentences starting from morpho-syntactic patterns which have been automatically learned from a large corpus P. [sent-131, score-0.458]

32 The tcohmoiacteic aollfy th leea corpus mfro am l wrgehic cho pthues patterns are extracted constitutes the first element of the creative sentence generation process, as different choices will generate sentences with different styles. [sent-132, score-0.652]

33 For example, a corpus of slogans or punchlines can result in short, catchy and memorable sentences, whereas a corpus of simplified English would be a better choice to learn a second language or to address low reading level audiences. [sent-133, score-0.479]

34 After selecting the target corpus, we parse all the sentences with the Stanford Parser (Klein and Man1448 ning, 2003) and produce the patterns by stripping away all content words from the parses. [sent-137, score-0.189]

35 This information is needed to select the patterns which are compatible with the target words t in the user specification U. [sent-143, score-0.266]

36 For example, this pattern is not compatible with t = [heading/VBG, edge/NN] as the pattern does not have an empty slot for a gerundive verb, while it satisfies t = [heading/NN, edge/NN] as it can accommodate the two singular nouns. [sent-144, score-0.378]

37 While retrieving patterns, we also need to enforce that a pattern be not completely filled just by adding the target words t, as under these conditions there would be no room to achieve any kind of creative effect. [sent-145, score-0.466]

38 To avoid always selecting the same patterns for the same kinds of inputs, we add a small random component (also controlled by Θ) to the pattern sorting algorithm, thus allowing for sentences to be generated also from non-top ranked patterns. [sent-152, score-0.232]

39 , the space of all sentences that can be generated by respecting the syntactic constraints encoded by each pattern. [sent-156, score-0.194]

40 , part-of-speech detnsubjdobjpdreetpamodpobjdet DT NNS The fires VBD X DT JJ NN IN DT NN a * smoke in the * Figure 2: A partially lexicalized sentence with a highlighted empty slot marked with X. [sent-159, score-0.508]

41 The search advances towards a complete solution by selecting an empty slot to fill and trying to place candidate fillers in the selected position. [sent-162, score-0.398]

42 Each partially lexicalized solution is scored by a battery of scoring functions that compete to generate creative sentences respecting the user specification U, as explained in Section 3. [sent-163, score-0.564]

43 To limit the number of words that can occupy a given position in a sentence, we define a set of operators that return a list of candidate fillers for a slot solely based on syntactic clues. [sent-168, score-0.388]

44 While filling in a given slot X, the dependency operators can be combined to obtain a list of words which are likely to occupy that position given the syntactic constraints induced by the structure of the pattern. [sent-176, score-0.367]

45 If wi is the head of X, then a direct operator is used to retrieve a list of fillers that satisfy the ith constraint. [sent-180, score-0.217]

46 As an example, let us consider the partially completed sentence shown in Figure 2 having an empty slot marked with X. [sent-182, score-0.293]

47 More formally, we can define the set of candidate fillers for a slot X, CX, ploit τd−o1bj(smoke) as: eC tXhe = se τr−h1X,X(hX) (Twi|wi∈MX τrwi,X(wi)), t∩e where rwi,X is the type ofT relation between wi and X, MX is the set of modifTiers of X and hX is the syntactic head of X. [sent-188, score-0.342]

48 3 Filler selection and solution scoring We have devised a set of feature functions that account for different aspects of the creative sentence generation process. [sent-194, score-0.556]

49 By changing the weight w of the feature functions in U, users can control the extent to which each creativity component will affect the sentence generation process, and tune the output of the system to better match their needs. [sent-195, score-0.455]

50 As explained in the remainder of this section, feature functions are responsible for ranking the can- didate slot fillers to be used during sentence generation and for selecting the best solutions to be 4An empty slot does not generate constraints for X. [sent-196, score-0.904]

51 Algorithm 2RankCandidates(U,f,c1,c2,s,X):c1 and c2 are two candidate fillers for the slot X in the sentence s = [s0, . [sent-199, score-0.322]

52 To compare two candidates c1 and c2 for the slot X in the sentence s, we first generate two sentences sc1 and sc2 in which the empty slot X is occupied by c1 and c2, respectively. [sent-206, score-0.544]

53 This approach makes it possible to establish a strict order of precedence among feature functions and to select fillers that have a highest chance of maximizing the user satisfaction. [sent-209, score-0.228]

54 All the words in the sentence which have an association with the target color c give a positive contribution, while those that are associated with a color ci c contribute negatively. [sent-225, score-0.293]

55 All the phonetic features are based on the phonetic representation of English words of the Carnegie Mellon University pronouncing dictionary (Lenzo, 1998). [sent-239, score-0.396]

56 For the alliteration scorer, we store the phonetic representation of each word in s in a trie (i. [sent-241, score-0.262]

57 More simply put, we count hPowi| many ofP Pthe phonetic prefixes of the words in Pthe sentence Pare repeated, and then we normalize this value by the total number of phonemes in s. [sent-246, score-0.231]

58 The rhyme feature works exactly in the same way, with the only difference that we invert the phonetic representation of each word before adding it to the TRIE. [sent-247, score-0.227]

59 Thus, we give higher scores to sentences in which several words share the same phonetic ending. [sent-248, score-0.236]

60 This is simply the likelihood of a sentence estimated by an n-gram language model, to enforce the generation of wellformed word sequences. [sent-259, score-0.239]

61 4 Evaluation We evaluated our model on a creative sentence generation task. [sent-268, score-0.472]

62 The objective of the evaluation is twofold: we wanted to demonstrate 1) the effectiveness of our approach for creative sentence generation, in general, and 2) the potential of BRAINSUP to support the brainstorming process behind slogan generation. [sent-269, score-0.604]

63 Five experienced annotators were asked to rate 432 creative sentences according to the following criteria, namely: 1) Catchiness: is the sentence attractive, catchy or memorable? [sent-271, score-0.594]

64 [Ungrammatical/Slightly disfluent/Fluent]; 5) Success: could the sentence be a good slogan for the target domain? [sent-275, score-0.313]

65 In these last two cases, the annotators 1451 were instructed to select the middle option only in cases where the gap with a correct/successful sentence could be filled just by performing minor editing. [sent-277, score-0.2]

66 We started by collecting slogans from an online repository of slogans5. [sent-279, score-0.255]

67 Then, we randomly selected a subset of these slogans and for each of them we generated an input specification U for the system. [sent-280, score-0.341]

68 Two or three content words appearing in each slogan were randomly selected as the target words t. [sent-282, score-0.259]

69 We did so to simulate the brainstorming phase behind the slogan generation process, where copywriters start with a set of relevant keywords to come up with a catchy slogan. [sent-283, score-0.676]

70 In all cases, we set the target emotion to “positive” as we could not establish a generally valid criteria to associate a specific emotion to a product. [sent-284, score-0.203]

71 Concerning chromatic slanting, for target domains having a strong chromatic correlation we allowed the system to slant the generated sentences accordingly. [sent-285, score-0.4]

72 For each of the resulting 50 input configurations, we generated up to 10 creative sentences. [sent-290, score-0.326]

73 These settings allow us to enforce an order of precedence among the scorers during slot-filling, while giving them virtually equal relevance for solution ranking. [sent-336, score-0.207]

74 As discussed in Section 3 we use two different treebanks to learn the syntactic patterns (P) aenntd ttrheee dependency operators t(aLc)ti. [sent-337, score-0.205]

75 fr Fomor a corpus of 16,000 proverbs (Mihalcea and Strapparava, 2006), which offers a good selection of short sentences with a good potential to be used for slogan generation. [sent-339, score-0.263]

76 This choice seemed to be a good compromise as, to our best knowledge, there is no published slogan dataset with an adequate size. [sent-340, score-0.204]

77 Besides, using existing slogans might have legal implications that we might not be aware of. [sent-341, score-0.255]

78 uk/ 8Since the CMU pronouncing dictionary used by the phonetic scorers is based on the American pronunciation of words, we actually pre-processed the whole BNC by replacing all British-English words with their American-English counterparts. [sent-352, score-0.344]

79 For example, all five annotators (MC=5) agreed on the annotation of the catchiness of the slogans in 19. [sent-377, score-0.387]

80 The agreement on the relatedness of the slogans is especially high, with all 5 annotators taking the same decision in almost two cases out of three, i. [sent-382, score-0.417]

81 The generated slogans are found to be catchy in more than 2/3 of the cases, (i. [sent-387, score-0.442]

82 15% of the cases the annotators have found that the generated slogans have the potential to be turned into successful ones only with minor editing. [sent-398, score-0.411]

83 Similar conclusions can be drawn concerning the correctness of the output, as in almost one third of the cases the slogans are 9For the binary decisions (i. [sent-400, score-0.342]

84 The relatedness figure is especially high, as in almost 94% of the cases the majority of annotators found the slogans to be pertinent to the target domain. [sent-404, score-0.513]

85 This result is not surprising, as all the slogans are generated by considering keywords that already exist in real slogans for the same domain. [sent-405, score-0.615]

86 , to support creative sentence generation starting from a good set of relevant keywords. [sent-408, score-0.472]

87 58%) the majority of the annotators have labeled the slogans favorably across all 5 dimensions. [sent-411, score-0.36]

88 In other cases, such as “A sixth calorie may taste an own good” or “A same sunshine is fewer than a juice of day”, more sophisticated reasoning about syntactic and semantic relations in the output might be necessary in order to enforce the generation of sound and grammatical sentences. [sent-427, score-0.505]

89 healthy day, juice, sunshine – Drink juice of your sunshine, and your weight will choose day of you. [sent-430, score-0.23]

90 – cigarette mascara doctors, smoke Unscrupulous doctors smoke armored units. [sent-432, score-0.39]

91 , presence or absence of phonetic features or chromatic slanting) and the outcome of the annotation, i. [sent-442, score-0.262]

92 BRAINSUP makes heavy use of dependency parsed data and statistics collected from dependency treebanks to ensure the grammaticality of the generated sentences, and to trim the search space while seeking the sentences that maximize the user satisfaction. [sent-451, score-0.264]

93 The system has been designed as a supporting tool for a variety of real-world applications, from advertisement to entertainment and education, where at the very least it can be a valuable support for time-consuming and knowledge- intensive sentence generation needs. [sent-452, score-0.236]

94 To demonstrate this point, we carried out an evaluation on a creative sentence generation benchmark showing that BRAINSUP can effectively produce catchy, memorable and successful sentences that have the potential to inspire the work of copywriters. [sent-453, score-0.616]

95 To our best knowledge, this is the first systematic attempt to build an extensible framework that allows for multi-dimensional creativity while at the same time relying on syntactic constraints to enforce grammaticality. [sent-454, score-0.256]

96 Further tuning of BRAINSUP to build a dedicated system for slogan generation is also part of our future plans. [sent-463, score-0.344]

97 After these improvements, we would like to conduct a more focused evaluation on slogan generation involving human copywriters and domain experts in an interactive setting. [sent-464, score-0.456]

98 Automatic analysis of rhythmic poetry with applications to generation and translation. [sent-492, score-0.245]

99 A computational approach to the automation of creative naming. [sent-557, score-0.278]

100 Graphlaugh: a tool for the interactive generation of humorous puns. [sent-585, score-0.179]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('brainsup', 0.442), ('creative', 0.278), ('slogans', 0.255), ('slogan', 0.204), ('phonetic', 0.177), ('smoke', 0.153), ('slot', 0.146), ('generation', 0.14), ('catchy', 0.139), ('scorers', 0.125), ('creativity', 0.124), ('fillers', 0.122), ('poetry', 0.105), ('sunshine', 0.102), ('humor', 0.099), ('empty', 0.093), ('color', 0.092), ('juice', 0.09), ('alliteration', 0.085), ('chromatic', 0.085), ('memorable', 0.085), ('strapparava', 0.077), ('scorer', 0.077), ('patterns', 0.075), ('emotion', 0.074), ('brainstorming', 0.068), ('catchiness', 0.068), ('compatiblepatterns', 0.068), ('copywriters', 0.068), ('crunch', 0.068), ('slant', 0.068), ('tasting', 0.068), ('waking', 0.068), ('zbal', 0.068), ('annotators', 0.064), ('fires', 0.062), ('sentences', 0.059), ('user', 0.059), ('slots', 0.058), ('solutions', 0.058), ('keywords', 0.057), ('operator', 0.056), ('connotation', 0.055), ('target', 0.055), ('relatedness', 0.054), ('sentence', 0.054), ('constraints', 0.052), ('calorie', 0.051), ('colton', 0.051), ('guerini', 0.051), ('manurung', 0.051), ('plosives', 0.051), ('slanting', 0.051), ('pattern', 0.05), ('rhyme', 0.05), ('doctors', 0.05), ('dependency', 0.049), ('control', 0.049), ('generated', 0.048), ('functions', 0.047), ('operators', 0.046), ('drink', 0.046), ('generate', 0.046), ('carlo', 0.046), ('emotional', 0.045), ('dimensions', 0.045), ('greene', 0.045), ('lips', 0.045), ('skin', 0.045), ('zde', 0.045), ('enforce', 0.045), ('domain', 0.044), ('cases', 0.044), ('concerning', 0.043), ('advertisement', 0.042), ('cup', 0.042), ('drama', 0.042), ('pronouncing', 0.042), ('taste', 0.042), ('users', 0.041), ('dt', 0.041), ('majority', 0.041), ('wi', 0.039), ('humorous', 0.039), ('valitutti', 0.039), ('occupy', 0.039), ('cr', 0.039), ('compatible', 0.039), ('day', 0.038), ('filled', 0.038), ('specification', 0.038), ('solution', 0.037), ('mc', 0.036), ('base', 0.035), ('mohammad', 0.035), ('syntactic', 0.035), ('ofp', 0.034), ('vbd', 0.034), ('armored', 0.034), ('bite', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

Author: Gozde Ozbal ; Daniele Pighin ; Carlo Strapparava

2 0.099727884 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.

3 0.090154961 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen

Abstract: We propose a method for automated generation of adult humor by lexical replacement and present empirical evaluation results of the obtained humor. We propose three types of lexical constraints as building blocks of humorous word substitution: constraints concerning the similarity of sounds or spellings of the original word and the substitute, a constraint requiring the substitute to be a taboo word, and constraints concerning the position and context of the replacement. Empirical evidence from extensive user studies indicates that these constraints can increase the effectiveness of humor generation significantly.

4 0.084284335 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue

Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

Abstract: While there have been many attempts to estimate the emotion of an addresser from her/his utterance, few studies have explored how her/his utterance affects the emotion of the addressee. This has motivated us to investigate two novel tasks: predicting the emotion of the addressee and generating a response that elicits a specific emotion in the addressee’s mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by . five human workers.

5 0.073181823 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

Author: James O. Ryan ; Serguei Pakhomov ; Susan Marino ; Charles Bernick ; Sarah Banks

Abstract: We present a system for automated phonetic clustering analysis of cognitive tests of phonemic verbal fluency, on which one must name words starting with a specific letter (e.g., ‘F’) for one minute. Test responses are typically subjected to manual phonetic clustering analysis that is labor-intensive and subject to inter-rater variability. Our system provides an automated alternative. In a pilot study, we applied this system to tests of 55 novice and experienced professional fighters (boxers and mixed martial artists) and found that experienced fighters produced significantly longer chains of phonetically similar words, while no differences were found in the total number of words produced. These findings are preliminary, but strongly suggest that our system can be used to detect subtle signs of brain damage due to repetitive head trauma in individuals that are otherwise unimpaired.

6 0.070507176 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

7 0.068117954 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

8 0.066459849 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions

9 0.063275665 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

10 0.059779651 163 acl-2013-From Natural Language Specifications to Program Input Parsers

11 0.059515398 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

12 0.058490008 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

13 0.057802845 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

14 0.05717963 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

15 0.05687096 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

16 0.056692455 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

17 0.055928033 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

18 0.055267572 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities

19 0.054948546 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

20 0.054813109 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.181), (1, 0.037), (2, -0.024), (3, -0.049), (4, -0.056), (5, -0.019), (6, 0.028), (7, -0.02), (8, 0.003), (9, 0.009), (10, -0.062), (11, 0.013), (12, -0.052), (13, 0.001), (14, -0.028), (15, -0.031), (16, -0.002), (17, -0.019), (18, 0.06), (19, 0.0), (20, -0.051), (21, -0.058), (22, 0.043), (23, 0.004), (24, -0.008), (25, 0.105), (26, 0.063), (27, 0.022), (28, -0.023), (29, 0.071), (30, -0.017), (31, -0.03), (32, -0.014), (33, -0.005), (34, 0.061), (35, 0.04), (36, -0.014), (37, 0.013), (38, -0.027), (39, 0.007), (40, 0.037), (41, 0.049), (42, 0.024), (43, -0.04), (44, -0.042), (45, -0.07), (46, 0.029), (47, 0.043), (48, -0.041), (49, 0.12)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87328446 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

Author: Gozde Ozbal ; Daniele Pighin ; Carlo Strapparava

2 0.68563378 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

Author: James O. Ryan ; Serguei Pakhomov ; Susan Marino ; Charles Bernick ; Sarah Banks

3 0.6848616 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen

4 0.65510231 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

Author: Annie Chen

Abstract: Though there has been substantial research concerning the extraction of information from clinical notes, to date there has been less work concerning the extraction of useful information from patient-generated content. Using a dataset comprised of online support group discussion content, this paper investigates two dimensions that may be important in the extraction of patient-generated experiences from text; significant individuals/groups and medication use. With regard to the former, the paper describes an approach involving the pairing of important figures (e.g. family, husbands, doctors, etc.) and affect, and suggests possible applications of such techniques to research concerning online social support, as well as integration into search interfaces for patients. Additionally, the paper demonstrates the extraction of side effects and sentiment at different phases in patient medication use, e.g. adoption, current use, discontinuation and switching, and demonstrates the utility of such an application for drug safety monitoring in online discussion forums. 1

5 0.65370595 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization

Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder

Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.

6 0.6292963 279 acl-2013-PhonMatrix: Visualizing co-occurrence constraints of sounds

7 0.6265561 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

8 0.62169987 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data

9 0.60443258 371 acl-2013-Unsupervised joke generation from big data

10 0.5945031 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

11 0.58377033 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections

12 0.57264882 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue

13 0.57074636 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

14 0.56902993 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization

15 0.56329441 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

16 0.56155241 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)

17 0.55141467 37 acl-2013-Adaptive Parser-Centric Text Normalization

18 0.54694426 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

19 0.53477013 322 acl-2013-Simple, readable sub-sentences

20 0.53341246 88 acl-2013-Computational considerations of comparisons and similes

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.051), (6, 0.031), (11, 0.057), (15, 0.016), (24, 0.054), (26, 0.035), (35, 0.097), (40, 0.012), (42, 0.043), (48, 0.045), (56, 0.293), (58, 0.011), (70, 0.048), (88, 0.036), (90, 0.023), (95, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89136183 108 acl-2013-Decipherment

Author: Kevin Knight

Abstract: The first natural language processing systems had a straightforward goal: decipher coded messages sent by the enemy. This tutorial explores connections between early decipherment research and today’s NLP work. We cover classic military and diplomatic ciphers, automatic decipherment algorithms, unsolved ciphers, language translation as decipherment, and analyzing ancient writing as decipherment. 1 Tutorial Overview The first natural language processing systems had a straightforward goal: decipher coded messages sent by the enemy. Sixty years later, we have many more applications, including web search, question answering, summarization, speech recognition, and language translation. This tutorial explores connections between early decipherment research and today’s NLP work. We find that many ideas from the earlier era have become core to the field, while others still remain to be picked up and developed. We first cover classic military and diplomatic cipher types, including complex substitution ciphers implemented in the first electro-mechanical encryption machines. We look at mathematical tools (language recognition, frequency counting, smoothing) developed to decrypt such ciphers on proto-computers. We show algorithms and extensive empirical results for solving different types of ciphers, and we show the role of algorithms in recent decipherments of historical documents. We then look at how foreign language can be viewed as a code for English, a concept developed by Alan Turing and Warren Weaver. We describe recently published work on building automatic translation systems from non-parallel data. We also demonstrate how some of the same algorithmic tools can be applied to natural language tasks like part-of-speech tagging and word alignment. Turning back to historical ciphers, we explore a number of unsolved ciphers, giving results of initial computer experiments on several of them. Finally, we look briefly at writing as a way to encipher phoneme sequences, covering ancient scripts and modern applications. 2 Outline 1. Classical military/diplomatic ciphers (15 minutes) • 60 cipher types (ACA) • Ciphers vs. codes • Enigma cipher: the mother of natural language processing computer analysis of text language recognition Good-Turing smoothing – – – 2. Foreign language as a code (10 minutes) • • Alan Turing’s ”Thinking Machines” Warren Weaver’s Memorandum 3. Automatic decipherment (55 minutes) • Cipher type detection • Substitution ciphers (simple, homophonic, polyalphabetic, etc) plaintext language recognition ∗ how much plaintext knowledge is – nheowede mdu 3 Proce diSnogfsia, of B thuleg5a r1iast, A Anungu aslt M4-9e t2in01g3 o.f ? tc he20 A1s3so Acsiasoticoinat fio rn C fo rm Cpoumtaptuiotantaioln Lainlg Luinisgtuicis ,tpi casges 3–4, – ∗ index of coincidence, unicity distance, oanf dc oointhceidr measures navigating a difficult search space ∗ frequencies of letters and words ∗ pattern words and cribs ∗ pElMin,g ILP, Bayesian models, sam– recent decipherments ∗ Jefferson cipher, Copiale cipher, cJievfifle war ciphers, n Caovaplia Enigma • • • • Application to part-of-speech tagging, Awopprdli alignment Application to machine translation withoAuptp parallel t teoxtm Parallel development of cryptography aPnarda ltrleanls dlaetvioenlo Recently released NSA internal nReewcselnettlyter (1974-1997) 4. *** Break *** (30 minutes) 5. Unsolved ciphers (40 minutes) • Zodiac 340 (1969), including computatZioodnaial cw 3o4r0k • Voynich Manuscript (early 1400s), including computational ewarolyrk • Beale (1885) • Dorabella (1897) • Taman Shud (1948) • Kryptos (1990), including computatKiorynaplt owsor (k1 • McCormick (1999) • Shoeboxes in attics: DuPonceau jour- nal, Finnerana, SYP, Mopse, diptych 6. Writing as a code (20 minutes) • Does writing encode ideas, or does it encDoodees phonemes? • Ancient script decipherment Egyptian hieroglyphs Linear B Mayan glyphs – – – – wUgoarkritic, including computational Chinese N ¨ushu, including computational work • Automatic phonetic decipherment • Application to transliteration 7. Undeciphered writing systems (15 minutes) • Indus Valley Script (3300BC) • Linear A (1900BC) • Phaistos disc (1700BC?) • Rongorongo (1800s?) – 8. Conclusion and further questions (15 minutes) 3 About the Presenter Kevin Knight is a Senior Research Scientist and Fellow at the Information Sciences Institute of the University of Southern California (USC), and a Research Professor in USC’s Computer Science Department. He received a PhD in computer science from Carnegie Mellon University and a bachelor’s degree from Harvard University. Professor Knight’s research interests include natural language processing, machine translation, automata theory, and decipherment. In 2001, he co-founded Language Weaver, Inc., and in 2011, he served as President of the Association for Computational Linguistics. Dr. Knight has taught computer science courses at USC for more than fifteen years and co-authored the widely adopted textbook Artificial Intelligence. 4

2 0.8810029 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

Author: Adam Vogel ; Christopher Potts ; Dan Jurafsky

Abstract: Conversational implicatures involve reasoning about multiply nested belief structures. This complexity poses significant challenges for computational models of conversation and cognition. We show that agents in the multi-agent DecentralizedPOMDP reach implicature-rich interpretations simply as a by-product of the way they reason about each other to maximize joint utility. Our simulations involve a reference game of the sort studied in psychology and linguistics as well as a dynamic, interactional scenario involving implemented artificial agents.

same-paper 3 0.7780419 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

Author: Gozde Ozbal ; Daniele Pighin ; Carlo Strapparava

4 0.74265021 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

Author: Enrique Alfonseca ; Daniele Pighin ; Guillermo Garrido

Abstract: This paper presents HEADY: a novel, abstractive approach for headline generation from news collections. From a web-scale corpus of English news, we mine syntactic patterns that a Noisy-OR model generalizes into event descriptions. At inference time, we query the model with the patterns observed in an unseen news collection, identify the event that better captures the gist of the collection and retrieve the most appropriate pattern to generate a headline. HEADY improves over a state-of-theart open-domain title abstraction method, bridging half of the gap that separates it from extractive methods using humangenerated titles in manual evaluations, and performs comparably to human-generated headlines as evaluated with ROUGE.

5 0.72759885 258 acl-2013-Neighbors Help: Bilingual Unsupervised WSD Using Context

Author: Sudha Bhingardive ; Samiulla Shaikh ; Pushpak Bhattacharyya

Abstract: Word Sense Disambiguation (WSD) is one of the toughest problems in NLP, and in WSD, verb disambiguation has proved to be extremely difficult, because of high degree of polysemy, too fine grained senses, absence of deep verb hierarchy and low inter annotator agreement in verb sense annotation. Unsupervised WSD has received widespread attention, but has performed poorly, specially on verbs. Recently an unsupervised bilingual EM based algorithm has been proposed, which makes use only of the raw counts of the translations in comparable corpora (Marathi and Hindi). But the performance of this approach is poor on verbs with accuracy level at 25-38%. We suggest a modifica- tion to this mentioned formulation, using context and semantic relatedness of neighboring words. An improvement of 17% 35% in the accuracy of verb WSD is obtained compared to the existing EM based approach. On a general note, the work can be looked upon as contributing to the framework of unsupervised WSD through context aware expectation maximization.

6 0.50321627 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

7 0.50288606 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

8 0.50260168 172 acl-2013-Graph-based Local Coherence Modeling

9 0.50126433 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

10 0.50068146 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

11 0.50006801 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

12 0.49980971 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

13 0.49905488 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

14 0.49878293 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation

15 0.49859118 318 acl-2013-Sentiment Relevance

16 0.49839178 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

17 0.49830174 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

18 0.49811921 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

19 0.49804267 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

20 0.49717456 224 acl-2013-Learning to Extract International Relations from Political Context