acl acl2013 acl2013-371 knowledge-graph by maker-knowledge-mining

371 acl-2013-Unsupervised joke generation from big data

Source: pdf

Author: Sasa Petrovic ; David Matthews

Abstract: Humor generation is a very hard problem. It is difficult to say exactly what makes a joke funny, and solving this problem algorithmically is assumed to require deep semantic understanding, as well as cultural and other contextual cues. We depart from previous work that tries to model this knowledge using ad-hoc manually created databases and labeled training examples. Instead we present a model that uses large amounts of unannotated data to generate I like my X like I like my Y, Z jokes, where X, Y, and Z are variables to be filled in. This is, to the best of our knowledge, the first fully unsupervised humor generation system. Our model significantly outperforms a competitive baseline and generates funny jokes 16% of the time, compared to 33% for human-generated jokes.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Unsupervised joke generation from big data Saˇ sa Petrovi c´ School of Informatics University of Edinburgh s as a . [sent-1, score-0.367]

2 It is difficult to say exactly what makes a joke funny, and solving this problem algorithmically is assumed to require deep semantic understanding, as well as cultural and other contextual cues. [sent-5, score-0.413]

3 We depart from previous work that tries to model this knowledge using ad-hoc manually created databases and labeled training examples. [sent-6, score-0.025]

4 Instead we present a model that uses large amounts of unannotated data to generate I like my X like I like my Y, Z jokes, where X, Y, and Z are variables to be filled in. [sent-7, score-0.222]

5 This is, to the best of our knowledge, the first fully unsupervised humor generation system. [sent-8, score-0.22]

6 Our model significantly outperforms a competitive baseline and generates funny jokes 16% of the time, compared to 33% for human-generated jokes. [sent-9, score-1.123]

7 1 Introduction Generating jokes is typically considered to be a very hard natural language problem, as it implies a deep semantic and often cultural understanding of text. [sent-10, score-0.82]

8 We deal with generating a particular type of joke I my X like I my Y, Z where X like like and Y are nouns and Z is typically an attribute that describes X and Y. [sent-11, score-0.643]

9 An example of such a joke is I my men like I my tea, hot and British like like these jokes are very popular online. [sent-12, score-1.188]

10 While this particular type of joke is not interesting from a purely generational point of view (the syntactic structure is fixed), the content selection problem is very challenging. [sent-13, score-0.351]

11 Thus, the main challenge in this work is to “fill in” the slots – – – in the joke template in a way that the whole phrase is considered funny. [sent-15, score-0.357]

12 uk Unlike the previous work in humor generation, we do not rely on labeled training data or handcoded rules, but instead on large quantities of unannotated data. [sent-19, score-0.254]

13 We present a machine learning model that expresses our assumptions about what makes these types of jokes funny and show that by using this fairly simple model and large quantities of data, we are able to generate jokes that are considered funny by human raters in 16% of cases. [sent-20, score-2.376]

14 The main contribution of this paper is, to the best of our knowledge, the first fully unsupervised joke generation system. [sent-21, score-0.367]

15 We rely only on large quantities ofunlabeled data, suggesting that generating jokes does not always require deep semantic understanding, as usually thought. [sent-22, score-0.876]

16 2 Related Work Related work on computational humor can be divided into two classes: humor recognition and humor generation. [sent-23, score-0.561]

17 Humor recognition includes double entendre identification in the form of That’s what she said jokes (Kiddon and Brun, 2011), sarcastic sentence identification (Davidov et al. [sent-24, score-0.849]

18 , 2010), and one-liner joke recognition (Mihalcea and Strapparava, 2005). [sent-25, score-0.334]

19 Examples of work on humor generation include dirty joke telling robots (Sj o¨bergh and Araki, 2008), a generative model of two-liner jokes (Labutov and Lipson, 2012), and a model of punning riddles (Binsted and Ritchie, 1994). [sent-29, score-1.458]

20 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 2 8–232, Figure 1: Our model presented as a factor graph. [sent-32, score-0.064]

21 Binsted and Ritchie (1994) have a set of six hardcoded rules for generating puns. [sent-33, score-0.037]

22 3 Generating jokes We generate jokes of the form I my X like I like like my Y, Z, and we assume that X and Y are nouns, and that Z is an adjective. [sent-34, score-1.636]

23 A graphical representation of our model in the form of a factor graph is shown in Figure 1. [sent-37, score-0.064]

24 Variables, denoted by circles, and factors, denoted by squares, define potential functions involving the variables they are connected to. [sent-38, score-0.033]

25 Mathematically, this assumption is expressed as: φ(x,z) = p(x,z) =Pxf,z(xf,(zx),z), (1) where f(x, z) 1 is a function that measures the cooccurrence between x and z. [sent-40, score-0.048]

26 Because this factor measures the 1We use uppercase to denote random variables, and lowercase to denote random variables taking on a specific value. [sent-45, score-0.09]

27 similarity between nouns and attributes, we will also refer to it as noun-attribute similarity. [sent-46, score-0.097]

28 Assumption ii) says that jokes are funnier if the attribute used is less common. [sent-47, score-1.044]

29 For example, there are a few attributes that are very common and can be used to describe almost anything (e. [sent-48, score-0.036]

30 , new, free, good), but using them would probably lead to bad jokes. [sent-50, score-0.017]

31 We posit that the less common the attribute Z is, the more likely it is to lead to surprisal, which is known to contribute to the funniness of jokes. [sent-51, score-0.131]

32 We express this assumption in the factor φ1 (Z) : φ1(z) = 1/f(z) (2) where f(z) is the number of times attribute z appears in some external corpus. [sent-52, score-0.166]

33 We will refer to this factor as attribute surprisal. [sent-53, score-0.136]

34 Assumption iii) says that more ambiguous attributes lead to funnier jokes. [sent-54, score-0.242]

35 This is based on the observation that the humor often stems from the fact that the attribute is used in one sense when describing noun x, and in a different sense when describing noun y. [sent-55, score-0.346]

36 This assumption is expressed in φ2 (Z) as: φ2 (z) = 1/senses(z) (3) where senses (z) is the number of different senses that attribute z has. [sent-56, score-0.229]

37 Note that this does not exactly capture the fact that z should be used in different senses for the different nouns, but it is a reasonable first approximation. [sent-57, score-0.068]

38 Finally, assumption iv) says that dissimilar nouns lead to funnier jokes. [sent-59, score-0.339]

39 For example, if the two nouns are girls and boys, we could easily find many attributes that both nouns share. [sent-60, score-0.194]

40 However, since the two nouns are very similar, the effect of surprisal would diminish as the observer would expect us to find an attribute that can describe both nouns well. [sent-61, score-0.303]

41 We therefore use φ(X, Y ) to encourage dissimilarity between the two nouns: φ(x, y) = 1/sim(x, y) , (4) where sim is a similarity function that measures how similar nouns x and y are. [sent-62, score-0.16]

42 There are many similarity functions proposed in the literature, see e. [sent-64, score-0.018]

43 To obtain the joint probability for an (x, y, z) triple we simply multiply all the factors and normalize over all the triples. [sent-68, score-0.093]

44 4 Data For estimating f(x, y) and f(z), we use Google n-gram data (Michel et al. [sent-69, score-0.02]

45 We obtain senses(z) from Wordnet, which contains the number of senses for all common words. [sent-76, score-0.051]

46 It is important to emphasize here that, while we do use Wordnet in our work, our approach does not crucially rely on it, and we use it to obtain only very shallow information. [sent-77, score-0.019]

47 In particular, we use Wordnet to obtain i) POS tags for Google 2-grams, and ii) number of senses for adjectives. [sent-78, score-0.051]

48 The number of different word senses for adjectives is harder to obtain without Wordnet, but this is only one of the four factors in our model, and we do not depend crucially on it. [sent-80, score-0.144]

49 5 Experiments We evaluate our model in two stages. [sent-81, score-0.025]

50 Firstly, using automatic evaluation with a set of jokes collected from Twitter, and secondly, by comparing our approach to human-generated jokes. [sent-82, score-0.776]

51 While this is too expensive for estimating the true probability of any (x, y, z) triple, it is feasible if we fix one of the nouns, i. [sent-85, score-0.038]

52 However, generating Y and Z given X, such that the joke is funny, is still a formidable challenge that a lot of humans are not able to perform successfully (cf. [sent-92, score-0.423]

53 2 Automatic evaluation In the automatic evaluation we measure the effect of the different factors in the model, as laid out in Section 3. [sent-95, score-0.074]

54 , the log of the probability that our model assigns to a triple. [sent-100, score-0.025]

55 However, because we do not compute it on all the data, just on the data that contains the Xs from our development set, it is not exactly equal to the log-likelihood. [sent-101, score-0.017]

56 Our second metric computes the rank of the humangenerated jokes in the distribution of all possible jokes sorted decreasingly by their LOL-likelihood. [sent-103, score-1.563]

57 This Rank OF Likelihood (ROFL) is computed relative to the number of all possible jokes, and like LOL-likelihood is averaged over all the jokes in our development data. [sent-104, score-0.79]

58 One advantage of ROFL is that it is designed with the way we generate jokes in mind (cf. [sent-105, score-0.782]

59 3), and thus more directly measures the quality of generated jokes than LOL-likelihood. [sent-107, score-0.795]

60 For measuring LOL-likelihood and ROFL we use a set of 48 jokes randomly sampled from Twitter that fit the I my X like I like like my Y, Z pattern. [sent-108, score-0.854]

61 Table 1 shows the effect of the different factors on the two metrics. [sent-109, score-0.074]

62 We use a model with only noun-attribute similarity (factors φ(X, Z) and φ(Y, Z)) as the baseline. [sent-110, score-0.043]

63 We see that the single biggest improvement comes from the attribute surprisal factor, i. [sent-111, score-0.145]

64 The best combination of the factors, according to automatic metrics, is using all factors except for the noun similarity (Model 1), while using all the factors is the second best combination (Model 2). [sent-114, score-0.197]

65 3 Human evaluation The main evaluation of our model is in terms of human ratings, put simply: do humans find the jokes generated by our model funny? [sent-116, score-0.862]

66 3 Baseline Baseline Baseline Baseline + φ(X, Y ) + φ1 (Z) + φ2 (Z) + φ1 (Z) + φ2 (Z) All factors (Model 2) 0. [sent-119, score-0.074]

67 (one that uses all the factors (Model 2), and one that uses all factors except for the noun dissimilarity (Model 1)), a baseline model that uses only the noun-attribute similarity, and jokes generated by humans, collected from Twitter. [sent-131, score-1.048]

68 We sample a further 32 jokes from Twitter, making sure that there was no overlap with the development set. [sent-132, score-0.758]

69 To generate a joke for a particular x we keep the top n most probable jokes according to the model, renormalize their probabilities so they sum to one, and sample from this reduced distribution. [sent-133, score-1.116]

70 This allows our model to focus on the jokes that it considers “funny”. [sent-134, score-0.783]

71 In our experiments, we use n = 30, which ensures that we can still generate a variety of jokes for any given x. [sent-135, score-0.782]

72 In our experiments we showed five native English speakers the jokes from all the systems in a random, per rater, order. [sent-136, score-0.758]

73 The raters were asked to score each joke on a 3-point Likert scale: 1 (funny), 2 (somewhat funny), and 3 (not funny). [sent-137, score-0.414]

74 Naturally, the raters did not know which approach each joke was coming from. [sent-138, score-0.414]

75 Our model was used to sample Y and Z variables, given the same Xs used in the jokes collected from Twitter. [sent-139, score-0.801]

76 The second column shows the inter-rater agreement (Randolph, 2005), and we can see that it is generally good, but that it is lower on the set of human jokes. [sent-141, score-0.023]

77 We inspected the human-generated jokes with high disagreement and found that the disagreement may be partly explained by raters missing cultural references in the jokes (e. [sent-142, score-1.679]

78 We do not explicitly model cultural references, and are thus less likely to generate such jokes, leading to higher agreement. [sent-145, score-0.092]

79 The third column shows the mean joke score (lower is better), and we can see that human-generated jokes were rated the funniest, jokes from the baseline model the least funny, and that the model which uses all the Model κ Human jokes Baseline Model 1 Model 2 0. [sent-146, score-2.702]

80 3 Table 2: Comparison of different models on the task of generating Y and Z given X. [sent-158, score-0.037]

81 factors (Model 2) outperforms the model that was best according to the automatic evaluation (Model 1). [sent-159, score-0.099]

82 Finally, the last column shows the percentage of jokes the raters scored as funny (i. [sent-160, score-1.18]

83 , the number of funny scores divided by the total number of scores). [sent-162, score-0.319]

84 This is a metric that we are ultimately interested in telling a joke that is somewhat – funny is not useful, and we should only reward generating a joke that is found genuinely funny by humans. [sent-163, score-1.39]

85 The last column shows that humangenerated jokes are considered funnier than the machine-generated ones, but also that our model with all the factors does much better than the other two models. [sent-164, score-1.067]

86 Model 2 is significantly better than the baseline at p = 0. [sent-165, score-0.021]

87 05 using a sign test, and human jokes are significantly better than all three models at p = 0. [sent-166, score-0.758]

88 In the end, our best model generated jokes that were found funny by humans in 16% of cases, compared to 33% obtained by human-generated jokes. [sent-168, score-1.156]

89 Finally, we note that the funny jokes generated by our system are not simply repeats of the human jokes, but entirely new ones that we were not able to find anywhere online. [sent-169, score-1.096]

90 Examples of the funny jokes generated by Model 2 are shown in Table 3. [sent-170, score-1.096]

91 6 Conclusion We have presented a fully unsupervised humor generation system for generating jokes of the type 231 Ilike my relationships like Ilike my source, open Ilike my coffee like Ilike my war, cold Ilike my boys like Ilike my sectors, bad Table 3: Example jokes generated by Model 2. [sent-171, score-1.918]

92 I my X like I my Y, Z, where X, Y, and Z are like like slots to be filled in. [sent-172, score-0.141]

93 To the best of our knowledge, this is the first humor generation system that does not require any labeled data or hard-coded rules. [sent-173, score-0.22]

94 We express our assumptions about what makes a joke funny as a machine learning model and show that by estimating its parameters on large quantities of unlabeled data we can generate jokes that are found funny by humans. [sent-174, score-1.867]

95 While our experiments show that human-generated jokes are funnier more of the time, our model significantly improves upon a non-trivial baseline, and we believe that the fact that humans found jokes generated by our model funny 16% of the time is encouraging. [sent-175, score-2.096]

96 Acknowledgements The authors would like to thank the raters for their help and patience in labeling the (often not so funny) jokes. [sent-176, score-0.112]

97 We would also like to thank Micha Elsner for this helpful comments. [sent-177, score-0.032]

98 Semi-supervised recognition of sarcastic sentences in twitter and amazon. [sent-189, score-0.067]

99 Free-marginal multirater kappa (multirater free): An alternative to fleiss fixed- marginal multirater kappa. [sent-221, score-0.118]

100 A complete and modestly funny system for generating and performing japanese stand-up comedy. [sent-225, score-0.356]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('jokes', 0.758), ('joke', 0.334), ('funny', 0.319), ('humor', 0.187), ('funnier', 0.157), ('ilike', 0.118), ('attribute', 0.097), ('raters', 0.08), ('nouns', 0.079), ('factors', 0.074), ('rofl', 0.07), ('multirater', 0.059), ('kiddon', 0.052), ('senses', 0.051), ('surprisal', 0.048), ('labutov', 0.048), ('binsted', 0.045), ('quantities', 0.045), ('davidov', 0.043), ('cultural', 0.043), ('google', 0.04), ('entendre', 0.039), ('obergh', 0.039), ('factor', 0.039), ('generating', 0.037), ('attributes', 0.036), ('humans', 0.035), ('araki', 0.035), ('brun', 0.035), ('circuits', 0.035), ('twitter', 0.035), ('variables', 0.033), ('generation', 0.033), ('punning', 0.032), ('sarcastic', 0.032), ('lipson', 0.032), ('like', 0.032), ('says', 0.032), ('noun', 0.031), ('assumption', 0.03), ('ritchie', 0.03), ('boys', 0.03), ('telling', 0.03), ('humangenerated', 0.03), ('wordnet', 0.028), ('dissimilarity', 0.028), ('sj', 0.027), ('weeds', 0.027), ('model', 0.025), ('mihalcea', 0.024), ('dissimilar', 0.024), ('generate', 0.024), ('assumptions', 0.023), ('column', 0.023), ('strapparava', 0.023), ('slots', 0.023), ('filled', 0.022), ('unannotated', 0.022), ('xs', 0.021), ('baseline', 0.021), ('estimating', 0.02), ('disagreement', 0.02), ('double', 0.02), ('generated', 0.019), ('crucially', 0.019), ('triple', 0.019), ('pos', 0.019), ('deep', 0.019), ('similarity', 0.018), ('iv', 0.018), ('feasible', 0.018), ('collected', 0.018), ('measures', 0.018), ('clancy', 0.017), ('funniness', 0.017), ('norvig', 0.017), ('nowak', 0.017), ('pickett', 0.017), ('veres', 0.017), ('zx', 0.017), ('inhabitants', 0.017), ('dirty', 0.017), ('robots', 0.017), ('petrovi', 0.017), ('generational', 0.017), ('zz', 0.017), ('formidable', 0.017), ('kui', 0.017), ('genuinely', 0.017), ('matthews', 0.017), ('sectors', 0.017), ('justus', 0.017), ('rater', 0.017), ('decreasingly', 0.017), ('ofunlabeled', 0.017), ('sim', 0.017), ('exactly', 0.017), ('adjective', 0.017), ('lead', 0.017), ('aiden', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999905 371 acl-2013-Unsupervised joke generation from big data

Author: Sasa Petrovic ; David Matthews

2 0.18498932 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen

Abstract: We propose a method for automated generation of adult humor by lexical replacement and present empirical evaluation results of the obtained humor. We propose three types of lexical constraints as building blocks of humorous word substitution: constraints concerning the similarity of sounds or spellings of the original word and the substitute, a constraint requiring the substitute to be a taboo word, and constraints concerning the position and context of the replacement. Empirical evidence from extensive user studies indicates that these constraints can increase the effectiveness of humor generation significantly.

3 0.053396884 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

4 0.051567577 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

Author: Shane Bergsma ; Benjamin Van Durme

Abstract: We describe a novel approach for automatically predicting the hidden demographic properties of social media users. Building on prior work in common-sense knowledge acquisition from third-person text, we first learn the distinguishing attributes of certain classes of people. For example, we learn that people in the Female class tend to have maiden names and engagement rings. We then show that this knowledge can be used in the analysis of first-person communication; knowledge of distinguishing attributes allows us to both classify users and to bootstrap new training examples. Our novel approach enables substantial improvements on the widelystudied task of user gender prediction, ob- taining a 20% relative error reduction over the current state-of-the-art.

5 0.049871128 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Author: Mohammad Taher Pilehvar ; David Jurgens ; Roberto Navigli

Abstract: Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: seman- tic textual similarity, word similarity, and word sense coarsening.

6 0.045674294 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

7 0.037608374 390 acl-2013-Word surprisal predicts N400 amplitude during reading

8 0.037467193 293 acl-2013-Random Walk Factoid Annotation for Collective Discourse

9 0.030131774 116 acl-2013-Detecting Metaphor by Contextual Analogy

10 0.029511787 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

11 0.029485198 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

12 0.028995067 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

13 0.028332531 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments

14 0.027576555 162 acl-2013-FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection

15 0.026215944 238 acl-2013-Measuring semantic content in distributional vectors

16 0.026089991 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain

17 0.025590878 12 acl-2013-A New Set of Norms for Semantic Relatedness Measures

18 0.025219018 234 acl-2013-Linking and Extending an Open Multilingual Wordnet

19 0.025167031 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue

20 0.02505493 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.078), (1, 0.03), (2, 0.004), (3, -0.033), (4, -0.007), (5, -0.029), (6, -0.001), (7, 0.013), (8, 0.007), (9, -0.016), (10, -0.046), (11, 0.006), (12, -0.015), (13, -0.02), (14, -0.005), (15, 0.004), (16, 0.034), (17, -0.002), (18, -0.004), (19, -0.018), (20, -0.007), (21, 0.008), (22, 0.039), (23, -0.021), (24, 0.041), (25, 0.046), (26, 0.016), (27, -0.027), (28, -0.01), (29, -0.011), (30, 0.001), (31, -0.012), (32, 0.026), (33, 0.005), (34, -0.022), (35, -0.006), (36, 0.027), (37, -0.028), (38, -0.024), (39, -0.039), (40, -0.023), (41, 0.021), (42, 0.003), (43, -0.064), (44, -0.028), (45, 0.061), (46, 0.008), (47, 0.034), (48, -0.097), (49, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85036027 371 acl-2013-Unsupervised joke generation from big data

Author: Sasa Petrovic ; David Matthews

2 0.73413754 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen

3 0.60003811 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data

Author: Kapila Ponnamperuma ; Advaith Siddharthan ; Cheng Zeng ; Chris Mellish ; Rene van der Wal

Abstract: The aim of the Tag2Blog system is to bring satellite tagged wild animals “to life” through narratives that place their movements in an ecological context. Our motivation is to use such automatically generated texts to enhance public engagement with a specific species reintroduction programme, although the protocols developed here can be applied to any animal or other movement study that involves signal data from tags. We are working with one of the largest nature conservation charities in Europe in this regard, focusing on a single species, the red kite. We describe a system that interprets a sequence of locational fixes obtained from a satellite tagged individual, and constructs a story around its use of the landscape.

4 0.54735959 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners

Author: Keisuke Sakaguchi ; Yuki Arase ; Mamoru Komachi

Abstract: We propose discriminative methods to generate semantic distractors of fill-in-theblank quiz for language learners using a large-scale language learners’ corpus. Unlike previous studies, the proposed methods aim at satisfying both reliability and validity of generated distractors; distractors should be exclusive against answers to avoid multiple answers in one quiz, and distractors should discriminate learners’ proficiency. Detailed user evaluation with 3 native and 23 non-native speakers of English shows that our methods achieve better reliability and validity than previous methods.

5 0.54728854 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization

Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder

Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.

6 0.53749371 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

7 0.50199282 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts

8 0.48713103 86 acl-2013-Combining Referring Expression Generation and Surface Realization: A Corpus-Based Investigation of Architectures

9 0.48680168 322 acl-2013-Simple, readable sub-sentences

10 0.47714549 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.

11 0.472242 37 acl-2013-Adaptive Parser-Centric Text Normalization

12 0.47158268 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features

13 0.47025371 62 acl-2013-Automatic Term Ambiguity Detection

14 0.46104026 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

15 0.45694584 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information

16 0.44687226 227 acl-2013-Learning to lemmatise Polish noun phrases

17 0.44525105 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

18 0.44402471 303 acl-2013-Robust multilingual statistical morphological generation models

19 0.43557069 8 acl-2013-A Learner Corpus-based Approach to Verb Suggestion for ESL

20 0.42511752 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.051), (6, 0.024), (11, 0.059), (24, 0.034), (26, 0.033), (35, 0.091), (42, 0.042), (45, 0.347), (48, 0.046), (64, 0.011), (70, 0.038), (88, 0.041), (90, 0.028), (95, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92394674 171 acl-2013-Grammatical Error Correction Using Integer Linear Programming

Author: Yuanbin Wu ; Hwee Tou Ng

Abstract: unkown-abstract

same-paper 2 0.7188592 371 acl-2013-Unsupervised joke generation from big data

Author: Sasa Petrovic ; David Matthews

3 0.63032466 102 acl-2013-DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German

Author: Britta Zeller ; Jan Snajder ; Sebastian Pado

Abstract: Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE, mapping over 280k lemmas into more than 17k non-singleton clusters. We focus on the rule component and a qualitative and quantitative evaluation. Our approach achieves up to 93% precision and 71% recall. We attribute the high precision to the fact that our rules are based on information from grammar books.

4 0.60274798 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit

Author: Georgiana Dinu ; Nghia The Pham ; Marco Baroni

Abstract: We introduce DISSECT, a toolkit to build and explore computational models of word, phrase and sentence meaning based on the principles of distributional semantics. The toolkit focuses in particular on compositional meaning, and implements a number of composition methods that have been proposed in the literature. Furthermore, DISSECT can be useful to researchers and practitioners who need models of word meaning (without composition) as well, as it supports various methods to construct distributional semantic spaces, assessing similarity and even evaluating against benchmarks, that are independent of the composition infrastructure.

5 0.42725977 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen

6 0.41165677 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

7 0.40994054 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

8 0.40984523 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

9 0.40894166 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

10 0.4088715 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference

11 0.40884763 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

12 0.40860578 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

13 0.40852392 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

14 0.40850642 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

15 0.40820366 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

16 0.40806028 172 acl-2013-Graph-based Local Coherence Modeling

17 0.40745401 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics

18 0.40714535 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm

19 0.40697968 275 acl-2013-Parsing with Compositional Vector Grammars

20 0.40691188 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis