acl acl2012 acl2012-77 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marco Guerini ; Carlo Strapparava ; Oliviero Stock
Abstract: In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach.
Reference: text
sentIndex sentText sentNum sentScore
1 eu — Abstract In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. [sent-3, score-0.139]
2 In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. [sent-4, score-0.126]
3 In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. [sent-5, score-0.229]
4 The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. [sent-6, score-0.349]
5 The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach. [sent-7, score-0.419]
6 1 Introduction In recent years there has been a growing interest in finding new cheap and fast methodologies to be used in experimental research, for, but not limited to, NLP tasks. [sent-8, score-0.163]
7 In this paper we focus on evaluating systems and theories about persuasion, see for example (Fogg, 2009) or the survey on persuasive NL generation studies in (Guerini et al. [sent-13, score-0.252]
8 eu — impact of a message is of paramount importance in this context, for example how affective text variations can alter the persuasive impact of a message. [sent-17, score-0.782]
9 This methodology allows us to crowdsource experiments with thousands of subjects for a few euros in a few hours, by tweaking and using existing commercial tools for advertising on the web. [sent-20, score-0.276]
10 It is worth noting that this work originated in the need to evaluate the impact of short persuasive messages, so as to assess the effectiveness ofdifferent linguistic choices. [sent-23, score-0.279]
11 Section 4 describes how AdWords features can be used for defining message persuasiveness metrics and what kind of stimulus characteristics can be evaluated. [sent-27, score-0.154]
12 2 Advantages of Ecological Approaches Evaluation of the effectiveness of persuasive systems is very expensive and time consuming, as the STOP experience showed (Reiter et al. [sent-31, score-0.224]
13 Existing methodologies for evaluating persuasion are usually split in two main sets, depending on the setup and domain: (i) long-term, in the field evaluation of behavioral change (as the STOP example mentioned before), and (ii) lab settings for evaluating short-term effects, as in (Andrews et al. [sent-33, score-0.196]
14 While in the first approach it is difficult to take into account the role of external events that can occur over long time spans, in the second there are still problems of recruiting subjects and of time consuming activities such as questionnaire gathering and processing. [sent-35, score-0.249]
15 Especially the second point is awkward: in fact, subjects can actually be convinced by the message to which they are exposed, but if they feel they do not care, they may not “react” at all, which is the case in many artificial settings. [sent-37, score-0.244]
16 Time consuming activities Subject recruitment Subject motivation Subtle effects measurements 2. [sent-42, score-0.176]
17 Still we believe that this poses other problems in assessing behavioral changes, and, more generally, persuasion effects. [sent-44, score-0.162]
18 Cover stories can be used to soften this awareness effect, nonetheless the fact that subjects are being paid for performing the task renders the approach unfeasible for behavioral change studies. [sent-54, score-0.186]
19 Subject motivation: ads can be targeted exactly to those persons that are, in that precise moment throughout the world, most interested in the topic of the experiment, and so potentially more prone to react. [sent-69, score-0.51]
20 Subject unaware, unbiased: subjects are totally unaware of being tested, testing is performed during their “natural” activity on the web. [sent-71, score-0.153]
21 Subtle effects measurements: if the are not enough subjects, just wait for more ads to be displayed, or focus on a subset of even more interested people. [sent-73, score-0.544]
22 Note that similar ecological approaches are beginning to be investigated: for example in (Aral and Walker, 2010) an approach to assessing the social effects of content features on an on-line community is presented. [sent-74, score-0.132]
23 The central idea is to let advertisers display their messages only to relevant audiences. [sent-78, score-0.186]
24 This is done by means of keyword-based contextualization on the Google network, divided into: • • Search network: includes Google search pages, sSeeaarrcchh sniettews oarnkd: properties othoagtl display pseaagrecsh, results pages (SERPs), such as Froogle and Earthlink. [sent-79, score-0.126]
25 When a user enters a query like “cruise” in the Google search network, Google displays a variety of relevant pages, along with ads that link to cruise trip businesses. [sent-81, score-0.588]
26 To be displayed, these ads must be associated with relevant keywords selected by the advertiser. [sent-82, score-0.543]
27 Every advertiser has an AdWords account that is structured like a pyramid: (i) account, (ii) campaign and (iii) ad group. [sent-83, score-0.219]
28 Each grouping gathers similar keywords together - for instance by a common theme - around an ad group. [sent-85, score-0.18]
29 For each ad group, the advertiser sets a cost-per-click (CPC) bid. [sent-86, score-0.16]
30 The CPC bid refers to the amount the advertiser is willing to pay for a click on his ad; the cost of the actual click instead is based on its quality score (a complex measure out of the scope of the present paper). [sent-87, score-0.186]
31 the number of times an ad has been • • • displayed in the Google Network). [sent-90, score-0.174]
32 Conversion rate equals the number of conversions divided by the number of ad clicks. [sent-92, score-0.146]
33 a keyword bid) had a statistically measurable impact on the campaign itself. [sent-101, score-0.22]
34 com/adwords/ interested in testing how different messages impact (possibly different) audiences. [sent-104, score-0.227]
35 The ACE tool also introduces an option that was not possible before, that of real-time testing of statistical significance. [sent-106, score-0.132]
36 Another advantage is that the statistical knowledge to evaluate the experiment is no longer necessary: the researcher can focus only on setting up proper experimental designs2. [sent-108, score-0.166]
37 The limit of the ACE tool is that it only allows A/B testing (single split with one control and one experimental condition) so for experiments with more than two conditions or for particular experimental settings that do not fit with ACE testing boundaries (e. [sent-109, score-0.366]
38 Finally it should be noted that even if ACE allows only A/B testing, it permits the decomposition of almost any variable affecting a campaign experiment in its basic dimensions, and then to segment such dimensions according to control and experimental conditions. [sent-115, score-0.321]
39 As an example of this powerful option, consider Tables 3 and 6 where control and experimental conditions are compared against every single keyword and every search network/ad position used for the experiments. [sent-116, score-0.403]
40 First we create an ad Group with 2 competing messages (one message for each condition). [sent-118, score-0.34]
41 As soon as data begins to be collected we can monitor the two conditions according to: • • • • Basic Metrics: the highest CTR measure indBiacsaitces M wehtricichs message hise bte CstT performing. [sent-125, score-0.17]
42 Google Analytics Metrics: measures how much tGheo messages kept subjects on athsuer sesiteh oawndm huocwh many pages have been viewed. [sent-127, score-0.228]
43 Conversion Metrics: measures how much the messages cno Mnveetrritecds: subjects etso htohew fi mnualc goal. [sent-129, score-0.228]
44 The more relevant (from a persuasive point of view) the action the user performs, the higher the value we must assign to that action. [sent-132, score-0.259]
45 In our view combined measurements are better: for example, there could be cases of messages with a lower CTR but a higher conversion rate. [sent-133, score-0.169]
46 Furthermore, AdWords allows very complex tar- geting options that can help in many different evaluation scenarios: • • • • • Language (see how message impact can vary in dLiafnfegruenagt languages). [sent-134, score-0.209]
47 Location (see how message impact can vary in dLiofcfearteiontn c (usleteur heosw sharing gthee i same language). [sent-135, score-0.209]
48 Keyword matching (see how message impact can vary dw mithat users having odwiffe mreesnsta interests). [sent-136, score-0.209]
49 Placements (see how message impact can vary among people having mdieffsesraegnet ivmapluaecst - e. [sent-137, score-0.209]
50 tahrey same message displayed on Democrat or Republican web sites). [sent-139, score-0.184]
51 Demographics (see how message impact can vary according to user gender saangde age). [sent-140, score-0.244]
52 In particular we were interested in gaining insights about a system for affective variations of existing commentaries on medieval frescoes for a mobile museum guide that attracts the attention of visitors towards specific paintings (Guerini et al. [sent-143, score-0.714]
53 The tower contains a group of 11 frescoes named “Ciclo dei Mesi” (cycle of the months) that represent a unique example of non-religious medieval frescoes in Europe. [sent-154, score-0.401]
54 Another suggestion is to restrict the matching modality on Keywords in order to have more control over the situations in which ads are displayed and to avoid possible extraneous effects (the order of control for matching modality is: [exact match], “phrase match” and broad match). [sent-159, score-0.84]
55 Such optimization must be the same for control and experimental condition. [sent-164, score-0.15]
56 The design should include actions that are meaningful indicators of persuasive effect/success of the message. [sent-169, score-0.256]
57 Add negative matching Keywords: To add more control, if in doubt, put the words/expressions of the control and experimental conditions as negative keywords. [sent-174, score-0.198]
58 It is not strictly necessary since one can always control which queries triggered a click through the report menu. [sent-176, score-0.177]
59 An example: if the only difference between control and experimental condition is the use of the adjectives “gentle knights” vs. [sent-177, score-0.212]
60 Frequency capping for the display network: if you are running ads on the display network, you can use the “frequency capping” option set to 1 to add more control to the experiment. [sent-180, score-0.812]
61 In this way it is assured that ads are displayed only one time per user on the display network. [sent-181, score-0.652]
62 This is best explained via an example: if, for whatever reason, one of the two ads gets repeatedly promoted to the premium position on the SERPs, then the CTR difference between ads would be strongly biased. [sent-184, score-1.083]
63 either both ads get an equal amount of premium position impressions, or both ads get no premium position at all). [sent-187, score-1.216]
64 Otherwise the difference in CTR is determined by the “premium position” rather than by the independent variable under investigation (presence/absence of particular affective terms in the text ad). [sent-188, score-0.21]
65 other + Experiment) and checking how many times each ad appeared in a given position on the SERPs, and see if the ACE tool reports any statistical difference in the frequencies of ads positions. [sent-190, score-0.692]
66 Extra experimental time: While planning an experiment, you should also take into account the ads reviewing time that can take up to several days, in worst case scenarios. [sent-191, score-0.557]
67 Note that when ads are in eligible status, they begin to show on the Google Network, but they are not approved yet. [sent-192, score-0.507]
68 This means that the ads can only run on Google search pages and can only show for users who have turned off SafeSearch filtering, until they are approved. [sent-193, score-0.521]
69 6 Experiments We ran two pilot experiments to test how affective variations of existing texts alter their persuasive im993 pact. [sent-200, score-0.55]
70 In particular we were interested in gaining initial insights about an intelligent system for affec- tive variations of existing commentaries on medieval frescoes. [sent-201, score-0.392]
71 We focused on adjective variations, using a slightly biased adjective for the control conditions and a strongly biased variation for the experimental condition. [sent-202, score-0.285]
72 In these experiments we took it for granted that affective variations of a message work better than a neutral version (Van Der Sluis and Mellish, 2010), and we wanted to explore more finely grained tactics that involve the grade of the variation (i. [sent-203, score-0.479]
73 Note that this is a more difficult task than the one proposed in (Van Der Sluis and Mellish, 2010), where they were testing long messages with lots of variations and with polarized conditions, neutral vs. [sent-207, score-0.253]
74 Top ranked adjectives were then manually ordered - according to affective weight - to choose the best one (we used a standard procedure using 3 annotators and a reconciliation phase for the final decision). [sent-214, score-0.21]
75 1 First Experiment The first experiment lasted 48 hour with a total of 38 thousand subjects and a cost of 30 euros (see Table 1 for the complete description of the experimental setup). [sent-216, score-0.4]
76 It was meant to test broadly how affective variations in the body of the ads performed. [sent-217, score-0.801]
77 The two variations contained a fragment of a commentary of the museum guide; the control condition contained “gentle knight” and “African lion”, while in the experimental condition the affective loaded variations were “valorous knight” and “indomitable lion” (see Figure 1, for the complete ads). [sent-218, score-0.716]
78 But segmenting the results according to the keyword that triggered the ads (see Table 3) we discovered that on the “medieval art” keyword, the control condition performed better than the experimental one. [sent-220, score-0.829]
79 Starting Date: 1/2/2012 Ending Date: 1/4/2012 Total Time: 48 hours Total Cost: 30 euros Subjects: 38,082 Network: Search and Display Language: English Locations: Australia; Canada; UK; US KeyWords: “medieval art”, pictures middle ages Table 1: First Experiment Setup ACE splitClicksImpr. [sent-221, score-0.152]
80 As already discussed, user motivation is a key element for success in such finegrained experiments: while less focused keywords did not yield any statistically significant differences, the most specialized keyword “medieval art” was the one that yielded results (i. [sent-238, score-0.209]
81 if we display messages like those in Figure 1, that are concerned with medieval art frescoes, only those users really interested in the topic show different reaction patterns to the affective variations, while those generically interested in medieval times behave similarly in the two conditions). [sent-240, score-1.002]
82 In the following experiment we tried to see 994 whether such variations have different effects when modifying a different element in the text. [sent-241, score-0.262]
83 2 Second Experiment The second experiment lasted 48 hours with a total of one thousand subjects and a cost of 17 euros (see Table 4 for the description of the experimental setup). [sent-243, score-0.438]
84 It was meant to test broadly how affective variations introduced in the title of the text Ads performed. [sent-244, score-0.36]
85 The two variations were the same as in the first experiment for the control condition “gentle knight”, and for the experimental condition “valorous knight” (see Figure 2 for the complete ads). [sent-245, score-0.502]
86 But segmenting the results according to the search network that triggered the ads (see Table 6) we discovered that on the search partners at the “other” position, the control condition performed better than the experimental one. [sent-247, score-0.952]
87 Unlike the first experiment, in this case we segmented according to the ad position and search network typology since we were running our experiment only on one keyword in exact match. [sent-248, score-0.51]
88 5 euros Subjects: 986 Network: Search Language: English Locations: Australia; Canada; UK; US KeyWords: [medieval knights] Table 4: Second Experiment Setup Figure 2: Ads used in the second experiment ACE splitClicksImpr. [sent-250, score-0.192]
89 From this experiment we can confirm that at least under some circumstances a mild affective variation performs better than a strong variation. [sent-271, score-0.353]
90 This mild variations seems to work better when user attention is high (the difference emerged when ads are displayed in a non-prominent position). [sent-272, score-0.688]
91 Furthermore it seems that modifying the title of the ad rather than the content yields better results: 0. [sent-273, score-0.146]
92 As a side note, in this experiment we can see the problem of extraneous variables: according to AdWords’ internal mechanisms, the experimental condition was displayed more often in the Google 995 search Network on the “other” position (277 vs. [sent-278, score-0.466]
93 ads should get an equal amount of impressions for each position). [sent-282, score-0.555]
94 Conclusions and future work AdWords gives us an appropriate context for evaluating persuasive messages. [sent-283, score-0.224]
95 The advantages are fast experiment building and evaluation, fully-automated analysis, and low cost. [sent-284, score-0.154]
96 The motivation for this work was exploration of the impact of short persuasive messages, so to assess the effectiveness of different linguistic choices. [sent-288, score-0.279]
97 The experiments reported in this paper are illustrative examples of the method proposed and are concerned with the evaluation of the role of minimal affective variations of short expressions. [sent-289, score-0.326]
98 But there is enormous further potential in the proposed approach to ecological crowdsourcing for NLP: for instance, dif- ferent rhetorical techniques can be checked in practice with large audiences and fast feedback. [sent-290, score-0.191]
99 The assessment of the effectiveness of a change in the title as opposed to the initial portion of the text body provides a useful indication: one can investigate if variations inside the given or the new part of an expression or in the topic vs. [sent-291, score-0.15]
100 Creating persuasive technologies: An eight-step design process. [sent-320, score-0.224]
wordName wordTfidf (topN-words)
[('ads', 0.475), ('adwords', 0.369), ('medieval', 0.241), ('persuasive', 0.224), ('affective', 0.21), ('guerini', 0.154), ('ace', 0.152), ('google', 0.137), ('message', 0.122), ('subjects', 0.122), ('variations', 0.116), ('ad', 0.112), ('experiment', 0.112), ('keyword', 0.106), ('messages', 0.106), ('ecological', 0.098), ('persuasion', 0.098), ('ctr', 0.096), ('control', 0.096), ('network', 0.081), ('euros', 0.08), ('frescoes', 0.08), ('impressions', 0.08), ('premium', 0.08), ('display', 0.08), ('extraneous', 0.077), ('keywords', 0.068), ('serps', 0.064), ('sluis', 0.064), ('behavioral', 0.064), ('displayed', 0.062), ('condition', 0.062), ('campaign', 0.059), ('subject', 0.057), ('mechanical', 0.057), ('partners', 0.056), ('impact', 0.055), ('experimental', 0.054), ('art', 0.054), ('position', 0.053), ('tool', 0.052), ('crowdsourcing', 0.051), ('site', 0.05), ('turk', 0.05), ('analytics', 0.049), ('option', 0.049), ('conditions', 0.048), ('advertiser', 0.048), ('bid', 0.048), ('cpc', 0.048), ('gentle', 0.048), ('knights', 0.048), ('mellish', 0.048), ('recruiting', 0.048), ('valorous', 0.048), ('consuming', 0.047), ('search', 0.046), ('click', 0.045), ('fast', 0.042), ('sommarive', 0.042), ('der', 0.041), ('commercial', 0.04), ('tips', 0.038), ('hours', 0.038), ('roi', 0.036), ('povo', 0.036), ('triggered', 0.036), ('controlled', 0.036), ('interested', 0.035), ('user', 0.035), ('effects', 0.034), ('advertising', 0.034), ('conversions', 0.034), ('pictures', 0.034), ('methodologies', 0.034), ('title', 0.034), ('cheap', 0.033), ('vary', 0.032), ('conversion', 0.032), ('strapparava', 0.032), ('activities', 0.032), ('capping', 0.032), ('cruise', 0.032), ('eagle', 0.032), ('eligible', 0.032), ('lasted', 0.032), ('persuasiveness', 0.032), ('recruitment', 0.032), ('seasonal', 0.032), ('vat', 0.032), ('visitors', 0.032), ('actions', 0.032), ('variation', 0.031), ('measurements', 0.031), ('testing', 0.031), ('knight', 0.03), ('stock', 0.03), ('planning', 0.028), ('biased', 0.028), ('theories', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
Author: Marco Guerini ; Carlo Strapparava ; Oliviero Stock
Abstract: In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach.
2 0.077821836 153 acl-2012-Named Entity Disambiguation in Streaming Data
Author: Alexandre Davis ; Adriano Veloso ; Altigran Soares ; Alberto Laender ; Wagner Meira Jr.
Abstract: The named entity disambiguation task is to resolve the many-to-many correspondence between ambiguous names and the unique realworld entity. This task can be modeled as a classification problem, provided that positive and negative examples are available for learning binary classifiers. High-quality senseannotated data, however, are hard to be obtained in streaming environments, since the training corpus would have to be constantly updated in order to accomodate the fresh data coming on the stream. On the other hand, few positive examples plus large amounts of unlabeled data may be easily acquired. Producing binary classifiers directly from this data, however, leads to poor disambiguation performance. Thus, we propose to enhance the quality of the classifiers using finer-grained variations of the well-known ExpectationMaximization (EM) algorithm. We conducted a systematic evaluation using Twitter streaming data and the results show that our classifiers are extremely effective, providing improvements ranging from 1% to 20%, when compared to the current state-of-the-art biased SVMs, being more than 120 times faster.
3 0.073797643 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
Author: Wen-Tai Hsieh ; Chen-Ming Wu ; Tsun Ku ; Seng-cho T. Chou
Abstract: Social Event Radar is a new social networking-based service platform, that aim to alert as well as monitor any merchandise flaws, food-safety related issues, unexpected eruption of diseases or campaign issues towards to the Government, enterprises of any kind or election parties, through keyword expansion detection module, using bilingual sentiment opinion analysis tool kit to conclude the specific event social dashboard and deliver the outcome helping authorities to plan “risk control” strategy. With the rapid development of social network, people can now easily publish their opinions on the Internet. On the other hand, people can also obtain various opinions from others in a few seconds even though they do not know each other. A typical approach to obtain required information is to use a search engine with some relevant keywords. We thus take the social media and forum as our major data source and aim at collecting specific issues efficiently and effectively in this work. 163 Chen-Ming Wu Institute for Information Industry cmwu@ i i i .org .tw Seng-cho T. Chou Department of IM, National Taiwan University chou @ im .ntu .edu .tw 1
4 0.058269478 160 acl-2012-Personalized Normalization for a Multilingual Chat System
Author: Ai Ti Aw ; Lian Hau Lee
Abstract: This paper describes the personalized normalization of a multilingual chat system that supports chatting in user defined short-forms or abbreviations. One of the major challenges for multilingual chat realized through machine translation technology is the normalization of non-standard, self-created short-forms in the chat message to standard words before translation. Due to the lack of training data and the variations of short-forms used among different social communities, it is hard to normalize and translate chat messages if user uses vocabularies outside the training data and create short-forms freely. We develop a personalized chat normalizer for English and integrate it with a multilingual chat system, allowing user to create and use personalized short-forms in multilingual chat. 1
5 0.053386636 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan
Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1
6 0.047410905 65 acl-2012-Crowdsourcing Inference-Rule Evaluation
7 0.043533985 58 acl-2012-Coreference Semantics from Web Features
8 0.042026304 78 acl-2012-Efficient Search for Transformation-based Inference
9 0.040257309 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
10 0.039565403 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora
11 0.03943605 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability
12 0.038914159 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
13 0.03644849 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
14 0.035885293 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
15 0.03566644 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
16 0.035412811 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
17 0.035054464 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
18 0.034880456 7 acl-2012-A Computational Approach to the Automation of Creative Naming
19 0.033747535 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
20 0.033148006 212 acl-2012-Using Search-Logs to Improve Query Tagging
topicId topicWeight
[(0, -0.119), (1, 0.053), (2, -0.005), (3, 0.023), (4, 0.007), (5, 0.055), (6, 0.035), (7, 0.041), (8, 0.009), (9, 0.042), (10, -0.021), (11, 0.055), (12, -0.027), (13, 0.026), (14, -0.004), (15, -0.026), (16, -0.024), (17, 0.027), (18, -0.044), (19, -0.039), (20, -0.029), (21, 0.03), (22, 0.059), (23, 0.037), (24, -0.013), (25, 0.028), (26, 0.084), (27, 0.06), (28, 0.078), (29, -0.044), (30, 0.015), (31, -0.084), (32, 0.055), (33, 0.105), (34, 0.078), (35, 0.024), (36, 0.015), (37, -0.015), (38, -0.078), (39, 0.041), (40, 0.099), (41, 0.002), (42, -0.018), (43, 0.008), (44, -0.071), (45, -0.024), (46, -0.04), (47, 0.084), (48, -0.053), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.93201298 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
Author: Marco Guerini ; Carlo Strapparava ; Oliviero Stock
Abstract: In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach.
2 0.6669035 160 acl-2012-Personalized Normalization for a Multilingual Chat System
Author: Ai Ti Aw ; Lian Hau Lee
Abstract: This paper describes the personalized normalization of a multilingual chat system that supports chatting in user defined short-forms or abbreviations. One of the major challenges for multilingual chat realized through machine translation technology is the normalization of non-standard, self-created short-forms in the chat message to standard words before translation. Due to the lack of training data and the variations of short-forms used among different social communities, it is hard to normalize and translate chat messages if user uses vocabularies outside the training data and create short-forms freely. We develop a personalized chat normalizer for English and integrate it with a multilingual chat system, allowing user to create and use personalized short-forms in multilingual chat. 1
3 0.55063349 153 acl-2012-Named Entity Disambiguation in Streaming Data
Author: Alexandre Davis ; Adriano Veloso ; Altigran Soares ; Alberto Laender ; Wagner Meira Jr.
Abstract: The named entity disambiguation task is to resolve the many-to-many correspondence between ambiguous names and the unique realworld entity. This task can be modeled as a classification problem, provided that positive and negative examples are available for learning binary classifiers. High-quality senseannotated data, however, are hard to be obtained in streaming environments, since the training corpus would have to be constantly updated in order to accomodate the fresh data coming on the stream. On the other hand, few positive examples plus large amounts of unlabeled data may be easily acquired. Producing binary classifiers directly from this data, however, leads to poor disambiguation performance. Thus, we propose to enhance the quality of the classifiers using finer-grained variations of the well-known ExpectationMaximization (EM) algorithm. We conducted a systematic evaluation using Twitter streaming data and the results show that our classifiers are extremely effective, providing improvements ranging from 1% to 20%, when compared to the current state-of-the-art biased SVMs, being more than 120 times faster.
4 0.54790145 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability
Author: Cristian Danescu-Niculescu-Mizil ; Justin Cheng ; Jon Kleinberg ; Lillian Lee
Abstract: Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased the choice of words and sentence structure — can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns. Another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts — that is, more portable. — We also show how the concept of “memorable language” can be extended across domains. 1 Hello. My name is Inigo Montoya. Understanding what items will be retained in the public consciousness, and why, is a question of fundamental interest in many domains, including marketing, politics, entertainment, and social media; as we all know, many items barely register, whereas others catch on and take hold in many people’s minds. An active line of recent computational work has employed a variety of perspectives on this question. 892 Building on a foundation in the sociology of diffusion [27, 31], researchers have explored the ways in which network structure affects the way information spreads, with domains of interest including blogs [1, 11], email [37], on-line commerce [22], and social media [2, 28, 33, 38]. There has also been recent research addressing temporal aspects of how different media sources convey information [23, 30, 39] and ways in which people react differently to infor- mation on different topics [28, 36]. Beyond all these factors, however, one’s everyday experience with these domains suggests that the way in which a piece of information is expressed the choice of words, the way it is phrased might also have a fundamental effect on the extent to which it takes hold in people’s minds. Concepts that attain wide reach are often carried in messages such as political slogans, marketing phrases, or aphorisms whose language seems intuitively to be memorable, “catchy,” or otherwise compelling. Our first challenge in exploring this hypothesis is to develop a notion of “successful” language that is precise enough to allow for quantitative evaluation. We also face the challenge of devising an evaluation setting that separates the phrasing of a message from the conditions in which it was delivered highlycited quotes tend to have been delivered under compelling circumstances or fit an existing cultural, political, or social narrative, and potentially what appeals to us about the quote is really just its invocation of these extra-linguistic contexts. Is the form of the language adding an effect beyond or independent of these (obviously very crucial) factors? To — — — investigate the question, one needs a way of controlProce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi8c 9s2–901, ling as much as possible for the role that the surrounding context of the language plays. — — The present work (i): Evaluating language-based memorability Defining what makes an utterance memorable is subtle, and scholars in several domains have written about this question. There is a rough consensus that an appropriate definition involves elements of both recognition people should be able to retain the quote and recognize it when they hear it invoked and production people should be motivated to refer to it in relevant situations [15]. One suggested reason for why some memes succeed is their ability to provoke emotions [16]. Alternatively, memorable quotes can be good for expressing the feelings, mood, or situation of an individual, a group, or a culture (the zeitgeist): “Certain quotes exquisitely capture the mood or feeling we wish to communicate to someone. We hear them ... and store them away for future use” [10]. None of these observations, however, serve as definitions, and indeed, we believe it desirable to — — — not pre-commit to an abstract definition, but rather to adopt an operational formulation based on external human judgments. In designing our study, we focus on a domain in which (i) there is rich use of language, some of which has achieved deep cultural penetration; (ii) there already exist a large number of external human judgments perhaps implicit, but in a form we can extract; and (iii) we can control for the setting in which the text was used. Specifically, we use the complete scripts of roughly 1000 movies, representing diverse genres, eras, and levels of popularity, and consider which lines are the most “memorable”. To acquire memorability labels, for each sentence in each script, we determine whether it has been listed as a “memorable quote” by users of the widely-known IMDb (the Internet Movie Database), and also estimate the number oftimes it appears on the Web. Both ofthese serve as memorability metrics for our purposes. When we evaluate properties of memorable quotes, we comparethemwithquotes thatarenotassessed as memorable, but were spoken by the same character, at approximately the same point in the same movie. This enables us to control in a fairly — fine-grained way for the confounding effects of context discussed above: we can observe differences 893 that persist even after taking into account both the speaker and the setting. In a pilot validation study, we find that human subjects are effective at recognizing the more IMDbmemorable of two quotes, even for movies they have not seen. This motivates a search for features intrinsic to the text of quotes that signal memorability. In fact, comments provided by the human subjects as part of the task suggested two basic forms that such textual signals could take: subjects felt that (i) memorable quotes often involve a distinctive turn of phrase; and (ii) memorable quotes tend to invoke general themes that aren’t tied to the specific setting they came from, and hence can be more easily invoked for future (out of context) uses. We test both of these principles in our analysis of the data. The present work (ii): What distinguishes memorable quotes Under the controlled-comparison setting sketched above, we find that memorable quotes exhibit significant differences from nonmemorable quotes in several fundamental respects, and these differences in the data reinforce the two main principles from the human pilot study. First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their nonmemorable counterparts. Interestingly, this distinctiveness takes place at the level of words, but not at the level of other syntactic features: the part-ofspeech composition of memorable quotes is in fact more likely with respect to newswire. Thus, we can think of memorable quotes as consisting, in an aggregate sense, of unusual word choices built on a scaffolding of common part-of-speech patterns. We also identify a number of ways in which memorable quotes convey greater generality. In their patterns of verb tenses, personal pronouns, and determiners, memorable quotes are structured so as to be more “free-standing,” containing fewer markers that indicate references to nearby text. Memorable quotes differ in other interesting as- pects as well, such as sound distributions. Our analysis ofmemorable movie quotes suggests a framework by which the memorability of text in a range of different domains could be investigated. We provide evidence that such cross-domain properties may hold, guided by one of our motivating applications in marketing. In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes. This suggests that some of the principles underlying memorable text have the potential to apply across different areas. Roadmap §2 lays the empirical foundations of our work: the design yasntdh ecerematpioirnic aofl our movie-quotes dataset, which we make publicly available (§2. 1), a pilot study cwhit hw ehu mmakaen subjects validating §I2M.1D),b abased memorability labels (§2.2), and further study bofa incorporating search-engine c2)o,u anntds (§2.3). §3 uddeytoafi lisn our analysis aenardc prediction experiments, using both movie-quotes data and, as an exploration of cross-domain applicability, slogans data. §4 surveys rcerloastse-dd owmoarkin across a variety goafn fsie dladtsa.. §5 briefly sruelmatmedar wizoesrk ka andcr ionsdsic aat veasr some ffuft uierled sd.ire §c5tio bnrsie. 2 I’m ready for my close-up. 2.1 Data To study the properties of memorable movie quotes, we need a source of movie lines and a designation of memorability. Following [8], we constructed a corpus consisting of all lines from roughly 1000 movies, varying in genre, era, and popularity; for each movie, we then extracted the list of quotes from IMDb’s Memorable Quotes page corresponding to the movie.1 A memorable quote in IMDb can appear either as an individual sentence spoken by one character, or as a multi-sentence line, or as a block of dialogue involving multiple characters. In the latter two cases, it can be hard to determine which particular portion is viewed as memorable (some involve a build-up to a punch line; others involve the follow-through after a well-phrased opening sentence), and so we focus in our comparisons on those memorable quotes that 1This extraction involved some edit-distance-based alignment, since the exact form of the line in the script can exhibit minor differences from the version typed into IMDb. rmotuqsfebmaNerolbm543281760 0 1234D5ecil678910 894 Figure 1: Location of memorable quotes in each decile of movie scripts (the first 10th, the second 10th, etc.), summed over all movies. The same qualitative results hold if we discard each movie’s very first and last line, which might have privileged status. appear as a single sentence rather than a multi-line block.2 We now formulate a task that we can use to evaluate the features of memorable quotes. Recall that our goal is to identify effects based in the language of the quotes themselves, beyond any factors arising from the speaker or context. Thus, for each (singlesentence) memorable quote M, we identify a nonmemorable quote that is as similar as possible to M in all characteristics but the choice of words. This means we want it to be spoken by the same character in the same movie. It also means that we want it to have the same length: controlling for length is important because we expect that on average, shorter quotes will be easier to remember than long quotes, and that wouldn’t be an interesting textual effect to report. Moreover, we also want to control for the fact that a quote’s position in a movie can affect memorability: certain scenes produce more memorable dialogue, and as Figure 1 demonstrates, in aggregate memorable quotes also occur disproportionately near the beginnings and especially the ends of movies. In summary, then, for each M, we pick a contrasting (single-sentence) quote N from the same movie that is as close in the script as possible to M (either before or after it), subject to the conditions that (i) M and N are uttered by the same speaker, (ii) M and N have the same number of words, and (iii) N does not occur in the IMDb list of memorable 2We also ran experiments relaxing the single-sentence assumption, which allows for stricter scene control and a larger dataset but complicates comparisons involving syntax. The non-syntax results were in line with those reported here. TaJSOMbtrclodekviTn1ra:eBTykhoPrwNenpmlxeasipFIHAeaithrclsfnitkaQeomuifltw’sdaveoitycmsnedoqatbuliocrkeytsl f.woEeimlanchguwspakyirdfsebavot;ilmsdfcoenti’dus.erx-citaINmSnrkeioamct:ohenwmardleytQ.howfeu t’yvrecp,o’gsmrtpuaosnmtyef o rtgnhqieuvrobt.pehasirtdeosfpykuern close together in the movie by the same while the other is not. (Contractions character, have the same length, and one is labeled memorable by the IMDb such as “it’s” count as two words.) quotes for the movie (either as a single line or as part of a larger block). Given such pairs, we formulate a pairwise comparison task: given M and N, determine which is the memorable quote. Psychological research on subjective evaluation [35], as well as initial experiments using ourselves as subjects, indicated that this pairwise set-up easier to work with than simply presenting a single sentence and asking whether it is memorable or not; the latter requires agreement on an “absolute” criterion for memorability that is very hard to impose consistently, whereas the former simply requires a judgment that one quote is more memorable than another. Our main dataset, available at http://www.cs. cornell.edu/∼cristian/memorability.html,3 thus consists of approximately 2200 such (M, N) pairs, separated by a median of 5 same-character lines in the script. The reader can get a sense for the nature of the data from the three examples in Table 1. We now discuss two further aspects to the formulation of the experiment: a preliminary pilot study involving human subjects, and the incorporation of search engine counts into the data. 2.2 Pilot study: Human performance As a preliminary consideration, we did a small pilot study to see if humans can distinguish memorable from non-memorable quotes, assuming our IMDBinduced labels as gold standard. Six subjects, all native speakers of English and none an author of this paper, were presented with 11 or 12 pairs of memorable vs. non-memorable quotes; again, we controlled for extra-textual effects by ensuring that in each pair the two quotes come from the same movie, are by the same character, have the same length, and 3Also available there: other examples and factoids. 895 Table 2: Human pilot study: number of matches to IMDb-induced annotation, ordered by decreasing match percentage. For the null hypothesis of random guessing, these results are statistically significant, p < 2−6 ≈ .016. appear as nearly as possible in the same scene.4 The order of quotes within pairs was randomized. Importantly, because we wanted to understand whether the language of the quotes by itself contains signals about memorability, we chose quotes from movies that the subjects said they had not seen. (This means that each subject saw a different set of quotes.) Moreover, the subjects were requested not to consult any external sources of information.5 The reader is welcome to try a demo version of the task at http: //www.cs.cornell.edu/∼cristian/memorability.html. Table 2 shows that all the subjects performed (sometimes much) better than chance, and against the null hypothesis that all subjects are guessing randomly, the results are statistically significant, p < 2−6 ≈ .016. These preliminary findings provide evidenc≈e f.0or1 t6h.e T validity eolifm our traysk fi:n despite trohev apparent difficulty of the job, even humans who haven’t seen the movie in question can recover our IMDb4In this pilot study, we allowed multi-sentence quotes. 5We did not use crowd-sourcing because we saw no way to ensure that this condition would be obeyed by arbitrary subjects. We do note, though, that after our research was completed and as of Apr. 26, 2012, ≈ 11,300 people completed the online test: average accuracy: 27,2 ≈%, 1 1m,3o0d0e npueompbleer c coomrrpelcett:e d9 t/1he2. induced labels with some reliability.6 2.3 Incorporating search engine counts Thus far we have discussed a dataset in which memorability is determined through an explicit labeling drawn from the IMDb. Given the “production” aspect of memorability discussed in § 1, we stihoonu”ld a saplesoc expect tmhaotr mabeimlityora dbislce quotes nw §il1l ,te wnde to appear more extensively on Web pages than nonmemorable quotes; note that incorporating this insight makes it possible to use the (implicit) judgments of a much larger number of people than are represented by the IMDb database. It therefore makes sense to try using search-engine result counts as a second indication of memorability. We experimented with several ways of constructing memorability information from search-engine counts, but this proved challenging. Searching for a quote as a stand-alone phrase runs into the problem that a number of quotes are also sentences that people use without the movie in mind, and so high counts for such quotes do not testify to the phrase’s status as a memorable quote from the movie. On the other hand, searching for the quote in a Boolean conjunction with the movie’s title discards most of these uses, but also eliminates a large fraction of the appearances on the Web that we want to find: precisely because memorable quotes tend to have widespread cultural usage, people generally don’t feel the need to include the movie’s title when invoking them. Finally, since we are dealing with roughly 1000 movies, the result counts vary over an enormous range, from recent blockbusters to movies with relatively small fan bases. In the end, we found that it was more effective to use the result counts in conjunction with the IMDb labels, so that the counts played the role of an additional filter rather than a free-standing numerical value. Thus, for each pair (M, N) produced using the IMDb methodology above, we searched for each of M and N as quoted expressions in a Boolean conjunction with the title of the movie. We then kept only those pairs for which M (i) produced more than five results in our (quoted, conjoined) search, and (ii) produced at least twice as many results as the cor6The average accuracy being below 100% reinforces that context is very important, too. 896 responding search for N. We created a version of this filtered dataset using each of Google and Bing, and all the main findings were consistent with the results on the IMDb-only dataset. Thus, in what follows, we will focus on the main IMDb-only dataset, discussing the relationship to the dataset filtered by search engine counts where relevant (in which case we will refer to the +Google dataset). 3 Never send a human to do a machine’s job. We now discuss experiments that investigate the hypotheses discussed in §1. In particular, we devise pmoetthheosdess t dhiastc can assess 1th.e Idnis ptianrcttiicvuelnaer,ss w aend d generality hypotheses and test whether there exists a notion of “memorable language” that operates across domains. In addition, we evaluate and compare the predictive power of these hypotheses. 3.1 Distinctiveness One of the hypotheses we examine is whether the use of language in memorable quotes is to some extent unusual. In order to quantify the level of distinctiveness of a quote, we take a language-model approach: we model “common language” using the newswire sections of the Brown corpus [21]7, and evaluate how distinctive a quote is by evaluating its likelihood with respect to this model the lower the likelihood, the more distinctive. In order to assess different levels of lexical and syntactic distinctiveness, we employ a total of six Laplacesmoothed8 language models: 1-gram, 2-gram, and — 3-gram word LMs and 1-gram, 2-gram and 3-gram LMs. We find strong evidence that from a lexical perspective, memorable quotes are more distinctive than their non-memorable counterparts. As indicated in Table 3, for each of our lexical “common language” models, in about 60% of the quote pairs, the memorable quote is more distinctive. Interestingly, the reverse is true when it comes to part-of-speech9 7Results were qualitatively similar if we used the fiction portions. The age of the Brown corpus makes it less likely to contain modern movie quotes. 8We employ Laplace (additive) smoothing with a smoothing parameter of 0.2. The language models’ vocabulary was that of the entire training corpus. 9Throughout we obtain part-of-speech tags by using the NLTK maximum entropy tagger with default parameters. in which the the memorable quote is more distinctive than the non-memorable one according to the respective “common language” model. Significance according to a two-tailed sign test is indicated using *-notation (∗∗∗=“p<.001”). syntax: memorable quotes appear to follow the syntactic patterns of “common language” as closely as or more closely than non-memorable quotes. Together, these results suggest that memorable quotes consist of unusual word sequences built on common syntactic scaffolding. 3.2 Generality Another of our hypotheses is that memorable quotes are easier to use outside the specific context in which they were uttered that is, more “portable” and therefore exhibit fewer terms that refer to those settings. We use the following syntactic properties as proxies for the generality of a quote: • Fewer 3rd-person pronouns, since these commonly r 3efer to a person or object that was introduced earlier in the discourse. Utterances that employ fewer such pronouns are easier to adapt to new contexts, and so will be considered more — — general. • More indefinite articles like a and an, since they are more likely ttioc lreesfer li ktoe general concepts than definite articles. Quotes with more indefinite articles will be considered more general. Fewer past tense verbs and more present tFeenwsee verbs, tseinncsee t vheer bfosrm aenrd are more likely to refer to specific previous events. Therefore utterances that employ fewer past tense verbs (and more present tense verbs) will be considered more general. Table 4 gives the results for each of these four metrics in each case, we show the percentage of • — 897 TalfmGebowsnre4pa:in3srGldet sypfne.msrate.lripnctysoe: purncsetaI56gM47e.326D9o710bf% -qo∗u n∗l tyepa+56iG892rs.o7i364ng% wl∗ eh∗i ch the memorable quote is more general than the non- memorable ones according to the respective metric. Pairs where the metric does not distinguish between the quotes are not considered. quote pairs for which the memorable quote scores better on the generality metric. Note that because the issue of generality is a complex one for which there is no straightforward single metric, our approach here is based on several proxies for generality, considered independently; yet, as the results show, all of these point in a consistent direction. It is an interesting open question to develop richer ways of assessing whether a quote has greater generality, in the sense that people intuitively attribute to memorable quotes. 3.3 “Memorable” language beyond movies One of the motivating questions in our analysis is whether there are general principles underlying “memorable language.” The results thus far suggest potential families of such principles. A further question in this direction is whether the notion of memorability can be extended across different domains, and for this we collected (and distribute on our website) 431 phrases that were explicitly designed to be memorable: advertising slogans (e.g., “Quality never goes out of style.”). The focus on slogans is also in keeping with one of the initial motivations in studying memorability, namely, marketing applications in other words, assessing whether a proposed slogan has features that are consistent with memorable text. The fact that it’s not clear how to construct a collection of “non-memorable” counterparts to slogans appears to pose a technical challenge. However, we can still use a language-modeling approach to assess whether the textual properties of the slogans are closer to the memorable movie quotes (as one would conjecture) or to the non-memorable movie quotes. Specifically, we train one language model on memorable quotes and another on non-memorable quotes — guage: percentage of slogans that have higher likelihood under the memorable language model than under the nonmemorable one (for each of the six language models considered). Rightmost column: for reference, the percentage of newswire sentences that have higher likelihood under the memorable language model than under the nonmemorable one. TaG% ble3nipared6stpa:lfeitrnSsyilto.megpareotnsicluaerns mo1s42lto.61g048ae% nseral2w1m.h16e3mn% .comn2p-63ma.0r46e19dm% .to memorable and non-memorable quotes. (%s of 3rd pers. pronouns and indefinite articles are relative to all tokens, %s of past tense are relative to all past and present verbs.) and compare how likely each slogan is to be produced according to these two models. As shown in the middle column of Table 5, we find that slogans are better predicted both lexically and syntactically by the former model. This result thus offers evidence for a concept of “memorable language” that can be applied beyond a single domain. We also note that the higher likelihood of slogans under a “memorable language” model is not simply occurring for the trivial reason that this model predicts all other large bodies of text better. In particular, the newswire section of the Brown corpus is predicted better at the lexical level by the language model trained on non-memorable quotes. Finally, Table 6 shows that slogans employ general language, in the sense that for each of our generality metrics, we see a slogans/memorablequotes/non-memorable quotes spectrum. 3.4 Prediction task We now show how the principles discussed above can provide features for a basic prediction task, corresponding to the task in our human pilot study: 898 given a pair of quotes, identify the memorable one. Our first formulation of the prediction task uses a standard bag-of-words model10. If there were no information in the textual content of a quote to determine whether it were memorable, then an SVM employing bag-of-words features should perform no better than chance. Instead, though, it obtains 59.67% (10-fold cross-validation) accuracy, as shown in Table 7. We then develop models using features based on the measures formulated earlier in this section: generality measures (the four listed in Table 4); distinctiveness measures (likelihood according to 1, 2, and 3-gram “common language” models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them); and similarityto-slogans measures (likelihood according to 1, 2, and 3-gram slogan-language models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them). Even a relatively small number of distinctiveness features, on their own, improve significantly over the much larger bag-of-words model. When we include additional features based on generality and language-model features measuring similarity to slogans, the performance improves further (last line of Table 7). Thus, the main conclusion from these prediction tasks is that abstracting notions such as distinctiveness and generality can produce relatively streamlined models that outperform much heavier-weight bag-of-words models, and can suggest steps toward approaching the performance of human judges who very much unlike our system have the full cultural context in which movies occur at their disposal. — — 3.5 Other characteristics We also made some auxiliary observations that may be ofinterest. Specifically, we find differences in letter and sound distribution (e.g., memorable quotes after curse-word removal use significantly more “front sounds” (labials or front vowels such as represented by the letter i) and significantly fewer “back sounds” such as the one represented by u),11 — — 10We discarded terms appearing fewer than 10 times. 11These findings may relate to marketing research on sound symbolism [7, 19, 40]. TablesdgF7lieao:sngtPiehnorauefc dtliswevctymeo irnp.des:StoVgeMh10r-fo#ldec9ra265ot42sv5aA6l8942ic.d36720atu57%ri aocn∗yresult using the respective feature sets. Random baseline accuracy is 50%. Accuracies statistically significantly greater than bag-of-words according to a two-tailed t-test are indicated with *(p<.05) and **(p<.01). word complexity (e.g., memorable quotes use words with significantly more syllables) and phrase complexity (e.g., memorable quotes use fewer coordinating conjunctions). The latter two are in line with our distinctiveness hypothesis. 4 A long time ago, in a galaxy far, far away How an item’s linguistic form affects the reaction it generates has been studied in several contexts, including evaluations of product reviews [9], political speeches [12], on-line posts [13], scientific papers [14], and retweeting of Twitter posts [36]. We use a different set of features, abstracting the notions of distinctiveness and generality, in order to focus on these higher-level aspects of phrasing rather than on particular lower-level features. Related to our interest in distinctiveness, work in advertising research has studied the effect of syntactic complexity on recognition and recall of slogans [5, 6, 24]. There may also be connections to Von Restorff’s isolation effect Hunt [17], which asserts that when all but one item in a list are similar in some way, memory for the different item is enhanced. Related to our interest in generality, Knapp et al. [20] surveyed subjects regarding memorable messages or pieces of advice they had received, finding that the ability to be applied to multiple concrete situations was an important factor. Memorability, although distinct from “memorizability”, relates to short- and long-term recall. Thorn and Page [34] survey sub-lexical, lexical, and semantic attributes affecting short-term memorability of lexical items. Studies of verbatim recall have also considered the task of distinguishing an exact quote from close paraphrases [3]. Investigations of longterm recall have included studies ofculturally signif- 899 icant passages of text [29] and findings regarding the effect of rhetorical devices of alliterative [4], “rhythmic, poetic, and thematic constraints” [18, 26]. Finally, there are complex connections between humor and memory [32], which may lead to interactions with computational humor recognition [25]. 5 I think this is the beginning of a beautiful friendship. Motivated by the broad question of what kinds of information achieve widespread public awareness, we studied the the effect of phrasing on a quote’s memorability. A challenge is that quotes differ not only in how they are worded, but also in who said them and under what circumstances; to deal with this difficulty, we constructed a controlled corpus of movie quotes in which lines deemed memorable are paired with non-memorable lines spoken by the same character at approximately the same point in the same movie. After controlling for context and situation, memorable quotes were still found to exhibit, on av- erage (there will always be individual exceptions), significant differences from non-memorable quotes in several important respects, including measures capturing distinctiveness and generality. Our experiments with slogans show how the principles we identify can extend to a different domain. Future work may lead to applications in marketing, advertising and education [4]. Moreover, the subtle nature of memorability, and its connection to research in psychology, suggests a range of further research directions. We believe that the framework developed here can serve as the basis for further computational studies of the process by which information takes hold in the public consciousness, and the role that language effects play in this process. My mother thanks you. My father thanks you. My sister thanks you. And Ithank you: Rebecca Hwa, Evie Kleinberg, Diana Minculescu, Alex Niculescu-Mizil, Jennifer Smith, Benjamin Zimmer, and the anonymous reviewers for helpful discussions and comments; our annotators Steven An, Lars Backstrom, Eric Baumer, Jeff Chadwick, Evie Kleinberg, and Myle Ott; and the makers of Cepacol, Robitussin, and Sudafed, whose products got us through the submission deadline. This paper is based upon work supported in part by NSF grants IIS-0910664, IIS-1016099, Google, and Yahoo! References [1] [2] [3] [4] [5] Eytan Adar, Li Zhang, Lada A. Adamic, and Rajan M. Lukose. Implicit structure and the dynamics of blogspace. In Workshop on the Weblogging Ecosystem, 2004. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of KDD, 2006. Elizabeth Bates, Walter Kintsch, Charles R. Fletcher, and Vittoria Giuliani. The role of pronominalization and ellipsis in texts: Some memory experiments. Journal of Experimental Psychology: Human Learning and Memory, 6 (6):676–691, 1980. Frank Boers and Seth Lindstromberg. Finding ways to make phrase-learning feasible: The mnemonic effect of alliteration. System, 33(2): 225–238, 2005. Samuel D. Bradley and Robert Meeds. Surface-structure transformations and advertising slogans: The case for moderate syntactic complexity. Psychology and Marketing, 19: 595–619, 2002. [6] Robert Chamblee, Robert Gilmore, Gloria Thomas, and Gary Soldow. When copy complexity can help ad readership. Journal of Advertising Research, 33(3):23–23, 1993. [7] John Colapinto. Famous names. The New Yorker, pages 38–43, 2011. [8] Cristian Danescu-Niculescu-Mizil and Lillian Lee. Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 2011. [9] Cristian Danescu-Niculescu-Mizil, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. In Proceedings of WWW, pages 141–150, 2009. [10] Stuart Fischoff, Esmeralda Cardenas, Angela Hernandez, Korey Wyatt, Jared Young, and 900 [11] [12] [13] [14] [15] Rachel Gordon. Popular movie quotes: Reflections of a people and a culture. In Annual Convention of the American Psychological Association, 2000. Daniel Gruhl, R. Guha, David Liben-Nowell, and Andrew Tomkins. Information diffusion through blogspace. Proceedings of WWW, pages 491–501, 2004. Marco Guerini, Carlo Strapparava, and Oliviero Stock. Trusting politicians’ words (for persuasive NLP). In Proceedings of CICLing, pages 263–274, 2008. Marco Guerini, Carlo Strapparava, and G o¨zde O¨zbal. Exploring text virality in social networks. In Proceedings of ICWSM (poster), 2011. Marco Guerini, Alberto Pepe, and Bruno Lepri. Do linguistic style and readability of scientific abstracts affect their virality? In Proceedings of ICWSM, 2012. Richard Jackson Harris, Abigail J. Werth, Kyle E. Bures, and Chelsea M. Bartel. Social movie quoting: What, why, and how? Ciencias Psicologicas, 2(1):35–45, 2008. [16] Chip Heath, Chris Bell, and Emily Steinberg. Emotional selection in memes: The case of urban legends. Journal of Personality, 81(6): 1028–1041, 2001. [17] R. Reed Hunt. The subtlety of distinctiveness: What von Restorff really did. Psychonomic Bulletin & Review, 2(1): 105–1 12, 1995. [18] Ira E. Hyman Jr. and David C. Rubin. Memorabeatlia: A naturalistic study of long-term memory. Memory & Cognition, 18(2):205– 214, 1990. [19] Richard R. Klink. Creating brand names with meaning: The use of sound symbolism. Marketing Letters, 11(1):5–20, 2000. [20] Mark L. Knapp, Cynthia Stohl, and Kathleen K. Reardon. “Memorable” messages. Journal of Communication, 3 1(4):27– 41, 1981. [21] Henry Kuˇ cera and W. Nelson Francis. Computational analysis of present-day American English. Dartmouth Publishing Group, 1967. [22] Jure Leskovec, Lada Adamic, and Bernardo Huberman. The dynamics of viral marketing. ACM Transactions on the Web, 1(1), May [23] [24] [25] [26] [27] [28] [29] 2007. Jure Leskovec, Lars Backstrom, and Jon Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of KDD, pages 497–506, 2009. Tina M. Lowrey. The relation between script complexity and commercial memorability. Journal of Advertising, 35(3):7–15, 2006. Rada Mihalcea and Carlo Strapparava. Learning to laugh (automatically): Computational models for humor recognition. Computational Intelligence, 22(2): 126–142, 2006. Milman Parry and Adam Parry. The making of Homeric verse: The collected papers of Milman Parry. Clarendon Press, Oxford, 1971. Everett Rogers. Diffusion of Innovations. Free Press, fourth edition, 1995. Daniel M. Romero, Brendan Meeder, and Jon Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on Twitter. Proceedings of WWW, pages 695–704, 2011. David C. Rubin. Very long-term memory for [30] [3 1] [32] [33] prose and verse. Journal of Verbal Learning and Verbal Behavior, 16(5):61 1–621, 1977. Nathan Schneider, Rebecca Hwa, Philip Gianfortoni, Dipanjan Das, Michael Heilman, Alan W. Black, Frederick L. Crabbe, and Noah A. Smith. Visualizing topical quotations over time to understand news discourse. Technical Report CMU-LTI-01-103, CMU, 2010. David Strang and Sarah Soule. Diffusion in organizations and social movements: From hybrid corn to poison pills. Annual Review of Sociology, 24:265–290, 1998. Hannah Summerfelt, Louis Lippman, and Ira E. Hyman Jr. The effect of humor on memory: Constrained by the pun. The Journal of General Psychology, 137(4), 2010. Eric Sun, Itamar Rosenn, Cameron Marlow, and Thomas M. Lento. Gesundheit! Model- 901 ing contagion through Facebook News Feed. In Proceedings of ICWSM, 2009. [34] Annabel Thorn and Mike Page. Interactions Between Short-Term and Long-Term Memory [35] [36] [37] [38] [39] [40] in the Verbal Domain. Psychology Press, 2009. Louis L. Thurstone. A law of comparative judgment. Psychological Review, 34(4):273– 286, 1927. Oren Tsur and Ari Rappoport. What’s in a Hashtag? Content based prediction of the spread of ideas in microblogging communities. In Proceedings of WSDM, 2012. Fang Wu, Bernardo A. Huberman, Lada A. Adamic, and Joshua R. Tyler. Information flow in social groups. Physica A: Statistical and Theoretical Physics, 337(1-2):327–335, 2004. Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. Who says what to whom on Twitter. In Proceedings of WWW, 2011. Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. In Proceedings of WSDM, 2011. Eric Yorkston and Geeta Menon. A sound idea: Phonetic effects of brand names on consumer judgments. Journal of Consumer Research, 3 1 (1):43–51, 2004.
5 0.54274273 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
Author: Fei Liu ; Fuliang Weng ; Xiao Jiang
Abstract: Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, i.e., for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates. In this paper, we propose a cognitivelydriven normalization system that integrates different human perspectives in normalizing the nonstandard tokens, including the enhanced letter transformation, visual priming, and string/phonetic similarity. The system was evaluated on both word- and messagelevel using four SMS and Twitter data sets. Results show that our system achieves over 90% word-coverage across all data sets (a . 10% absolute increase compared to state-ofthe-art); the broad word-coverage can also successfully translate into message-level performance gain, yielding 6% absolute increase compared to the best prior approach.
6 0.53035653 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
7 0.49570808 7 acl-2012-A Computational Approach to the Automation of Creative Naming
8 0.49093473 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora
9 0.48382992 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
10 0.44893944 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy
11 0.43812075 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
12 0.41998032 164 acl-2012-Private Access to Phrase Tables for Statistical Machine Translation
13 0.41600046 186 acl-2012-Structuring E-Commerce Inventory
14 0.40952212 195 acl-2012-The Creation of a Corpus of English Metalanguage
15 0.40747696 68 acl-2012-Decoding Running Key Ciphers
16 0.39582008 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
17 0.38915181 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum
18 0.3870261 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus
19 0.37779388 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
20 0.36166415 197 acl-2012-Tokenization: Returning to a Long Solved Problem A Survey, Contrastive Experiment, Recommendations, and Toolkit
topicId topicWeight
[(25, 0.027), (26, 0.038), (28, 0.048), (30, 0.043), (37, 0.032), (39, 0.07), (48, 0.013), (55, 0.267), (59, 0.017), (74, 0.036), (82, 0.015), (84, 0.036), (85, 0.039), (90, 0.097), (92, 0.054), (94, 0.012), (99, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.76671624 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
Author: Marco Guerini ; Carlo Strapparava ; Oliviero Stock
Abstract: In recent years there has been a growing interest in crowdsourcing methodologies to be used in experimental research for NLP tasks. In particular, evaluation of systems and theories about persuasion is difficult to accommodate within existing frameworks. In this paper we present a new cheap and fast methodology that allows fast experiment building and evaluation with fully-automated analysis at a low cost. The central idea is exploiting existing commercial tools for advertising on the web, such as Google AdWords, to measure message impact in an ecological setting. The paper includes a description of the approach, tips for how to use AdWords for scientific research, and results of pilot experiments on the impact of affective text variations which confirm the effectiveness of the approach.
2 0.7341308 207 acl-2012-Unsupervised Morphology Rivals Supervised Morphology for Arabic MT
Author: David Stallard ; Jacob Devlin ; Michael Kayser ; Yoong Keok Lee ; Regina Barzilay
Abstract: If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. We apply maximum marginal decoding to the unsupervised analyzer, and show that this yields the best published segmentation accuracy for Arabic, while also making segmentation output more stable. Our approach gives an 18% relative BLEU gain for Levantine dialectal Arabic. Furthermore, it gives higher gains for Modern Standard Arabic (MSA), as measured on NIST MT-08, than does MADA (Habash and Rambow, 2005), a leading supervised MSA segmenter.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
4 0.52097982 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
Author: Gerard de Melo ; Gerhard Weikum
Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
5 0.50776392 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
Author: Hao Wang ; Dogan Can ; Abe Kazemzadeh ; Francois Bar ; Shrikanth Narayanan
Abstract: This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events. In addition, sentiment analysis can help explore how these events affect public opinion. While traditional content analysis takes days or weeks to complete, the system demonstrated here analyzes sentiment in the entire Twitter traffic about the election, delivering results instantly and continuously. It offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the electoral process and public opinion. 1
6 0.50681847 191 acl-2012-Temporally Anchored Relation Extraction
7 0.50590986 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
8 0.50548112 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
9 0.50464022 139 acl-2012-MIX Is Not a Tree-Adjoining Language
10 0.50422555 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
11 0.50375593 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
12 0.50347203 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
13 0.50302809 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
14 0.50216329 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
15 0.50177521 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
16 0.50160927 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
17 0.50091946 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
18 0.50007141 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
19 0.49981311 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
20 0.49973378 184 acl-2012-String Re-writing Kernel