acl acl2013 acl2013-30 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Cristian Danescu-Niculescu-Mizil ; Moritz Sudhof ; Dan Jurafsky ; Jure Leskovec ; Christopher Potts
Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community.
Reference: text
sentIndex sentText sentNum sentScore
1 Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. [sent-4, score-2.088]
2 These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. [sent-5, score-0.907]
3 We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. [sent-7, score-0.433]
4 We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. [sent-8, score-1.271]
5 Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community. [sent-9, score-0.895]
6 Natural languages provide numerous and diverse means for encoding politeness and, in conversation, we constantly make choices about where and how to use these devices. [sent-11, score-0.88]
7 Kaplan (1999) observes that “people desire to be paid respect” and identifies honorifics and other politeness markers, like please, sudho f | j urafsky| cgpott s @ st anford . [sent-12, score-0.894]
8 In turn, politeness markers are intimately related to the power dynamics of social interactions and are often a decisive factor in whether those interactions go well or poorly (Gyasi Obeng, 1997; Chilton, 1990; Andersson and Pearson, 1999; Rogers and LeeWong, 2003; Holmes and Stubbe, 2005). [sent-14, score-1.079]
9 The present paper develops a computational framework for identifying and characterizing politeness marking in requests. [sent-15, score-0.892]
10 We focus on requests because they involve the speaker imposing on the addressee, making them ideal for exploring the social value of politeness strategies (Clark and Schunk, 1980; Francik and Clark, 1985). [sent-16, score-1.26]
11 Our investigation is guided by a new corpus of requests annotated for politeness. [sent-18, score-0.27]
12 The data come from two large online communities in which members frequently make requests of other members: Wikipedia, where the requests involve editing and other administrative functions, and Stack Exchange, where the requests center around a diverse range of topics (e. [sent-19, score-0.841]
13 The corpus confirms the broad outlines of linguistic theories of politeness pioneered by Brown and Levinson (1987), but it also reveals new interactions between politeness markings and the morphosyntactic context. [sent-22, score-1.804]
14 For example, the politeness of please depends on its syntactic position and the politeness markers it co-occurs with. [sent-23, score-1.839]
15 Using this corpus, we construct a politeness classifier with a wide range of domainindependent lexical, sentiment, and dependency features operationalizing key components of po250 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-24, score-0.907]
16 The classifier achieves near human-level accuracy across domains, which highlights the consistent nature of politeness strategies and paves the way to using the classifier to study new data. [sent-27, score-0.933]
17 Politeness theory predicts a negative correlation between politeness and the power of the requester, where power is broadly construed to include social status, authority, and autonomy (Brown and Levinson, 1987). [sent-28, score-1.068]
18 The greater the speaker’s power relative to her addressee, the less polite her requests are expected to be: there is no need for her to incur the expense of paying respect, and failing to make such payments can invoke, and hence reinforce, her power. [sent-29, score-0.591]
19 We support this prediction by applying our politeness framework to Wikipedia and Stack Exchange, both of which provide independent measures of social status. [sent-30, score-0.951]
20 We show that polite Wikipedia editors are more likely to achieve high status through elections; however, once elected, they become less polite. [sent-31, score-0.329]
21 Similarly, on Stack Exchange, we find that users at the top of the reputation scale are less polite than those at the bottom. [sent-32, score-0.325]
22 Finally, we briefly address the question of how politeness norms vary across communities and social groups. [sent-33, score-0.982]
23 Our findings confirm established results about the relationship between politeness and gender, and they identify substantial variation in politeness across different programming language subcommunities on Stack Exchange. [sent-34, score-1.773]
24 Requests in online communities We base our analysis on two online communities where requests have an important role: the Wikipedia community of editors and the Stack Exchange question-answer community. [sent-36, score-0.377]
25 On Stack Exchange, users often comment on existing posts requesting further information or proposing edits; these requests are generally directed to the authors of the original posts. [sent-41, score-0.29]
26 Both communities are not only rich in userto-user requests, but these requests are also part of consequential conversations, not empty social banter; they solicit specific information or concrete actions, and they expect a response. [sent-42, score-0.372]
27 We therefore label a large portion of our request data (over 10,000 utterances) using Amazon Mechanical Turk (AMT), creating the largest corpus with politeness annotations (see Table 1for details). [sent-44, score-0.934]
28 3 We choose to annotate requests containing exactly two sentences, where the second sentence is the actual request (and ends with a question mark). [sent-45, score-0.324]
29 Each annotator was instructed to read a batch of 13 requests and consider them as originating from a co-worker by email. [sent-47, score-0.27]
30 For each request, the annotator had to indicate how polite she perceived the request to be by using a slider with values ranging from “very impolite” to “very polite”. [sent-48, score-0.291]
31 Because politeness is highly subjective and an- notators may have inconsistent scales, we applied the standard z-score normalization to each worker’s scores. [sent-55, score-0.88]
32 Finally, we define the politeness score (henceforth politeness) of a request as the average of the five scores assigned by the annotators. [sent-56, score-0.934]
33 251 domain #requests #annotated #annotators Wiki 35,661 4,353 219 SE 373,519 6,604 212 Table 1: Summary of the request data and its politeness annotations. [sent-61, score-0.934]
34 7 for both domains; positive values correspond to polite requests (i. [sent-65, score-0.524]
35 , requests with normalized annotations towards the “very polite” extreme) and negative values to impolite requests. [sent-67, score-0.345]
36 For reference, we compute the same quantities after randomizing the scores by sampling from the observed distribution of politeness scores. [sent-70, score-0.88]
37 5 Binary perception Although we did not impose a discrete categorization of politeness, we acknowledge an implicit binary perception of the phenomenon: whenever an annotator moved a slider in one direction or the other, she made a binary politeness judgment. [sent-73, score-0.91]
38 Quartile: 1st 2nd 3rd 4th Wiki 62% 8% 3% 51% SE 37% 4% 6% 46% Table 2: The percentage of requests for which all five annotators agree on binary politeness. [sent-76, score-0.283]
39 The 4th quartile contains the requests with the top 25% politeness scores in the data. [sent-77, score-1.261]
40 ) ary between somewhat polite and somewhat impolite requests can be blurry. [sent-79, score-0.567]
41 To test this intuition, we break the set of annotated requests into four groups, each corresponding to a politeness score quartile. [sent-80, score-1.15]
42 For each quartile, we compute the percentage of requests for which all five annotators made the same binary politeness judgment. [sent-81, score-1.163]
43 This suggests that the politeness scores assigned to requests that are only somewhat polite or somewhat impolite are less reliable and less tied to an intuitive notion of binary politeness. [sent-83, score-1.447]
44 This discrepancy motivates our choice of classes in the prediction experiments (Section 4) and our use of the top politeness quartile (the 25% most polite requests) as a reference in our subsequent discussion. [sent-84, score-1.228]
45 3 Politeness strategies As we mentioned earlier, requests impose on the addressee, potentially placing her in social peril if she is unwilling or unable to comply. [sent-85, score-0.364]
46 These strategies are prominent in Table 3, which describes the core politeness markers we analyzed in our corpus of Wikipedia requests. [sent-87, score-0.942]
47 Requests exhibiting politeness markers are automatically extracted using regular expression matching on the dependency parse obtained by the Stanford Dependency Parser (de Marneffe et al. [sent-89, score-0.94]
48 For example, for the hedges marker (Table 3, line 19), we match all requests containing a nominal subject dependency edge pointing out from a hedge verb from the hedge list created by Hyland (2005). [sent-91, score-0.314]
49 For each politeness strategy, Table 3 shows the average politeness score of the respective requests (as described in Section 2; positive numbers indicate polite requests), and their top politeness quartile membership (i. [sent-92, score-3.275]
50 , what percentage fall within the top quartile of politeness scores). [sent-94, score-0.991]
51 As discussed at the end of Section 2, the top politeness quartile gives a more robust and more intuitive measure of politeness. [sent-95, score-0.991]
52 For reference, a random sample of requests will have a 0 politeness score and a 25% top quartile membership; in both cases, larger numbers indicate higher politeness. [sent-96, score-1.261]
53 The remainder of the cues in Table 3 are negative politeness strategies, serving the purpose of minimizing, at least in appearance, the imposition on the addressee. [sent-103, score-0.929]
54 Though indirectness is not invariably interpreted as politeness marking (Blum-Kulka, 2003), it is nonetheless a reliable marker of it, as our scores indicate. [sent-111, score-0.933]
55 Many of these features are correlated with each other, in keeping with the insight of Brown and Levinson (1987) that politeness markers are often combined to create a cumulative effect of increased politeness. [sent-125, score-0.919]
56 For example, sentence-medial please is polite (line 7), presumably because of its freedom to combine with other negative politeness strategies (Could you please . [sent-127, score-1.235]
57 In contrast, sentence-initial please is impolite (line 8), because it typically signals a more direct strategy (Please do this), which can make the politeness marker itself seem insincere. [sent-131, score-1.009]
58 4 Predicting politeness We now show how our linguistic analysis can be used in a machine learning model for automatically classifying requests according to politeness. [sent-139, score-1.163]
59 Also, by providing automatic politeness judgments for large 253 Strategy 1. [sent-141, score-0.88]
60 38 Table 3: Positive (1-5) and negative (6–20) politeness strategies and their relation to human perception of politeness. [sent-235, score-0.933]
61 Throughout the paper: for politeness scores, statistical significance is calculated by comparing the set of requests exhibiting the strategy with the rest using a Mann-Whitney-Wilcoxon U test; for top quartile membership a binomial test is used. [sent-237, score-1.297]
62 amounts of new data on a scale unfeasible for human annotation, it can also enable a detailed analysis of the relation between politeness and social factors (Section 5). [sent-238, score-0.964]
63 We take the model (features and weights) trained on Wikipedia and use them to classify requests from Stack Exchange. [sent-242, score-0.27]
64 We consider two classes of requests: polite and impolite, defined as the top and, respectively, bottom quartile of requests when sorted by their politeness score (based on the binary notion of politeness discussed in Section 2). [sent-243, score-2.378]
65 The classes are therefore balanced, with each class consisting of 1,089 requests for the Wikipedia domain and 1,65 1requests for the Stack Exchange domain. [sent-244, score-0.27]
66 Finally, to obtain a reference point for the prediction task we also collect three new politeness annotations for each of the requests in our dataset using the same methodology described in Section 2. [sent-252, score-1.15]
67 We then calculate human performance on the task (Human) as the percentage of requests for which the average score from the additional annotations matches the binary politeness class of the original annotations (e. [sent-253, score-1.15]
68 In the next section we exploit this insight to automat- ically annotate a much larger set of requests (about 400,000) with politeness labels, enabling us to relate politeness to several social variables and outcomes. [sent-260, score-2.101]
69 For new requests, we use class probability estimates obtained by fitting a logistic regression model to the output of the SVM (Witten and Frank, 2005) as predicted politeness scores (with values between 0 and 1; henceforth politeness, by abuse of language). [sent-261, score-0.88]
70 5 Relation to social factors We now apply our framework to studying the relationship between politeness and social variables, focussing on social power dynamics. [sent-262, score-1.17]
71 Encouraged by the close-to-human performance of our in-domain classifiers, we use them to assign politeness labels to our full dataset and then compare these labels to independent measures of power and status in our data. [sent-263, score-0.981]
72 1 Relation to social outcome Earlier, we characterized politeness markings as currency used to pay respect. [sent-282, score-0.963]
73 We now address this question by studying politeness in the context of the electoral system of the Wikipedia community of editors. [sent-285, score-0.897]
74 To see whether politeness correlates with eventual high status, we compare, in Table 5, the politeness levels of requests made by users who will eventually succeed in becoming administrators (Eventual status: Admins) with requests made by users who are not admins (Non-admins). [sent-293, score-2.463]
75 org/wiki / Wikipedi a : Reque st s_for_adminship 9We consider only requests made up to one month before the election, to avoid confusion with pre-election behavior. [sent-298, score-0.27]
76 Editors who will eventually become admins are more polite than non-admins (p<0. [sent-304, score-0.338]
77 Out of their requests, 30% are rated in the top politeness quartile (significantly more than the 25% of a random sample; p<0. [sent-307, score-0.991]
78 These users are also significantly less polite than their successful counterparts, indicating that politeness indeed correlates with a positive social outcome here. [sent-315, score-1.225]
79 2 Politeness and power We expect a rise in status to correlate with a decline in politeness (as predicted by politeness theory, and discussed in Section 1). [sent-317, score-1.861]
80 Figure 3 shows that this boost is mirrored by a significant decrease in politeness (blue, diamond markers). [sent-321, score-0.88]
81 Losing an election has the opposite effect on politeness (red, circle markers), perhaps as a consequence of reinforced low status. [sent-322, score-0.88]
82 Following the elections, successful editors become less polite while unsuccessful editors become more polite. [sent-328, score-0.321]
83 Analysis conducted on 181k requests (106k for question-askers, 75k for answer-givers). [sent-337, score-0.27]
84 Again, we see a negative correlation between politeness and power, even after controlling for the role of the user making the requests (i. [sent-339, score-1.165]
85 11 Human validation The above analyses are based on predicted politeness from our classifier. [sent-343, score-0.88]
86 This allows us to use the entire request data cor11Since our data does not contain time stamps for reputation scores, we only consider requests that were issued in the six months prior to the available snapshot. [sent-344, score-0.376]
87 In order to validate this methodology, we turned again to human annotation: we collected additional politeness annotation for the types of requests involved in the newly designed experiments. [sent-356, score-1.15]
88 59*** 28%* Table 8: Politeness of requests from different language communities on Stack Exchange. [sent-369, score-0.301]
89 In recent years, politeness has been studied in online settings. [sent-376, score-0.88]
90 Researchers have identified variation in politeness marking across different contexts and media types (Herring, 1994; Brennan and Ohaeri, 1999; Duthler, 2006) and between different social groups (Burke and Kraut, 2008a). [sent-377, score-0.963]
91 The present paper pursues similar goals using orders of magnitude more data, which facilitates a fuller survey of different politeness strategies. [sent-378, score-0.88]
92 The present paper complements this work by revealing the role of politeness in social outcomes and power relations. [sent-389, score-1.002]
93 7 Conclusion We construct and release a large collection of politeness-annotated requests and use it to evaluate key aspects of politeness theory. [sent-390, score-1.15]
94 We build a politeness classifier that achieves near-human performance and use it to explore the relation between politeness and social factors such as power, status, gender, and community membership. [sent-391, score-1.876]
95 We hope the publicly available collection of annotated requests enables further study of politeness and its relation to social factors, as this paper has only begun to explore this area. [sent-392, score-1.221]
96 Mind your Ps and Qs: the impact of politeness and rudeness in online communities. [sent-443, score-0.88]
97 The role of linguistic indirectness and honorifics in achieving linguistic politeness in Korean requests. [sent-451, score-0.947]
98 How to make requests that overcome obstacles to compli- ance. [sent-489, score-0.27]
99 Reconceptualizing politeness to accommodate dynamic tensions in subordinate-to-superior reporting. [sent-584, score-0.88]
100 Being polite is a handicap: Towards a game theoretical analysis of polite linguistic behavior. [sent-595, score-0.487]
wordName wordTfidf (topN-words)
[('politeness', 0.88), ('requests', 0.27), ('polite', 0.237), ('quartile', 0.095), ('stack', 0.075), ('admins', 0.074), ('social', 0.071), ('impolite', 0.06), ('addressee', 0.057), ('exchange', 0.056), ('request', 0.054), ('reputation', 0.052), ('power', 0.051), ('status', 0.05), ('levinson', 0.05), ('please', 0.04), ('markers', 0.039), ('imposition', 0.034), ('wiki', 0.032), ('wikipedia', 0.031), ('communities', 0.031), ('editors', 0.028), ('indirectness', 0.027), ('burke', 0.025), ('brown', 0.025), ('strategies', 0.023), ('eventual', 0.022), ('pragmatics', 0.021), ('exhibiting', 0.021), ('andersson', 0.02), ('apologizing', 0.02), ('deference', 0.02), ('gyasi', 0.02), ('liteness', 0.02), ('lakoff', 0.02), ('users', 0.02), ('interactions', 0.019), ('elections', 0.018), ('jure', 0.018), ('kraut', 0.018), ('leskovec', 0.018), ('workplace', 0.018), ('indirect', 0.018), ('positive', 0.017), ('community', 0.017), ('incur', 0.017), ('posted', 0.016), ('speaker', 0.016), ('robin', 0.016), ('cscw', 0.016), ('gratitude', 0.016), ('paying', 0.016), ('top', 0.016), ('negative', 0.015), ('perception', 0.015), ('classifier', 0.015), ('line', 0.015), ('rogers', 0.015), ('burden', 0.015), ('hedges', 0.015), ('strategy', 0.015), ('marker', 0.014), ('se', 0.014), ('become', 0.014), ('failed', 0.014), ('universals', 0.014), ('administrators', 0.014), ('adminship', 0.014), ('bramsen', 0.014), ('counterfactual', 0.014), ('diehl', 0.014), ('francik', 0.014), ('herring', 0.014), ('honorifics', 0.014), ('huttenlocher', 0.014), ('obeng', 0.014), ('penelope', 0.014), ('rooy', 0.014), ('scholand', 0.014), ('shoshana', 0.014), ('stubbe', 0.014), ('wikipedi', 0.014), ('wikipedians', 0.014), ('eventually', 0.013), ('relationship', 0.013), ('linguistic', 0.013), ('bow', 0.013), ('women', 0.013), ('annotators', 0.013), ('factors', 0.013), ('marking', 0.012), ('fellowship', 0.012), ('brennan', 0.012), ('chilton', 0.012), ('enron', 0.012), ('grice', 0.012), ('markings', 0.012), ('moira', 0.012), ('operationalizing', 0.012), ('prabhakaran', 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 30 acl-2013-A computational approach to politeness with application to social factors
Author: Cristian Danescu-Niculescu-Mizil ; Moritz Sudhof ; Dan Jurafsky ; Jure Leskovec ; Christopher Potts
Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community.
2 0.039751764 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
3 0.032853682 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue
Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda
Abstract: While there have been many attempts to estimate the emotion of an addresser from her/his utterance, few studies have explored how her/his utterance affects the emotion of the addressee. This has motivated us to investigate two novel tasks: predicting the emotion of the addressee and generating a response that elicits a specific emotion in the addressee’s mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by . five human workers.
4 0.031907499 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
Author: Adrien Barbaresi
Abstract: We present a way to extract links from messages published on microblogging platforms and we classify them according to the language and possible relevance of their target in order to build a text corpus. Three platforms are taken into consideration: FriendFeed, identi.ca and Reddit, as they account for a relative diversity of user profiles and more importantly user languages. In order to explore them, we introduce a traversal algorithm based on user pages. As we target lesser-known languages, we try to focus on non-English posts by filtering out English text. Using mature open-source software from the NLP research field, a spell checker (as- pell) and a language identification system (langid .py), our case study and our benchmarks give an insight into the linguistic structure of the considered services.
5 0.030824926 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
6 0.030262971 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
8 0.027212908 121 acl-2013-Discovering User Interactions in Ideological Discussions
9 0.026510056 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts
10 0.026097238 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
11 0.025338247 301 acl-2013-Resolving Entity Morphs in Censored Data
12 0.025026949 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
13 0.024270412 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
14 0.024263332 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
15 0.023745101 184 acl-2013-Identification of Speakers in Novels
16 0.022597637 45 acl-2013-An Empirical Study on Uncertainty Identification in Social Media Context
17 0.022478284 42 acl-2013-Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster
18 0.021929093 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
19 0.021315776 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
20 0.021259317 33 acl-2013-A user-centric model of voting intention from Social Media
topicId topicWeight
[(0, 0.058), (1, 0.037), (2, -0.007), (3, 0.005), (4, -0.005), (5, -0.013), (6, 0.026), (7, 0.005), (8, 0.031), (9, 0.001), (10, -0.019), (11, 0.023), (12, -0.028), (13, 0.007), (14, 0.007), (15, -0.012), (16, -0.024), (17, -0.009), (18, 0.011), (19, -0.02), (20, -0.025), (21, 0.017), (22, 0.019), (23, 0.027), (24, -0.009), (25, -0.012), (26, 0.003), (27, 0.007), (28, 0.013), (29, -0.006), (30, -0.009), (31, -0.014), (32, 0.022), (33, -0.003), (34, 0.036), (35, 0.004), (36, 0.039), (37, 0.028), (38, -0.024), (39, -0.02), (40, 0.014), (41, 0.013), (42, 0.035), (43, 0.009), (44, -0.002), (45, -0.012), (46, 0.037), (47, -0.044), (48, 0.026), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.84099698 30 acl-2013-A computational approach to politeness with application to social factors
Author: Cristian Danescu-Niculescu-Mizil ; Moritz Sudhof ; Dan Jurafsky ; Jure Leskovec ; Christopher Potts
Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community.
2 0.59305578 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
Author: Adrien Barbaresi
Abstract: We present a way to extract links from messages published on microblogging platforms and we classify them according to the language and possible relevance of their target in order to build a text corpus. Three platforms are taken into consideration: FriendFeed, identi.ca and Reddit, as they account for a relative diversity of user profiles and more importantly user languages. In order to explore them, we introduce a traversal algorithm based on user pages. As we target lesser-known languages, we try to focus on non-English posts by filtering out English text. Using mature open-source software from the NLP research field, a spell checker (as- pell) and a language identification system (langid .py), our case study and our benchmarks give an insight into the linguistic structure of the considered services.
3 0.59286112 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
Author: Adam Vogel ; Christopher Potts ; Dan Jurafsky
Abstract: Conversational implicatures involve reasoning about multiply nested belief structures. This complexity poses significant challenges for computational models of conversation and cognition. We show that agents in the multi-agent DecentralizedPOMDP reach implicature-rich interpretations simply as a by-product of the way they reason about each other to maximize joint utility. Our simulations involve a reference game of the sort studied in psychology and linguistics as well as a dynamic, interactional scenario involving implemented artificial agents.
4 0.5903309 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
5 0.55976588 184 acl-2013-Identification of Speakers in Novels
Author: Hua He ; Denilson Barbosa ; Grzegorz Kondrak
Abstract: Speaker identification is the task of at- tributing utterances to characters in a literary narrative. It is challenging to auto- mate because the speakers of the majority ofutterances are not explicitly identified in novels. In this paper, we present a supervised machine learning approach for the task that incorporates several novel features. The experimental results show that our method is more accurate and general than previous approaches to the problem.
6 0.54454416 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use
7 0.51968479 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates
8 0.50381982 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions
10 0.47370362 121 acl-2013-Discovering User Interactions in Ideological Discussions
11 0.46067476 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation
12 0.45889896 33 acl-2013-A user-centric model of voting intention from Social Media
13 0.45804232 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
14 0.44715476 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
15 0.43805927 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic
16 0.43257502 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
17 0.42574286 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
18 0.41887477 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
19 0.41832981 49 acl-2013-An annotated corpus of quoted opinions in news articles
20 0.41189501 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
topicId topicWeight
[(0, 0.048), (6, 0.028), (11, 0.048), (15, 0.035), (21, 0.01), (24, 0.041), (26, 0.054), (28, 0.016), (35, 0.046), (42, 0.037), (48, 0.028), (56, 0.013), (64, 0.015), (70, 0.039), (72, 0.302), (79, 0.012), (88, 0.042), (90, 0.021), (95, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.7849521 30 acl-2013-A computational approach to politeness with application to social factors
Author: Cristian Danescu-Niculescu-Mizil ; Moritz Sudhof ; Dan Jurafsky ; Jure Leskovec ; Christopher Potts
Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community.
2 0.71714342 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
Author: Hajime Morita ; Ryohei Sasano ; Hiroya Takamura ; Manabu Okumura
Abstract: This study proposes a text summarization model that simultaneously performs sentence extraction and compression. We translate the text summarization task into a problem of extracting a set of dependency subtrees in the document cluster. We also encode obligatory case constraints as must-link dependency constraints in order to guarantee the readability of the generated summary. In order to handle the subtree extraction problem, we investigate a new class of submodular maximization problem, and a new algorithm that has the approximation ratio 21 (1 − e−1). Our experiments with the NTC(1IR − −A eCLIA test collections show that our approach outperforms a state-of-the-art algorithm.
3 0.68681931 165 acl-2013-General binarization for parsing and translation
Author: Matthias Buchse ; Alexander Koller ; Heiko Vogler
Abstract: Binarization ofgrammars is crucial for improving the complexity and performance of parsing and translation. We present a versatile binarization algorithm that can be tailored to a number of grammar formalisms by simply varying a formal parameter. We apply our algorithm to binarizing tree-to-string transducers used in syntax-based machine translation.
4 0.4786815 172 acl-2013-Graph-based Local Coherence Modeling
Author: Camille Guinaudeau ; Michael Strube
Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.
5 0.4091332 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
Author: Weiwei Guo ; Hao Li ; Heng Ji ; Mona Diab
Abstract: Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-toNews task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previ- ous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
6 0.40871468 318 acl-2013-Sentiment Relevance
7 0.40110272 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
9 0.39872095 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
10 0.39836812 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
11 0.39694172 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms
12 0.39691558 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
13 0.39489079 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.39463758 250 acl-2013-Models of Translation Competitions
15 0.39436278 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
16 0.39414912 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
17 0.39343515 225 acl-2013-Learning to Order Natural Language Texts
18 0.39321104 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
19 0.39285755 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
20 0.39259171 333 acl-2013-Summarization Through Submodularity and Dispersion