acl acl2013 acl2013-232 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
Reference: text
sentIndex sentText sentNum sentScore
1 edu cri st iand@ Abstract Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. [sent-3, score-0.083]
2 Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. [sent-4, score-0.4]
3 To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. [sent-5, score-0.786]
4 The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. [sent-6, score-0.633]
5 We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . [sent-7, score-0.276]
6 These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. [sent-9, score-0.202]
7 1 Introduction Writers and editors of reference works such as encyclopedias, textbooks, and scientific articles strive to keep their language unbiased. [sent-11, score-0.078]
8 For example, Wikipedia advocates a policy called neutral point of view (NPOV), according to which articles should represent “fairly, proportionately, and as far as possible without bias, all significant views that have been published by reliable sources” (Wikipedia, 2013b). [sent-12, score-0.082]
9 Wikipedia’s style guide asks editors to use nonjudgmental language, to indicate the relative prominence of opposing points of view, to avoid presenting uncontroversial . [sent-13, score-0.09]
10 Understanding the linguistic realization of bias is important for linguistic theory; automatically detecting these biases is equally significant for computational linguistics. [sent-16, score-0.454]
11 We propose to address both by using a powerful resource: edits in Wikipedia that are specifically designed to remove bias. [sent-17, score-0.422]
12 Since Wikipedia maintains a complete revision history, the edits associated with NPOV tags allow us to compare the text in its biased (before) and unbiased (after) form, helping us better understand the linguistic realization of bias. [sent-18, score-0.707]
13 Our work thus shares the intuition of prior NLP work applying Wikipedia’s revision history (Nelken and Yamangil, 2008; Yatskar et al. [sent-19, score-0.088]
14 The analysis of Wikipedia’s edits provides valuable linguistic insights into the nature of biased language. [sent-21, score-0.583]
15 The first, framing bias, is realized by subjective words or phrases linked with a particular point of view. [sent-23, score-0.302]
16 The second class, epistemological bias, is related to linguistic features that subtly (often via presupposition) focus on the believability of a proposition. [sent-25, score-0.304]
17 In (2), the assertive stated removes the bias introduced by claimed, which casts doubt on Kuypers’ statement. [sent-26, score-0.469]
18 Usually, smaller cottage-style houses have been demolished to make way for these McMansions. [sent-28, score-0.089]
19 Usually, smaller cottage-style houses have been demolished to make way for these homes. [sent-30, score-0.089]
20 Bias is linked to the lexical and grammatical cues identified by the literature on subjectivity (Wiebe et al. [sent-35, score-0.154]
21 , 2005; Turney, 2002), and especially stance 1650 ProceedingsS ooffita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h. [sent-38, score-0.082]
22 For exstance, framing bias is realized when the writer of a text takes a particular position on a controversial topic and uses its metaphors and vocabulary. [sent-45, score-0.696]
23 Our linguistic analysis identifies common classes of these subtle bias cues, including factive verbs, implicatives and other entailments, hedges, and subjective intensifiers. [sent-47, score-0.653]
24 Using these cues could help automatically detect and correct instances of bias, by first finding biased phrases, then identifying the word that introduces the bias, and finally rewording to eliminate the bias. [sent-48, score-0.311]
25 In this paper we propose a solution for the second of these tasks, identifying the bias-inducing word in a biased phrase. [sent-49, score-0.202]
26 1 The NPOV Corpus from Wikipedia Given Wikipedia’s strict enforcement of an NPOV policy, we decided to build the NPOV corpus, containing Wikipedia edits that are specifically designed to remove bias. [sent-55, score-0.422]
27 Editors are encouraged to identify and rewrite biased passages to achieve a more neutral tone, and they can use several NPOV 1The data and bias lexicon we developed are available at http : / /www . [sent-56, score-0.616]
28 We constructed the NPOV corpus by retrieving all articles that were or had been in the NPOVdispute category3 together with their full revision history. [sent-67, score-0.125]
29 Following Wikipedia’s terminology, we call each version of a Wikipedia article a revision, and so an article can be viewed as a set of (chronologically ordered) revisions. [sent-70, score-0.122]
30 2 Extracting Edits Meant to Remove Bias Given all the revisions of a page, we extracted the changes between pairs of revisions with the wordmode diff function from the Diff Match and Patch library. [sent-72, score-0.243]
31 5 We refer to these changes between revisions as edits, e. [sent-73, score-0.1]
32 Our assumption was that among the edits happening in NPOV disputes, we would have a high density of edits intended to remove bias, which we call bias-driven edits, like (1) and (2) from Section 1. [sent-81, score-0.803]
33 But many other edits occur even in NPOV disputes, including edits to fix spelling or grammatical errors, simplify the language, make the meaning more precise, or even vandalism (Max 2{{POV}}, {{POV-check}}, {{POV-section}}, etc. [sent-82, score-0.762]
34 Add{in{gP tOheVs}e }t,ags{ d{iPspOlaVy-sc hae tcekm}p}l,ate{ s{uPcOhV Va-ss “eTcthieon nn}e}u,trality of this article is disputed. [sent-83, score-0.061]
35 We considered as bias-driven edits those that appeared in a revision whose comment mentioned (N)POV, e. [sent-95, score-0.469]
36 , Attempts at presenting some claims in more NPOV way; or merging in a passage from the researchers article after basic NPOVing. [sent-97, score-0.061]
37 We only kept edits whose before and after forms contained five or fewer words, and discarded those that only added a hyperlink or that involved a minimal change (character-based Levenshtein distance < 4). [sent-98, score-0.381]
38 The final number of biasdriven edits for each of the data sets is shown in the “Edits” column of Table 1. [sent-99, score-0.381]
39 3 Linguistic Analysis Style guides talk about biased language in a prescriptive manner, listing a few words that should be avoided because they are flattering, vague, or endorse a particular point of view (Wikipedia, 2013a). [sent-101, score-0.202]
40 Our focus is on analyzing actual biased text and bias-driven edits extracted from Wikipedia. [sent-102, score-0.583]
41 As we suggested above, this analysis uncovered two major classes of bias: epistemological bias and framing bias. [sent-103, score-0.804]
42 Table 2 shows the distribution (from a sample of 100 edits) of the different types and subtypes of bias presented in this section. [sent-104, score-0.364]
43 (A) Epistemological bias involves propositions that are either commonly agreed to be true or commonly agreed to be false and that are subtly presupposed, entailed, asserted or hedged in the text. [sent-105, score-0.475]
44 Factive verbs (Kiparsky and Kiparsky, 1970) presuppose the truth of their complement clause. [sent-107, score-0.199]
45 In (3-a) and (4-a), realize and reveal presuppose the truth of “the oppression of black people. [sent-108, score-0.231]
46 ”, whereas (3-b) and (4-b) present the two propositions as somebody’s stand or an experimental result. [sent-114, score-0.051]
47 He realized that the oppression of black people was more of a result of economic exploitation than anything innately racist. [sent-116, score-0.158]
48 His stand was that the oppression of black people was more of a result of economic exploitation than anything innately racist. [sent-118, score-0.122]
49 Epistemological bias % 43 - Factive verbs 3 - Entailments 25 - Assertives 11 - Hedges B. [sent-124, score-0.405]
50 Framing bias 4 57 - Intensifiers 19 - One-sided terms 38 Table 2: Proportion of the different bias types. [sent-125, score-0.728]
51 Entailments are directional relations that hold whenever the truth of one word or phrase follows from another, e. [sent-127, score-0.109]
52 , murder entails kill because there cannot be murdering without killing (5). [sent-129, score-0.123]
53 However, murder entails killing in an unlawful, premeditated way. [sent-130, score-0.123]
54 This class includes implicative verbs (Karttunen, 1971), which imply the truth or untruth of their complement, depending on the polarity of the main predicate. [sent-131, score-0.15]
55 In (6-a), coerced into accepting entails accepting in an unwilling way. [sent-132, score-0.215]
56 After he murdered three policemen, the colony proclaimed Kelly a wanted outlaw. [sent-134, score-0.098]
57 After he killed three policemen, the colony proclaimed Kelly a wanted outlaw. [sent-136, score-0.098]
58 A computer engineer who was coerced into accepting a plea bargain. [sent-138, score-0.189]
59 The truth of the proposition is not presupposed, but its level of certainty depends on the asserting verb. [sent-143, score-0.207]
60 Whereas verbs of saying like say and state are usually neutral, point out and claim cast doubt on the certainty of the proposition. [sent-144, score-0.169]
61 The “no Boeing” theory is a controversial issue, even among conspiracists, many of whom have pointed out that it is disproved by . [sent-146, score-0.149]
62 The “no Boeing” theory is a controversial issue, even among conspiracists, many of whom have said that it is disproved by. [sent-150, score-0.149]
63 Cooper says that slavery was worse in South America and the US than Canada, but clearly states that it was a horrible and cruel practice. [sent-154, score-0.138]
64 Cooper says that slavery was worse in South America and the US than Canada, but points out that it was a horrible and cruel practice. [sent-156, score-0.138]
65 Hedges are used to reduce one’s commitment to the truth of a proposition, thus avoiding any bold predictions (9-b) or statements (9) a. [sent-158, score-0.109]
66 Eliminating the profit motive will decrease the rate of medical innovation. [sent-159, score-0.049]
67 Eliminating the profit motive may have a lower rate of medical innovation. [sent-162, score-0.049]
68 The lower cost of living in more rural areas means a possibly higher standard of living. [sent-164, score-0.082]
69 The lower cost of living in more rural areas means a higher standard of living. [sent-166, score-0.082]
70 Epistemological bias is bidirectional, that is, bias can occur because doubt is cast on a proposition commonly assumed to be true, or because a presupposition or implication is made about a proposition commonly assumed to be false. [sent-167, score-0.984]
71 If the truth of the proposition is uncontroversially accepted by the community (i. [sent-169, score-0.257]
72 In contrast, if only a specific viewpoint agrees with its truth, then using a factive is biased. [sent-173, score-0.146]
73 (B) Framing bias is usually more explicit than epistemological bias because it occurs when subjective or one-sided words are used, revealing the author’s stance in a particular debate (Entman, 2007). [sent-174, score-1.169]
74 Schnabel himself did the fantastic reproductions of Basquiat’s work. [sent-178, score-0.049]
75 Schnabel himself did the accurate reproductions of Basquiat’s work. [sent-180, score-0.049]
76 ) where the same event can be seen from two or more opposing perspectives, like the Israeli-Palestinian conflict (Lin et al. [sent-190, score-0.049]
77 Concerned Women for America’s major areas of political activity have consisted of opposition to gay causes, pro-life law. [sent-199, score-0.082]
78 Concerned Women for America’s major areas of political activity have consisted of opposition to gay causes, anti-abortion law. [sent-203, score-0.082]
79 Framing bias has been studied within the literature on stance recognition and arguing subjectivity. [sent-210, score-0.446]
80 Because this literature has focused on identifying which side an article takes on a two-sided debate such as the Israeli-Palestinian conflict (Lin et al. [sent-211, score-0.106]
81 , 2012; Somasundaran and Wiebe, 2010), or into one of two opposing views (Yano et al. [sent-215, score-0.049]
82 The features used by these models include subjectivity and sentiment lexicons, counts of unigrams and bigrams, distributional similarity, discourse relationships, and so on. [sent-218, score-0.133]
83 The datasets used by these studies come from genres that overtly take a specific stance (e. [sent-219, score-0.147]
84 For this reason, overtly biased opinion statements such as “I believe that. [sent-223, score-0.267]
85 The features used by the subjectivity literature help us detect framing bias, but we also need features that capture epistemological bias expressed through presuppositions and entailments. [sent-227, score-0.941]
86 3 Automatically Identifying Biased Language We now show how the bias cues identified in Section 2. [sent-228, score-0.424]
87 This is part of a potential three-step process for detecting and correcting biased language: (1) finding biased phrases, (2) identifying the word that introduces the bias, (3) rewording to eliminate the bias. [sent-233, score-0.453]
88 As we will see below, it can be 1653 hard even for humans to track down the sources of bias, because biases in reference works are often subtle and implicit. [sent-234, score-0.054]
89 An automatic bias detector that can highlight the bias-inducing word(s) and draw the editors’ attention to words that need to be modified could thus be important for improving reference works like Wikipedia or even in news reporting. [sent-235, score-0.364]
90 The ones targeting framing bias draw on previous work on sentiment and subjectivity detection (Wiebe et al. [sent-247, score-0.693]
91 Features to capture epistemological bias are based on the bias cues identified in Section 2. [sent-250, score-1.032]
92 Taking context into account is important given that biases can be context-dependent, especially epistemological bias since it depends on the truth of a proposition. [sent-262, score-0.771]
93 Features 9–10 use the list of hedges from Hyland (2005), features 11–14 use the factives and assertives from Hooper (1975), features 15–16 use the implicatives from Karttunen (1971), features 19–20 use the entailments from Berant et al. [sent-265, score-0.293]
94 (2012), features 21–25 employ the subjectivity lexicon from Riloff and Wiebe (2003), and features 26–29 use the sentiment lexicon—positive and negative words—from Liu et al. [sent-266, score-0.183]
95 Of the 654 words included in this lexicon, 433 were unique to this lexicon (i. [sent-271, score-0.05]
96 , recorded in neither Riloff and Wiebe’s (2003) subjectivity lexicon nor Liu et al. [sent-273, score-0.144]
97 ’s (2005) sentiment lexicon) and represented many one-sided or controversial terms, e. [sent-274, score-0.139]
98 Finally, we also included a “collaborative feature” that, based on the previous revisions of the edit’s article, computes the ratio between the number of times that the word was NPOV-edited and its frequency of occurrence. [sent-277, score-0.1]
99 This feature was designed to capture framing bias specific to an article or topic. [sent-278, score-0.621]
100 2 Baselines Previous work on subjectivity and stance recog- nition has been evaluated on the task of classifying documents as opinionated vs. [sent-280, score-0.176]
wordName wordTfidf (topN-words)
[('npov', 0.388), ('edits', 0.381), ('bias', 0.364), ('epistemological', 0.244), ('biased', 0.202), ('framing', 0.196), ('factive', 0.146), ('wikipedia', 0.132), ('truth', 0.109), ('controversial', 0.1), ('revisions', 0.1), ('subjectivity', 0.094), ('hedges', 0.089), ('revision', 0.088), ('entailments', 0.082), ('stance', 0.082), ('america', 0.078), ('implicatives', 0.073), ('kuypers', 0.073), ('meditation', 0.073), ('oppression', 0.073), ('subjective', 0.07), ('wiebe', 0.067), ('overtly', 0.065), ('proposition', 0.062), ('article', 0.061), ('cues', 0.06), ('subtly', 0.06), ('accepting', 0.06), ('presupposed', 0.056), ('doubt', 0.056), ('biases', 0.054), ('propositions', 0.051), ('lexicon', 0.05), ('opposing', 0.049), ('albums', 0.049), ('assertive', 0.049), ('assertives', 0.049), ('basquiat', 0.049), ('coerced', 0.049), ('colombian', 0.049), ('colony', 0.049), ('conrad', 0.049), ('conspiracists', 0.049), ('cruel', 0.049), ('demolished', 0.049), ('disproved', 0.049), ('hooper', 0.049), ('innately', 0.049), ('kiparsky', 0.049), ('mcmansion', 0.049), ('motive', 0.049), ('policemen', 0.049), ('presuppose', 0.049), ('proclaimed', 0.049), ('reproductions', 0.049), ('rewording', 0.049), ('schnabel', 0.049), ('shwekey', 0.049), ('slavery', 0.049), ('uncontroversially', 0.049), ('entails', 0.046), ('corenlp', 0.045), ('debate', 0.045), ('policy', 0.045), ('rural', 0.043), ('boeing', 0.043), ('presuppositions', 0.043), ('intensifiers', 0.043), ('yano', 0.043), ('gay', 0.043), ('encyclopedias', 0.043), ('diff', 0.043), ('pov', 0.043), ('remove', 0.041), ('editors', 0.041), ('verbs', 0.041), ('cri', 0.04), ('plea', 0.04), ('murder', 0.04), ('cooper', 0.04), ('presupposition', 0.04), ('houses', 0.04), ('horrible', 0.04), ('engineer', 0.04), ('disputes', 0.04), ('stanford', 0.04), ('sentiment', 0.039), ('areas', 0.039), ('killing', 0.037), ('sents', 0.037), ('articles', 0.037), ('accepted', 0.037), ('realized', 0.036), ('realization', 0.036), ('cast', 0.036), ('kelly', 0.036), ('recasens', 0.036), ('israeli', 0.036), ('certainty', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
2 0.10931554 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger
Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.
3 0.098270498 318 acl-2013-Sentiment Relevance
Author: Christian Scheible ; Hinrich Schutze
Abstract: A number of different notions, including subjectivity, have been proposed for distinguishing parts of documents that convey sentiment from those that do not. We propose a new concept, sentiment relevance, to make this distinction and argue that it better reflects the requirements of sentiment analysis systems. We demonstrate experimentally that sentiment relevance and subjectivity are related, but different. Since no large amount of labeled training data for our new notion of sentiment relevance is available, we investigate two semi-supervised methods for creating sentiment relevance classifiers: a distant supervision approach that leverages structured information about the domain of the reviews; and transfer learning on feature representations based on lexical taxonomies that enables knowledge transfer. We show that both methods learn sentiment relevance classifiers that perform well.
4 0.096156046 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates
Author: Kazi Saidul Hasan ; Vincent Ng
Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.
5 0.092811055 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky
Abstract: We study subjective language media and create Twitter-specific lexicons via bootstrapping sentiment-bearing terms from multilingual Twitter streams. Starting with a domain-independent, highprecision sentiment lexicon and a large pool of unlabeled data, we bootstrap Twitter-specific sentiment lexicons, using a small amount of labeled data to guide the process. Our experiments on English, Spanish and Russian show that the resulting lexicons are effective for sentiment classification for many underexplored languages in social media.
6 0.076580532 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
7 0.065697178 121 acl-2013-Discovering User Interactions in Ideological Discussions
8 0.063577801 49 acl-2013-An annotated corpus of quoted opinions in news articles
9 0.062013954 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
10 0.056863867 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning
11 0.055504851 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
12 0.053808965 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
13 0.052537911 267 acl-2013-PARMA: A Predicate Argument Aligner
14 0.052396532 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
15 0.047279503 224 acl-2013-Learning to Extract International Relations from Political Context
16 0.045401104 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
17 0.044631548 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
18 0.043268766 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
19 0.042891115 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
20 0.042710118 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
topicId topicWeight
[(0, 0.118), (1, 0.091), (2, -0.012), (3, 0.029), (4, -0.017), (5, -0.016), (6, -0.017), (7, 0.001), (8, 0.027), (9, 0.031), (10, -0.01), (11, 0.012), (12, -0.048), (13, -0.001), (14, -0.066), (15, -0.007), (16, -0.017), (17, 0.019), (18, 0.002), (19, -0.028), (20, 0.001), (21, 0.001), (22, -0.023), (23, 0.064), (24, 0.037), (25, 0.011), (26, -0.066), (27, -0.044), (28, -0.007), (29, 0.016), (30, -0.02), (31, -0.035), (32, 0.015), (33, -0.005), (34, -0.005), (35, 0.024), (36, 0.075), (37, 0.014), (38, -0.096), (39, -0.078), (40, 0.041), (41, 0.024), (42, 0.048), (43, -0.034), (44, 0.051), (45, -0.023), (46, -0.008), (47, -0.134), (48, 0.005), (49, -0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.89838701 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
2 0.80734533 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates
Author: Kazi Saidul Hasan ; Vincent Ng
Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.
3 0.75801033 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions
Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz
Abstract: Social media platforms have enabled people to freely express their views and discuss issues of interest with others. While it is important to discover the topics in discussions, it is equally useful to mine the nature of such discussions or debates and the behavior of the participants. There are many questions that can be asked. One key question is whether the participants give reasoned arguments with justifiable claims via constructive debates or exhibit dogmatism and egotistic clashes of ideologies. The central idea of this question is tolerance, which is a key concept in the field of communications. In this work, we perform a computational study of tolerance in the context of online discussions. We aim to identify tolerant vs. intolerant participants and investigate how disagreement affects tolerance in discussions in a quantitative framework. To the best of our knowledge, this is the first such study. Our experiments using real-life discussions demonstrate the effective- ness of the proposed technique and also provide some key insights into the psycholinguistic phenomenon of tolerance in online discussions.
4 0.71133912 121 acl-2013-Discovering User Interactions in Ideological Discussions
Author: Arjun Mukherjee ; Bing Liu
Abstract: Online discussion forums are a popular platform for people to voice their opinions on any subject matter and to discuss or debate any issue of interest. In forums where users discuss social, political, or religious issues, there are often heated debates among users or participants. Existing research has studied mining of user stances or camps on certain issues, opposing perspectives, and contention points. In this paper, we focus on identifying the nature of interactions among user pairs. The central questions are: How does each pair of users interact with each other? Does the pair of users mostly agree or disagree? What is the lexicon that people often use to express agreement and disagreement? We present a topic model based approach to answer these questions. Since agreement and disagreement expressions are usually multiword phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling. After modeling, we use the modeling results to classify the nature of interaction of each user pair. Our evaluation results using real-life discussion/debate posts demonstrate the effectiveness of the proposed techniques.
5 0.68526983 30 acl-2013-A computational approach to politeness with application to social factors
Author: Cristian Danescu-Niculescu-Mizil ; Moritz Sudhof ; Dan Jurafsky ; Jure Leskovec ; Christopher Potts
Abstract: We propose a computational framework for identifying linguistic aspects of politeness. Our starting point is a new corpus of requests annotated for politeness, which we use to evaluate aspects of politeness theory and to uncover new interactions between politeness markers and context. These findings guide our construction of a classifier with domain-independent lexical and syntactic features operationalizing key components of politeness theory, such as indirection, deference, impersonalization and modality. Our classifier achieves close to human performance and is effective across domains. We use our framework to study the relationship between po- liteness and social power, showing that polite Wikipedia editors are more likely to achieve high status through elections, but, once elevated, they become less polite. We see a similar negative correlation between politeness and power on Stack Exchange, where users at the top of the reputation scale are less polite than those at the bottom. Finally, we apply our classifier to a preliminary analysis of politeness variation by gender and community.
6 0.62647539 49 acl-2013-An annotated corpus of quoted opinions in news articles
7 0.59618551 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use
9 0.56020027 318 acl-2013-Sentiment Relevance
10 0.53547049 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
11 0.53187811 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
12 0.53119534 52 acl-2013-Annotating named entities in clinical text by combining pre-annotation and active learning
13 0.51592529 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
14 0.49834985 61 acl-2013-Automatic Interpretation of the English Possessive
15 0.47107556 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study
16 0.46843088 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting
17 0.44643858 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
18 0.43882707 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
19 0.43529156 33 acl-2013-A user-centric model of voting intention from Social Media
20 0.4331871 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays
topicId topicWeight
[(0, 0.028), (6, 0.024), (11, 0.042), (15, 0.492), (24, 0.044), (26, 0.05), (35, 0.063), (42, 0.03), (48, 0.026), (63, 0.021), (70, 0.025), (88, 0.02), (90, 0.013), (95, 0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.8944369 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
2 0.73157126 199 acl-2013-Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
Author: Sumire Uematsu ; Takuya Matsuzaki ; Hiroki Hanaoka ; Yusuke Miyao ; Hideki Mima
Abstract: This paper describes a method of inducing wide-coverage CCG resources for Japanese. While deep parsers with corpusinduced grammars have been emerging for some languages, those for Japanese have not been widely studied, mainly because most Japanese syntactic resources are dependency-based. Our method first integrates multiple dependency-based corpora into phrase structure trees and then converts the trees into CCG derivations. The method is empirically evaluated in terms of the coverage of the obtained lexi- con and the accuracy of parsing.
3 0.64069033 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
Author: Antske Fokkens ; Marieke van Erp ; Marten Postma ; Ted Pedersen ; Piek Vossen ; Nuno Freire
Abstract: Repeating experiments is an important instrument in the scientific toolbox to validate previous work and build upon existing work. We present two concrete use cases involving key techniques in the NLP domain for which we show that reproducing results is still difficult. We show that the deviation that can be found in reproduction efforts leads to questions about how our results should be interpreted. Moreover, investigating these deviations provides new insights and a deeper understanding of the examined techniques. We identify five aspects that can influence the outcomes of experiments that are typically not addressed in research papers. Our use cases show that these aspects may change the answer to research questions leading us to conclude that more care should be taken in interpreting our results and more research involving systematic testing of methods is required in our field.
4 0.57574344 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
Author: Vasile Rus ; Mihai Lintean ; Rajendra Banjade ; Nobal Niraula ; Dan Stefanescu
Abstract: We present in this paper SEMILAR, the SEMantic simILARity toolkit. SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented semantic similarity methods. Furthermore, it offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool). 1
5 0.54983836 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
Author: Weiwei Guo ; Hao Li ; Heng Ji ; Mona Diab
Abstract: Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-toNews task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previ- ous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.
6 0.54109883 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
7 0.35113505 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
8 0.33994266 250 acl-2013-Models of Translation Competitions
9 0.33684918 318 acl-2013-Sentiment Relevance
10 0.33024043 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
11 0.32480749 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
12 0.32284129 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
13 0.32153487 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
14 0.32107934 163 acl-2013-From Natural Language Specifications to Program Input Parsers
15 0.31794268 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
16 0.31382075 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
17 0.31266314 30 acl-2013-A computational approach to politeness with application to social factors
18 0.30661938 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
19 0.3065801 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance
20 0.30576569 33 acl-2013-A user-centric model of voting intention from Social Media