acl acl2013 acl2013-287 knowledge-graph by maker-knowledge-mining

287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Source: pdf

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

Abstract: Social media platforms have enabled people to freely express their views and discuss issues of interest with others. While it is important to discover the topics in discussions, it is equally useful to mine the nature of such discussions or debates and the behavior of the participants. There are many questions that can be asked. One key question is whether the participants give reasoned arguments with justifiable claims via constructive debates or exhibit dogmatism and egotistic clashes of ideologies. The central idea of this question is tolerance, which is a key concept in the field of communications. In this work, we perform a computational study of tolerance in the context of online discussions. We aim to identify tolerant vs. intolerant participants and investigate how disagreement affects tolerance in discussions in a quantitative framework. To the best of our knowledge, this is the first such study. Our experiments using real-life discussions demonstrate the effective- ness of the proposed technique and also provide some key insights into the psycholinguistic phenomenon of tolerance in online discussions.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 While it is important to discover the topics in discussions, it is equally useful to mine the nature of such discussions or debates and the behavior of the participants. [sent-4, score-0.291]

2 One key question is whether the participants give reasoned arguments with justifiable claims via constructive debates or exhibit dogmatism and egotistic clashes of ideologies. [sent-6, score-0.331]

3 In this work, we perform a computational study of tolerance in the context of online discussions. [sent-8, score-0.513]

4 intolerant participants and investigate how disagreement affects tolerance in discussions in a quantitative framework. [sent-10, score-1.246]

5 Our experiments using real-life discussions demonstrate the effective- ness of the proposed technique and also provide some key insights into the psycholinguistic phenomenon of tolerance in online discussions. [sent-12, score-0.723]

6 In this work, we adopt this definition, and also employ the following characteristics of tolerance (also known as “code of conduct”) (Crocker, 2005; Gutmann and Thompson, 1996) to guide our work. [sent-24, score-0.453]

7 , using proper language irrespective of agreement or disagreement of views. [sent-30, score-0.335]

8 The issue of tolerance has been actively researched in the field of communications for the past two decades, and has been investigated in multiple dimensions. [sent-31, score-0.522]

9 However, existing studies are typically qualitative and focus on theorizing the socio-linguistic aspects of tolerance (more details in §2). [sent-32, score-0.453]

10 With the rapid growth of social media, the large volumes of online discussions/debates offer a golden opportunity to investigate people’s implicit psyche in discussions quantitatively based on the real-life data, i. [sent-33, score-0.321]

11 , their tolerance levels and their arguing nature, which are of fundamental interest to several fields, e. [sent-35, score-0.505]

12 Analyzing how disagreement affects tolerance and estimating the tipping point of such effects. [sent-44, score-1.054]

13 These allow us to generate a set of novel features from the estimated latent variables of DTM capable of capturing authors’ tolerance psyche during discussions. [sent-52, score-0.524]

14 The features are then used in learning to identify tolerant and intolerant authors. [sent-53, score-0.548]

15 The second task studies the interplay of tolerance and disagreement. [sent-55, score-0.484]

16 It is well-known that tolerance facilitates constructive disagreements, but sustained disagreements often result in a transition to destructive disagreement leading to polarization and intolerance (Dahlgren, 2005). [sent-56, score-1.024]

17 An interesting question is: What is the tipping point of disagreement to exhibit intolerance? [sent-57, score-0.636]

18 We take a Bayesian approach to seek an answer and discover issue-specific tipping points. [sent-58, score-0.27]

19 Finally, this work also produces an annotated corpus of tolerant and intolerant users in online discussions across two domains: politics and religion. [sent-60, score-0.883]

20 2 Related Work Although limited work has been done on analysis of tolerance in online discussions, there are several general research areas that are related to our work. [sent-62, score-0.513]

21 Tolerance has also been investigated in the domain of political communications with an emphasis on political sophistication (Gastil and Dillard, 1999), civic culture (Dahlgren, 2002), and democracy (Fishkin, 1991). [sent-71, score-0.341]

22 These existing works study tolerance from the qualitative perspective. [sent-72, score-0.453]

23 Tolerance in discussions refers to the reception of certain views and often indicated by agreement and disagreement expressions and other features (§5). [sent-81, score-0.511]

24 Online discussions or debates: Several works put authors in debate into support and oppose camps. [sent-82, score-0.296]

25 However, these works do not consider tolerance analysis in debate discussions, which is the focus of this work. [sent-89, score-0.566]

26 1681 In a similar vein, several classification methods have been proposed to recognize opinion stances and speaker sides in online debates (Somasundaran and Wiebe, 2009; Thomas et al. [sent-90, score-0.252]

27 Other related works studying dialogue and discourse in discussions include authority recognition (Mayfield and Rosè, 201 1), dialogue act segmentation and classification (Morbini and Sagae, 201 1; Boyer et al. [sent-99, score-0.271]

28 But they are not designed to identify tolerance or to ana- lyze tipping points of disagreements for intolerance in discussions which are the focus of this work. [sent-103, score-1.042]

29 The full data is used for modeling, but 436 and 501 authors from Politics and Religion domains were manually labeled as being tolerant or intolerant (Table 1(c)) respectively for classification experiments. [sent-112, score-0.635]

30 The judges are fluent in English and were briefed on the definition of tolerance (see § 1). [sent-114, score-0.497]

31 From each domain (Politics, Religion), we randomly sampled authors having not more than 60 posts in order to reduce the labeling burden as the judges need to read all posts and see all interactions of each author before providing a label. [sent-115, score-0.414]

32 In our labeling, we found that users strongly exhibit one dominant trait: tolerant or intolerant, as our data consists of topics like elections, immigration, theism, terrorism, and vegetarianism across politics and religion domains, RPDeolmigtaicosn 46P8os63t0s5 A1u30t72h07ors Co0 h. [sent-120, score-0.517]

33 This shows that tolerance as defined in § 1 is quite decisive and one can decide whether a debater is exhibiting tolerant vs. [sent-130, score-0.661]

34 4 Model We now present our generative model to capture the key aspects of discussions/debates and their intricate relationships, which enable us to (1) design sophisticated features for classification and (2) perform an in-depth analysis of the interplay of disagreement and tolerance. [sent-134, score-0.354]

35 DTM is a semi-supervised generative model motivated by the joint occurrence of various topics; and agreement and disagreement expressions (abbreviated AD-expressions hereon) in debate posts. [sent-136, score-0.503]

36 1682 ting, we model topics and debate expression distributions specific to authors as this work is concerned with modeling authors’ (in)tolerance nature. [sent-157, score-0.228]

37 These will be used to produce a rich set of user behavioral features for characterizing tolerance in §5. [sent-636, score-0.559]

38 5 Feature Engineering We now propose features which will be used for model building to classify tolerant and intolerant authors in Table 1(c). [sent-638, score-0.579]

39 1 Language based Features of Tolerance Word and POS n-grams: As tolerance in communication is directly reflected in language usage, word n-grams are obvious features. [sent-641, score-0.512]

40 The rationale of using POS tag based features is that intolerant communications are often characterized by hate/egotistic speech which have pronounced use of specific part of speech (e. [sent-643, score-0.376]

41 As tolerance in discussions is characterized by reasoned expressions which often accompany sourcing (e. [sent-648, score-0.686]

42 2 Debate Expression Features AD-expressions: As we have seen in §4, DTM can discover specific agreement and disagreement expressions in debates. [sent-658, score-0.39]

43 3 User Behavioral Features Here we propose several features of user interaction which reflect the socio-psychological state of tolerance while participating in discussions. [sent-682, score-0.496]

44 We use the probability mass assigned to each arguing nature type as a user behavioral feature. [sent-694, score-0.205]

45 (3) Behavioral Response: As intolerant users are likely to attract more disagreement, it is naturally useful to estimate the response (agreeing vs. [sent-714, score-0.398]

46 ]� (6) where the inner expectation is taken over all posts of ? [sent-853, score-0.178]

47 The aggressive posting behavior is weighted by author’s disagreeing nature because a person usually exhibits a dominating nature when he pushes hard to establish his ideology (which is often in disagreement with others) (Moxey and Sanford, 2000). [sent-860, score-0.476]

48 Topic Shifts: An interesting phenomenon of human (social) psyche is that when people are unable to logically argue their stances and feel they are losing the debate, they often try to belittle/deride others by pulling unrelated topics into discussion (Slavin and Kriegman, 1992). [sent-861, score-0.182]

49 Topic shifts thus have a relation with tolerance in deliberation. [sent-863, score-0.503]

50 StromerGalley (2005) reported that if the discussion is off topic, then tolerance or deliberation cannot meet its objective of deep consideration of an issue. [sent-864, score-0.524]

51 across various posts in a thread can serve as a good feature for measuring tolerance. [sent-866, score-0.234]

52 Finally, we note that by no means do we claim that the mere presence and a large value of any of the above features imply that a user is intolerant or tolerant. [sent-914, score-0.383]

53 They are indicators of the phenomenon of tolerance in discussions/debates. [sent-915, score-0.487]

54 In particular, we first consider the task of classifying whether an author is tolerant or intolerant in discussions. [sent-919, score-0.617]

55 We employ a linear kernel 5 SVM (using the SVMLight system (Joachims, 1999)) and report 5fold cross validation (CV) results on the task of predicting the socio-psychological nature of users’ communication: tolerant vs. [sent-923, score-0.255]

56 was then fitted (using the approach in (Hofmann, 1999)) to the test set users and their posts for generating the features of the test instances. [sent-927, score-0.193]

57 Using heuristic factor analyses (HFA) of reasoned and sourced expressions (Table 4) brings about 1% and 2% improvement in ac- curacy in politics and religion domains respectively. [sent-938, score-0.319]

58 3 produced from DTM progressively improve classification accuracies by 4% and 8% in politics domains and 5% and 6% in religion domains. [sent-942, score-0.235]

59 This shows that the debate expressions and user behaviors computed using the DTM model can capture various dimensions of (in)tolerance not captured by n-grams. [sent-946, score-0.211]

60 We now quantitatively study the effect of disagreement on tolerance. [sent-949, score-0.335]

61 We recall from § 1 that tolerance indicates constructive discussion and allows disagreement. [sent-950, score-0.498]

62 Some level of disagreement is often times an integral component of deliberation and tolerance (Cappella et al. [sent-951, score-0.819]

63 The distinction is that the former is aimed at arriving at a consensus or solution, while the latter leads to polarization and intolerance (Sunstein, 2002). [sent-954, score-0.189]

64 a0 el321u expctd disagreement over all posts in each issue/thread, # Posts: the total number of posts, # Users: the total number of users/authors, % Intol: % of intolerant users in each thread, ? [sent-977, score-0.828]

65 : the estimated tipping point, and p-value: computed from two-tailed Fisher’s exact test. [sent-978, score-0.27]

66 greement often takes a transition towards destructive disagreement and is likely to lead to intolerance. [sent-979, score-0.295]

67 In such cases, the participants often stubbornly stick to an extreme attitude, which eventually results in intolerance and defeats the very purpose of deliberative discussion. [sent-981, score-0.306]

68 An intriguing research question is: What is the relationship between disagreement and intolerance? [sent-982, score-0.295]

69 To derive quantitative and definite conclusions, it is required to perform the following tasks: • For each issue, empirically investigate in expectation the tipping point of disagreement beyond which a user tends to be intolerant. [sent-986, score-0.687]

70 • Further, investigate the confidence on the estimated tipping point (i. [sent-987, score-0.306]

71 , what is the likelihood that the estimated tipping point is statistically significant instead of chance alone). [sent-989, score-0.306]

72 ) are the estimates of agreeing and disagreeing nature of an author and ? [sent-1004, score-0.284]

73 < 1 serves as a tipping point of disagreement beyond which intolerance is exhibited. [sent-1022, score-0.757]

74 indeed serves as the tipping point of disagreement to exhibit intolerance corresponds is to rejecting the null hypothesis that the probabilities in (8) are equal. [sent-1073, score-0.792]

75 We employ a Fisher’s exact test to test significance and report confidence measures (using p-values) for the tipping point thresholds. [sent-1074, score-0.306]

76 is computed using the entropy method in (Fayyad and Irani, 1993) as follows: We first fit our previously learned model (using the data in Table 1 (a)) to the new threads in Ta- ble 6 and its users and posts to obtain the estimates of ? [sent-1077, score-0.243]

77 Then, for each user we have his predicted deliberative (social) psyche (Tolerant vs. [sent-1089, score-0.227]

78 Intolerant) and also his overall disagreeing nature exhibited in that thread (the posterior on ? [sent-1090, score-0.274]

79 For a thread, tolerant and intolerant users (data points) span the range [0, 1] attaining different values for ? [sent-1099, score-0.606]

80 Each candidate tipping point of disagreement, 0 ≤ ? [sent-1108, score-0.306]

81 ′ ≤ 1results in a binary partition of the range with each partition containing some proportion of tolerant and intolerant users. [sent-1109, score-0.548]

82 We compute the entropy of the partition for every candidate tipping point in the range [0, 1]. [sent-1110, score-0.306]

83 Across all threads/issues, we find that the expected disagreement over all posts, ? [sent-1115, score-0.295]

84 5 showing that in discussions of the reported issues, disagreement predominates. [sent-1127, score-0.416]

85 The percentage of intolerant users increases with the expected overall disagreement in the issue except for the issue Obama euphoria. [sent-1142, score-0.759]

86 The estimated tipping point of disagreement to exhibit intolerance, ? [sent-1144, score-0.636]

87 This reflects that as overall disagreement in the issue increases, the tipping point of intolerance decreases, i. [sent-1156, score-0.79]

88 , due to high discussion heat, people are likely to turn intolerant even with relatively small amount of disagreement. [sent-1158, score-0.34]

89 As judging all users across all threads would require reading about 7000 posts, for confirmation, we randomly sampled 30 authors across various threads for labeling by our judges. [sent-1161, score-0.189]

90 Table 6 shows that for moderately heated issues (healthcare, Europe ’s collapse), in expectation, author’s disagreement ? [sent-1164, score-0.375]

91 However, for sensitive issues, we find that the tipping point is much lower, abortion: 37%; socialism: 48%. [sent-1173, score-0.306]

92 � = 66% overall disagreement, the percentage of intolerant users remains the lowest (30%) and the tipping point attains a highest value (? [sent-1186, score-0.704]

93 7 Conclusion This work performed a deep analysis of the sociopsychological and psycholinguistic phenomenon of tolerance in online discussions, which is an important concept in the field of communications. [sent-1193, score-0.602]

94 A novel framework is proposed, which is capable of characterizing and classifying tolerance in online discussions. [sent-1194, score-0.513]

95 Further, a novel technique was also proposed to quantitatively evaluate the interplay of tolerance and disagreement. [sent-1195, score-0.524]

96 Our empirical results using real-life online discussions render key insights into the psycholinguistic process of tolerance and dovetail with existing theories in psychology and communications. [sent-1196, score-0.689]

97 In our future work, we want to further this research and study the role of diversity of opinions in the context of tolerance and its relation to polarization. [sent-1198, score-0.491]

98 The power of negative thinking: Exploiting label disagreement in the min-cut classification framework. [sent-1223, score-0.323]

99 In search of the talkative public: Media, deliberative democracy and civic culture. [sent-1289, score-0.222]

100 Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. [sent-1319, score-0.335]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tolerance', 0.453), ('intolerant', 0.34), ('disagreement', 0.295), ('tipping', 0.27), ('tolerant', 0.208), ('dtm', 0.201), ('intolerance', 0.156), ('posts', 0.135), ('discussions', 0.121), ('dahlgren', 0.113), ('deliberative', 0.113), ('debate', 0.113), ('mukherjee', 0.111), ('thread', 0.099), ('political', 0.098), ('politics', 0.096), ('disagreeing', 0.087), ('debates', 0.086), ('gastil', 0.085), ('religion', 0.083), ('agreeing', 0.081), ('deliberation', 0.071), ('democracy', 0.071), ('psyche', 0.071), ('author', 0.069), ('behavioral', 0.063), ('dialogue', 0.061), ('online', 0.06), ('communication', 0.059), ('users', 0.058), ('cappella', 0.057), ('moxey', 0.057), ('reasoned', 0.057), ('sanford', 0.057), ('psycholinguistic', 0.055), ('expressions', 0.055), ('pennebaker', 0.052), ('arguing', 0.052), ('adexpressions', 0.05), ('heated', 0.05), ('shifts', 0.05), ('threads', 0.05), ('nature', 0.047), ('expression', 0.047), ('liu', 0.045), ('constructive', 0.045), ('judges', 0.044), ('expectation', 0.043), ('user', 0.043), ('boyer', 0.043), ('dogmatism', 0.043), ('fishkin', 0.043), ('intol', 0.043), ('kriegman', 0.043), ('slavin', 0.043), ('disagreements', 0.042), ('obama', 0.042), ('draw', 0.042), ('posterior', 0.041), ('agreement', 0.04), ('quantitatively', 0.04), ('stances', 0.04), ('topic', 0.039), ('opinions', 0.038), ('opinion', 0.038), ('irani', 0.038), ('fayyad', 0.038), ('sunstein', 0.038), ('civic', 0.038), ('morbini', 0.038), ('topical', 0.037), ('chung', 0.037), ('participants', 0.037), ('topics', 0.037), ('point', 0.036), ('communications', 0.036), ('exhibit', 0.035), ('thinking', 0.034), ('phenomenon', 0.034), ('media', 0.033), ('issue', 0.033), ('polarization', 0.033), ('agrawal', 0.033), ('oppose', 0.031), ('interplay', 0.031), ('authors', 0.031), ('member', 0.031), ('public', 0.031), ('issues', 0.03), ('mayfield', 0.03), ('elections', 0.029), ('social', 0.029), ('domains', 0.028), ('classification', 0.028), ('clashes', 0.028), ('critchley', 0.028), ('crocker', 0.028), ('dillard', 0.028), ('escobar', 0.028), ('fruchter', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000019 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

2 0.42768732 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

Abstract: Online discussion forums are a popular platform for people to voice their opinions on any subject matter and to discuss or debate any issue of interest. In forums where users discuss social, political, or religious issues, there are often heated debates among users or participants. Existing research has studied mining of user stances or camps on certain issues, opposing perspectives, and contention points. In this paper, we focus on identifying the nature of interactions among user pairs. The central questions are: How does each pair of users interact with each other? Does the pair of users mostly agree or disagree? What is the lexicon that people often use to express agreement and disagreement? We present a topic model based approach to answer these questions. Since agreement and disagreement expressions are usually multiword phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling. After modeling, we use the modeling results to classify the nature of interaction of each user pair. Our evaluation results using real-life discussion/debate posts demonstrate the effectiveness of the proposed techniques.

3 0.13611557 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

Author: Kazi Saidul Hasan ; Vincent Ng

Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.

4 0.10785592 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev

Abstract: In this paper, we use Arabic natural language processing techniques to analyze Arabic debates. The goal is to identify how the participants in a discussion split into subgroups with contrasting opinions. The members of each subgroup share the same opinion with respect to the discussion topic and an opposing opinion to the members of other subgroups. We use opinion mining techniques to identify opinion expressions and determine their polarities and their targets. We opinion predictions to represent the discussion in one of two formal representations: signed attitude network or a space of attitude vectors. We identify opinion subgroups by partitioning the signed network representation or by clustering the vector space representation. We evaluate the system using a data set of labeled discussions and show that it achieves good results.

5 0.077385597 49 acl-2013-An annotated corpus of quoted opinions in news articles

Author: Tim O'Keefe ; James R. Curran ; Peter Ashwell ; Irena Koprinska

Abstract: Quotes are used in news articles as evidence of a person’s opinion, and thus are a useful target for opinion mining. However, labelling each quote with a polarity score directed at a textually-anchored target can ignore the broader issue that the speaker is commenting on. We address this by instead labelling quotes as supporting or opposing a clear expression of a point of view on a topic, called a position statement. Using this we construct a corpus covering 7 topics with 2,228 quotes.

6 0.070624508 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

7 0.068780944 224 acl-2013-Learning to Extract International Relations from Political Context

8 0.067822158 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

9 0.066977955 33 acl-2013-A user-centric model of voting intention from Social Media

10 0.063729644 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

11 0.063551322 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction

12 0.060480889 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

13 0.054425597 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

14 0.054053888 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

15 0.053427808 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

16 0.052069545 244 acl-2013-Mining Opinion Words and Opinion Targets in a Two-Stage Framework

17 0.050504915 336 acl-2013-Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

18 0.05019654 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

19 0.048629284 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media

20 0.048067451 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, 0.138), (2, -0.013), (3, 0.061), (4, 0.026), (5, 0.005), (6, 0.037), (7, -0.046), (8, -0.065), (9, -0.019), (10, -0.025), (11, 0.067), (12, 0.014), (13, 0.028), (14, -0.076), (15, -0.097), (16, -0.052), (17, 0.067), (18, 0.061), (19, -0.083), (20, -0.01), (21, -0.024), (22, -0.008), (23, -0.033), (24, 0.026), (25, 0.03), (26, -0.119), (27, -0.068), (28, 0.005), (29, -0.043), (30, 0.113), (31, -0.027), (32, 0.067), (33, 0.023), (34, -0.005), (35, 0.072), (36, 0.15), (37, -0.023), (38, -0.139), (39, -0.051), (40, 0.145), (41, 0.028), (42, 0.133), (43, -0.179), (44, 0.073), (45, 0.022), (46, -0.091), (47, -0.269), (48, 0.068), (49, -0.044)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94148546 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

2 0.86660421 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

Author: Kazi Saidul Hasan ; Vincent Ng

3 0.80056643 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

4 0.61369687 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language

Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky

Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.

5 0.57201576 49 acl-2013-An annotated corpus of quoted opinions in news articles

Author: Tim O'Keefe ; James R. Curran ; Peter Ashwell ; Irena Koprinska

6 0.51123983 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

7 0.49336138 30 acl-2013-A computational approach to politeness with application to social factors

8 0.43106544 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

9 0.42611435 33 acl-2013-A user-centric model of voting intention from Social Media

10 0.4220978 54 acl-2013-Are School-of-thought Words Characterizable?

11 0.41571203 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

12 0.39906302 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

13 0.36328328 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

14 0.362872 184 acl-2013-Identification of Speakers in Novels

15 0.35473716 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

16 0.33826524 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

17 0.33615908 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs

18 0.33540428 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

19 0.33322936 224 acl-2013-Learning to Extract International Relations from Political Context

20 0.32916683 126 acl-2013-Diverse Keyword Extraction from Conversations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.044), (4, 0.105), (6, 0.023), (11, 0.052), (15, 0.021), (18, 0.248), (24, 0.046), (26, 0.046), (28, 0.012), (35, 0.077), (42, 0.031), (48, 0.045), (63, 0.022), (70, 0.04), (88, 0.035), (90, 0.025), (95, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.7911706 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

2 0.68532896 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

Abstract: Although diathesis alternations have been used as features for manual verb classification, and there is recent work on incorporating such features in computational models of human language acquisition, work on large scale verb classification has yet to examine the potential for using diathesis alternations as input features to the clustering process. This paper proposes a method for approximating diathesis alternation behaviour in corpus data and shows, using a state-of-the-art verb clustering system, that features based on alternation approximation outperform those based on independent subcategorization frames. Our alternation-based approach is particularly adept at leveraging information from less frequent data.

3 0.62914574 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

Author: Shafiq Joty ; Giuseppe Carenini ; Raymond Ng ; Yashar Mehdad

Abstract: We propose a novel approach for developing a two-stage document-level discourse parser. Our parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Conditional Random Fields: one for intrasentential parsing and the other for multisentential parsing. We present two approaches to combine these two stages of discourse parsing effectively. A set of empirical evaluations over two different datasets demonstrates that our discourse parser significantly outperforms the stateof-the-art, often by a wide margin.

4 0.56114864 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Author: Asli Celikyilmaz ; Dilek Hakkani-Tur ; Gokhan Tur ; Ruhi Sarikaya

Abstract: Microsoft Research Microsoft Mountain View, CA, USA Redmond, WA, USA dilek @ ieee .org rus arika@mi cro s o ft . com gokhan .tur @ ieee .org performance (Tur and DeMori, 2011). This requires a tedious and time intensive data collection Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity. Furthermore, data mismatch issues, which arise when the expected test (target) data does not exactly match the training data, aggravate this scarcity problem. To deal with these issues, we describe an efficient semisupervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data. Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context. (ii) Retrospective Learner is a new learning technique that adapts to the unlabeled target data. Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging.

5 0.55675018 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang

Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.

6 0.55180532 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

7 0.54652119 121 acl-2013-Discovering User Interactions in Ideological Discussions

8 0.54233849 294 acl-2013-Re-embedding words

9 0.5116455 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

10 0.50560021 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

11 0.48874059 318 acl-2013-Sentiment Relevance

12 0.48750207 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

13 0.48599854 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

14 0.48308524 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

15 0.48285258 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

16 0.48229629 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

17 0.48148948 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm

18 0.4809365 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

19 0.48054343 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

20 0.47990274 219 acl-2013-Learning Entity Representation for Entity Disambiguation