acl acl2013 acl2013-121 knowledge-graph by maker-knowledge-mining

121 acl-2013-Discovering User Interactions in Ideological Discussions

Source: pdf

Author: Arjun Mukherjee ; Bing Liu

Abstract: Online discussion forums are a popular platform for people to voice their opinions on any subject matter and to discuss or debate any issue of interest. In forums where users discuss social, political, or religious issues, there are often heated debates among users or participants. Existing research has studied mining of user stances or camps on certain issues, opposing perspectives, and contention points. In this paper, we focus on identifying the nature of interactions among user pairs. The central questions are: How does each pair of users interact with each other? Does the pair of users mostly agree or disagree? What is the lexicon that people often use to express agreement and disagreement? We present a topic model based approach to answer these questions. Since agreement and disagreement expressions are usually multiword phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling. After modeling, we use the modeling results to classify the nature of interaction of each user pair. Our evaluation results using real-life discussion/debate posts demonstrate the effectiveness of the proposed techniques.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In forums where users discuss social, political, or religious issues, there are often heated debates among users or participants. [sent-2, score-0.384]

2 Existing research has studied mining of user stances or camps on certain issues, opposing perspectives, and contention points. [sent-3, score-0.356]

3 In this paper, we focus on identifying the nature of interactions among user pairs. [sent-4, score-0.257]

4 Since agreement and disagreement expressions are usually multiword phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling. [sent-9, score-1.001]

5 After modeling, we use the modeling results to classify the nature of interaction of each user pair. [sent-10, score-0.3]

6 There have been some related works that focus on discovering the general topics and ideological perspectives in online discussions (Ahmed and Xing, 2010), placing users in support/oppose camps (Agarwal et al. [sent-22, score-0.645]

7 , 2003), and classifying user stances (Somasundaran and Wiebe, 2009). [sent-23, score-0.23]

8 However, these works are at a rather coarser level and have not considered more fine-grained characteristics of debates/discussions where users interact with each other by quoting/replying each other to express agreement or disagreement and argue with one another. [sent-24, score-0.567]

9 The nature of interaction of each pair of users or participants who have engaged in the discussion of certain issues, i. [sent-26, score-0.396]

10 What language expressions are often used to express agreement (e. [sent-30, score-0.254]

11 , “I agree” and “you’re right”) and disagreement (e. [sent-32, score-0.312]

12 We note that although agreement and disagreement expressions are distinct from traditional sentiment expressions (words and phrases) such as good, excellent, bad, and horrible, agreement and disagreement clearly express a kind of sentiment as well. [sent-35, score-1.438]

13 They are usually emitted during interactive exchanges of arguments in ideological discussions. [sent-36, score-0.272]

14 We define the polarity of agreement expressions as positive and the polarity of disagreement expressions as negative. [sent-38, score-0.642]

15 We refer agreement and disagreement expressions as ADsentiment expressions, or AD-expressions for short. [sent-39, score-0.566]

16 AD-expressions are crucial for the analysis of interactive discussions and debates just as sentiment expressions are instrumental in sentiment analysis (Liu, 2012). [sent-40, score-0.558]

17 In our earlier work (Mukherjee and Liu, 2012a), we proposed three topic models to mine contention points, which also extract ADexpressions. [sent-44, score-0.205]

18 In this paper, we further improve the work by coupling an information retrieval method to rank good candidate phrases with topic modeling in order to discover more accurate ADexpressions. [sent-45, score-0.385]

19 Furthermore, we apply the resulting AD-expressions to the new task of classifying the arguing or interaction nature of each pair of users. [sent-46, score-0.496]

20 We employ a semi-supervised generative model called JTE-P to jointly model AD-expressions, pair interactions, and discussion topics simultaneously in a single framework. [sent-48, score-0.3]

21 For example, we can discover the most contentious pairs for each topic and ideological camps of participants, i. [sent-50, score-0.38]

22 As discussed earlier, agreement and disagreement are a special form of sentiments and are different from the sentiment studied in the mainstream research. [sent-60, score-0.702]

23 Traditional sentiment is mainly expressed with sentiment terms (e. [sent-61, score-0.387]

24 , great and bad), while agreement and disagreement are inferred by AD-expressions (e. [sent-63, score-0.49]

25 Topic models: Our work is also related to topic modeling and joint modeling of topics and other information as we jointly model several aspects of discussions/debates. [sent-67, score-0.353]

26 Yet other approaches extend topic models to produce author specific topics (Rosen-Zvi et al. [sent-73, score-0.31]

27 However, these models do not model debates and hence are unable to discover AD-expressions and interaction natures of author pairs. [sent-76, score-0.301]

28 Also related are topic models in sentiment analysis which are often referred to as Aspect and Sentiment models (ASMs). [sent-77, score-0.298]

29 , discovering positive and negative topic words and sentiments for each topic without separating topic and sentiment terms) (e. [sent-80, score-0.701]

30 , 2003), speaker 672 utterances were classified into agreement, disagreement and backchannel classes. [sent-107, score-0.356]

31 , 2013), mining opposing perspectives (Lin and Hauptmann, 2006), linguistic accommodation (Mukherjee and Liu, 2012c), and contention point mining (Mukherjee and Liu, 2012a). [sent-115, score-0.285]

32 We propose a new method to improve the AD-expression mining and a new task of classifying pair interaction nature to determine whether each pair of users who have interacted based on replying relations mostly agree or disagree with each other. [sent-117, score-0.748]

33 JTE-P is a semi-supervised generative model motivated by the joint occurrence of expression types (agreement and disagreement), topics in discussion posts, and user pairwise interactions. [sent-119, score-0.391]

34 In a typical debate/discussion post, the user (author) mentions a few topics (using semantically related topical terms) and expresses some viewpoints with one or more ADexpression types (using agreement and disagreement expressions). [sent-121, score-0.856]

35 In our crawled dataset, 77% of all posts exhibit explicit quoting/reply-to relations excluding the first posts of threads which start the discussions and usually have nobody to quote/reply-to. [sent-126, score-0.394]

36 The discussion topics and AD-expressions emitted are thus caused by the author-pairs’ topical interests and their nature of interaction (agreeing vs. [sent-128, score-0.592]

37 com, we found that a pair of users typically exhibited a dominant arguing nature Figure 1: JTE-P Model in plate notation. [sent-131, score-0.465]

38 exhibits shared topics and arguing nature of various pairs, ? [sent-151, score-0.377]

39 More precisely, the pair specific topic and AD-expression … distributions ( ? [sent-159, score-0.229]

40 ) “shape” the topics and AD-expressions emitted in ? [sent-165, score-0.218]

41 as agreement and disagreement on topical viewpoints are directed towards certain target authors. [sent-166, score-0.674]

42 = 2 as in debates, we mostly find two expression types: agreement and disagreement (more details in §6. [sent-180, score-0.601]

43 Instead of using all n-grams, a relevance based ranking method is proposed to select a subset of highly relevant n-grams for model building (details in §4). [sent-186, score-0.311]

44 The idea is motivated by the observation that topical and AD-expression terms usually play different roles in a sentence. [sent-205, score-0.22]

45 674 4 Phrase Ranking based on Relevance We now detail our method of pre-processing ngrams (phrases) based on relevance to select a subset of highly relevant n-grams for model building. [sent-607, score-0.26]

46 For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution. [sent-617, score-0.257]

47 Yet another thread of research post-processes the discovered topical unigrams to form multiword phrases using likelihood scores (Blei and Lafferty, 2009). [sent-620, score-0.465]

48 Next, we rank the candidate phrases (n-grams) using our probabilistic ranking function. [sent-652, score-0.228]

49 The ranking function is grounded on the following hypothesis: a relevant phrase is one whose unigrams are closely related to (or appear with high probabilities in) the given AD-expression type, ? [sent-653, score-0.248]

50 , 201 1) for deriving our relevance ranking function as follows: ? [sent-691, score-0.24]

51 ) , because there are many more irrelevant phrases than relevant ones, i. [sent-773, score-0.212]

52 (4) Thus, our ranking function actually computes the relevance score The last term, log ? [sent-822, score-0.302]

53 Precisely, we want to analyze the coverage of our proposed ranking based on relevance models. [sent-944, score-0.29]

54 the proposed relevance ranking, we want to get an estimate of how many relevant terms from a sample of the collection were covered. [sent-953, score-0.296]

55 Finally, a term was considered to be relevant if both judges marked it so. [sent-960, score-0.231]

56 We then computed the coverage to see how many of the relevant terms in the random sample were also present in top k phrases from the ranked candidate n-grams. [sent-961, score-0.375]

57 4 No agreement (κ < 0), slight agreement (0 < κ ≤ 0. [sent-964, score-0.356]

58 8 < κ ≤ coverage results below moderate agreement (0. [sent-969, score-0.228]

59 We find that choosing top k = 5000 candidate ngrams based on our proposed ranking, we obtain a coverage of 87% for agreement and 89. [sent-979, score-0.365]

60 Thus, we choose top 5000 candidate n-grams for each expression type and add them to the vocabulary beyond all unigrams. [sent-981, score-0.203]

61 However, for topics, selecting k based on coverage of each topic is more difficult because we induce 50 topics and it is also much more difficult to manually find relevant topical phrases in the sampled data as a topical phrase may belong to more than one topic. [sent-993, score-0.839]

62 We selected top 2000 ranked candidate phrases for each topic using ? [sent-994, score-0.318]

63 Note that phrases for topics are not as crucial as for AD-expressions because topics can more or less be defined by unigrams. [sent-1001, score-0.317]

64 5 Classifying Pair Interaction Nature We now determine whether two users (also called a user pair) mostly agree or disagree with each other in their exchanges, i. [sent-1002, score-0.339]

65 However, above works do not discover pair interactions (arguing nature) in debate authors. [sent-1032, score-0.318]

66 We first evaluate the discovered AD-expressions by comparing results with and without using the phrase ranking method in Section 4, and then evaluate the classification of interaction nature of pairs. [sent-1091, score-0.446]

67 For each post, we extracted the post id, author, domain, ids of all posts to which it replies/quotes, and the post content. [sent-1097, score-0.309]

68 The reduced dataset consists of 1095586 tokens (after n-gram preprocessing in §4), 40102 posts with an average of 27 posts or interactions per pair. [sent-1101, score-0.411]

69 3) appearing at least 10 times and labeled them as topical (361) or AD-expressions (139) and used the corresponding features of each term (in the context of posts where it occurs, §3) to train the Max-Ent model. [sent-1126, score-0.415]

70 , Web search results, aspect terms in topic models for sentiment analysis (Zhao et al. [sent-1148, score-0.421]

71 , the Dirichlet smoothing effect ensures that every term in the vocabulary has some nonzero mass to agreement or disagreement expression type. [sent-1159, score-0.659]

72 560 … … …, Table 5: Results using all tokens (without applying phrase relevance ranking) for P@50, 100, 150 and 500 labeled examples were used for Max-Ent (ME) training). [sent-1180, score-0.247]

73 We also studied interrater agreement using two judges who independently labeled the top n terms as correct or incorrect. [sent-1188, score-0.465]

74 78 for all p@n computations implying substantial and good agreements as identifying whether a phrase implies agreement or disagreement or none is an easy task. [sent-1196, score-0.53]

75 P@n excluding ME labeled terms (Table 4, second column) are slightly lower than those using all terms but are still decent. [sent-1197, score-0.225]

76 Further to evaluate the sensitivity of performance on the amount of labeled terms for Max-Ent, we computed p@n across different sizes of labeled terms. [sent-1199, score-0.207]

77 Table 4 shows p@n for agreement and disagreement expressions across different sizes of labeled terms (L). [sent-1200, score-0.71]

78 The result in Table 4 uses relevance ranking (§4). [sent-1203, score-0.24]

79 Clearly, P@n is lower than in Table 4 (last row; with phrase relevance ranking) because without phrase relevance ranking (Table 5) many irrelevant terms can rank high due to co- occurrences which may not be semantically related. [sent-1206, score-0.605]

80 This shows that relevance ranking of phrases is beneficial. [sent-1207, score-0.321]

81 3 Pair Interaction Nature We now evaluate the overall interaction nature of each pair of users. [sent-1209, score-0.275]

82 The evaluation of this task requires human judges to read all the posts where the two users forming the pair have interacted. [sent-1210, score-0.418]

83 Two human judges were asked to independently read all the post interactions of 500 pairs and label each pair as overall “disagreeing” or overall “agreeing” or “none”. [sent-1213, score-0.364]

84 Pairs were finally labeled as agreeing or disagreeing if both judges deemed them so. [sent-1221, score-0.491]

85 This resulted in 320 disagreeing and 152 agreeing pairs. [sent-1222, score-0.326]

86 Out of the rest 28 pairs, 10 were marked “none” by both judges while 18 pairs had disagreement in labels. [sent-1223, score-0.414]

87 We only focus on the 472 agreeing and disagreeing pairs. [sent-1224, score-0.326]

88 As we have labeled data for 472 pairs, we can treat identifying pair arguing nature as a text classification problem where all interactions between a pair are merged in one document representing the pair along with the label given by judges: agreeing or disagreeing. [sent-1225, score-0.861]

89 ), we experiment with top 1000 and 2000 AD-expressions terms for both agreement and disagreement. [sent-1235, score-0.3]

90 comparison results using 5-fold Cross Validation (CV) with two classes: agreeing and disagreeing in Table 6. [sent-1237, score-0.326]

91 Predicting agreeing arguing nature is harder than that of disagreeing across all feature settings. [sent-1243, score-0.585]

92 yields the best performance showing that the discovered AD-expressions are of high quality and reflect the user pair arguing nature well. [sent-1248, score-0.526]

93 7 Conclusion This paper studied the problem of modeling user pair interactions in online discussions with the purpose of discovering the interaction or arguing nature of each author pair and various ADexpressions emitted in debates. [sent-1257, score-1.096]

94 A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model. [sent-1258, score-0.375]

95 Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. [sent-1282, score-0.251]

96 The power of negative thinking: Exploiting label disagreement in the min-cut classification framework. [sent-1298, score-0.312]

97 Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. [sent-1365, score-0.49]

98 Aspect and sentiment unification model for online review analysis. [sent-1403, score-0.228]

99 Topic sentiment mixture: modeling facets and opinions in weblogs. [sent-1453, score-0.252]

100 Predicting response to political blog posts with topic models. [sent-1563, score-0.374]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('disagreement', 0.312), ('mukherjee', 0.208), ('agreeing', 0.186), ('agreement', 0.178), ('arguing', 0.167), ('asms', 0.16), ('posts', 0.155), ('sentiment', 0.153), ('topic', 0.145), ('relevance', 0.144), ('adexpressions', 0.141), ('disagreeing', 0.14), ('topical', 0.139), ('disagree', 0.122), ('discovered', 0.119), ('topics', 0.118), ('ideological', 0.112), ('stances', 0.112), ('expression', 0.111), ('judges', 0.102), ('interactions', 0.101), ('emitted', 0.1), ('interaction', 0.099), ('ranking', 0.096), ('debates', 0.092), ('nature', 0.092), ('liu', 0.087), ('discussions', 0.084), ('pair', 0.084), ('terms', 0.081), ('adexpression', 0.081), ('phrases', 0.081), ('forums', 0.078), ('users', 0.077), ('post', 0.077), ('expressions', 0.076), ('agree', 0.076), ('online', 0.075), ('political', 0.074), ('relevant', 0.071), ('debate', 0.07), ('draw', 0.067), ('perspectives', 0.065), ('user', 0.064), ('discover', 0.063), ('labeled', 0.063), ('log', 0.062), ('camps', 0.06), ('contention', 0.06), ('exchanges', 0.06), ('heated', 0.06), ('irrelevant', 0.06), ('mining', 0.06), ('sentiments', 0.059), ('term', 0.058), ('sampled', 0.056), ('posterior', 0.056), ('burfoot', 0.056), ('generative', 0.054), ('classifying', 0.054), ('discovering', 0.054), ('opinions', 0.054), ('wiebe', 0.053), ('candidate', 0.051), ('zhao', 0.05), ('hillard', 0.05), ('coverage', 0.05), ('forum', 0.048), ('author', 0.047), ('lafferty', 0.047), ('ig', 0.046), ('durational', 0.046), ('meraz', 0.046), ('modeling', 0.045), ('ngrams', 0.045), ('viewpoints', 0.045), ('plate', 0.045), ('speaker', 0.044), ('discussion', 0.044), ('bansal', 0.043), ('thread', 0.043), ('aspect', 0.042), ('blei', 0.042), ('multiword', 0.042), ('denotes', 0.042), ('somasundaran', 0.041), ('xing', 0.041), ('unigrams', 0.041), ('top', 0.041), ('phrase', 0.04), ('accommodation', 0.04), ('erosheva', 0.04), ('hurst', 0.04), ('hyunjung', 0.04), ('refute', 0.04), ('sunstein', 0.04), ('tomokiyo', 0.04), ('yano', 0.04), ('social', 0.04), ('mccallum', 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

2 0.42768732 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

Abstract: Social media platforms have enabled people to freely express their views and discuss issues of interest with others. While it is important to discover the topics in discussions, it is equally useful to mine the nature of such discussions or debates and the behavior of the participants. There are many questions that can be asked. One key question is whether the participants give reasoned arguments with justifiable claims via constructive debates or exhibit dogmatism and egotistic clashes of ideologies. The central idea of this question is tolerance, which is a key concept in the field of communications. In this work, we perform a computational study of tolerance in the context of online discussions. We aim to identify tolerant vs. intolerant participants and investigate how disagreement affects tolerance in discussions in a quantitative framework. To the best of our knowledge, this is the first such study. Our experiments using real-life discussions demonstrate the effective- ness of the proposed technique and also provide some key insights into the psycholinguistic phenomenon of tolerance in online discussions.

3 0.19043432 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

Author: Kazi Saidul Hasan ; Vincent Ng

Abstract: Determining the stance expressed by an author from a post written for a twosided debate in an online debate forum is a relatively new problem. We seek to improve Anand et al.’s (201 1) approach to debate stance classification by modeling two types of soft extra-linguistic constraints on the stance labels of debate posts, user-interaction constraints and ideology constraints. Experimental results on four datasets demonstrate the effectiveness of these inter-post constraints in improving debate stance classification.

4 0.18466194 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

Author: Jianfeng Si ; Arjun Mukherjee ; Bing Liu ; Qing Li ; Huayi Li ; Xiaotie Deng

Abstract: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market. We first utilize a continuous Dirichlet Process Mixture model to learn the daily topic set. Then, for each topic we derive its sentiment according to its opinion words distribution to build a sentiment time series. We then regress the stock index and the Twitter sentiment time series to predict the market. Experiments on real-life S&P100; Index show that our approach is effective and performs better than existing state-of-the-art non-topic based methods. 1

5 0.17326279 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

Author: Angeliki Lazaridou ; Ivan Titov ; Caroline Sporleder

Abstract: We propose a joint model for unsupervised induction of sentiment, aspect and discourse information and show that by incorporating a notion of latent discourse relations in the model, we improve the prediction accuracy for aspect and sentiment polarity on the sub-sentential level. We deviate from the traditional view of discourse, as we induce types of discourse relations and associated discourse cues relevant to the considered opinion analysis task; consequently, the induced discourse relations play the role of opinion and aspect shifters. The quantitative analysis that we conducted indicated that the integration of a discourse model increased the prediction accuracy results with respect to the discourse-agnostic approach and the qualitative analysis suggests that the induced representations encode a meaningful discourse structure.

6 0.17145255 318 acl-2013-Sentiment Relevance

7 0.16934589 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

8 0.1689444 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

9 0.13998403 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

10 0.13620017 49 acl-2013-An annotated corpus of quoted opinions in news articles

11 0.12977719 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

12 0.12765442 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

13 0.12668902 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model

14 0.11545119 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

15 0.10893682 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

16 0.1072759 168 acl-2013-Generating Recommendation Dialogs by Extracting Information from User Reviews

17 0.10399131 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

18 0.10298092 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

19 0.10210578 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

20 0.10012327 224 acl-2013-Learning to Extract International Relations from Political Context

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.271), (1, 0.252), (2, 0.025), (3, 0.113), (4, 0.045), (5, -0.027), (6, 0.087), (7, -0.069), (8, -0.107), (9, -0.018), (10, 0.056), (11, 0.082), (12, 0.038), (13, 0.053), (14, -0.059), (15, -0.091), (16, -0.067), (17, 0.08), (18, 0.017), (19, -0.058), (20, -0.039), (21, -0.012), (22, -0.024), (23, -0.008), (24, 0.047), (25, 0.029), (26, -0.134), (27, -0.082), (28, 0.008), (29, -0.044), (30, 0.091), (31, -0.008), (32, 0.06), (33, -0.032), (34, -0.005), (35, 0.049), (36, 0.17), (37, -0.029), (38, -0.15), (39, -0.095), (40, 0.125), (41, 0.022), (42, 0.118), (43, -0.189), (44, 0.066), (45, 0.028), (46, -0.073), (47, -0.244), (48, 0.027), (49, -0.051)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95422208 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

Author: Arjun Mukherjee ; Vivek Venkataraman ; Bing Liu ; Sharon Meraz

same-paper 2 0.93938357 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

3 0.86956364 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

Author: Kazi Saidul Hasan ; Vincent Ng

4 0.7281366 49 acl-2013-An annotated corpus of quoted opinions in news articles

Author: Tim O'Keefe ; James R. Curran ; Peter Ashwell ; Irena Koprinska

Abstract: Quotes are used in news articles as evidence of a person’s opinion, and thus are a useful target for opinion mining. However, labelling each quote with a polarity score directed at a textually-anchored target can ignore the broader issue that the speaker is commenting on. We address this by instead labelling quotes as supporting or opposing a clear expression of a point of view on a topic, called a position statement. Using this we construct a corpus covering 7 topics with 2,228 quotes.

5 0.71074665 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language

Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky

Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.

6 0.62151247 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

7 0.6149556 54 acl-2013-Are School-of-thought Words Characterizable?

8 0.59877479 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

9 0.58430135 30 acl-2013-A computational approach to politeness with application to social factors

10 0.54670626 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

11 0.5390169 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

12 0.52401572 126 acl-2013-Diverse Keyword Extraction from Conversations

13 0.52275807 298 acl-2013-Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

14 0.49773353 318 acl-2013-Sentiment Relevance

15 0.49452078 33 acl-2013-A user-centric model of voting intention from Social Media

16 0.48310333 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

17 0.4778502 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

18 0.46393883 257 acl-2013-Natural Language Models for Predicting Programming Comments

19 0.45968267 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?

20 0.45531714 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.057), (4, 0.217), (6, 0.028), (11, 0.073), (15, 0.014), (24, 0.081), (26, 0.065), (35, 0.131), (42, 0.046), (48, 0.043), (63, 0.02), (70, 0.039), (88, 0.032), (90, 0.03), (95, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86454791 273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang

Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.

2 0.85857469 315 acl-2013-Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Author: Asli Celikyilmaz ; Dilek Hakkani-Tur ; Gokhan Tur ; Ruhi Sarikaya

Abstract: Microsoft Research Microsoft Mountain View, CA, USA Redmond, WA, USA dilek @ ieee .org rus arika@mi cro s o ft . com gokhan .tur @ ieee .org performance (Tur and DeMori, 2011). This requires a tedious and time intensive data collection Finding concepts in natural language utterances is a challenging task, especially given the scarcity of labeled data for learning semantic ambiguity. Furthermore, data mismatch issues, which arise when the expected test (target) data does not exactly match the training data, aggravate this scarcity problem. To deal with these issues, we describe an efficient semisupervised learning (SSL) approach which has two components: (i) Markov Topic Regression is a new probabilistic model to cluster words into semantic tags (concepts). It can efficiently handle semantic ambiguity by extending standard topic models with two new features. First, it encodes word n-gram features from labeled source and unlabeled target data. Second, by going beyond a bag-of-words approach, it takes into account the inherent sequential nature of utterances to learn semantic classes based on context. (ii) Retrospective Learner is a new learning technique that adapts to the unlabeled target data. Our new SSL approach improves semantic tagging performance by 3% absolute over the baseline models, and also compares favorably on semi-supervised syntactic tagging.

same-paper 3 0.8389954 121 acl-2013-Discovering User Interactions in Ideological Discussions

Author: Arjun Mukherjee ; Bing Liu

4 0.81708193 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso

Abstract: This paper introduces a graph-based semisupervised joint model of Chinese word segmentation and part-of-speech tagging. The proposed approach is based on a graph-based label propagation technique. One constructs a nearest-neighbor similarity graph over all trigrams of labeled and unlabeled data for propagating syntactic information, i.e., label distributions. The derived label distributions are regarded as virtual evidences to regularize the learning of linear conditional random fields (CRFs) on unlabeled data. An inductive character-based joint model is obtained eventually. Empirical results on Chinese tree bank (CTB-7) and Microsoft Research corpora (MSR) reveal that the proposed model can yield better results than the supervised baselines and other competitive semi-supervised CRFs in this task.

5 0.80561632 294 acl-2013-Re-embedding words

Author: Igor Labutov ; Hod Lipson

Abstract: We present a fast method for re-purposing existing semantic word vectors to improve performance in a supervised task. Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data. However, some methods take days or weeks to learn good embeddings, and some are notoriously difficult to train. We propose a method that takes as input an existing embedding, some labeled data, and produces an embedding in the same space, but with a better predictive performance in the supervised task. We show improvement on the task of sentiment classification with re- spect to several baselines, and observe that the approach is most useful when the training set is sufficiently small.

6 0.76994759 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

7 0.73496556 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

8 0.7218582 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

9 0.69246805 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

10 0.68655038 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm

11 0.68091506 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

12 0.67970204 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

13 0.67885625 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

14 0.67877865 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

15 0.67855847 318 acl-2013-Sentiment Relevance

16 0.6769101 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

17 0.67670143 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

18 0.67587256 219 acl-2013-Learning Entity Representation for Entity Disambiguation

19 0.67388898 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

20 0.67365974 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation