acl acl2013 acl2013-187 knowledge-graph by maker-knowledge-mining

187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

Source: pdf

Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev

Abstract: In this paper, we use Arabic natural language processing techniques to analyze Arabic debates. The goal is to identify how the participants in a discussion split into subgroups with contrasting opinions. The members of each subgroup share the same opinion with respect to the discussion topic and an opposing opinion to the members of other subgroups. We use opinion mining techniques to identify opinion expressions and determine their polarities and their targets. We opinion predictions to represent the discussion in one of two formal representations: signed attitude network or a space of attitude vectors. We identify opinion subgroups by partitioning the signed network representation or by clustering the vector space representation. We evaluate the system using a data set of labeled discussions and show that it achieves good results.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The goal is to identify how the participants in a discussion split into subgroups with contrasting opinions. [sent-4, score-0.501]

2 The members of each subgroup share the same opinion with respect to the discussion topic and an opposing opinion to the members of other subgroups. [sent-5, score-1.134]

3 We use opinion mining techniques to identify opinion expressions and determine their polarities and their targets. [sent-6, score-0.944]

4 We opinion predictions to represent the discussion in one of two formal representations: signed attitude network or a space of attitude vectors. [sent-7, score-1.064]

5 We identify opinion subgroups by partitioning the signed network representation or by clustering the vector space representation. [sent-8, score-1.289]

6 The re- cent political and civic movements in the Arab World resulted in a revolutionary growth in the number of Arabic users on social networking sites. [sent-13, score-0.156]

7 This growth in the presence of Arab users on social networks and all the interactions and discussions that happen among them led to a huge amount of opinion-rich Arabic text being available. [sent-17, score-0.171]

8 When a controversial topic is discussed, it is normal for the discussants to adopt different viewpoints towards it. [sent-19, score-0.284]

9 This usually causes rifts in discussion groups and leads to the split of the discussants into subgroups with contrasting opinions. [sent-20, score-0.609]

10 Our goal in this paper is to use natural language processing techniques to detect opinion subgroups in Arabic discussions. [sent-21, score-0.67]

11 Our approach starts by identifying opinionated (subjective) text and determining its polarity (positive, negative, or neutral). [sent-22, score-0.203]

12 Next, we determine the target of each opinion expression. [sent-23, score-0.442]

13 The target of opinion can be a named entity mentioned in the discussion or an aspect of the discussed topic. [sent-24, score-0.541]

14 We use the identified opiniontarget relations to represent the discussion in one of two formal representations. [sent-25, score-0.148]

15 In the first representation, each discussant is represented by a vector that encodes all his or her opinion information towards the discussion topic. [sent-26, score-0.791]

16 In the second representation, each discussant is represented by a node in a signed graph. [sent-27, score-0.488]

17 A positive edge connects two discussants if they have similar opinion towards the topic, otherwise the sign of the edge is nega1http : / / semi oca st . [sent-28, score-0.691]

18 To identify opinion subgroups, we cluster the vector space (the first representation) or partition the signed network (the second representation). [sent-32, score-0.849]

19 The results show that the clustering the vector space representation achieves better results than partitioning the signed network representation. [sent-35, score-0.567]

20 2 Previous Work Our work is related to a large body of research on opinion mining and sentiment analysis. [sent-36, score-0.606]

21 Pang & Lee (2008) and Liu & Zhang (2012) wrote two re- cent comprehensive surveys about sentiment analysis and opinion mining techniques and applications. [sent-37, score-0.644]

22 Previous work has proposed methods for identifying subjective text that expresses opinion and distinguishing it from objective text that presents factual information (Wiebe, 2000; Hatzivassiloglou and Wiebe, 2000a; Banea et al. [sent-38, score-0.535]

23 Previous work addressed the problem of identifying the polarity of subjective text (Hatzivassiloglou and Wiebe, 2000b; Hassan et al. [sent-41, score-0.223]

24 Many of the proposed methods for text polarity identification depend on the availability of polarity lexicons (i. [sent-44, score-0.196]

25 Other research efforts focused on identifying the holders and the targets of opinion (Zhai et al. [sent-49, score-0.57]

26 Opinion mining and sentiment analysis techniques have been used in various applications. [sent-52, score-0.196]

27 One example of such applications is identifying perspectives (Grefenstette et al. [sent-53, score-0.111]

28 (2003) proposed a method for extracting perspectives from political texts. [sent-59, score-0.102]

29 They used their method to estimate the policy positions ofpolitical parties in Britain and Ireland, on both economic and social policy dimensions. [sent-60, score-0.152]

30 Somasundaran and Wiebe (2009) present an unsupervised opinion analysis method for debateside classification. [sent-61, score-0.41]

31 They mine the web to learn associations that are indicative of opinion stances in debates and combine this knowledge with discourse information. [sent-62, score-0.494]

32 They use a number of linguistic and structural fea- tures such as unigrams, bigrams, cue words, repeated punctuation, and opinion dependencies to build a stance classification model. [sent-65, score-0.46]

33 In previous work, we proposed a method that uses participantto-participant and participant-to-topic attitudes to identify subgroups in ideological discussions using attitude vector space clustering (Abu-Jbara and Radev, 2012). [sent-66, score-0.702]

34 In this paper, we extend this method by adding latent similarity features to the attitude vectors and applying it to Arabic discussions. [sent-67, score-0.162]

35 In another previous work, our group proposed a supervised method for extracting signed social networks from text (Hassan et al. [sent-68, score-0.36]

36 The signed networks constructed using this method were based only on participant-to-participant attitudes that are expressed explicitly in discussions. [sent-70, score-0.363]

37 We used this method to extract signed networks from discussions and used a partitioning algorithm to detect opinion subgroups (Hassan et al. [sent-71, score-1.127]

38 In this paper, we extend this method by using participant-to-topic attitudes to construct the signed network. [sent-73, score-0.308]

39 Unfortunately, not much work has been done on Arabic sentiment analysis and opinion mining. [sent-74, score-0.567]

40 (2008) applies sentiment analysis techniques to identify and classify documentlevel opinions in text crawled from English and Arabic web forums. [sent-76, score-0.209]

41 (201 1) proposed a method for identifying the polarity of nonEnglish words using multilingual semantic graphs. [sent-78, score-0.163]

42 Abdul-Mageed and Diab (201 1) annotated a corpus of Modern Standard Arabic (MSA) news text for subjectivity at the sentence level. [sent-80, score-0.126]

43 (2012a) developed SAMAR, a system for subjectivity and Sentiment Analysis for Arabic social media genres. [sent-83, score-0.188]

44 3 Approach In this section, we present our approach to de- tecting opinion subgroups in Arabic discussions. [sent-85, score-0.67]

45 The input to the pipeline is a discussion thread in Arabic language crawled from a discussion forum. [sent-87, score-0.283]

46 The output is the list of participants in the discussion and the subgroup membership of each discussant. [sent-88, score-0.26]

47 1 Preprocessing The input to this component is a discussion thread in HTML format. [sent-91, score-0.145]

48 We parse the HTML file to identify the posts, the discussants, and the thread structure. [sent-92, score-0.098]

49 We transform the Arabic content of the posts and the discussant names that are written in Arabic to the Buckwalter encoding (Buckwalter, 2004). [sent-93, score-0.345]

50 We identify the polarized words that appear in text by looking each word up in a lexicon of Arabic polarized words. [sent-97, score-0.164]

51 For example, a positive word that appears in a negated context should be treated as expressing negative opinion rather than positive. [sent-101, score-0.517]

52 To identify the polarity of a word given the sentence it appears in, we use SAMAR (Abdul-Mageed et al. [sent-102, score-0.15]

53 SAMAR labels a sentence that contains an opinion expression as positive, negative, or neutral taking into account the context of the opinion expression. [sent-104, score-0.82]

54 The reported accuracy of SAMAR on different data sets ranges between 84% and 95% for subjectivity classification and 65% and 81% for polarity classification. [sent-105, score-0.224]

55 3 Identifying Opinion Targets In this step, we determine the targets that the opinion is expressed towards. [sent-107, score-0.505]

56 To avoid the noise that may result from including all noun phrases, we limit what we consider as an opinion target, to the ones that appear in at least two posts written by two different participants. [sent-109, score-0.51]

57 Since, the sentence may contain multiple possible targets for every opinion expression, we associate each opinion expression with the target that is closest to it in the sentence. [sent-110, score-0.947]

58 For each discussant, we keep track of the targets mentioned in his/her posts and the number of times each target was mentioned in a positive/negative context. [sent-111, score-0.194]

59 4 Latent Textual Similarity If two participants share the same opinion, they tend to focus on similar aspects of the discussion topic and emphasize similar points that support their opinion. [sent-113, score-0.175]

60 So, we represent all the text written in the discussion by each participant as a vector of 100 dimensions. [sent-117, score-0.208]

61 The vector of each participant contains the topic distribution of the participant, as produced by the LDA model. [sent-118, score-0.116]

62 5 Subgroup Detection At this point, we have for every discussant the targets towards which he/she expressed explicit opinion and a 100-dimensions vector representing the LDA distribution of the text written by him/her. [sent-120, score-0.82]

63 We use this information to represent the discussion in two representations. [sent-121, score-0.099]

64 In the first representation, each discussant is represented by a vector. [sent-122, score-0.245]

65 (b) and (c) are two posts expressing contrasting viewpoints with respect to the topic. [sent-129, score-0.169]

66 We also add to this vector the 100 topic entries from the LDA vector of that discussant. [sent-132, score-0.114]

67 So, if the number of targets identified in step 3 of the pipeline is t then the number of entries in the discussant vector is 3 ∗ t 100. [sent-133, score-0.416]

68 To identify opinion subgroups, we cluster the vector space. [sent-134, score-0.499]

69 In this representation, each discussant is represented by a node in a graph. [sent-138, score-0.245]

70 Two discussants are connected by an edge if they both mention at least one common target in their posts. [sent-139, score-0.228]

71 If a discussant mentions a target multiple times in different contexts with different polarities, the ma- + ×× jority polarity is assumed as the opinion of this discussant with respect to this target. [sent-140, score-1.03]

72 A positive sign is assigned to the edge connecting two discussants if the number of targets that they have similar opinion towards is greater than the targets that they have opposing opinion towards, otherwise a negative sign is assigned to the edge. [sent-141, score-1.431]

73 To identify subgroups, we use a signed network partitioning algorithm to partition the network. [sent-142, score-0.507]

74 , 2012b), we use the Dorian-Mrvar (1996) algorithm to partition the signed network. [sent-145, score-0.284]

75 The optimization criterion aims to have dense positive links within groups and dense negative links between groups. [sent-146, score-0.179]

76 4 Data We use data from an Arabic discussion forum called Naqeshny. [sent-154, score-0.099]

77 This means that the data set is self-labeled for subgroup membership. [sent-164, score-0.125]

78 The average number of posts per discussion is 19. [sent-167, score-0.166]

79 75 and the average number of participants per discussion is 13. [sent-168, score-0.135]

80 In one variation, we use the signed network partitioning approach to detect subgroups. [sent-174, score-0.414]

81 In the other variations, we use the vector space clustering approach. [sent-175, score-0.114]

82 We also run two experiments to evaluate the contribution of both opiniontarget counts and latent similarity features on the clustering accuracy. [sent-177, score-0.165]

83 The results show that the clustering approach achieves better results than the signed network partitioning approach. [sent-185, score-0.491]

84 This can be explained by the fact that the vector representation is a richer representation and encodes all the discussants’ opinion information explicitly. [sent-186, score-0.525]

85 6 Conclusion In this paper, we presented a system for identifying opinion subgroups in Arabic online discussions. [sent-189, score-0.735]

86 The system uses opinion and text sim- TSCOiyeplsugxtneiSmdornN-gTielatwrFgEKoeyM-tkmOnalysP0 u. [sent-190, score-0.41]

87 The first approach clusters a space of dis- cussant opinion vectors. [sent-195, score-0.41]

88 The second approach partitions a signed network representation of the discussion. [sent-196, score-0.348]

89 Our experiments also showed that both opinion and similarity features are important. [sent-198, score-0.41]

90 All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the of? [sent-200, score-0.41]

91 Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. [sent-207, score-0.41]

92 Subjectivity and sentiment annotation of modern standard arabic newswire. [sent-214, score-0.521]

93 Awatif: A multi-genre corpus for modern standard arabic subjectivity and sentiment analysis. [sent-219, score-0.647]

94 Samar: a system for subjectivity and sentiment analysis of arabic social media. [sent-228, score-0.676]

95 Samar: A system for subjectivity and sentiment analysis of arabic social media. [sent-233, score-0.676]

96 A bootstrapping method for building subjectivity lexicons for languages with scarce resources. [sent-251, score-0.126]

97 Genre independent subgroup detection in online discussion threads: A study of implicit attitude using textual latent semantics. [sent-269, score-0.386]

98 Coupling niche browsers and affect analysis for an opinion mining application. [sent-294, score-0.449]

99 Signed attitude networks: Predicting positive and negative links using linguistic analysis. [sent-323, score-0.266]

100 Extracting policy positions from political texts using words as data. [sent-345, score-0.101]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('opinion', 0.41), ('arabic', 0.331), ('subgroups', 0.26), ('discussant', 0.245), ('signed', 0.243), ('discussants', 0.196), ('sentiment', 0.157), ('samar', 0.14), ('hassan', 0.13), ('subjectivity', 0.126), ('subgroup', 0.125), ('attitude', 0.123), ('janyce', 0.12), ('mona', 0.112), ('partitioning', 0.105), ('amjad', 0.1), ('discussion', 0.099), ('polarity', 0.098), ('targets', 0.095), ('arab', 0.094), ('muhammad', 0.086), ('dragomir', 0.085), ('wiebe', 0.079), ('clustering', 0.077), ('ahmed', 0.074), ('hatzivassiloglou', 0.07), ('posts', 0.067), ('network', 0.066), ('attitudes', 0.065), ('identifying', 0.065), ('social', 0.062), ('subjective', 0.06), ('diab', 0.059), ('korea', 0.058), ('umi', 0.056), ('polarized', 0.056), ('political', 0.056), ('negative', 0.056), ('networks', 0.055), ('discussions', 0.054), ('contrasting', 0.054), ('identify', 0.052), ('positive', 0.051), ('riloff', 0.05), ('opposing', 0.05), ('stance', 0.05), ('debates', 0.05), ('gradability', 0.049), ('hochbaum', 0.049), ('laver', 0.049), ('opiniontarget', 0.049), ('radev', 0.048), ('eecs', 0.048), ('viewpoints', 0.048), ('perspectives', 0.046), ('thread', 0.046), ('debate', 0.045), ('policy', 0.045), ('orientation', 0.045), ('lda', 0.044), ('jeju', 0.044), ('vasileios', 0.044), ('abbasi', 0.043), ('partition', 0.041), ('guo', 0.04), ('topic', 0.04), ('wassa', 0.04), ('opinionated', 0.04), ('representation', 0.039), ('pipeline', 0.039), ('latent', 0.039), ('mining', 0.039), ('participant', 0.039), ('fastest', 0.038), ('cent', 0.038), ('michigan', 0.037), ('vector', 0.037), ('zhai', 0.037), ('participants', 0.036), ('links', 0.036), ('dasigi', 0.036), ('banea', 0.036), ('buckwalter', 0.036), ('wilson', 0.035), ('sign', 0.034), ('bethard', 0.034), ('ideological', 0.034), ('stances', 0.034), ('anand', 0.034), ('bente', 0.034), ('maegaard', 0.034), ('grefenstette', 0.033), ('polarities', 0.033), ('choukri', 0.033), ('dempster', 0.033), ('written', 0.033), ('modern', 0.033), ('target', 0.032), ('msa', 0.032), ('takamura', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev

2 0.40791661 244 acl-2013-Mining Opinion Words and Opinion Targets in a Two-Stage Framework

Author: Liheng Xu ; Kang Liu ; Siwei Lai ; Yubo Chen ; Jun Zhao

Abstract: This paper proposes a novel two-stage method for mining opinion words and opinion targets. In the first stage, we propose a Sentiment Graph Walking algorithm, which naturally incorporates syntactic patterns in a Sentiment Graph to extract opinion word/target candidates. Then random walking is employed to estimate confidence of candidates, which improves extraction accuracy by considering confidence of patterns. In the second stage, we adopt a self-learning strategy to refine the results from the first stage, especially for filtering out high-frequency noise terms and capturing the long-tail terms, which are not investigated by previous methods. The experimental results on three real world datasets demonstrate the effectiveness of our approach compared with stateof-the-art unsupervised methods.

3 0.40225732 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction

Author: Bishan Yang ; Claire Cardie

Abstract: This paper addresses the task of finegrained opinion extraction the identification of opinion-related entities: the opinion expressions, the opinion holders, and the targets of the opinions, and the relations between opinion expressions and their targets and holders. Most existing approaches tackle the extraction of opinion entities and opinion relations in a pipelined manner, where the interdependencies among different extraction stages are not captured. We propose a joint inference model that leverages knowledge from predictors that optimize subtasks – of opinion extraction, and seeks a globally optimal solution. Experimental results demonstrate that our joint inference approach significantly outperforms traditional pipeline methods and baselines that tackle subtasks in isolation for the problem of opinion extraction.

4 0.33543181 336 acl-2013-Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Author: Kang Liu ; Liheng Xu ; Jun Zhao

Abstract: Mining opinion targets is a fundamental and important task for opinion mining from online reviews. To this end, there are usually two kinds of methods: syntax based and alignment based methods. Syntax based methods usually exploited syntactic patterns to extract opinion targets, which were however prone to suffer from parsing errors when dealing with online informal texts. In contrast, alignment based methods used word alignment model to fulfill this task, which could avoid parsing errors without using parsing. However, there is no research focusing on which kind of method is more better when given a certain amount of reviews. To fill this gap, this paper empiri- cally studies how the performance of these two kinds of methods vary when changing the size, domain and language of the corpus. We further combine syntactic patterns with alignment model by using a partially supervised framework and investigate whether this combination is useful or not. In our experiments, we verify that our combination is effective on the corpus with small and medium size.

5 0.26254562 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

Author: Mohamed Aly ; Amir Atiya

Abstract: We introduce LABR, the largest sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars. We investigate the properties of the the dataset, and present its statistics. We explore using the dataset for two tasks: sentiment polarity classification and rating classification. We provide standard splits of the dataset into training and testing, for both polarity and rating classification, in both balanced and unbalanced settings. We run baseline experiments on the dataset to establish a benchmark.

6 0.20376085 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

7 0.18985662 49 acl-2013-An annotated corpus of quoted opinions in news articles

8 0.1804245 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

9 0.17137502 318 acl-2013-Sentiment Relevance

10 0.16934589 121 acl-2013-Discovering User Interactions in Ideological Discussions

11 0.15745081 256 acl-2013-Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

12 0.15649478 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

13 0.15063739 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

14 0.13493295 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

15 0.13440658 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

16 0.1321876 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

17 0.12650478 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

18 0.11989687 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

19 0.11986456 214 acl-2013-Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

20 0.11791617 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.207), (1, 0.325), (2, -0.016), (3, 0.331), (4, -0.119), (5, 0.097), (6, -0.24), (7, -0.089), (8, -0.12), (9, -0.081), (10, -0.112), (11, 0.06), (12, 0.028), (13, 0.024), (14, -0.046), (15, 0.022), (16, -0.054), (17, -0.012), (18, -0.021), (19, 0.044), (20, 0.003), (21, 0.02), (22, 0.02), (23, 0.021), (24, -0.0), (25, -0.069), (26, -0.017), (27, -0.042), (28, 0.042), (29, -0.145), (30, -0.012), (31, -0.021), (32, 0.02), (33, -0.08), (34, 0.039), (35, -0.037), (36, 0.053), (37, 0.038), (38, 0.018), (39, 0.082), (40, 0.035), (41, 0.029), (42, 0.028), (43, -0.064), (44, 0.004), (45, 0.01), (46, 0.018), (47, -0.026), (48, -0.003), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96014631 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev

2 0.80536669 244 acl-2013-Mining Opinion Words and Opinion Targets in a Two-Stage Framework

Author: Liheng Xu ; Kang Liu ; Siwei Lai ; Yubo Chen ; Jun Zhao

3 0.77961838 49 acl-2013-An annotated corpus of quoted opinions in news articles

Author: Tim O'Keefe ; James R. Curran ; Peter Ashwell ; Irena Koprinska

Abstract: Quotes are used in news articles as evidence of a person’s opinion, and thus are a useful target for opinion mining. However, labelling each quote with a polarity score directed at a textually-anchored target can ignore the broader issue that the speaker is commenting on. We address this by instead labelling quotes as supporting or opposing a clear expression of a point of view on a topic, called a position statement. Using this we construct a corpus covering 7 topics with 2,228 quotes.

4 0.77039868 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction

Author: Bishan Yang ; Claire Cardie

5 0.7500093 336 acl-2013-Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Author: Kang Liu ; Liheng Xu ; Jun Zhao

6 0.65226686 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model

7 0.61519742 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

8 0.60259467 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

9 0.56526512 114 acl-2013-Detecting Chronic Critics Based on Sentiment Polarity and Userâ•Žs Behavior in Social Media

10 0.49407837 151 acl-2013-Extra-Linguistic Constraints on Stance Recognition in Ideological Debates

11 0.48696271 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

12 0.48079532 318 acl-2013-Sentiment Relevance

13 0.46733218 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

14 0.45519382 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

15 0.4494825 121 acl-2013-Discovering User Interactions in Ideological Discussions

16 0.4474788 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

17 0.43592867 317 acl-2013-Sentence Level Dialect Identification in Arabic

18 0.42277128 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays

19 0.42243043 287 acl-2013-Public Dialogue: Analysis of Tolerance in Online Discussions

20 0.39580429 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.056), (4, 0.018), (6, 0.018), (11, 0.04), (13, 0.018), (15, 0.052), (24, 0.068), (26, 0.071), (31, 0.012), (35, 0.069), (42, 0.041), (48, 0.048), (63, 0.029), (70, 0.041), (86, 0.167), (88, 0.051), (90, 0.018), (95, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85016096 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions

Author: Amjad Abu-Jbara ; Ben King ; Mona Diab ; Dragomir Radev

2 0.80714476 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.

3 0.76927555 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

Author: Xiaodong Zeng ; Derek F. Wong ; Lidia S. Chao ; Isabel Trancoso

Abstract: This paper presents a semi-supervised Chinese word segmentation (CWS) approach that co-regularizes character-based and word-based models. Similarly to multi-view learning, the “segmentation agreements” between the two different types of view are used to overcome the scarcity of the label information on unlabeled data. The proposed approach trains a character-based and word-based model on labeled data, respectively, as the initial models. Then, the two models are constantly updated using unlabeled examples, where the learning objective is maximizing their segmentation agreements. The agreements are regarded as a set of valuable constraints for regularizing the learning of both models on unlabeled data. The segmentation for an input sentence is decoded by using a joint scoring function combining the two induced models. The evaluation on the Chinese tree bank reveals that our model results in better gains over the state-of-the-art semi-supervised models reported in the literature.

4 0.73350048 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

Author: Weiwei Guo ; Hao Li ; Heng Ji ; Mona Diab

Abstract: Many current Natural Language Processing [NLP] techniques work well assuming a large context of text as input data. However they become ineffective when applied to short texts such as Twitter feeds. To overcome the issue, we want to find a related newswire document to a given tweet to provide contextual support for NLP tasks. This requires robust modeling and understanding of the semantics of short texts. The contribution of the paper is two-fold: 1. we introduce the Linking-Tweets-toNews task as well as a dataset of linked tweet-news pairs, which can benefit many NLP applications; 2. in contrast to previ- ous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). This is motivated by the observation that a tweet usually only covers one aspect of an event. We show that using tweet specific feature (hashtag) and news specific feature (named entities) as well as temporal constraints, we are able to extract text-to-text correlations, and thus completes the semantic picture of a short text. Our experiments show significant improvement of our new model over baselines with three evaluation metrics in the new task.

5 0.72806281 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset

Author: Mohamed Aly ; Amir Atiya

6 0.7237466 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

7 0.72028506 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting

8 0.71662349 318 acl-2013-Sentiment Relevance

9 0.71552896 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

10 0.7155171 267 acl-2013-PARMA: A Predicate Argument Aligner

11 0.71417165 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain

12 0.71277916 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

13 0.71176815 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users

14 0.71051186 97 acl-2013-Cross-lingual Projections between Languages from Different Families

15 0.70930606 140 acl-2013-Evaluating Text Segmentation using Boundary Edit Distance

16 0.70804006 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us

17 0.70671684 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

18 0.7052204 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

19 0.70490563 294 acl-2013-Re-embedding words

20 0.70449322 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction