emnlp emnlp2013 emnlp2013-47 knowledge-graph by maker-knowledge-mining

47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs


Source: pdf

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. [sent-5, score-0.91]

2 The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. [sent-8, score-1.809]

3 , 2011) and have been proved to be useful in many applications, such as opinion polling (Tang et al. [sent-24, score-0.59]

4 However, classifying microblog texts at the sentence level is often insufficient for applications because it does not identify the opinion targets. [sent-28, score-0.787]

5 In this paper, we will study the task of opinion target extraction for Chinese microblog messages. [sent-29, score-0.887]

6 Opinion target extraction aims to find the object to which the opinion is expressed. [sent-30, score-0.784]

7 This task is mostly studied in customer review texts in which opinion targets are often referred as features or aspects (Liu, 2012). [sent-33, score-0.869]

8 Most of the opinion target extraction approaches rely on dependency parsing (Zhuang et al. [sent-34, score-0.758]

9 However, such approaches are not suitable for microblogs because the natural language processing tools perform poorly on microblog texts due to their inherent characteristics. [sent-38, score-0.4]

10 Besides, mi- croblog messages may express opinion in different ways and do not always contain opinion words, which lowers the performance of methods utilizing opinion words to find opinion targets. [sent-42, score-2.509]

11 In this study, we propose an unsupervised method to collectively extract the opinion targets from opinionated sentences in the same topic. [sent-43, score-1.038]

12 We first present a dynamic programming based segmentation algorithm for Chinese hashtag segmentation. [sent-47, score-0.394]

13 Afterwards, all the noun phrases in each sentence and the hashtag segments are extracted as opinion target candidates. [sent-49, score-1.096]

14 We propose an unsupervised label propagation algorithm to collectively rank the candidates of all sentences based on the assumption that similar sentences in a topic may share the same opinion targets. [sent-50, score-1.278]

15 Finally, for each sentence, the candidate which gets the highest score after unsupervised label propagation is selected as the opinion target. [sent-51, score-1.041]

16 Our contributions in this study are summarized as follows: 1) our method considers not only the explicit opinion targets within the sentence but also the implicit opinion targets in the hashtag or mentioned in the previous sentence. [sent-52, score-1.996]

17 3) We develop an unsupervised label propagation algorithm for collective opinion target extraction. [sent-55, score-1.099]

18 4) To the best of our knowledge, the task of opinion target extraction in microblogs has not been well studied yet. [sent-59, score-0.991]

19 It is more challenging than microblog sentiment classification and opinion target extraction in review texts. [sent-60, score-0.989]

20 It is noteworthy that topics aggregated by the same hashtag play an important role in Chinese microblog websites. [sent-69, score-0.466]

21 Analyzing the opinion targets of these topics can help to get a deeper overview of the public attitudes towards the entities involved in the hot topics. [sent-72, score-0.946]

22 In this study, we aim to collectively extract the opinion targets of messages with the same hashtag, i. [sent-74, score-0.999]

23 Opinion target of a sentence can be divided into two types, one of which called explicit target appears in the sentence such as “I love Obama”, and the other one called implicit target 4 http://huati. [sent-77, score-0.561]

24 ” in Table 1 directly comments on the target in the hashtag “#Property publicity of government officials#” . [sent-80, score-0.399]

25 Such implicit opinion targets are not considered in previous works and are more difficult to extract than explicit targets. [sent-81, score-0.937]

26 However, we believe that the contextual information will help to locate both of the two kinds of opinion targets because similar sentences in a topic may share the same opinion target, which provides the possibility for collective extraction. [sent-82, score-1.548]

27 In the topic #官 员 财产公示# (#Property publicity of government officials#), the first sentence omits the opinion target. [sent-85, score-0.744]

28 If we find the correct opinion target for sentence 2, we can infer that sentence 1 may have an implicit opinion target similar to the opinion target in sentence 2. [sent-87, score-2.25]

29 The similarity between these two sentences may indicate that both of the two sentences are expressing opinion on “政府”. [sent-89, score-0.652]

30 Based on the above observation, we can assume that similar sentences in a topic may have the same opinion targets. [sent-90, score-0.692]

31 Such assumption can help to locate both explicit and implicit opinion targets. [sent-91, score-0.773]

32 Following this idea, we firstly extract all the noun phrases in each sentence as opinion target candidates after applying Chinese word segmentation and part-of-speech tagging. [sent-92, score-1.046]

33 Afterwards, an unsupervised label propagation algorithm is proposed to rank these candidates for all sentences in the topic. [sent-93, score-0.517]

34 For messages without hashtags, an alternative way is to generate pseudo topics by clustering microblogs messages and then apply the proposed algorithm to each pseudo topic. [sent-95, score-0.789]

35 The segmentation errors especially on opinion target words will directly influence the results of part-of-speech tagging and candidate extraction. [sent-101, score-0.893]

36 However, some of the opinion target words in a topic are often included in the hashtag. [sent-102, score-0.763]

37 By finding the correct segments of a hashtag and adding them to the user dictionary of the Chinese word segmentation tool, we can remarkably improve the overall segmentation performance. [sent-103, score-0.519]

38 In the topic #90 后打老人# (means “A young man hits an old man”), “90 后” (literally “90 later” and means a young man born in the 90s) is an important word because it is the opinion target of many sentences. [sent-105, score-0.763]

39 As we only extract noun phrases as opinion target candidates, the wrong segmentation on “90 后” makes it impossible to find the right opinion target. [sent-108, score-1.477]

40 Such error may occur many times in sentences that mention the word “90 后” and express opinion on it. [sent-109, score-0.621]

41 After segmenting the hashtag correctly into “90 后 /打/老人”, we can add the hashtag segments to the user dictionary of the segmentation tool to further segment the message texts of the topic. [sent-112, score-0.835]

42 The basic idea for our hashtag segmentation algorithm is to regard strings that appear frequently in a topic as words. [sent-113, score-0.465]

43 , 2012b) which calculates the SCP value of each string based on Microsoft Web N-Gram, our hashtag segmentation algorithm only uses the topic content and do not need any additional corpus. [sent-160, score-0.492]

44 2 Candidate Extraction After segmenting the hashtag, all the hashtag segments with length greater than one are added to the user dictionary of the Chinese word segmentation tool ICTCLAS5 to further segment the message texts of the topic. [sent-162, score-0.589]

45 For each sentence, all phrases that match the regular expression and meet the length restriction are 。 extracted as explicit opinion target candidates. [sent-172, score-0.825]

46 The hashtag segments are regarded as implicit candidates for all sentences. [sent-173, score-0.546]

47 Besides, some opinionated sentences in microblogs do not contain any noun phase, such as “ 无 聊 至 ! ” (“So boring! [sent-174, score-0.405]

48 These sentences may express opinion on object that has been mentioned before. [sent-176, score-0.647]

49 Therefore, the explicit candidates of the previous sentence in the same message are also taken as the implicit candidates for such sentences. [sent-177, score-0.502]

50 Therefore, the most confident candidate of each sentence will be selected as the opinion target. [sent-183, score-0.709]

51 In this section, we introduce an unsupervised graph-based label propagation algorithm to collectively rank the candidates of all sentences in a topic. [sent-184, score-0.586]

52 Label propagation (Zhu and Ghahramani, 2002; Talukdar and Crammer, 2009) is a semisupervised algorithm which spreads label distributions from a small set of nodes seeded with some initial label information throughout the graph. [sent-185, score-0.478]

53 Finally, we select the candidate with the highest score in the label vector as the opinion target for each sentence. [sent-193, score-0.882]

54 The candidate set CT for the whole topic is the union of all Cv, CT Cv (6) The total number of candidates in the topic is denoted by M = |CT|. [sent-203, score-0.392]

55 We set w = we if Lk is an explicit candidate (extracted noun phrase) of v and w = wi if Lk is an implicit candidate (hashtag segment or inherited from previous sentence) of v. [sent-207, score-0.489]

56 These values which are initialized as zero should always remain zero during the propagation algorithm because the corresponding label does not belong to the candidate set Cv of node v. [sent-209, score-0.455]

57 By multiplying the candidate similarity matrix S, we aim to propagate the score of the i-th candidate of node u not only to the i-th candidate of node v, but also to all the other candidates. [sent-218, score-0.407]

58 There are three tasks in the evaluation: subjectivity classification, polarity classification and opinion target extraction. [sent-224, score-0.81]

59 In each topic, 100 messages are manually annotated with subjectivity, polarity and opinion targets. [sent-234, score-0.817]

60 Strict Evaluation: For a proposed opinion target, it is regarded as correct only if it covers the same span with the annotation result. [sent-240, score-0.674]

61 Note that, in CMSAE, an opinion target should be proposed along with its polarity. [sent-241, score-0.692]

62 CMSAE Teams: Sixteen teams participated in the opinion target extraction task of CMSAE. [sent-249, score-0.833]

63 The most important component of their model is a topic-dependent opinion target lexicon which is called object sheet. [sent-254, score-0.718]

64 If a word or phrase in the object sheet appears in a sentence or a hashtag, it is extracted as opinion target. [sent-255, score-0.675]

65 AssocMi: We implement the unsupervised method for opinion target extraction based on (Hu and Liu, 2004), which relies on association mining and a sentiment lexicon to extract frequent and infrequent product features. [sent-258, score-0.905]

66 The opinion targets are extracted only for opinionated sentences and should be proposed along with their polarity. [sent-267, score-0.924]

67 Then our unsupervised label propagation (ULP) method is used to extract the opinion targets for the proposed opinionated sentences. [sent-269, score-1.218]

68 It shows that opinion target extraction is a quite hard problem in Chinese microblogs. [sent-277, score-0.758]

69 Besides, we do not need any prior information of the topics while Team-1 has to manually build an opinion target lexicon for each topic. [sent-280, score-0.783]

70 To compare with the other opinion target extraction methods, we only use gold-standard opin- ionated sentences for evaluation and do not classify the polarity of the opinion targets. [sent-281, score-1.457]

71 It achieves high results because it has already seen the opinion targets in the training set. [sent-287, score-0.781]

72 There are two major parameters in our algorithm: the initial weight w for both explicit and implicit candidates in Equation 8 and the injection probability pinj in Equation 11. [sent-318, score-0.533]

73 The initial weights of explicit and implicit candidates are set differently because the explicit can- didates are more likely to be the opinion targets. [sent-319, score-0.982]

74 Figure 2(a) displays the opinion target extraction performance when wi varies from 0 to 1. [sent-324, score-0.814]

75 Due to limited space, we only list the strict F-measure of opinion target extraction evaluated on opinioned sentences (same experimental setup as Table 3). [sent-326, score-0.834]

76 Figure 2(b) shows the results of opinion target extraction with HTBMSPRaerb+tuhlkRseod4ury. [sent-335, score-0.758]

77 o 5anf68itodn opinion target extraction respect to different values of the injection probability. [sent-338, score-0.825]

78 If the correct opinion target is not extracted as a candidate, the ranking step will be in vain. [sent-343, score-0.692]

79 As described in Section 3, we develop a hashtag segmentation algorithm and use a rule based method to extract noun phrases from each sentence. [sent-344, score-0.501]

80 Our method HS+Rule leverages the hashtag segments to enhance the segmentation result and extracts explicit candidate using a regular expression. [sent-350, score-0.599]

81 To demonstrate the effectiveness of our hashtag segmentation algorithm, the second comparison baseline Rule directly uses ICTCLAS to segment the whole topic content and labels each word with its part-of-speech tag. [sent-351, score-0.479]

82 The third column shows the number of correct opinion targets among them. [sent-355, score-0.781]

83 We can find that the two rulebased models both outperform Berkeley Parser and our HS+Rule method finds 14% more correct opinion targets than Rule. [sent-356, score-0.781]

84 Therefore, the performance of label propagation will be improved when there are fewer candidates to rank. [sent-359, score-0.407]

85 It can be demonstrated by the F-measure of opinion target extraction in the fourth and fifth columns. [sent-360, score-0.758]

86 For messages without hashtags, we can first employ clustering algorithms to obtain pseudo topics (clusters) and then exploiting the topic-oriented algorithm for collective opinion target extraction. [sent-365, score-1.116]

87 After clustering microblogs, the opinion targets of messages in each cluster are collectively extracted by the proposed unsupervised label propagation algorithm. [sent-373, score-1.369]

88 From the results, we can see that clustering microblogs without hashtags is a quite difficult job which only gets an F-Measure of 0. [sent-375, score-0.43]

89 However, the corresponding opinion target extraction performance is still promising, which outperforms the AssocMi and CRF-C methods in Table 3. [sent-377, score-0.758]

90 With the help of hashtags, the clustering performance of APCluster+HS is largely improved and the opinion target extraction performance is also increased. [sent-378, score-0.803]

91 The above results reveal that our proposed unsupervised label propagation algorithm works well in pseudo topics and the performance can be increased with better clustering results. [sent-384, score-0.552]

92 opinion mining, is the field of studying and analyzing people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions (Liu, 2012). [sent-389, score-0.59]

93 Classification of opinion polarity is the most common task studied in microblogs. [sent-394, score-0.694]

94 It is mostly performed on product reviews where opinion targets are always described as product features or aspects. [sent-404, score-0.781]

95 The pioneering research on this task is conducted by Hu and Liu 1848 (2004) who propose a method which extracts frequent nouns and noun phrases as the opinion targets. [sent-405, score-0.671]

96 (201 1) propose a double propagation method to extract opinion word and opinion target simultaneously. [sent-408, score-1.459]

97 (2012) use the word translation model in a monolingual scenario to mine the associations between opinion targets and opinion words. [sent-410, score-1.371]

98 7 Conclusion and Future Work In this paper, we study the problem of opinion target extraction in Chinese microblogs which has not been well investigated yet. [sent-411, score-0.965]

99 We propose an unsupervised label propagation algorithm to collectively rank the opinion target candidates of all sentences in a topic. [sent-412, score-1.278]

100 Extracting opinion targets in a single-and cross-domain setting with conditional random fields. [sent-455, score-0.781]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('opinion', 0.59), ('hashtag', 0.246), ('microblogs', 0.207), ('targets', 0.191), ('propagation', 0.177), ('chinese', 0.156), ('pinj', 0.152), ('messages', 0.149), ('hashtags', 0.139), ('microblog', 0.129), ('candidates', 0.127), ('segmentation', 0.114), ('opinionated', 0.112), ('yv', 0.107), ('label', 0.103), ('target', 0.102), ('sentiment', 0.102), ('cmsae', 0.101), ('scp', 0.101), ('fv', 0.094), ('twitter', 0.091), ('topics', 0.091), ('candidate', 0.087), ('implicit', 0.078), ('explicit', 0.078), ('polarity', 0.078), ('teams', 0.075), ('topic', 0.071), ('collectively', 0.069), ('injection', 0.067), ('extraction', 0.066), ('soft', 0.064), ('message', 0.06), ('hs', 0.057), ('pseudo', 0.057), ('wi', 0.056), ('noun', 0.055), ('node', 0.054), ('apcluster', 0.051), ('assocmi', 0.051), ('pcont', 0.051), ('publicity', 0.051), ('stickiness', 0.051), ('wab', 0.051), ('regarded', 0.05), ('collective', 0.048), ('segment', 0.048), ('unsupervised', 0.045), ('clustering', 0.045), ('strict', 0.045), ('segments', 0.045), ('liu', 0.045), ('cv', 0.043), ('subjectivity', 0.04), ('speriosu', 0.04), ('attitudes', 0.04), ('tool', 0.04), ('jakob', 0.04), ('gets', 0.039), ('matrix', 0.038), ('texts', 0.036), ('denoted', 0.036), ('barbosa', 0.035), ('love', 0.035), ('span', 0.034), ('algorithm', 0.034), ('bollen', 0.034), ('circumstance', 0.034), ('frey', 0.034), ('officials', 0.034), ('pcontdv', 0.034), ('tencent', 0.034), ('vwuv', 0.034), ('lk', 0.034), ('hot', 0.034), ('researches', 0.032), ('gurevych', 0.032), ('sentence', 0.032), ('initial', 0.031), ('qiu', 0.031), ('xiaojun', 0.031), ('sentences', 0.031), ('nodes', 0.03), ('characters', 0.03), ('regular', 0.029), ('barackobama', 0.029), ('tumasjan', 0.029), ('besides', 0.029), ('tweets', 0.029), ('tools', 0.028), ('equation', 0.028), ('ct', 0.028), ('string', 0.027), ('locate', 0.027), ('sheet', 0.027), ('studied', 0.026), ('customer', 0.026), ('phrases', 0.026), ('object', 0.026), ('rule', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999887 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.

2 0.21300635 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

3 0.17736125 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

4 0.16528402 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

5 0.15379506 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

Author: Qi Zhang ; Jin Qian ; Huan Chen ; Jihua Kang ; Xuanjing Huang

Abstract: Explanatory sentences are employed to clarify reasons, details, facts, and so on. High quality online product reviews usually include not only positive or negative opinions, but also a variety of explanations of why these opinions were given. These explanations can help readers get easily comprehensible information of the discussed products and aspects. Moreover, explanatory relations can also benefit sentiment analysis applications. In this work, we focus on the task of identifying subjective text segments and extracting their corresponding explanations from product reviews in discourse level. We propose a novel joint extraction method using firstorder logic to model rich linguistic features and long distance constraints. Experimental results demonstrate the effectiveness of the proposed method.

6 0.13977914 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

7 0.13951917 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

8 0.13043588 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

9 0.12891084 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

10 0.11243428 121 emnlp-2013-Learning Topics and Positions from Debatepedia

11 0.11017855 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

12 0.10766286 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

13 0.10714956 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

14 0.10409791 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

15 0.093045726 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

16 0.088927314 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles

17 0.087203749 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter

18 0.07967253 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation

19 0.079591967 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

20 0.079061888 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.251), (1, 0.094), (2, -0.192), (3, -0.178), (4, 0.084), (5, -0.142), (6, 0.064), (7, 0.077), (8, 0.009), (9, 0.165), (10, -0.014), (11, -0.085), (12, 0.013), (13, 0.082), (14, -0.122), (15, 0.105), (16, 0.068), (17, 0.083), (18, 0.038), (19, -0.028), (20, 0.04), (21, -0.129), (22, -0.186), (23, 0.017), (24, -0.062), (25, -0.06), (26, -0.049), (27, 0.076), (28, -0.005), (29, 0.001), (30, 0.089), (31, -0.04), (32, -0.038), (33, 0.028), (34, 0.104), (35, 0.1), (36, -0.015), (37, -0.052), (38, 0.09), (39, 0.031), (40, 0.028), (41, 0.048), (42, 0.059), (43, 0.025), (44, 0.006), (45, -0.065), (46, 0.029), (47, 0.028), (48, -0.101), (49, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96705377 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.

2 0.67873037 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

Author: Thomas Scholz ; Stefan Conrad

Abstract: A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we will compare several state-of-the-art approaches for Opinion Mining in newspaper articles in this paper. Furthermore, we will introduce a new technique to extract entropy-based word connections which identifies the word combinations which create a tonality. In the evaluation, we use two different corpora consisting of news articles, by which we show that the new approach achieves better results than the four state-of-the-art methods.

3 0.63600612 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh

Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.

4 0.6059413 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang

Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.

5 0.57453537 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

6 0.56971437 143 emnlp-2013-Open Domain Targeted Sentiment

7 0.56783187 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

8 0.51859212 170 emnlp-2013-Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet

9 0.50647283 121 emnlp-2013-Learning Topics and Positions from Debatepedia

10 0.49055552 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning

11 0.4783228 163 emnlp-2013-Sarcasm as Contrast between a Positive Sentiment and Negative Situation

12 0.46239287 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

13 0.44316456 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

14 0.4158721 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

15 0.39696243 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

16 0.39195579 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

17 0.37584078 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

18 0.36840042 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

19 0.35973698 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation

20 0.35916245 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.024), (11, 0.242), (18, 0.041), (22, 0.062), (30, 0.105), (43, 0.013), (47, 0.013), (50, 0.023), (51, 0.158), (66, 0.067), (71, 0.073), (75, 0.032), (77, 0.017), (90, 0.013), (96, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80726874 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao

Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.

2 0.68899369 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

Author: Wolfgang Seeker ; Jonas Kuhn

Abstract: Morphology and syntax interact considerably in many languages and language processing should pay attention to these interdependencies. We analyze the effect of syntactic features when used in automatic morphology prediction on four typologically different languages. We show that predicting morphology for languages with highly ambiguous word forms profits from taking the syntactic context of words into account and results in state-ofthe-art models.

3 0.65919161 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

4 0.65356284 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

5 0.65353447 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

Author: Wang Ling ; Chris Dyer ; Alan W Black ; Isabel Trancoso

Abstract: Compared to the edited genres that have played a central role in NLP research, microblog texts use a more informal register with nonstandard lexical items, abbreviations, and free orthographic variation. When confronted with such input, conventional text analysis tools often perform poorly. Normalization replacing orthographically or lexically idiosyncratic forms with more standard variants can improve performance. We propose a method for learning normalization rules from machine translations of a parallel corpus of microblog messages. To validate the utility of our approach, we evaluate extrinsically, showing that normalizing English tweets and then translating improves translation quality (compared to translating unnormalized text) using three standard web translation services as well as a phrase-based translation system trained — — on parallel microblog data.

6 0.65265906 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

7 0.65204608 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

8 0.6471948 123 emnlp-2013-Learning to Rank Lexical Substitutions

9 0.64577913 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

10 0.64309734 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

11 0.64254105 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

12 0.64059341 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes

13 0.64033091 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

14 0.63920581 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic

15 0.63793993 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

16 0.6377272 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

17 0.6375283 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts

18 0.63666874 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

19 0.63546908 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

20 0.63535774 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts