emnlp emnlp2013 emnlp2013-99 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang
Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.
Reference: text
sentIndex sentText sentNum sentScore
1 com , Abstract Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. [sent-5, score-0.557]
2 We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. [sent-6, score-0.312]
3 Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. [sent-7, score-0.645]
4 Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better. [sent-8, score-0.317]
5 1 Introduction Feature-specific opinion mining has been well de- fined by Ding and Liu(2008). [sent-9, score-0.069]
6 Example 1 is a cell phone review in which two features are mentioned. [sent-10, score-0.094]
7 Example 1 This cell phone is fashion in appearance, and it is also very cheap. [sent-11, score-0.094]
8 If a feature appears in a review directly, it is called an explicit feature. [sent-12, score-0.321]
9 If a feature is only implied, it is called an implicit feature. [sent-13, score-0.373]
10 In Example 1, appearance is an explicit feature while price is an implicit feature, which is implied by cheap. [sent-14, score-0.84]
11 Furthermore, an explicit sentence is defined as a sentence containing at least one explicit feature, and an implicit sentence is the sentence only containing implicit features. [sent-15, score-1.06]
12 Thus, the first sentence is an explicit sentence, while the second is an implicit one. [sent-16, score-0.53]
13 This paper proposes an approach for implicit feature detection based on SVM and Topic Model(TM). [sent-17, score-0.419]
14 903 The Topic Model, which incorporated into constraints based on the pre-defined product feature, is established to extract the training attributes for SVM. [sent-18, score-0.26]
15 In the end, several SVM classifiers are constructed to train the selected attributes and utilized to detect the implicit features. [sent-19, score-0.465]
16 2 Related Work The definition of implicit feature comes from Liu et al. [sent-20, score-0.373]
17 (2006) used Pointwise Mutual Information (PMI) based semantic association analysis to identify implicit features, but no quantitative experimental results were provided. [sent-23, score-0.291]
18 (201 1) used co-occurrence association rule mining to identify implicit features. [sent-25, score-0.291]
19 However, they only dealt with opinion words and neglected the facts. [sent-26, score-0.102]
20 Since the inception of these works, many variations have been proposed. [sent-31, score-0.033]
21 For example, LDA has previously been used to construct attributes for classification; it often acts to reduce data dimension(Blei and Jordan, 2003; Fei-Fei and Perona, 2005; Quelhas et al. [sent-32, score-0.096]
22 Here, we modify LDA and adopt it to select the training attributes for SVM. [sent-34, score-0.096]
23 1 Introduction to LDA We briefly introduce LDA, following the notation of Griffiths(Griffiths and Steyvers, 2004). [sent-36, score-0.031]
24 For this scheme, the core process is the topic updating for each word in each document according to Equation 1. [sent-40, score-0.259]
25 P(zi = j |z−i, w, α, β) = (∑wW′n(−nwi(−wi,)ji,′j)++ β Wβ)(∑jTn (−d(−i d,j)i ,j)++ α Tα) (1) where zi = j represents the assignment of the ith word in a document to topic j, z−i represents all n(jw′) the topic assignments excluding the ith word. [sent-41, score-0.467]
26 is the number of instances of word w′ assigned to topic j and is the number of words from document di assigned to topic j, the −i notation sig- n(jdi) nifies that tshsieg cnoedun ttos are tcak je,n th omitting athtieo nva sliugeof zi. [sent-42, score-0.443]
27 When a specific product and the reviews are provided, the explicit sentences and corresponding features are extracted(Line 1) by word segmentation, part-ofspeech(POS) tagging and synonyms feature clustering. [sent-47, score-0.445]
28 Then the prior knowledge are drawn from the explicit sentences automatically and integrated in- to the constrained topic model((Line 3 - Line 5). [sent-48, score-0.76]
29 Finally, several SVM classifiers are generated and applied to detect implicit features(Line 7 - Line 12). [sent-50, score-0.369]
30 In our work, we use a constrained topic model to select attributes for each product features. [sent-53, score-0.558]
31 Then two types of prior knowledge, which are derived from the pre-defined product features, are extracted automatically and incorporated: must-link/cannot-link and correlation prior knowledge. [sent-55, score-0.286]
32 1 Must-link and Cannot-link Must-link: It specifies that two data instances must be in the same cluster. [sent-58, score-0.036]
33 In order to mine these words, we compute the co-occurrence degree by frequency*PMI(f,w), whose formula is as following: bility of subscript occurrence in explicit sentences, f is the feature, w is the word, and f&w; means the co-occurrence of f and w. [sent-60, score-0.239]
34 A higher value of frequency*PMI signifies that w often indicates f. [sent-61, score-0.033]
35 For a feature fi, the top five words and fi constitute must-links. [sent-62, score-0.175]
36 Cannot-link: It specifies that two data instances cannot be in the same cluster. [sent-64, score-0.036]
37 If a word and a feature never co-occur in our corpus, we assume them to form a cannot-link. [sent-65, score-0.082]
38 For example, the word lowcost has never co-occurred with the product feature screen, so they constitute a cannot-link in our corpus. [sent-66, score-0.241]
39 In this paper, the pre-defined process, must-link, and cannot-link are derived from Andrzejewski and Zhu (2009)’s work, all must-links and cannot-links are incorporated our constrained topic model. [sent-67, score-0.414]
40 We multiply an indicator function δ(wi, zj), which represents a hard constraint, to the Equation 1 as the final probability for topic updating (see Equation 4). [sent-68, score-0.259]
41 P(zi = j |z−i, w, α, β) = δ(wi,zj)(∑wWn′n(−w(i−wi,)ji′,)j++ β Wβ)(∑jTnn(−d−(i d,ij)i,)+j+ α Tα) (4) As illustrated by Equations 1 and 4, δ(wi, zj), which represents intervention or help from preexisting knowledge of must-links and cannot-links, plays a key role in this study. [sent-69, score-0.033]
42 In the topic updating for each word in each document, we assume that the current word is wi and its linked feature topic set is Z(wi), then for the current topic zj, δ(wi, zj) is calculated as follows: 1. [sent-70, score-1.069]
43 If wi is constrained by must-links and the linked feature belongs to Z(wi), δ(wi, zj |zj ∈ Z(wi)) = 1and δ(wi, zj |zj ∈/ Z(wi)) = 0. [sent-71, score-1.385]
44 If wi is constrained by cannot-links and the linked feature belongs to Z(wi), δ(wi, zj |zj ∈ Z(wi)) = 0 and δ(wi, zj |zj ∈/ Z(wi)) = 1. [sent-73, score-1.385]
45 2 Correlation Prior Knowledge In view of the explicit product feature of each topic, the association of the word and the feature to topic-word distribution should be taken into account. [sent-81, score-0.491]
46 Therefore, Equation 2 is revised as the following: ϕj(wi)=∑wW(′1(1 + + C Cwwi,′j, ) ( nnj(wj(wi)′) ) + + β Wβ where Cw′,j re∑flects the correlation of w′ (5) with the topic j,which is centered on the product feature fzj . [sent-82, score-0.723]
47 The basic idea is to determine the association of w′ and fzj ,ifthey have the high relevance, Cw′,j should be set as a positive number. [sent-83, score-0.305]
48 Otherwise, if we can determine w′ and fzj are irrelevant, Cw′,j should be 905 set as a positive number. [sent-84, score-0.305]
49 Dependency relation judgement: If w′ as parent node in the syntax tree mainly co-occurs with fzj , Cw′,j will be set positive. [sent-87, score-0.305]
50 If w′ mainly co-occurs with several features including fzj , Cw′,j will be set negative. [sent-88, score-0.305]
51 PMI judgement: If w′ mainly co-occurs with fzj and PMI(w′, fzj ) is greater than the given value, Cw′,j will be set positive. [sent-91, score-0.61]
52 4 Attribute Selection Some words, such as ”good”, can modify several product features and should be removed. [sent-94, score-0.088]
53 In the result of run once, if a word appears in the topics which relates to different features, it is defined as a conflicting word. [sent-95, score-0.079]
54 If a term is thought to describe several features or indicate no features, it is defined as a noise word . [sent-96, score-0.047]
55 When each topic has been pre-allocated, we run the explicit topic model 100 times. [sent-97, score-0.651]
56 If a word turns into a conflicting word Tcw times(Tcw is set to 20), we assume that it is a noise word. [sent-98, score-0.095]
57 Then the noise word collection is obtained and applied to filter the explicit sentences. [sent-99, score-0.286]
58 The most important part to filter noise words is the correlation computation. [sent-102, score-0.089]
59 So the experiment can work well with only estimated parameters. [sent-103, score-0.03]
60 Next, By integrating pre-existing knowledge, the explicit topic model, which runs Titer times, severs as attribute selection for SVM. [sent-104, score-0.668]
61 In every result for each topic cluster, we remove the least four prob- able of word groups and merge the results by the pre-defined product feature. [sent-105, score-0.294]
62 For a feature, if a word appears in its topic words more than Titer ∗ tratio times, it is selected as one of the training att∗ri tbutes for the feature. [sent-106, score-0.358]
63 In the end, if an attribute associates with different features, it is deleted. [sent-107, score-0.148]
64 IGCnafhoSinGqRauniat oire 20 30 40 50 Attrbiute Factor Number 60 70 80 90 100 (a) SVM based on traditional attribute selection method T M + mcsPuoaynMrteIs+coal in ko+twPcselMyrndIagit(o+kswyelnadgcit) T M + mcsPuoyarnMet+Islcoani ko+twPcselMyrndIagit(o+kswyelndagcit) 01. [sent-108, score-0.271]
65 (b) our constrained topic model by different tratio (Titer = 20) 10 20 Ti etr 30 40 50 (c) our constrained topic model by different Titer (tratio = 0. [sent-113, score-0.9]
66 5 Implicit Feature Detection via SVM After completing attribute selection, vector space model(VSM) is applied to the selected attributes on the explicit sentences. [sent-115, score-0.483]
67 For each feature fi, a SVM classifier Ci is adopted. [sent-116, score-0.082]
68 In train-set, the positive cases are the explicit sentences of fi, and the negative cases are the other explicit sentences. [sent-117, score-0.514]
69 For a nonexplicit sentence, if the classification result of Ci is positive, it is an implicit sentence which implies fi. [sent-118, score-0.291]
70 1 Data Sets There has no standard data set yet, we crawled the experiment data, which included reviews about a cellphone, from a famous Chinese shopping web- site1 . [sent-120, score-0.03]
71 The feature of each sentence was manually annotated by two research assistants. [sent-122, score-0.082]
72 A handful of sentences which were annotated inconsistently were deleted. [sent-123, score-0.069]
73 Other features were ignored because of their rare appearance. [sent-125, score-0.033]
74 Here are some explanations: (1)The sentences containing several explicit features were not added to the train-set. [sent-126, score-0.275]
75 (2) A tiny number of sentences contain both explicit and implicit features, and they can only be regarded as explicit sentences. [sent-127, score-0.805]
76 (3) The training set contains 3 140 explicit sentences, the test set contains 7043 non-explicit sentences and more than 5500 sentences have no feature. [sent-128, score-0.311]
77 (4) According to the ratio among the explicit sentences(6: 1:2:3: 1:2), it is reasonable that the most suitable number of topics should be 14. [sent-129, score-0.27]
78 com/ 906 Table 1: Experiment data Features Explicit Implicit Total screen 1165 244 1409 quality 199 83 282 battery 456 205 661 price 627 561 1188 appearance 224 167 391 software 469 129 598 uct feature screen is 6, so we can assign the feature to topic 0,1,2,3,4,5. [sent-132, score-0.713]
79 (5) Although the size of dataset is limited, out proposed is based on the constraint-based topic model, which has been widely used in different NLP fields. [sent-134, score-0.206]
80 Of course, more high quality data will be collected to do the experiment in the future. [sent-136, score-0.03]
81 2 Experimental Results Figure 1a depicts the performance of using traditional attribute selection methods on SVM. [sent-138, score-0.271]
82 In our constrained topic model, we use different Titer and tratio. [sent-141, score-0.374]
83 We conducted experiments by incorporating different types prior knowledge. [sent-142, score-0.078]
84 From Figure 1b and 1c, we conclude that: (1)All these methods perform much better than the traditional feature selection methods, the improvements are more than 6%. [sent-143, score-0.205]
85 (2)The reason for the little improvement of must-links is that the topic clusters have already obtained these linked words. [sent-144, score-0.269]
86 (3)All the pre-existing knowledge performs best and shows 3% improvement over non prior knowledge. [sent-145, score-0.111]
87 (4)Different types of prior knowledge have different impact on the stabilities of different parameters. [sent-146, score-0.111]
88 (5)As we have expected, by combing allprior knowledge, the best performance can reach 77. [sent-147, score-0.03]
89 Furthermore, as tratio or Titer changes, our constrained topic model incorporating all prior knowledge look like very stable. [sent-149, score-0.637]
90 5 Conclusions In this paper, we adopt a constrained topic model incorporating prior knowledge to select attribute for SVM classifiers to detect implicit features. [sent-150, score-1.002]
91 Experiments show this method outperforms the attribute feature selection methods and detect implicit fea- tures better. [sent-151, score-0.637]
92 Using pointwise mutual information to identify implicit features in customer reviews. [sent-235, score-0.327]
wordName wordTfidf (topN-words)
[('zj', 0.393), ('fzj', 0.305), ('implicit', 0.291), ('wi', 0.253), ('explicit', 0.239), ('titer', 0.229), ('topic', 0.206), ('cw', 0.186), ('constrained', 0.168), ('tratio', 0.152), ('attribute', 0.148), ('price', 0.121), ('pmi', 0.115), ('svm', 0.108), ('lda', 0.103), ('tcw', 0.099), ('attributes', 0.096), ('product', 0.088), ('feature', 0.082), ('prior', 0.078), ('quelhas', 0.076), ('tdi', 0.076), ('twpcselmyrndiagit', 0.076), ('griffiths', 0.075), ('selection', 0.075), ('cheap', 0.073), ('appearance', 0.07), ('opinion', 0.069), ('tsinghua', 0.066), ('ww', 0.065), ('blei', 0.063), ('linked', 0.063), ('screen', 0.061), ('judgement', 0.06), ('andrzejewski', 0.056), ('ding', 0.056), ('fi', 0.055), ('zi', 0.055), ('updating', 0.053), ('cell', 0.051), ('china', 0.051), ('conflicting', 0.048), ('ko', 0.048), ('traditional', 0.048), ('noise', 0.047), ('ci', 0.046), ('line', 0.046), ('detection', 0.046), ('hai', 0.045), ('phone', 0.043), ('equations', 0.043), ('gibbs', 0.043), ('equation', 0.042), ('correlation', 0.042), ('beijing', 0.042), ('detect', 0.041), ('steyvers', 0.041), ('incorporated', 0.04), ('wj', 0.039), ('vision', 0.038), ('constitute', 0.038), ('implied', 0.037), ('classifiers', 0.037), ('liu', 0.037), ('specifies', 0.036), ('pointwise', 0.036), ('sentences', 0.036), ('dirichlet', 0.036), ('gmai', 0.036), ('established', 0.036), ('belongs', 0.033), ('ignored', 0.033), ('tuytelaars', 0.033), ('lowcost', 0.033), ('telecommunications', 0.033), ('inconsistently', 0.033), ('seett', 0.033), ('signifies', 0.033), ('cellphone', 0.033), ('inception', 0.033), ('inghua', 0.033), ('jw', 0.033), ('neglected', 0.033), ('nnj', 0.033), ('orient', 0.033), ('tsd', 0.033), ('xiaowen', 0.033), ('knowledge', 0.033), ('national', 0.031), ('notation', 0.031), ('topics', 0.031), ('descriptors', 0.03), ('iccv', 0.03), ('battery', 0.03), ('hua', 0.03), ('combing', 0.03), ('dso', 0.03), ('peo', 0.03), ('experiment', 0.03), ('ji', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang
Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.
2 0.1540902 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.
3 0.13977914 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao
Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.
4 0.12251944 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
Author: John Philip McCrae ; Philipp Cimiano ; Roman Klinger
Abstract: Cross-lingual topic modelling has applications in machine translation, word sense disambiguation and terminology alignment. Multilingual extensions of approaches based on latent (LSI), generative (LDA, PLSI) as well as explicit (ESA) topic modelling can induce an interlingual topic space allowing documents in different languages to be mapped into the same space and thus to be compared across languages. In this paper, we present a novel approach that combines latent and explicit topic modelling approaches in the sense that it builds on a set of explicitly defined topics, but then computes latent relations between these. Thus, the method combines the benefits of both explicit and latent topic modelling approaches. We show that on a crosslingual mate retrieval task, our model significantly outperforms LDA, LSI, and ESA, as well as a baseline that translates every word in a document into the target language.
5 0.10438029 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao
Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.
6 0.093975969 58 emnlp-2013-Dependency Language Models for Sentence Completion
7 0.090125494 152 emnlp-2013-Predicting the Presence of Discourse Connectives
8 0.085171297 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
9 0.078145966 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
10 0.078057632 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?
11 0.071985051 121 emnlp-2013-Learning Topics and Positions from Debatepedia
12 0.071696848 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach
13 0.071428634 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI
14 0.067277238 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes
15 0.065369084 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning
16 0.063621625 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
17 0.059446316 94 emnlp-2013-Identifying Manipulated Offerings on Review Portals
18 0.058571048 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
19 0.057370357 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression
20 0.057285972 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
topicId topicWeight
[(0, -0.188), (1, 0.062), (2, -0.107), (3, -0.007), (4, 0.02), (5, -0.013), (6, 0.079), (7, 0.045), (8, -0.03), (9, -0.016), (10, -0.078), (11, -0.207), (12, -0.12), (13, 0.093), (14, -0.024), (15, 0.115), (16, 0.093), (17, 0.014), (18, -0.041), (19, -0.058), (20, -0.014), (21, 0.058), (22, -0.133), (23, 0.089), (24, -0.095), (25, -0.074), (26, -0.044), (27, 0.104), (28, -0.014), (29, 0.035), (30, -0.01), (31, -0.117), (32, 0.085), (33, -0.02), (34, -0.032), (35, 0.001), (36, -0.003), (37, -0.237), (38, 0.183), (39, 0.109), (40, 0.027), (41, 0.04), (42, 0.111), (43, 0.098), (44, 0.031), (45, -0.085), (46, 0.083), (47, -0.017), (48, -0.01), (49, 0.111)]
simIndex simValue paperId paperTitle
same-paper 1 0.96500999 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang
Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.
2 0.74510682 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.
3 0.59676629 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao
Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.
4 0.59414971 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
Author: Hiroshi Noji ; Daichi Mochihashi ; Yusuke Miyao
Abstract: One of the language phenomena that n-gram language model fails to capture is the topic information of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all ngrams into exclusive topics, and two, developing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a document.
5 0.50922233 121 emnlp-2013-Learning Topics and Positions from Debatepedia
Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith
Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.
6 0.49337915 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
7 0.4435491 199 emnlp-2013-Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students
8 0.4273639 58 emnlp-2013-Dependency Language Models for Sentence Completion
9 0.42406625 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
10 0.42066625 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
11 0.41046831 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
12 0.37626415 25 emnlp-2013-Appropriately Incorporating Statistical Significance in PMI
13 0.37479344 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision
14 0.36292198 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
15 0.35310879 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
16 0.34631541 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition
17 0.34601432 152 emnlp-2013-Predicting the Presence of Discourse Connectives
18 0.33446038 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression
19 0.32029173 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
20 0.3154065 138 emnlp-2013-Naive Bayes Word Sense Induction
topicId topicWeight
[(3, 0.021), (9, 0.031), (18, 0.017), (22, 0.077), (30, 0.08), (50, 0.013), (51, 0.186), (62, 0.277), (66, 0.076), (71, 0.043), (75, 0.049), (77, 0.017), (96, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.79696655 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
Author: Wei Wang ; Hua Xu ; Xiaoqiu Huang
Abstract: Implicit feature detection, also known as implicit feature identification, is an essential aspect of feature-specific opinion mining but previous works have often ignored it. We think, based on the explicit sentences, several Support Vector Machine (SVM) classifiers can be established to do this task. Nevertheless, we believe it is possible to do better by using a constrained topic model instead of traditional attribute selection methods. Experiments show that this method outperforms the traditional attribute selection methods by a large margin and the detection task can be completed better.
2 0.64129621 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan
Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.
3 0.63887519 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.
4 0.63138133 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
5 0.6303786 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao
Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.
6 0.6300348 143 emnlp-2013-Open Domain Targeted Sentiment
7 0.62920547 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
8 0.62629151 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
9 0.62466502 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
10 0.62223941 152 emnlp-2013-Predicting the Presence of Discourse Connectives
11 0.62131673 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
12 0.6195792 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
13 0.61934322 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
14 0.61926889 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
15 0.61733532 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
16 0.61702365 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
17 0.61699522 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
18 0.61688471 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
19 0.61667967 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
20 0.61614782 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery