emnlp emnlp2012 emnlp2012-101 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kang Liu ; Liheng Xu ; Jun Zhao
Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). [sent-4, score-1.186]
2 At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. [sent-5, score-2.091]
3 Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. [sent-6, score-2.039]
4 By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. [sent-7, score-0.879]
5 By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. [sent-9, score-1.187]
6 In opinion mining, one fundamental problem is opinion target extraction. [sent-14, score-1.728]
7 For example, in the sentence of “The phone has a colorful and even amazing screen”, “screen” is an opinion target. [sent-17, score-0.982]
8 In online product reviews, opinion targets often are products or product features, so this task is also named as product feature extraction in previous work (Hu et al. [sent-18, score-1.24]
9 To extract opinion targets, many studies regarded opinion words as strong indicators (Hu et al. [sent-25, score-1.73]
10 , 2010), which is based on the observation that opinion words are usually located around opinion targets, and there are associations between them. [sent-30, score-1.792]
11 Therefore, most pervious methods iteratively extracted opinion targets depending upon the associations between opinion words and opinion targets (Qiu et al. [sent-31, score-3.202]
12 If “colorful” and “amazing” had been known to be opinion words, “screen” is likely to be an opinion target in this domain. [sent-35, score-1.728]
13 In addition, the extracted opinion targets can be used to expand more opinion words according to their associations. [sent-36, score-1.974]
14 Therefore, mining associations between opinion targets and opinion words is a key for opinion target extraction (Wu et al. [sent-38, score-3.016]
15 , 2008), named as adjacent methods, employed the adjacent rule, where an opinion target was regarded to have opinion relations with the surrounding opinion words in a given window. [sent-43, score-2.708]
16 However, because of the limitation of window size, opinion relations cannot be captured precisely, especially for long-span relations, which would hurt estimating associations between opinion targets and opinion words. [sent-44, score-2.967]
17 If the syntactic relation between an opinion word and an opinion target satisfied a designed pattern, then there was an opinion relation between them. [sent-51, score-2.598]
18 To overcome the weakness of the two kinds of methods mentioned above, we propose a novel unsupervised approach to extract opinion targets by using word-based translation model (WTM). [sent-60, score-1.167]
19 We formulate identifying opinion relations between opinion targets and opinion words as a word alignment task. [sent-61, score-2.88]
20 We argue that an opinion target can find its corresponding modifier through monolingual word alignment. [sent-62, score-0.911]
21 For example in Figure 1, the opinion words “colorful” and “amazing” are aligned with the target “screen” through word alignment. [sent-63, score-0.907]
22 To this end, we use WTM to perform monolingual word alignment for mining associations between opinion targets and opinion words. [sent-64, score-2.169]
23 Compared with adjacent methods, WTM doesn’t identify opinion relations between words in a given window, so long-span relations can be effectively captured (Liu et al. [sent-67, score-0.986]
24 In addition, by using WTM, our method can capture the “one-to-many” or “many-to-one” relations (“one-to-many” means that, in a sentence one opinion word modifies several opinion targets, and “many-to-one” means several opinion words modify one opinion target). [sent-71, score-3.418]
25 Thus, it’s reasonable to expect that WTM is likely to yield better performance than traditional methods for mining associations between opinion targets and opinion words. [sent-72, score-2.101]
26 Based on the mined associations, we extract opinion targets in a ranking framework. [sent-73, score-1.153]
27 All nouns/noun phrases are regarded as opinion target candidates. [sent-74, score-0.933]
28 Then a graph-based algorithm is exploited to assign confidences to each candidate, in which candidate opinion relevance and importance are incorporated to generate a global measure. [sent-75, score-1.091]
29 At last, the candidates with higher ranks are extracted as opinion targets. [sent-76, score-0.879]
30 , 2011), we don’t extract opinion targets iteratively based on the bootstrapping strategy, such as Double Propagation (Qiu et al. [sent-80, score-1.152]
31 1) We formulate the opinion relation identification between opinion targets and opinion words as a word alignment task. [sent-84, score-2.856]
32 2) We propose a graph-based algorithm for opinion target extraction in which candidate opinion relevance and importance are incorporated into a unified graph to estimate candidate confidence. [sent-87, score-2.048]
33 Then the candidates with higher confidence scores are extracted as opinion targets (in Section 3. [sent-88, score-1.238]
34 2 Related Work Many studies have focused on the task of opinion target extraction, such as (Hu et al. [sent-99, score-0.888]
35 In supervised approaches, the opinion target extraction task was usually regarded as a sequence labeling task (Jin et al. [sent-110, score-0.944]
36 (2010) proposed a Skip-Tree CRF model for opinion target extraction. [sent-119, score-0.888]
37 (2009) utilized a SVM classifier to identify relations between opinion targets and opinion expressions by leveraging phrase dependency parsing. [sent-122, score-2.032]
38 In unsupervised methods, most approaches regarded opinion words as the important indicators for opinion targets (Hu et al. [sent-124, score-1.985]
39 The basic idea was that reviewers often 1348 use the same opinion words when they comment on the similar opinion targets. [sent-130, score-1.68]
40 The extraction procedure was often a bootstrapping process which extracted opinion words and opinion targets iteratively, depending upon their associations. [sent-131, score-2.016]
41 (2005) used syntactic patterns to extract opinion target candidates. [sent-133, score-0.934]
42 The adjective nearest to the frequent explicit feature was extracted as an opinion word. [sent-137, score-0.906]
43 Then the extracted opinion words were used to extract infrequent opinion targets. [sent-138, score-1.719]
44 (2009, 2011) proposed a Double Propagation method to expand a domain sentiment lexicon and an opinion target set iteratively. [sent-142, score-0.908]
45 They exploited direct dependency relations between words to extract opinion targets and opinion words iteratively. [sent-143, score-2.045]
46 , 1999) algorithm to compute the feature relevance scores, which were simply multiplied by the log of feature frequencies to rank the extracted opinion targets. [sent-151, score-0.919]
47 2) Candidate confidence estimation: Based on these associations, we exploit a graph-based algorithm to compute the confidence of each opinion target candidate. [sent-155, score-1.054]
48 Then the candidates with higher confidence scores are extracted as opinion targets. [sent-156, score-0.962]
49 2 Mining associations between opinion targets and opinion words using Wordbased Translation Model This component is to identify potential opinion relations in sentences and estimate associations between opinion targets and opinion words. [sent-158, score-5.032]
50 We assume opinion targets and opinion words respectively to be nouns/noun phrases and adjectives, which have been widely adopted in previous work (Hu et al. [sent-159, score-1.986]
51 Thus, our aim is to find potential opinion relations between nouns/noun phrases and adjectives in sentences, and calculate the associations between them. [sent-164, score-1.035]
52 As mentioned in the first section, we formulate opinion relation identification as a word alignment task. [sent-165, score-0.9]
53 Moreover, these models may capture “one-to-many” or “many-to-one” opinion relations (mentioned in the first section). [sent-194, score-0.879]
54 We can see that our method using WTM can successfully capture associations between opinion targets and opinion words. [sent-206, score-2.068]
55 0 f0 t2w0 6a re Table 1: Examples of associations between opinion targets and opinion words. [sent-210, score-2.068]
56 3 Candidate Confidence Estimation In this component, we compute the confidence of each opinion target candidate and rank them. [sent-212, score-1.059]
57 The candidates with higher confidence are regarded as the opinion targets. [sent-213, score-0.973]
58 Opinion Relevance reflects the degree that a candidate is associated to opinion words. [sent-215, score-0.928]
59 If an adjective has higher confidence to be an opinion word, the noun/noun phrase it modifies will have higher confidence to be an opinion target. [sent-216, score-1.933]
60 1350 Similarly, if a noun/noun phrase has higher confidence to be an opinion target, the adjective which modifies it will be highly possible to be an opinion word. [sent-217, score-1.85]
61 We assign an importance score to an opinion target candidate f according to its tf- idf score, which is further normalized by the sum of tf- idf scores of all candidates. [sent-220, score-1.017]
62 An edge between a noun/noun phrase and an adjective represents that there is an opinion relation between them. [sent-224, score-0.923]
63 Opinion Target Candidates (nouns/noun phrases) Opinion Word Candidates (adjectives) Figure 2: Bipartite graph for modeling relations between opinion targets and opinion words 1 http://books. [sent-237, score-1.995]
64 M is matrix, a m n matrix, (6) confidence vector at candidate confidence an opinion relevance where M i, j is the associated weight between a noun/noun phrase iand an adjective j . [sent-241, score-1.223]
65 To consider the candidate importance scores, we introduce a reallocate condition: combining the candidate opinion relevance with the candidate importance at each step. [sent-242, score-1.247]
66 When 1 , the candidate confidence is completely determined by the candidate importance; and when 0 , the candidate confidence is determined by the candidate opinion relevance. [sent-245, score-1.358]
67 Using this equation, we estimate confidences for opinion target candidates. [sent-254, score-0.905]
68 The candidates with higher confidence scores than the threshold will be extracted as the opinion targets. [sent-255, score-0.962]
69 Then the opinion targets in Large were manually annotated as the gold standard for evaluations. [sent-275, score-1.116]
70 Then two annotators were required to judge whether every noun/noun phrase is opinion target or not. [sent-278, score-0.908]
71 In total, we respectively obtain 1,112, 1,241 and 1,850 opinion targets in Hotel, MP3 and Restaurant. [sent-282, score-1.116]
72 , 2004), which extracted opinion targets by using adjacent rule. [sent-347, score-1.17]
73 , 2011), which used Double Propagation algorithm to extract opinion targets depending on syntactic relations between words. [sent-349, score-1.176]
74 They extracted opinion targets candidates using syntactic patterns and other specific patterns. [sent-352, score-1.18]
75 Then HITS (Kleinberg 1999) algorithm combined with candidate frequency is employed to rank the results for opinion target extraction. [sent-353, score-0.976]
76 Hu is selected to represent adjacent methods for opinion target extraction. [sent-354, score-0.924]
77 Ours denotes full model of our method, in which we use IBM-3 model for identifying opinion relations between words. [sent-362, score-0.894]
78 This indicates that our method based on word-based translation model is effective for opinion target extraction. [sent-369, score-0.918]
79 The reason is that graph-based methods extract opinion targets in a global framework and they can effectively avoid the error propagation made by traditional methods based on Double Propagation. [sent-373, score-1.19]
80 We believe the reason is that Ours consider the opinion relevance and the candidate importance in a unified graph-based framework. [sent-375, score-1.03]
81 By contrast, Zhang only simply plus opinion relevance with frequency to determine the candidate confidence. [sent-376, score-0.989]
82 This indicates that our method is more effective for opinion target extraction than state-of-art methods, especially for large corpora. [sent-382, score-0.915]
83 On the other side, Ours uses WTM other than parsing to identify opinion relations between words, and the noises made by inaccurate parsing can be avoided. [sent-384, score-0.966]
84 An Example In Table 6, we show top 10 opinion targets extracted by Hu, DP, Zhang and Ours in MP3 of Large. [sent-390, score-1.134]
85 From these examples, we can see Ours extracts more correct opinion targets than others. [sent-393, score-1.116]
86 Moreover, Ours considers candidate importance besides opinion relevance, so some specific 1353 opinion targets are ranked to the fore, such as “voice recorder”, “fm radio” and “lcd screen”. [sent-396, score-2.085]
87 3 Effect of Word-based Translation Model In this subsection, we aim to prove the effectiveness of our WTM for estimating associations between opinion targets and opinion words. [sent-398, score-2.082]
88 4) is used to estimate associations between opinion targets and opinion words. [sent-406, score-2.068]
89 It indicates that WTM is effective for identifying opinion relations, which makes the estimation of the associations be more precise. [sent-412, score-0.952]
90 4 Effect of Our Graph-based Method In this subsection, we aim to prove the effectiveness of our graph-based method for opinion target extraction. [sent-414, score-0.902]
91 Both WTM_DP and WTM_HITS use WTM to mine associations between opinion targets and opinion words. [sent-416, score-2.068]
92 2009) to extract opinion targets, which only consider the candidate opinion relevance. [sent-419, score-1.789]
93 (2010) to extract opinion targets, which consider both candidate opinion relevance and frequency. [sent-421, score-1.85]
94 It indicates that candidate importance and candidate opinion relevance are both important for candidate confidence estimation. [sent-442, score-1.289]
95 The performance of opinion target extraction benefits from their combination. [sent-443, score-0.915]
96 Experimental results when varying 5 Conclusions and Future Work This paper proposes a novel graph-based approach to extract opinion targets using WTM. [sent-447, score-1.137]
97 Compared with previous adjacent methods and syntax-based methods, by using WTM, our method can capture opinion relations more precisely and therefore be more effective for opinion target extraction, especially for large informal Web corpora. [sent-448, score-1.85]
98 Meanwhile, we will add some syntactic information into WTM to constrain the word alignment process, in order to identify opinion relations between words more precisely. [sent-451, score-0.941]
99 Moreover, we believe that there are some verbs or nouns can be opinion words and they may be helpful for opinion target extraction. [sent-452, score-1.728]
100 And we think that it’s useful to add some prior knowledge of opinion words (sentiment lexicon) in our model for estimating candidate opinion relevance. [sent-453, score-1.768]
wordName wordTfidf (topN-words)
[('opinion', 0.84), ('targets', 0.276), ('wtm', 0.208), ('qiu', 0.126), ('associations', 0.112), ('reviews', 0.107), ('hu', 0.103), ('candidate', 0.088), ('confidence', 0.083), ('amazing', 0.066), ('pibm', 0.066), ('double', 0.064), ('screen', 0.063), ('relevance', 0.061), ('zhang', 0.055), ('bing', 0.054), ('dp', 0.049), ('adjective', 0.048), ('target', 0.048), ('waj', 0.047), ('alignment', 0.045), ('wtms', 0.044), ('ding', 0.042), ('liu', 0.042), ('importance', 0.041), ('wu', 0.04), ('relations', 0.039), ('propagation', 0.038), ('phone', 0.038), ('colorful', 0.038), ('customer', 0.036), ('chinese', 0.036), ('adjacent', 0.036), ('popescu', 0.033), ('aj', 0.033), ('mining', 0.033), ('informal', 0.033), ('translation', 0.03), ('noises', 0.03), ('hotel', 0.03), ('regarded', 0.029), ('exploited', 0.029), ('adjectives', 0.028), ('extraction', 0.027), ('wang', 0.027), ('product', 0.026), ('review', 0.026), ('patterns', 0.025), ('wn', 0.025), ('wj', 0.024), ('monolingual', 0.023), ('wa', 0.023), ('popsecu', 0.022), ('ct', 0.022), ('extract', 0.021), ('candidates', 0.021), ('customers', 0.021), ('bipartite', 0.021), ('sentiment', 0.02), ('limitation', 0.02), ('parsing', 0.02), ('phrase', 0.02), ('opinions', 0.02), ('products', 0.019), ('modifies', 0.019), ('chun', 0.019), ('guang', 0.019), ('fertility', 0.019), ('yuanbin', 0.019), ('wordbased', 0.019), ('aligned', 0.019), ('datasets', 0.018), ('jin', 0.018), ('li', 0.018), ('extracted', 0.018), ('subsection', 0.018), ('identify', 0.017), ('jiajun', 0.017), ('xiaowen', 0.017), ('confidences', 0.017), ('restaurant', 0.016), ('meanwhile', 0.016), ('vertices', 0.016), ('baselines', 0.016), ('qi', 0.016), ('mined', 0.016), ('phrases', 0.016), ('effectively', 0.015), ('relation', 0.015), ('moreover', 0.015), ('incorporated', 0.015), ('syntaxbased', 0.015), ('denotes', 0.015), ('bootstrapping', 0.015), ('align', 0.014), ('adopted', 0.014), ('precisely', 0.014), ('wi', 0.014), ('prove', 0.014), ('hits', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
Author: Kang Liu ; Liheng Xu ; Jun Zhao
Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1
2 0.42927575 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
Author: Bishan Yang ; Claire Cardie
Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.
3 0.31066528 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua
Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.
4 0.13205442 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
5 0.10694923 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts
Author: Yanyan Zhao ; Bing Qin ; Ting Liu
Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.
6 0.096132211 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
7 0.06650617 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
8 0.062854685 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
9 0.049757876 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
10 0.041389722 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
11 0.040879786 120 emnlp-2012-Streaming Analysis of Discourse Participants
12 0.040204883 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
13 0.038409796 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
14 0.038161315 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
15 0.037322041 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
16 0.036577236 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
17 0.034587175 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features
18 0.032554097 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
19 0.032391232 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
20 0.029255068 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
topicId topicWeight
[(0, 0.157), (1, 0.077), (2, 0.005), (3, 0.33), (4, 0.26), (5, -0.196), (6, -0.258), (7, -0.067), (8, -0.201), (9, -0.043), (10, 0.005), (11, 0.054), (12, 0.113), (13, -0.043), (14, -0.117), (15, 0.093), (16, 0.009), (17, -0.462), (18, -0.011), (19, 0.041), (20, 0.037), (21, -0.046), (22, -0.146), (23, -0.096), (24, -0.089), (25, 0.011), (26, -0.147), (27, -0.018), (28, -0.081), (29, 0.036), (30, 0.006), (31, -0.036), (32, 0.002), (33, 0.022), (34, -0.002), (35, -0.009), (36, 0.004), (37, -0.02), (38, -0.006), (39, 0.013), (40, -0.008), (41, -0.015), (42, -0.035), (43, 0.006), (44, 0.065), (45, -0.014), (46, -0.049), (47, -0.005), (48, 0.0), (49, 0.003)]
simIndex simValue paperId paperTitle
same-paper 1 0.98714781 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
Author: Kang Liu ; Liheng Xu ; Jun Zhao
Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1
2 0.91249681 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
Author: Bishan Yang ; Claire Cardie
Abstract: Extracting opinion expressions from text is usually formulated as a token-level sequence labeling task tackled using Conditional Random Fields (CRFs). CRFs, however, do not readily model potentially useful segment-level information like syntactic constituent structure. Thus, we propose a semi-CRF-based approach to the task that can perform sequence labeling at the segment level. We extend the original semi-CRF model (Sarawagi and Cohen, 2004) to allow the modeling of arbitrarily long expressions while accounting for their likely syntactic structure when modeling segment boundaries. We evaluate performance on two opinion extraction tasks, and, in contrast to previous sequence labeling approaches to the task, explore the usefulness of segment- level syntactic parse features. Experimental results demonstrate that our approach outperforms state-of-the-art methods for both opinion expression tasks.
3 0.56010556 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua
Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.
4 0.26892853 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
5 0.22865163 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith
Abstract: We investigate the use of language in food writing, specifically on restaurant menus and in customer reviews. Our approach is to build predictive models of concrete external variables, such as restaurant menu prices. We make use of a dataset of menus and customer reviews for thousands of restaurants in several U.S. cities. By focusing on prediction tasks and doing our analysis at scale, our methodology allows quantitative, objective measurements of the words and phrases used to de- scribe food in restaurants. We also explore interactions in language use between menu prices and sentiment as expressed in user reviews.
6 0.22417463 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
7 0.19902356 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts
8 0.13395399 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
9 0.13254517 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
10 0.12772073 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants
11 0.12619114 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
12 0.12413663 120 emnlp-2012-Streaming Analysis of Discourse Participants
13 0.12181208 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
14 0.11812683 100 emnlp-2012-Open Language Learning for Information Extraction
15 0.11457136 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
16 0.11308087 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
17 0.10991342 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
18 0.10551125 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
19 0.10491489 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
20 0.10452515 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
topicId topicWeight
[(2, 0.022), (16, 0.049), (25, 0.011), (34, 0.057), (60, 0.085), (63, 0.05), (64, 0.034), (65, 0.04), (70, 0.028), (73, 0.012), (74, 0.035), (76, 0.043), (79, 0.015), (80, 0.017), (86, 0.042), (90, 0.285), (95, 0.069)]
simIndex simValue paperId paperTitle
same-paper 1 0.76550376 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
Author: Kang Liu ; Liheng Xu ; Jun Zhao
Abstract: This paper proposes a novel approach to extract opinion targets based on wordbased translation model (WTM). At first, we apply WTM in a monolingual scenario to mine the associations between opinion targets and opinion words. Then, a graphbased algorithm is exploited to extract opinion targets, where candidate opinion relevance estimated from the mined associations, is incorporated with candidate importance to generate a global measure. By using WTM, our method can capture opinion relations more precisely, especially for long-span relations. In particular, compared with previous syntax-based methods, our method can effectively avoid noises from parsing errors when dealing with informal texts in large Web corpora. By using graph-based algorithm, opinion targets are extracted in a global process, which can effectively alleviate the problem of error propagation in traditional bootstrap-based methods, such as Double Propagation. The experimental results on three real world datasets in different sizes and languages show that our approach is more effective and robust than state-of-art methods. 1
2 0.69892621 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
3 0.45753339 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky
Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.
4 0.45628268 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
5 0.45463958 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
Author: Huizhong Duan ; Yanen Li ; ChengXiang Zhai ; Dan Roth
Abstract: Discriminative training in query spelling correction is difficult due to the complex internal structures of the data. Recent work on query spelling correction suggests a two stage approach a noisy channel model that is used to retrieve a number of candidate corrections, followed by discriminatively trained ranker applied to these candidates. The ranker, however, suffers from the fact the low recall of the first, suboptimal, search stage. This paper proposes to directly optimize the search stage with a discriminative model based on latent structural SVM. In this model, we treat query spelling correction as a multiclass classification problem with structured input and output. The latent structural information is used to model the alignment of words in the spelling correction process. Experiment results show that as a standalone speller, our model outperforms all the baseline systems. It also attains a higher recall compared with the noisy channel model, and can therefore serve as a better filtering stage when combined with a ranker.
6 0.44690406 83 emnlp-2012-Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls
7 0.44648004 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
8 0.44626597 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
9 0.44540632 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
10 0.44408327 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
11 0.44340587 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
12 0.44299978 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation
13 0.44231611 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
14 0.44148406 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
15 0.44123369 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
16 0.44111183 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
17 0.44109687 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
18 0.4389191 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
19 0.4388338 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
20 0.4384383 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media