emnlp emnlp2012 emnlp2012-28 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yanyan Zhao ; Bing Qin ; Ting Liu
Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.
Reference: text
sentIndex sentText sentNum sentScore
1 Collocation Polarity Disambiguation Using Web-based Pseudo Contexts Yanyan Zhao, Bing Qin and Ting Liu∗ Harbin Institute of Technology, Harbin, China {yyzhao , bqin Abstract This paper focuses on the task of collocation polarity disambiguation. [sent-1, score-0.915]
2 To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. [sent-3, score-0.738]
3 However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. [sent-4, score-0.763]
4 We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity. [sent-5, score-0.746]
5 (Negative) camera (Negative) has [long]p The phrases marked with p superscript are the polarity-ambiguous words, and the phrases marked with t superscript are targets modified by the polarity words. [sent-35, score-0.719]
6 In the above two sentences, the sentiment orientation of the polarity word “长” (“long” in En- glish) changes along with different targets. [sent-36, score-0.745]
7 When modifying the target “ 电池寿命” (“battery life” in English), its polarity is positive; and when modifying “启 动时间” (“startup” in English), its polarity is PLraoncge uadgineg Lse oafr tnhineg 2,0 p1a2g Jeosin 16t C0–o1n7f0e,re Jnecjue Iosnla Enmd,p Kiroicraela, M 1e2t–h1o4ds Ju ilny N 20a1tu2r. [sent-37, score-1.156]
8 We analyze 4,861 common binary tuples of polarity words and their modified targets from 478 reviews1 , and find that over 20% of them are the collocations defined in this paper. [sent-43, score-0.715]
9 Therefore, the task ofcollocation polarity disambiguation is worthy of study. [sent-44, score-0.563]
10 For a sentence s containing such a collocation c, since the in-sentence features are always ambiguous, it is difficult to disambiguate the polarity of c by using them. [sent-45, score-0.969]
11 Thus some previous work turned to investigate its surrounding contexts’ polarities (such as the sentences before or after s), and then assigned the majority polarity to the collocation c (Hatzivassiloglou and McKeown, 1997; Hu and Liu, 2004; Kanayama and Nasukawa, 2006). [sent-46, score-1.058]
12 However, since the amount of contexts from the original review is very limited, the final resulting polarity for the collocation c is insufficient to be reliable. [sent-47, score-1.194]
13 Thus for a collocation, we can collect large amounts of contexts from other reviews to improve its polarity disambiguation. [sent-49, score-0.823]
14 These expanded contexts are called pseudo contexts in this paper. [sent-50, score-1.011]
15 , 2008) expanded some pseudo contexts from a topically-related review set. [sent-53, score-0.819]
16 In order to overcome this problem, we propose an unsupervised three-component framework to expand more pseudo contexts from web for the collocation polarity disambiguation. [sent-55, score-1.639]
17 Without using any labeled data, experiments on a Chinese data set from four product domains show that the three-component framework is feasible and the web-based pseudo contexts are useful for the collocation polarity disambiguation. [sent-56, score-1.579]
18 2 Related Work The key of the collocation polarity disambiguation task is to recognize the polarity word’s sentiment orientation of a collocation. [sent-66, score-1.682]
19 Overall, most of the above approaches aim to generate a large static polarity word lexicon marked with prior polarities. [sent-74, score-0.579]
20 , the collocations mentioned in this paper, rather than only the polarity words. [sent-79, score-0.661]
21 Some researchers exploited the features of the sentences containing colloca- tions to help disambiguate the polarity of the polarity-ambiguous word. [sent-81, score-0.595]
22 In addition, many related works tried to learn word polarity in a specific domain, but ignored the problem that even the same word in the same domain may indicate different polarities (Jijkoun et al. [sent-94, score-0.659]
23 1 Overview The motivation ofour approach is to make full use of web sources to collect more useful pseudo contexts for a collocation, whose original contexts are limited or unreliable. [sent-102, score-0.926]
24 Then we can extract the pseudo contexts from these snippets. [sent-109, score-0.645]
25 Sentiment Analysis: For both original contexts and the expanded pseudo contexts from web, a simple lexicon-based sentiment computing method is used to recognize each context’s polarity. [sent-111, score-1.177]
26 Combination: Two strategies are designed to integrate the polarities of the original and pseudo contexts, under the assumption that these two kinds of contexts can be complementary to each other. [sent-113, score-0.875]
27 1 Why Expanding Queries For a collocation, such as ⟨长, 电池寿命⟩ (⟨long, battery life⟩ cina English), athse ⟨ 长m,o电st池 池in寿tui命tiv⟩e ( query busatetde rfyor searching nisg lcioshn)s,tr uthceted m by tth ien fuoitrmive eo fq “utearryget + polarity word”, i. [sent-119, score-0.834]
28 Even if we search this query alone, a great many web snippets covering the polarity word and target will be retrieved. [sent-122, score-0.836]
29 In fact, for a collocation, though the amount of the retrieved snippets is large, lots of them cannot pro- 池 vide accurate pseudo contexts. [sent-124, score-0.608]
30 The reason is that the polarity words in some snippets do not really modify the targets, such as in the sentence “The battery life is short, and finds few buyers for a long time. [sent-125, score-0.92]
31 In addition, as the new query is short, in many retrieved snippets, there also exist no modifying relations between the polarity words and targets. [sent-129, score-0.776]
32 As a result, if we just use this query strategy, the expanded pseudo contexts are limited and cannot yield ideal performance. [sent-130, score-0.948]
33 Therefore, we need to design some effective query expansion strategies to ensure that (1) the polarity words do modify the targets in the retrieved web snippets, and (2) the snippets are more enough. [sent-131, score-1.215]
34 2 Query Expansion Strategy We first investigate the modifying relations between polarity words and the targets, and then construct effective queries. [sent-134, score-0.597]
35 Take the collocation ⟨长, 池 寿 命⟩ (⟨long, battery life⟩ ien English) as an example, 命the⟩ strategies are yde lsifcer⟩ibe ind as gfloilslohw) ass. [sent-141, score-0.573]
36 Strategy1: target + modifier + polarity word: Such as the query “电池寿命很长” or “电池寿命 非常长” (“the battery life is very long” in English). [sent-142, score-0.963]
37 电 163 Strategy2: modifier + polarity word + 的+ target: Such as the query “很长 的 电池寿命” or “非 常长的 电池寿命” (“very long battery life” in English). [sent-146, score-0.898]
38 This strategy also uses modifiers to modify polarity words, and the generated queries can satisfy the “attribute-head” relation. [sent-147, score-0.631]
39 Strategy3: negation word + polarity word + 的+ target: Such as the query “不长的 电池寿命” or “没 有长 的 池 寿 命” (“not long battery life” in English). [sent-148, score-0.912]
40 This strategy uses negation words to modify the polarity words. [sent-149, score-0.617]
41 The only difference is that the polarity of this kind of queries is opposite to that of the collocation. [sent-151, score-0.595]
42 Further, the pseudo contexts generated by these non-reviews are useless or even harmful. [sent-158, score-0.645]
43 From this function, we can collect the contexts of c by summing up all the pseudo contexts from every queryij. [sent-164, score-0.867]
44 In detail, the pseudo context acquisition algorithm for a collocation c is illustrated in Figure 2. [sent-166, score-0.842]
45 Analyzing either the pseudo contexts or the original contexts, we can find that not all of them are useful contexts. [sent-170, score-0.672]
46 3 Sentiment Analysis For both the original and expanded pseudo contexts, we employ the lexicon-based sentiment computing method (Hu and Liu, 2004) to compute the polarity value for each context. [sent-173, score-1.274]
47 The polarity value Polarity(con) for a context con 164 Algorithm: Pseudo Context Expansion Algorithm Input: A collocation c and the URL list Output: The pseudo context set Conx(c) 1. [sent-175, score-1.475]
48 is computed by summing up the polarity values of all words in con, making use of both the word polarity defined in the positive and negative lexicons and the contextual shifters defined in the negation lexicon. [sent-181, score-1.162]
49 In this algorithm, n is the parameter controlling the window size within which the negation words have influence on the polarity words, and here n is set to 3. [sent-183, score-0.581]
50 Normally, if the polarity value Polarity(con) is more than 0, the context con is labeled as positive; if less than 0, the context is negative. [sent-184, score-0.678]
51 Polarity(con) = ∑ Polarity (w) w∈W(con) Figure 3: The algorithm for context polarity computation. [sent-193, score-0.565]
52 4 Combination After the pseudo context acquisition and polarity computation, two kinds of effective contexts: original contexts and pseudo contexts, and their corresponding polarities can be obtained. [sent-195, score-1.838]
53 In order to yield a relatively accurate polarity Polarity(c) for a collocation c, we exploit the following combination methods: 1. [sent-196, score-0.915]
54 Suppose c has n effective contexts (including original and pseudo contexts), it can obtain n polarity tags based on the individual sentiment analysis algorithm. [sent-198, score-1.371]
55 The polarity tag receiving more votes is chosen as the final polarity of c. [sent-199, score-1.082]
56 Complementation: For a collocation c, we first employ “Majority Voting” method just on the expanded pseudo contexts to obtain the polarity tag. [sent-201, score-1.704]
57 165 If the polarity of c cannot be recognized2, the majority polarity tag voted on the original contexts is chosen as the final polarity tag. [sent-202, score-1.897]
58 Therefore, for a collocation, if we only consider its original contexts alone or the expanded pseudo contexts from the domainrelated review set alone, the contexts are obviously limited and unreliable. [sent-214, score-1.325]
59 Else, this method cannot disambiguate the polarity of c. [sent-231, score-0.595]
60 , 2008), we solve this task with the help of the pseudo contexts in the domain-related review dataset. [sent-234, score-0.675]
61 This method expands the pseudo contexts from the web. [sent-239, score-0.645]
62 The majority polarity is chosen as the final polarity. [sent-240, score-0.566]
63 The majority polarity of all the pseudo contexts is chosen as the final polarity. [sent-244, score-1.211]
64 166 Expwmevb/+cexp+com: This is the method proposed in this paper, which combines the original and expanded pseudo contexts. [sent-245, score-0.594]
65 Breaking down the boundary of the current review, Expdataset explores the pseudo contexts from other domainrelated reviews. [sent-252, score-0.68]
66 Further, Expweb+sig, Expweb+exp Expwmevb/+cexp+com and expand the pseudo contexts from web, which can be considered as a large corpus and can provide more evidences for the collocation polarity disambiguation. [sent-253, score-1.613]
67 (5804%72 ) Table 4: Comparative results for the collocation polarity disambiguation task. [sent-257, score-0.937]
68 Table 4 illustrates the comparative results of all systems for collocation polarity disambiguation. [sent-258, score-0.915]
69 In comparison, Expdataset adds a post-processing step of expanding pseudo contexts from the topically-related review dataset, which achieves a better result with an absolute improvement of 5. [sent-262, score-0.675]
70 However, Expdataset is just effective in disambiguating the polarity of such a collocation c, which appears many times in the domain-related reviews. [sent-265, score-0.969]
71 Thus, for such a collocation c, the pseudo contexts expanded from other reviews that contain the same c are still far from enough, since the review set size in this system is not very large. [sent-267, score-1.253]
72 In order to avoid the context limitation problem, we expand more pseudo contexts from web for each collocation. [sent-268, score-0.729]
73 It can demonstrate that our web mining based pseudo context expansion is useful for disambiguating the collocation’s polarity, since this system can explore more contexts. [sent-271, score-0.716]
74 This system can generate some harmful contexts for the reason of the wrong modifying relations between polarity words and targets in the retrieved snippets. [sent-273, score-0.893]
75 Thus this paper adds three query expansion strategies to generate more and accurate pseudo contexts. [sent-274, score-0.849]
76 Finally, Table 4 gives the results of our method in this paper, Expwmevb+exp+com and Expcweb+exp+com, which combines the original and expanded pseudo contexts to yield a final polarity. [sent-279, score-0.816]
77 We can observe that both of these systems outperform the system NoExp of just using the original contexts and the system Expweb+exp of just using the expanded pseudo contexts. [sent-280, score-0.816]
78 We can further find that, although the amount of original contexts is small, it also plays an important role in disambiguating the polarities of the collo167 cations that cannot be recognized by the expanded pseudo contexts. [sent-284, score-0.969]
79 2 The Contributions of the Query Expansion Strategies The expanded pseudo contexts from our method can be partly credited to the query expansion strategies. [sent-286, score-1.15]
80 (78%5407)Av21 g610(92#) Table 5: The performance of our method based on each query expansion strategy for collocation polarity disambiguation. [sent-291, score-1.312]
81 Table 5 provides the performance of our method based on each query expansion strategy for collocation polarity disambiguation. [sent-292, score-1.312]
82 For each strategy, “Avg” in Table 5 denotes the average number of the expanded pseudo contexts for each collocation. [sent-293, score-0.789]
83 This can further demonstrate our idea that more and effective pseudo contexts can improve the performance of the collocation polarity disambiguation task. [sent-296, score-1.601]
84 3 Deep Experiments in the Three-Component Framework In order to do a detailed analysis into our threecomponent framework, some deep experiments are made: Query Expansion The aim of query expansion is to retrieve lots of relative snippets, from which we can extract the useful pseudo contexts. [sent-301, score-0.849]
85 e527)g05y3 Table 6: The accuracies of the query expansion, pseudo context and sentiment analysis for each strategy. [sent-306, score-0.745]
86 snippet, if the polarity word of the collocation does modify the target, we consider this snippet as a correct query expansion result. [sent-307, score-1.307]
87 Pseudo Context For each expanded pseudo con- text from web, if it shows the same sentiment orientation with the collocation (or opposite with the collocation’s polarity because of the usage of transitional words), we consider this context as a correct pseudo context. [sent-308, score-2.163]
88 Sentiment Analysis For each expanded pseudo context, if its polarity can be correctly recognized by the polarity computation method in Figure 3, and meanwhile it shows the same sentiment orientation with the collocation, we consider this context as a correct one. [sent-309, score-1.877]
89 Table 6 illustrates the accuracy of each experiment for each strategy in detail, where 400 web retrieved snippets for Query Expansion and 400 expanded pseudo contexts for Pseudo Context and Sentiment Analysis are randomly selected and manually evaluated for each strategy. [sent-310, score-1.0]
90 The queries from Strategy0 are short, thus in many retrieved snippets, there exist no modifying relations between the polarity words and targets. [sent-316, score-0.671]
91 Accordingly, the pseudo contexts from these snippets are incorrect. [sent-317, score-0.749]
92 For example, we get all the pseudo contexts using the algorithm in Figure 2. [sent-323, score-0.645]
93 On the other hand, the context polarity computation algorithm in Figure 3 is just a simple attempt, which is not the best way to compute the context’s polarity. [sent-326, score-0.565]
94 6 Conclusion and Future Work This paper proposes a web-based context expansion framework for collocation polarity disambiguation. [sent-329, score-1.16]
95 The basic assumption of this framework is that, if a collocation appears in different forms, both within the same review and within topically-related reviews, then the large amounts of pseudo contexts from these reviews can help to disambiguate such a collocation’s polarity. [sent-330, score-1.182]
96 A framework including three independent components is proposed for collocation polarity disambiguation. [sent-335, score-0.934]
97 Web-based pseudo contexts are effective for disambiguating a collocation’s polarity. [sent-338, score-0.699]
98 The initial contexts from current reviews and the expanded contexts from web are complementary to each other. [sent-342, score-0.68]
99 The immediate extension of our work is to polish each component of this framework, such as improving the accuracy ofquery expansion and pseudo context acquisition, using other effective polarity computing methods for each context and so on. [sent-343, score-1.256]
100 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. [sent-529, score-0.667]
wordName wordTfidf (topN-words)
[('polarity', 0.541), ('pseudo', 0.423), ('collocation', 0.374), ('contexts', 0.222), ('expansion', 0.202), ('query', 0.159), ('expanded', 0.144), ('sentiment', 0.139), ('battery', 0.134), ('collocations', 0.12), ('polarities', 0.118), ('expweb', 0.105), ('snippets', 0.104), ('life', 0.103), ('con', 0.089), ('conx', 0.082), ('wiebe', 0.074), ('strategies', 0.065), ('orientation', 0.065), ('exp', 0.063), ('opinion', 0.063), ('reviews', 0.06), ('sig', 0.058), ('queries', 0.054), ('disambiguate', 0.054), ('targets', 0.054), ('hu', 0.052), ('ding', 0.052), ('hatzivassiloglou', 0.052), ('expdataset', 0.047), ('kanayama', 0.047), ('noexp', 0.047), ('startup', 0.047), ('janyce', 0.045), ('lots', 0.042), ('lexicons', 0.04), ('negation', 0.04), ('retrieved', 0.039), ('long', 0.038), ('avg', 0.037), ('modifying', 0.037), ('strategy', 0.036), ('disambiguating', 0.035), ('domainrelated', 0.035), ('expcweb', 0.035), ('expwmevb', 0.035), ('polar', 0.035), ('camera', 0.034), ('web', 0.032), ('snippet', 0.031), ('review', 0.03), ('esuli', 0.03), ('transitional', 0.03), ('wilson', 0.03), ('riloff', 0.03), ('forum', 0.03), ('expand', 0.028), ('com', 0.028), ('subjectivity', 0.028), ('liu', 0.028), ('original', 0.027), ('superscript', 0.027), ('velikovich', 0.027), ('modifier', 0.026), ('majority', 0.025), ('evidences', 0.025), ('sp', 0.025), ('context', 0.024), ('suzuki', 0.024), ('cexp', 0.023), ('jijkoun', 0.023), ('kamps', 0.023), ('polish', 0.023), ('quuery', 0.023), ('senaft', 0.023), ('senbef', 0.023), ('snip', 0.023), ('threecomponent', 0.023), ('usnip', 0.023), ('mckeown', 0.023), ('disambiguation', 0.022), ('acquisition', 0.021), ('voting', 0.021), ('kinds', 0.02), ('url', 0.02), ('subjective', 0.02), ('evaluative', 0.02), ('nasukawa', 0.02), ('bollegala', 0.02), ('kobayashi', 0.02), ('maarten', 0.02), ('orientations', 0.02), ('sentiments', 0.02), ('lexicon', 0.02), ('framework', 0.019), ('sites', 0.019), ('effective', 0.019), ('kaji', 0.018), ('harbin', 0.018), ('marked', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts
Author: Yanyan Zhao ; Bing Qin ; Ting Liu
Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.
2 0.34648538 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
3 0.18031076 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Takuya Kawada ; Stijn De Saeger ; Jun'ichi Kazama ; Yiou Wang
Abstract: In this paper we explore the utility of sentiment analysis and semantic word classes for improving why-question answering on a large-scale web corpus. Our work is motivated by the observation that a why-question and its answer often follow the pattern that if something undesirable happens, the reason is also often something undesirable, and if something desirable happens, the reason is also often something desirable. To the best of our knowledge, this is the first work that introduces sentiment analysis to non-factoid question answering. We combine this simple idea with semantic word classes for ranking answers to why-questions and show that on a set of 850 why-questions our method gains 15.2% improvement in precision at the top-1 answer over a baseline state-of-the-art QA system that achieved the best performance in a shared task of Japanese non-factoid QA in NTCIR-6.
4 0.14788443 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
Author: Jianfeng Gao ; Shasha Xie ; Xiaodong He ; Alnur Ali
Abstract: This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods.
5 0.1242087 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
Author: Altaf Rahman ; Vincent Ng
Abstract: We examine the task of resolving complex cases of definite pronouns, specifically those for which traditional linguistic constraints on coreference (e.g., Binding Constraints, gender and number agreement) as well as commonly-used resolution heuristics (e.g., string-matching facilities, syntactic salience) are not useful. Being able to solve this task has broader implications in artificial intelligence: a restricted version of it, sometimes referred to as the Winograd Schema Challenge, has been suggested as a conceptually and practically appealing alternative to the Turing Test. We employ a knowledge-rich approach to this task, which yields a pronoun resolver that outperforms state-of-the-art resolvers by nearly 18 points in accuracy on our dataset.
7 0.11141022 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
8 0.10694923 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
9 0.093640387 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
10 0.087008819 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
11 0.083517723 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
12 0.074966066 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
13 0.064742528 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
14 0.062383484 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
15 0.059041649 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
16 0.057338599 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
17 0.057241216 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
18 0.056970306 97 emnlp-2012-Natural Language Questions for the Web of Data
19 0.051042244 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
20 0.043195069 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
topicId topicWeight
[(0, 0.159), (1, 0.119), (2, 0.018), (3, 0.321), (4, 0.23), (5, -0.218), (6, -0.041), (7, -0.05), (8, 0.01), (9, -0.031), (10, -0.006), (11, -0.076), (12, 0.191), (13, 0.134), (14, -0.075), (15, -0.072), (16, -0.067), (17, 0.346), (18, -0.008), (19, -0.021), (20, -0.203), (21, 0.056), (22, -0.037), (23, -0.017), (24, -0.169), (25, 0.094), (26, -0.103), (27, -0.044), (28, 0.062), (29, -0.13), (30, 0.092), (31, 0.026), (32, 0.013), (33, -0.123), (34, 0.018), (35, -0.034), (36, 0.074), (37, 0.029), (38, 0.023), (39, -0.042), (40, 0.066), (41, -0.07), (42, 0.051), (43, -0.062), (44, -0.056), (45, 0.066), (46, -0.035), (47, -0.107), (48, -0.057), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.98500812 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts
Author: Yanyan Zhao ; Bing Qin ; Ting Liu
Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.
2 0.84838539 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum
Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.
3 0.44015759 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
Author: Jianfeng Gao ; Shasha Xie ; Xiaodong He ; Alnur Ali
Abstract: This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods.
Author: Ahmed Hassan ; Amjad Abu-Jbara ; Dragomir Radev
Abstract: A mixture of positive (friendly) and negative (antagonistic) relations exist among users in most social media applications. However, many such applications do not allow users to explicitly express the polarity of their interactions. As a result most research has either ignored negative links or was limited to the few domains where such relations are explicitly expressed (e.g. Epinions trust/distrust). We study text exchanged between users in online communities. We find that the polarity of the links between users can be predicted with high accuracy given the text they exchange. This allows us to build a signed network representation of discussions; where every edge has a sign: positive to denote a friendly relation, or negative to denote an antagonistic relation. We also connect our analysis to social psychology theories of balance. We show that the automatically predicted networks are consistent with those theories. Inspired by that, we present a technique for identifying subgroups in discussions by partitioning singed networks representing them.
5 0.35219845 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Takuya Kawada ; Stijn De Saeger ; Jun'ichi Kazama ; Yiou Wang
Abstract: In this paper we explore the utility of sentiment analysis and semantic word classes for improving why-question answering on a large-scale web corpus. Our work is motivated by the observation that a why-question and its answer often follow the pattern that if something undesirable happens, the reason is also often something undesirable, and if something desirable happens, the reason is also often something desirable. To the best of our knowledge, this is the first work that introduces sentiment analysis to non-factoid question answering. We combine this simple idea with semantic word classes for ranking answers to why-questions and show that on a set of 850 why-questions our method gains 15.2% improvement in precision at the top-1 answer over a baseline state-of-the-art QA system that achieved the best performance in a shared task of Japanese non-factoid QA in NTCIR-6.
6 0.30729663 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
7 0.25819045 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
9 0.23406665 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
10 0.20652972 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
11 0.20164058 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
12 0.20020902 44 emnlp-2012-Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web
13 0.19549149 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
14 0.19331828 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
15 0.18424708 41 emnlp-2012-Entity based QA Retrieval
16 0.17858018 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
17 0.16573054 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
18 0.16539463 97 emnlp-2012-Natural Language Questions for the Web of Data
19 0.15769495 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification
20 0.14840704 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
topicId topicWeight
[(2, 0.012), (16, 0.026), (34, 0.049), (60, 0.083), (63, 0.03), (64, 0.035), (65, 0.022), (70, 0.027), (73, 0.01), (74, 0.02), (76, 0.071), (86, 0.014), (95, 0.488)]
simIndex simValue paperId paperTitle
same-paper 1 0.90320599 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts
Author: Yanyan Zhao ; Bing Qin ; Ting Liu
Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.
2 0.87281525 83 emnlp-2012-Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls
Author: Kai Hong ; Christian G. Kohler ; Mary E. March ; Amber A. Parker ; Ani Nenkova
Abstract: We present a system for automatic identification of schizophrenic patients and healthy controls based on narratives the subjects recounted about emotional experiences in their own life. The focus of the study is to identify the lexical features that distinguish the two populations. We report the results of feature selection experiments that demonstrate that the classifier can achieve accuracy on patient level prediction as high as 76.9% with only a small set of features. We provide an in-depth discussion of the lexical features that distinguish the two groups and the unexpected relationship between emotion types of the narratives and the accuracy of patient status prediction.
3 0.85669428 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
Author: Keith Stevens ; Philip Kegelmeyer ; David Andrzejewski ; David Buttler
Abstract: We apply two new automated semantic evaluations to three distinct latent topic models. Both metrics have been shown to align with human evaluations and provide a balance between internal measures of information gain and comparisons to human ratings of coherent topics. We improve upon the measures by introducing new aggregate measures that allows for comparing complete topic models. We further compare the automated measures to other metrics for topic models, comparison to manually crafted semantic tests and document classification. Our experiments reveal that LDA and LSA each have different strengths; LDA best learns descriptive topics while LSA is best at creating a compact semantic representation ofdocuments and words in a corpus.
4 0.81066167 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
Author: Jianfeng Gao ; Shasha Xie ; Xiaodong He ; Alnur Ali
Abstract: This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods.
5 0.46514907 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
Author: Huizhong Duan ; Yanen Li ; ChengXiang Zhai ; Dan Roth
Abstract: Discriminative training in query spelling correction is difficult due to the complex internal structures of the data. Recent work on query spelling correction suggests a two stage approach a noisy channel model that is used to retrieve a number of candidate corrections, followed by discriminatively trained ranker applied to these candidates. The ranker, however, suffers from the fact the low recall of the first, suboptimal, search stage. This paper proposes to directly optimize the search stage with a discriminative model based on latent structural SVM. In this model, we treat query spelling correction as a multiclass classification problem with structured input and output. The latent structural information is used to model the alignment of words in the spelling correction process. Experiment results show that as a standalone speller, our model outperforms all the baseline systems. It also attains a higher recall compared with the noisy channel model, and can therefore serve as a better filtering stage when combined with a ranker.
6 0.44718105 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
7 0.42648524 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
8 0.41091889 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation
9 0.40976387 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes
10 0.40959051 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model
11 0.39584386 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
12 0.38779688 120 emnlp-2012-Streaming Analysis of Discourse Participants
13 0.38393772 63 emnlp-2012-Identifying Event-related Bursts via Social Media Activities
14 0.38074556 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level
15 0.37731135 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
16 0.37377554 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
17 0.35867757 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
18 0.35131484 97 emnlp-2012-Natural Language Questions for the Web of Data
19 0.34985077 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
20 0.34958315 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes