acl acl2011 acl2011-162 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
Reference: text
sentIndex sentText sentNum sentScore
1 Identifying the Semantic Orientation of Foreign Words Ahmed Hassan EECS Department University of Michigan Ann Arbor, MI has s anam@umi ch . [sent-1, score-0.053]
2 rahul j ha@umi ch edu Abstract We present a method for identifying the positive or negative semantic orientation of foreign words. [sent-3, score-1.119]
3 Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. [sent-4, score-0.54]
4 Identifying the semantic orientation of English words has been extensively studied in literature. [sent-5, score-0.472]
5 Most of this work assumes the existence of resources (e. [sent-6, score-0.064]
6 Wordnet, seeds, etc) that do not exist in foreign languages. [sent-8, score-0.358]
7 In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. [sent-9, score-0.743]
8 We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. [sent-10, score-1.197]
9 The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results. [sent-11, score-0.25]
10 1 Introduction Amjad Abu-Jbara EECS Department University of Michigan Ann Arbor, MI amjbara@umi ch . [sent-12, score-0.053]
11 edu Dragomir Radev EECS Department and School of Information University of Michigan Ann Arbor, MI radev@ umi ch . [sent-13, score-0.128]
12 Another interesting application is mining attitude in discussions (Hassan et al. [sent-15, score-0.096]
13 , 2010), where the attitude of participants in a discussion is inferred using the text they exchange. [sent-16, score-0.065]
14 Due to its importance, several researchers have addressed the problem of identifying the semantic orientation of individual words. [sent-17, score-0.539]
15 For example Turney and Littman (2003) use the entire English Web corpus by submitting queries consisting of the given word and a set of seeds to a search engine. [sent-20, score-0.281]
16 In addition, several other methods have used Wordnet (Miller, 1995) for connecting semantically related words (Kamps et al. [sent-21, score-0.084]
17 When we try to apply those methods to other lan- guages, we run into the problem of the lack of resources in other languages when compared to English. [sent-24, score-0.123]
18 , 1966) has thousands of English words labeled with semantic orientation. [sent-26, score-0.169]
19 Most of the literature has used it as a source of labeled seeds or A great body of research work has focused on identifying the semantic orientation of words. [sent-27, score-0.78]
20 Word polarity is a very important feature that has been used in several applications. [sent-28, score-0.103]
21 For example, the problem of mining product reputation from Web reviews has been extensively studied (Turney, 2002; Morinaga et al. [sent-29, score-0.068]
22 In this work, we present a method for predicting the semantic orientation of foreign words. [sent-36, score-0.83]
23 posed method is based on creating a multilingual network of words that represents both English and foreign words. [sent-41, score-0.725]
24 The network has English-English connections, as well as foreign-foreign connections and English-foreign connections. [sent-42, score-0.326]
25 This allows us to benefit from the richness of the resources built for the English language and in the meantime utilize resources specific to foreign languages. [sent-43, score-0.486]
26 Figure 1 shows a multilingual network where a sparse foreign network and a dense English network are connected. [sent-44, score-1.178]
27 We then define a random walk model over the multilingual network and predict the semantic orientation of any given word by comparing the mean hitting time of a random walk starting from it to a positive and a negative set of seed English words. [sent-45, score-1.606]
28 We compare the performance of several methods using the foreign language resources only and the multilingual network that has both English and foreign words. [sent-47, score-1.114]
29 We show that bootstrapping from languages with dense resources such as English is useful for improving the performance on other languages with limited resources. [sent-48, score-0.232]
30 2 Related Work The problem of identifying the polarity of individual words is a well-studied problem that attracted several research efforts in the past few years. [sent-54, score-0.2]
31 They proposed a method for identifying the polarity of adjectives. [sent-57, score-0.167]
32 Their method is based on extracting all conjunctions of adjectives from a given corpus and then they classify each conjunctive expression as either the same orientation such as “simple and well-received” or different orientation such as “simplistic but well-received”. [sent-58, score-0.698]
33 Turney and Littman (2003) identify word polarity by looking at its statistical association with a set of positive/negative seed words. [sent-60, score-0.197]
34 Co-occurrence statistics are collected by submitting queries to a search engine. [sent-62, score-0.05]
35 The number of hits for positive seeds, negative seeds, positives seeds near the given word, and negative seeds near the given word are used to estimate the association ofthe given word to the positive/negative seeds. [sent-63, score-0.762]
36 Wordnet (Miller, 1995), thesaurus and cooccurrence statistics have been widely used to measure word relatedness by several semantic orientation prediction methods. [sent-64, score-0.542]
37 (2004) use the length of the shortest-path in Wordnet connecting any given word to positive/negative seeds to identify word polarity. [sent-66, score-0.282]
38 Hu and Liu (2004) use Wordnet synonyms and antonyms to bootstrap from words with known polarity to words with unknown polarity. [sent-67, score-0.247]
39 They assign any given word the label of its synonyms or the opposite label of its antonyms if any of them are known. [sent-68, score-0.078]
40 (2005) proposed using spin models for extracting semantic orientation of words. [sent-71, score-0.546]
41 They construct a network of words using gloss definitions, thesaurus and cooccurrence statistics. [sent-72, score-0.314]
42 Each electron has a spin and each spin has a direction taking one of two values: up or down. [sent-74, score-0.214]
43 Two neighboring spins tend to have the same orientation from an energetic point of view. [sent-75, score-0.387]
44 Their hypothesis is that as neighboring electrons tend to have the same spin direction, neighboring words tend to have similar polarity. [sent-76, score-0.216]
45 Hassan and Radev (2010) use a random walk model defined over a word relatedness graph to classify words as either positive or negative. [sent-77, score-0.24]
46 They measure the random walk mean hitting time of the given word to the positive set and the negative set. [sent-79, score-0.588]
47 Identifying the semantic orientation of individual words is closely related to subjectivity analysis. [sent-81, score-0.523]
48 Subjectivity analysis focused on identifying text that presents opinion as opposed to objective text that presents factual information (Wiebe, 2000). [sent-82, score-0.105]
49 Some approaches to subjectivity analysis disregard the context phrases and words appear in (Wiebe, 2000; Hatzivassiloglou and Wiebe, 2000; Banea et al. [sent-83, score-0.084]
50 3 Approach The general goal of this work is to mine the semantic orientation of foreign words. [sent-85, score-0.797]
51 We do this by creating a multilingual network of words. [sent-86, score-0.334]
52 In this network two words are connected ifwe believe that they are semantically related. [sent-87, score-0.251]
53 Some of the English words will be used as seeds for which we know the semantic orientation. [sent-89, score-0.354]
54 Given such a network, we will measure the mean hitting time in a random walk starting at any given word to the positive set of seeds and the negative set of seeds. [sent-90, score-0.851]
55 Positive words will be more likely to hit the positive set faster than hitting the negative set and vice versa. [sent-91, score-0.462]
56 In the rest of this section, we define how the multilingual word network is built and describe an algorithm for predicting the semantic orientation of any given word. [sent-92, score-0.806]
57 1 Multilingual Word Network We build a network G(V, E) where V = Ven ∪ Vfr is the union of a set of English and foreign w∪or Vds. [sent-94, score-0.576]
58 There are three types of connections: English-English connections, Foreign-Foreign connections and EnglishForeign connections. [sent-96, score-0.108]
59 Foreign-Foreign connections are created in a sim- ilar way to the English connections. [sent-102, score-0.108]
60 Some other languages have lexical resources based on the design of the Princeton English Wordnet. [sent-103, score-0.123]
61 Finally, to connect foreign words to English words, we use a foreign to English dictionary. [sent-108, score-0.749]
62 For every word in a list of foreign words, we look up its meaning in a dictionary and add an edge between the foreign word and every other English word that appeared as a possible meaning for it. [sent-109, score-0.716]
63 2 Semantic Orientation Prediction We use the multilingual network we described above to predict the semantic orientation of words based on the mean hitting time to two sets of positive and negative seeds. [sent-111, score-1.351]
64 Let the average number of steps that a random walker starting at some node i will need to enter a state k ∈ S be h(i|S). [sent-113, score-0.137]
65 It can be formally defined as: h(i|S) =(P0j∈Vpij× h(j|S) + 1 oit ∈he Srwise (2) where pij iPs the transition probability between node iand node j. [sent-114, score-0.076]
66 Given two lists of seed English words with known polarity, we define two sets of nodes S+ and S− representing t dheofsinee ese tewdos. [sent-115, score-0.127]
67 Fetosr any given Sw+or dan w, we calculate the mean hitting time between w and the two seed sets h(w|S+) and h(w|S−). [sent-116, score-0.43]
68 dW aes used the list of labeled seeds from (Hatzivassiloglou and McKeown, 1997) and (Stone et al. [sent-119, score-0.277]
69 Sev- eral other similarity measures may be used to predict whether a given word is closer to the positive seeds list or the negative seeds list (e. [sent-121, score-0.671]
70 However hitting time has been shown to be more efficient and more accurate (Hassan and Radev, 2010) because it measures connectivity rather than distance. [sent-125, score-0.287]
71 1 Data We used Wordnet (Miller, 1995) as a source of synonyms and hypernyms for linking English words in the word relatedness graph. [sent-129, score-0.11]
72 We used two foreign languages for our experiments Arabic and Hindi. [sent-130, score-0.417]
73 Both languages have a Wordnet that was constructed based on the design the Princeton English Wordnet. [sent-131, score-0.059]
74 In addition, we used three lexicons with words labeled as either positive or negative. [sent-136, score-0.199]
75 The lexicon contains 4206 words, 1915 of which are positive and 2291 are negative. [sent-139, score-0.117]
76 For Arabic and Hindi we constructed a labeled set of 300 words for each language 595 100 SO-PMIHT-FRHT-FR+EN Figure 2: Accuracy of the proposed method and baselines for both Arabic and Hindi. [sent-140, score-0.079]
77 Those sets were labeled by two native speakers of each language. [sent-142, score-0.046]
78 This method is based on finding the semantic association of any given word to a set of positive and a set of negative words. [sent-148, score-0.261]
79 It can be calculated as follows: SO-PMI(w) = loghhiit ssww,,pnoesg×× h hiit ssnpoesg (3) where w is a word with unknown polarity, hitsw,pos is the number of hits returned by a commercial search engine when the search query is the given word and the disjunction of all positive seed words. [sent-149, score-0.274]
80 hitspos is the number of hits when we search for the disjunction of all positive seed words. [sent-150, score-0.274]
81 We used 7 positive and 7 negative seeds as described in (Turney and Littman, 2003). [sent-152, score-0.402]
82 The second baseline constructs a network of foreign words only as described earlier. [sent-153, score-0.609]
83 It uses mean hitting time to find the semantic association of any given word. [sent-154, score-0.426]
84 Finally, we build a multilingual network and use the hitting time as before to predict semantic orientation. [sent-157, score-0.749]
85 , 1966) as seeds and the labeled foreign words for evaluation. [sent-159, score-0.668]
86 We notice that the SO-PMI and the hitting time based methods perform poorly on both Arabic and Hindi. [sent-162, score-0.287]
87 This supports our hypothesis that state of the art methods, designed for English, perform poorly on foreign languages due to the limited amount of resources available in foreign languages compared to English. [sent-164, score-0.898]
88 The figure also shows that the proposed method, which combines resources from both English and foreign languages, performs significantly better. [sent-165, score-0.422]
89 Finally, we studied how much improvement is achieved by including links between foreign words from global Wordnets. [sent-166, score-0.391]
90 5 Conclusions We addressed the problem of predicting the semantic orientation of foreign words. [sent-169, score-0.866]
91 Applying off-the-shelf methods developed for English to other languages does not work well because of the limited amount of resources available in foreign languages compared to English. [sent-171, score-0.54]
92 We proposed a method based on the construction of a multilingual network that uses both language specific resources as well as the rich semantic relations available in English. [sent-172, score-0.488]
93 We then use a model that computes the mean hitting time to a set of positive and negative seed words to predict whether a given word has a positive or a negative semantic orientation. [sent-173, score-0.933]
94 We showed that the proposed method can predict semantic orientation with high accuracy. [sent-174, score-0.477]
95 All statements of fact, opinion or conclusions contained herein are those of the authors and should not be construed as representing the ofcial views or policies of IARPA, the ODNI or the U. [sent-179, score-0.041]
96 A bootstrapping method for building subjectivity lexicons for languages with scarce resources. [sent-184, score-0.144]
97 Effects of adjective orientation and gradability on sentence subjectivity. [sent-233, score-0.349]
98 An experience in building the indo wordnet - a wordnet for hindi. [sent-261, score-0.458]
99 Measuring praise and criticism: Inference of semantic orientation from association. [sent-298, score-0.439]
100 Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. [sent-317, score-0.249]
wordName wordTfidf (topN-words)
[('foreign', 0.358), ('orientation', 0.349), ('hitting', 0.258), ('seeds', 0.231), ('wordnet', 0.229), ('network', 0.218), ('elkateb', 0.197), ('arabic', 0.141), ('hassan', 0.141), ('kamps', 0.124), ('multilingual', 0.116), ('littman', 0.114), ('jha', 0.113), ('hatzivassiloglou', 0.111), ('connections', 0.108), ('spin', 0.107), ('turney', 0.104), ('polarity', 0.103), ('vossen', 0.099), ('seed', 0.094), ('stone', 0.094), ('semantic', 0.09), ('radev', 0.087), ('positive', 0.086), ('nasukawa', 0.086), ('negative', 0.085), ('alkhalifa', 0.085), ('narayan', 0.085), ('pease', 0.085), ('hindi', 0.083), ('walk', 0.081), ('eecs', 0.075), ('umi', 0.075), ('miller', 0.071), ('inquirer', 0.069), ('attitude', 0.065), ('ahmed', 0.064), ('banea', 0.064), ('resources', 0.064), ('identifying', 0.064), ('michigan', 0.062), ('english', 0.06), ('languages', 0.059), ('morinaga', 0.056), ('janyce', 0.055), ('takamura', 0.054), ('ch', 0.053), ('vasileios', 0.053), ('arbor', 0.051), ('fellbaum', 0.051), ('connecting', 0.051), ('subjectivity', 0.051), ('dense', 0.05), ('disjunction', 0.05), ('rodriguez', 0.05), ('awn', 0.05), ('pande', 0.05), ('submitting', 0.05), ('mean', 0.049), ('black', 0.049), ('dragomir', 0.047), ('wiebe', 0.047), ('labeled', 0.046), ('odni', 0.046), ('kanayama', 0.046), ('popescu', 0.045), ('hits', 0.044), ('iarpa', 0.043), ('princeton', 0.041), ('maarten', 0.041), ('antonyms', 0.041), ('orientations', 0.041), ('opinion', 0.041), ('relatedness', 0.04), ('tetsuya', 0.039), ('predict', 0.038), ('node', 0.038), ('neighboring', 0.038), ('mi', 0.038), ('synonyms', 0.037), ('product', 0.037), ('addressed', 0.036), ('mckeown', 0.036), ('enter', 0.035), ('lexicons', 0.034), ('ann', 0.034), ('rahul', 0.034), ('thesaurus', 0.033), ('predicting', 0.033), ('words', 0.033), ('starting', 0.032), ('walker', 0.032), ('mining', 0.031), ('lexicon', 0.031), ('cooccurrence', 0.03), ('thumbs', 0.03), ('time', 0.029), ('database', 0.029), ('etzioni', 0.029), ('kdd', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
2 0.19351518 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Author: Dipanjan Das ; Slav Petrov
Abstract: We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised model (BergKirkpatrick et al., 2010). Across eight European languages, our approach results in an average absolute improvement of 10.4% over a state-of-the-art baseline, and 16.7% over vanilla hidden Markov models induced with the Expectation Maximization algorithm.
3 0.16381383 159 acl-2011-Identifying Noun Product Features that Imply Opinions
Author: Lei Zhang ; Bing Liu
Abstract: Identifying domain-dependent opinion words is a key problem in opinion mining and has been studied by several researchers. However, existing work has been focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases. In our work, we used the feature-based opinion mining model, and we found that in some domains nouns and noun phrases that indicate product features may also imply opinions. In many such cases, these nouns are not subjective but objective. Their involved sentences are also objective sentences and imply positive or negative opinions. Identifying such nouns and noun phrases and their polarities is very challenging but critical for effective opinion mining in these domains. To the best of our knowledge, this problem has not been studied in the literature. This paper proposes a method to deal with the problem. Experimental results based on real-life datasets show promising results. 1
4 0.15196121 174 acl-2011-Insights from Network Structure for Text Mining
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
5 0.12783672 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
6 0.11007275 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
7 0.096438102 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
8 0.093483038 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
9 0.092514597 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
10 0.090279184 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
11 0.089904778 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features
12 0.089440033 177 acl-2011-Interactive Group Suggesting for Twitter
13 0.084030144 167 acl-2011-Improving Dependency Parsing with Semantic Classes
14 0.080963627 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality
15 0.079581328 94 acl-2011-Deciphering Foreign Language
16 0.078912832 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification
17 0.07775066 253 acl-2011-PsychoSentiWordNet
18 0.075235583 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
19 0.075185679 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
20 0.075053111 105 acl-2011-Dr Sentiment Knows Everything!
topicId topicWeight
[(0, 0.165), (1, 0.101), (2, 0.049), (3, -0.02), (4, 0.005), (5, -0.015), (6, 0.087), (7, 0.002), (8, -0.006), (9, -0.037), (10, 0.0), (11, -0.059), (12, -0.026), (13, 0.057), (14, -0.011), (15, -0.179), (16, 0.107), (17, -0.025), (18, -0.014), (19, -0.085), (20, -0.026), (21, 0.108), (22, 0.088), (23, 0.124), (24, 0.035), (25, -0.018), (26, 0.026), (27, 0.154), (28, 0.061), (29, 0.062), (30, 0.176), (31, -0.046), (32, -0.026), (33, -0.024), (34, 0.143), (35, -0.072), (36, 0.053), (37, 0.064), (38, 0.013), (39, -0.012), (40, -0.006), (41, -0.093), (42, -0.026), (43, 0.099), (44, 0.064), (45, -0.053), (46, 0.003), (47, 0.046), (48, -0.015), (49, -0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.94389766 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
2 0.6383127 174 acl-2011-Insights from Network Structure for Text Mining
Author: Zornitsa Kozareva ; Eduard Hovy
Abstract: Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network, in which words are the nodes and relations linking them are the edges. The results of computational network analysis, especially from the world wide web, are thus applicable. Surprisingly, these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining, how they explain some previously observed phenomena, and how they can be helpful for computational linguistics applications.
3 0.61993605 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
Author: Mitesh M. Khapra ; Salil Joshi ; Arindam Chatterjee ; Pushpak Bhattacharyya
Abstract: Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1) can benefit from the annotation work done in a resource rich language (L2) via parameter projection. However, this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. We then use bilingual bootstrapping, wherein, a model trained using the seed annotated data of L1 is used to annotate the untagged data of L2 and vice versa using parameter projection. The untagged instances of L1 and L2 which get annotated with high confidence are then added to the seed data of the respective languages and the above process is repeated. Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.
4 0.60252005 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
5 0.52581453 159 acl-2011-Identifying Noun Product Features that Imply Opinions
Author: Lei Zhang ; Bing Liu
Abstract: Identifying domain-dependent opinion words is a key problem in opinion mining and has been studied by several researchers. However, existing work has been focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases. In our work, we used the feature-based opinion mining model, and we found that in some domains nouns and noun phrases that indicate product features may also imply opinions. In many such cases, these nouns are not subjective but objective. Their involved sentences are also objective sentences and imply positive or negative opinions. Identifying such nouns and noun phrases and their polarities is very challenging but critical for effective opinion mining in these domains. To the best of our knowledge, this problem has not been studied in the literature. This paper proposes a method to deal with the problem. Experimental results based on real-life datasets show promising results. 1
6 0.50280869 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
7 0.49430612 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
8 0.4880887 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon
9 0.48020273 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates
10 0.47189918 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
11 0.43960342 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
12 0.42694187 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
13 0.42103299 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
14 0.40681714 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic
15 0.38723227 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination
16 0.38215792 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
17 0.37719375 299 acl-2011-The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
18 0.37187165 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
19 0.36893585 297 acl-2011-That's What She Said: Double Entendre Identification
20 0.36848813 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues
topicId topicWeight
[(5, 0.023), (17, 0.034), (18, 0.29), (26, 0.028), (37, 0.079), (39, 0.026), (41, 0.045), (53, 0.049), (55, 0.012), (59, 0.061), (72, 0.02), (88, 0.019), (91, 0.057), (96, 0.165), (97, 0.015)]
simIndex simValue paperId paperTitle
1 0.76306427 121 acl-2011-Event Discovery in Social Media Feeds
Author: Edward Benson ; Aria Haghighi ; Regina Barzilay
Abstract: We present a novel method for record extraction from social streams such as Twitter. Unlike typical extraction setups, these environments are characterized by short, one sentence messages with heavily colloquial speech. To further complicate matters, individual messages may not express the full relation to be uncovered, as is often assumed in extraction tasks. We develop a graphical model that addresses these problems by learning a latent set of records and a record-message alignment simultaneously; the output of our model is a set of canonical records, the values of which are consistent with aligned messages. We demonstrate that our approach is able to accurately induce event records from Twitter messages, evaluated against events from a local city guide. Our method achieves significant error reduction over baseline methods.1
same-paper 2 0.74187833 162 acl-2011-Identifying the Semantic Orientation of Foreign Words
Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev
Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.
3 0.66051918 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.
4 0.66040182 182 acl-2011-Joint Annotation of Search Queries
Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith
Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.
5 0.57958621 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis
Author: Amjad Abu-Jbara ; Dragomir Radev
Abstract: In this paper we present Clairlib, an opensource toolkit for Natural Language Processing, Information Retrieval, and Network Analysis. Clairlib provides an integrated framework intended to simplify a number of generic tasks within and across those three areas. It has a command-line interface, a graphical interface, and a documented API. Clairlib is compatible with all the common platforms and operating systems. In addition to its own functionality, it provides interfaces to external software and corpora. Clairlib comes with a comprehensive documentation and a rich set of tutorials and visual demos.
6 0.57670611 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
7 0.57535326 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
8 0.5749557 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web
9 0.57487059 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
10 0.57469273 66 acl-2011-Chinese sentence segmentation as comma classification
11 0.57452792 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
12 0.57411897 159 acl-2011-Identifying Noun Product Features that Imply Opinions
13 0.573901 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
14 0.57381779 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
15 0.57374132 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
16 0.57305741 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
17 0.57297325 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
18 0.57246357 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
19 0.57242876 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques
20 0.57061172 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation