acl acl2010 acl2010-105 knowledge-graph by maker-knowledge-mining

105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Source: pdf

Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee

Abstract: Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. [sent-4, score-0.916]

2 Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. [sent-5, score-0.839]

3 This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view. [sent-6, score-1.767]

4 1 Introduction The field of NLP has seen a recent surge in the amount of research on subjectivity analysis. [sent-7, score-0.733]

5 These endeavors have been successful in constructing lexicons, annotated corpora, and tools for subjectivity analysis in multiple languages. [sent-9, score-0.83]

6 , 2007)1 and TextMap, an entity search engine developed by Stony Brook University for sentiment analysis along with other functionalities (Bautin et al. [sent-11, score-0.194]

7 2 Though these systems currently rely on English analysis tools and a machine translation (MT) technology to 1http://oasys. [sent-13, score-0.181]

8 com/ translate other languages into English, up-to-date research provides various ways to analyze subjectivity in multilingual environments. [sent-18, score-1.073]

9 Given sentiment analysis systems in different languages, there are many situations when the analysis outcomes need to be multilanguagecomparable. [sent-19, score-0.354]

10 Surveying these opinions and sentiments in various languages involves merging the analysis outcomes into a single database, thereby objectively comparing the result across languages. [sent-22, score-0.322]

11 If there exists an ideal subjectivity analysis system for each language, evaluating the multilanguage-comparability would be unneces- sary because the analysis in each language would correctly identify the exact meanings of all input texts regardless of the language. [sent-23, score-1.013]

12 However, this requirement is not fulfilled with current technology, thus the need for defining and measuring the multilanguage-comparability of subjectivity analysis systems is evident. [sent-24, score-0.846]

13 This paper proposes to evaluate the multilanguage-comparability of multilingual subjectivity analysis systems. [sent-25, score-0.975]

14 We build a number of subjectivity classifiers that distinguishes subjective texts from objective ones, and measure the multilanguage-comparability according to our proposed evaluation method. [sent-26, score-1.031]

15 These approaches enable us to extend a monolingual system to many languages with a number of freely available NLP resources and tools. [sent-30, score-0.16]

16 2 Related Work Much research have been put into developing methods for multilingual subjectivity analysis recently. [sent-31, score-0.975]

17 (2008) proposed a number of approaches exploiting a bilingual dictionary, a parallel corpus, and an MT system to port the resources and systems available in English to languages with limited resources. [sent-37, score-0.287]

18 To overcome the shortcomings of available resources and to take advantage of ensemble systems, Wan (2008) and Wan (2009) explored methods for developing a hybrid system for Chinese using English and Chinese sentiment analyzers. [sent-41, score-0.216]

19 (2008) and Boiy and Moens (2009) have created manually annotated gold standards in target languages and studied various feature selection and learning techniques in machine learning approaches to analyze sentiments in multilingual web documents. [sent-43, score-0.433]

20 For learning multilingual subjectivity, the literature tentatively concludes that translating lexicon is less dependable in terms of preserving subjectivity than corpus translation (Mihalcea et al. [sent-44, score-1.01]

21 Based on the observation that the performances of subjectivity analysis systems in comparable experimental settings for two languages differ, Figure 1: Examples of sentiments in multilingual text Banea et al. [sent-48, score-1.22]

22 (2008) have attributed the variations in the difficulty level of subjectivity learning to the differences in language construction. [sent-49, score-0.733]

23 (2008)’s system analyzes the sentiment scores of entities in multilingual news and blogs and adjusted the sentiment scores using entity sentiment probabilities of languages. [sent-51, score-0.758]

24 1 Motivation The quality of a subjectivity analysis tool is measured by its ability to distinguish subjectivity from objectivity and/or positive sentiments from negative sentiments. [sent-53, score-1.596]

25 Let us consider two cases where the pairs of multilingual inputs in English and Korean have identical and different subjectivity meanings (Figure 1). [sent-55, score-1.028]

26 The first pair of texts carry a negative sentiment about how the release of a new electronics device might affect an emerging business market. [sent-56, score-0.21]

27 The second pair of texts share a similar positive sentiment about a mobile device’s battery capacity but with different strengths. [sent-58, score-0.21]

28 A good multilingual system must be able to identify the positive sentiments and distinguish the differences in their intensities. [sent-59, score-0.293]

29 The first approach requires multilingual texts aligned at the level of specificity, for instance, document, sentence and phrase, that the subjectivity analysis system works. [sent-66, score-1.113]

30 Annotating these types of corpus can be efficient; as parallel texts must have identical semantic meanings, subjectivity–related annotations for one language can be projected into other languages with- out much loss of accuracy. [sent-68, score-0.238]

31 The latter approach accepts any pair of multilingual texts as long as they are annotated with labels and/or intensity. [sent-69, score-0.304]

32 In this case, evaluating the label consistency of a multilingual system is only as difficult as evaluating that of a monolingual system; we can produce all possible pairs of texts from test corpora annotated with labels for each language. [sent-70, score-0.367]

33 In this paper, we utilize the first approach because it provides a more rational means; we can reasonably hypothesize that text translated into another language by a skilled translator carries an identical semantic meaning and thereby conveys identical subjectivity. [sent-72, score-0.191]

34 For evaluation, we measure the consistency in the subjectivity labels and the correlation of subjectivity intensity scores of parallel texts. [sent-74, score-1.728]

35 4 Multilingual Subjectivity System We create a number of multilingual systems consisting of multiple subsystems each processing a language, where one system analyzes English, and the other systems analyze the Korean, Chinese, and Japanese languages. [sent-77, score-0.404]

36 1 Source Language System We adopt the three systems described below as our source language systems: a state-of-the-art subjectivity classifier, a corpus-based, and a lexiconbased systems. [sent-80, score-0.885]

37 In addition, these systems cover the general spectrum of current approaches to subjectivity analysis. [sent-82, score-0.787]

38 State-of-the-art (S-SA): OpinionFinder is a publicly-available NLP tool for subjectivity analysis (Wiebe and Riloff, 2005; Wilson et al. [sent-83, score-0.792]

39 3 The software and its resources have been widely used in the field of subjectivity analysis, and it has been the de facto standard system against which new systems are validated. [sent-85, score-0.868]

40 We use a highcoverage classifier from the OpinionFinder’s two sentence-level subjectivity classifiers. [sent-86, score-0.809]

41 The classifier assesses a sentence’s subjectivity with a label and a score for confidence in its judgment. [sent-88, score-0.773]

42 4 We retrieve the sentence level subjectivity labels for 11,111 sentences using the set of rules described in (Wiebe and Riloff, 2005). [sent-95, score-0.803]

43 The corpus provides a relatively balanced corpus with 55% subjective sentences. [sent-96, score-0.187]

44 Previous studies have found that, among several ML-based approaches, the SVM classifier generally performs well in many subjectivity analysis tasks (Pang et al. [sent-98, score-0.832]

45 Lexicon-based (S-LB): OpinionFinder contains a list of English subjectivity clue words with intensity labels (Wilson et al. [sent-103, score-0.853]

46 Riloff and Wiebe (2003) constructed a highprecision classifier for contiguous sentences using the number of strong and weak subjective words in current and nearby sentences. [sent-106, score-0.259]

47 Using the lexicon, we build a simple and highcoverage rule-based subjectivity classifier. [sent-108, score-0.769]

48 Setting the scores of strong and weak subjective words as 1. [sent-109, score-0.22]

49 5, we evaluate the subjectivity of a given sentence as the sum of subjectivity scores; above a threshold, the input is subjective, and otherwise objective. [sent-111, score-1.49]

50 2 Target Language System To construct a target language system leveraging on available resources in the source language, we consider three approaches from previous literature: 1. [sent-115, score-0.206]

51 translating test sentences in target language into source language and inputting them into 4http://www. [sent-116, score-0.153]

52 translating a source language training corpus into target language and creating a corpusbased system in target language (Banea et al. [sent-126, score-0.29]

53 translating a subjectivity lexicon from source language to target language and creating a lexicon-based system in target language (Mihalcea et al. [sent-128, score-1.024]

54 The advantage of the first approach is its simple architecture, clear separation of subjectivity and MT systems, and that it has only one subjectivity system, and is thus easier to maintain. [sent-130, score-1.466]

55 In the second and third approaches, a subjectivity system in the target language is constructed sharing corpora, rules, and/or features with the source language system. [sent-132, score-0.897]

56 Lexicon-based (T-LB): This classifier is identical to S-LB, where the English lexicon is replaced by one of the target languages. [sent-138, score-0.179]

57 598 Table 1: Agreement on subjectivity (S for subjective, O objective) of 859 sentence chunks in Korean between two annotators (An. [sent-145, score-0.879]

58 Three human annotators who are fluent in the two languages manually annotated Nto-N sentence alignments for each language pairs (KR-EN, KR-CH, KR-JP). [sent-159, score-0.167]

59 By keeping only the sentence chunks whose Korean chunk appears in all language pairs, we were left with 859 sentence chunk pairs. [sent-160, score-0.188]

60 The corpus was preprocessed with NLP tools for each language,11 and the Korean, Chinese, and Japanese texts were translated into English with the same web-based service used to translate the training corpus in Section 4. [sent-161, score-0.237]

61 kr/) Table 2: Agreement on projection of subjectivity (S for subjective, O objective) from Korean (KR) to English (EN) by one annotator. [sent-182, score-0.733]

62 EN RKToSOtal4 1S572803 O68 39T438o956t5a94l To assess the performance of our subjectivity analysis systems, the Korean sentence chunks were manually annotated by two native speakers of Korean with Subjective and Objective labels (Table 1). [sent-183, score-0.944]

63 We set aside 743 sentence chunks that both annotators agreed on for the automatic evaluation of subjectivity analysis systems, thereby removing the borderline cases, which are difficult even for humans to assess. [sent-187, score-0.966]

64 The corresponding sentence chunks for other languages were extracted and tagged with labels equivalent to Korean chunks. [sent-188, score-0.231]

65 In addition, to verify how consistently the subjectivity of the original texts is projected to the translated, we carried out another manual annotation and agreement study with Korean and English sentence chunks (Table 2). [sent-189, score-0.991]

66 (2007), where two annotators labeled the sen- tence subjectivity of a parallel text in different languages. [sent-191, score-0.817]

67 They reported that, similarly to monolingual annotations, most cases of disagreements on annotations are due to the differences in the annotators’ judgments on subjectivity, and the rest from subjective meanings lost in the translation process and figurative language such as irony. [sent-192, score-0.29]

68 To avoid the role played by annotators’ private views from disagreements, the subjectivity of sentence chunks in English were manually annotated by one of the annotators for the Korean text. [sent-193, score-0.879]

69 Judged by the same annotator, we speculate that the disagreement in the annotation should account only for the inconsistency in the subjectivity projection. [sent-194, score-0.733]

70 Evaluation Metrics To evaluate the multilanguage-comparability of subjectivity analysis systems, we measure 1) how consistently the system assigns subjectivity labels and 2) how closely numeric scores for systems’ confidences correlate with regard to parallel texts in different languages. [sent-200, score-1.762]

71 In particular, we use Cohen’s kappa coefficient for the first and Pearson’s correlation coefficient for the latter. [sent-201, score-0.212]

72 2 Subjectivity Classification Our multilingual subjectivity analysis systems were evaluated on the test corpora described in Section 5. [sent-207, score-1.029]

73 The source language systems (S-SA,-CB,LB) lose a small percentage in precision when inputted with translations, but the recalls are generally on a par or even higher in the target languages. [sent-215, score-0.251]

74 For the systems created from target language resources, Corpus-based systems (T-CB) generally perform better than the ones with source language resource (S-CB), and lexicon-based systems (TLB) perform worse than (S-LB). [sent-216, score-0.287]

75 The subjectivity analysis systems are evaluated with all language pairs with kappa and Pearson’s correlation coefficients. [sent-223, score-1.012]

76 We observe a distinct contrast in performances between corpus-based systems (S-CB and T-CB) and lexicon-based systems (S-LB and T-LB); All corpus-based systems show moderate agreements while agreements on lexicon-based systems are only fair. [sent-226, score-0.489]

77 For lexicon-based systems, systems in the target languages (T-LB) performs the worst with only slight to fair agreements between languages. [sent-228, score-0.312]

78 Lexicon-based systems and state-of-the-art systems in the source language (S-LB and S-SA) result in average performances. [sent-229, score-0.17]

79 600 Table 3: Performance of subjectivity analysis with precision (P), recall (R), and F-measure (F). [sent-230, score-0.792]

80 S-SA,CB,-LB systems in Korean, Chinese, Japanese indicate English analysis systems inputted with translations of the target languages into English. [sent-231, score-0.381]

81 4 Table 4: Performance of multilanguage-comparability: kappa coefficient (κ) for measuring comparability of classification labels and Pearson’s correlation coefficient (ρ) for classification scores for English (EN), Korean (KR), Chinese (CH), and Japanese (JP). [sent-286, score-0.384]

82 601 Figure 3 shows scatter plots of subjectivity scores ofour English and Korean test corpora evaluated on different systems; the data points on the first and the third quadrants are occurrences of label agreements, and the second and the fourth are disagreements. [sent-360, score-0.793]

83 Figure 3a shows a moderate correlation for multilingual results from the state-of-the-art system (S-SA). [sent-362, score-0.287]

84 Agreements on objective instances are clustered together while agreements on subjective instances are diffused over a wide region. [sent-363, score-0.339]

85 Agreements between the source language corpus-based system (S-CB) and the corpus-based system trained with translated resources (T-CB) are more distinctively correlated than the results for other pairs of systems (Figures 3b and 3d). [sent-364, score-0.367]

86 We observe that the results from the English system with translated inputs (S-LB) is more correlated than those from systems with translated lexicons (T-LB), and that analysis results from both systems are biased toward subjective scores. [sent-367, score-0.613]

87 6 Discussion Which approach is most suitable for multilingual subjectivity analysis? [sent-368, score-0.916]

88 In our experiments, the corpus-based systems trained on corpora translated from English to the target languages (T-CB) perform well for subjectivity classification and multilanguagecomparability measures on the whole. [sent-369, score-1.081]

89 We again employed Pearson’s correlation metrics to measure the correlations of precision (P), recall (R), and F-measures (F) to kappa (κ) and Pearson’s correlation (ρ) values. [sent-376, score-0.233]

90 Specifically, we measure the correlations between the sums of P, the sums of R, and the sums of F to κ and ρ for all pairs of systems. [sent-377, score-0.152]

91 However, we cannot always expect a highprecision multilingual subjectivity classifier to be multilanguage-comparable as well. [sent-389, score-0.988]

92 We implemented a number of previously proposed approaches to learning multilingual subjectivity, and evaluated the systems on multilanguage-comparability as well as classification performance. [sent-392, score-0.27]

93 Our experimental results provide meaningful comparisons of the multilin- gual subjectivity analysis systems across various aspects. [sent-393, score-0.846]

94 Also, we developed a multilingual subjectivity evaluation corpus from a parallel text, and studied inter-annotator, inter-language agreements on subjectivity, and observed persistent subjectivity projections from one language to another from a parallel text. [sent-394, score-1.853]

95 For future work, we aim extend this work to constructing a multilingual sentiment analysis system and evaluate it with multilingual datasets such as product reviews collected from different countries. [sent-395, score-0.599]

96 We also plan to resolve the lexiconbased classifiers’ classification bias towards subjective meanings with a list of objective words (Esuli and Sebastiani, 2006) and their multilingual expansion (Kim et al. [sent-396, score-0.523]

97 A machine learning approach to sentiment analysis in multlingual Web texts. [sent-418, score-0.194]

98 Found in translation: Conveying subjectivity of a lexicon of one language into another using a bilingual dictionary and a link analysis algorithm. [sent-441, score-0.893]

99 Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. [sent-459, score-0.164]

100 Creating subjective and objective sentence classifiers from unannotated texts. [sent-469, score-0.247]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subjectivity', 0.733), ('korean', 0.272), ('subjective', 0.187), ('multilingual', 0.183), ('sentiment', 0.135), ('banea', 0.122), ('agreements', 0.116), ('translated', 0.083), ('chunks', 0.082), ('languages', 0.079), ('bautin', 0.079), ('pearson', 0.078), ('kappa', 0.077), ('texts', 0.075), ('wiebe', 0.074), ('kr', 0.074), ('intensity', 0.074), ('inputted', 0.072), ('sentiments', 0.071), ('mihalcea', 0.07), ('correlation', 0.065), ('mpqa', 0.064), ('prf', 0.063), ('target', 0.063), ('janyce', 0.062), ('source', 0.062), ('kim', 0.06), ('analysis', 0.059), ('japanese', 0.056), ('systems', 0.054), ('english', 0.049), ('meanings', 0.048), ('opinionfinder', 0.047), ('outcomes', 0.047), ('chinese', 0.047), ('labels', 0.046), ('parallel', 0.044), ('riloff', 0.044), ('agreement', 0.044), ('resources', 0.042), ('translate', 0.041), ('performances', 0.041), ('annotators', 0.04), ('wilson', 0.04), ('classifier', 0.04), ('identical', 0.04), ('system', 0.039), ('opinions', 0.038), ('wan', 0.038), ('tools', 0.038), ('mt', 0.038), ('analyze', 0.037), ('analyzes', 0.037), ('objective', 0.036), ('dictionary', 0.036), ('boiy', 0.036), ('cesarano', 0.036), ('donga', 0.036), ('highcoverage', 0.036), ('lexiconbased', 0.036), ('multilanguagecomparability', 0.036), ('oasys', 0.036), ('pohang', 0.036), ('ch', 0.036), ('lexicon', 0.036), ('corpusbased', 0.035), ('coefficient', 0.035), ('jp', 0.034), ('sums', 0.034), ('classification', 0.033), ('carried', 0.033), ('scores', 0.033), ('abbasi', 0.032), ('jungi', 0.032), ('excerpts', 0.032), ('highprecision', 0.032), ('translation', 0.03), ('lexicons', 0.03), ('chunk', 0.029), ('opinion', 0.029), ('icwsm', 0.029), ('bilingual', 0.029), ('news', 0.028), ('thereby', 0.028), ('translating', 0.028), ('en', 0.028), ('xiaojun', 0.027), ('scatter', 0.027), ('korea', 0.027), ('comparability', 0.027), ('carmen', 0.026), ('precisions', 0.026), ('daily', 0.026), ('esuli', 0.026), ('correlations', 0.026), ('ellen', 0.025), ('disagreements', 0.025), ('sentence', 0.024), ('pairs', 0.024), ('correlated', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee

2 0.23421392 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp

Abstract: We present a method for automatically generating focused and accurate topicspecific subjectivity lexicons from a general purpose polarity lexicon that allow users to pin-point subjective on-topic information in a set of relevant documents. We motivate the need for such lexicons in the field of media analysis, describe a bootstrapping method for generating a topic-specific lexicon from a general purpose polarity lexicon, and evaluate the quality of the generated lexicons both manually and using a TREC Blog track test set for opinionated blog post retrieval. Although the generated lexicons can be an order of magnitude more selective than the general purpose lexicon, they maintain, or even improve, the performance of an opin- ion retrieval system.

3 0.14527564 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

4 0.12222114 141 acl-2010-Identifying Text Polarity Using Random Walks

Author: Ahmed Hassan ; Dragomir Radev

Abstract: Automatically identifying the polarity of words is a very important task in Natural Language Processing. It has applications in text classification, text filtering, analysis of product review, analysis of responses to surveys, and mining online discussions. We propose a method for identifying the polarity of words. We apply a Markov random walk model to a large word relatedness graph, producing a polarity estimate for any given word. A key advantage of the model is its ability to accurately and quickly assign a polarity sign and magnitude to any word. The method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. The method is experimentally tested using a manually labeled set of positive and negative words. It outperforms the state of the art methods in the semi-supervised setting. The results in the unsupervised setting is comparable to the best reported values. However, the proposed method is faster and does not need a large corpus.

5 0.11374415 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

Author: Bin Wei ; Christopher Pal

Abstract: In this paper, we study the problem of using an annotated corpus in English for the same natural language processing task in another language. While various machine translation systems are available, automated translation is still far from perfect. To minimize the noise introduced by translations, we propose to use only key ‘reliable” parts from the translations and apply structural correspondence learning (SCL) to find a low dimensional representation shared by the two languages. We perform experiments on an EnglishChinese sentiment classification task and compare our results with a previous cotraining approach. To alleviate the problem of data sparseness, we create extra pseudo-examples for SCL by making queries to a search engine. Experiments on real-world on-line review data demonstrate the two techniques can effectively improvetheperformancecomparedtoprevious work.

6 0.11121817 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

7 0.10868617 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

8 0.10817326 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

9 0.10244382 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

10 0.090592682 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

11 0.088322386 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

12 0.085336655 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

13 0.082424238 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

14 0.078925245 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

15 0.072645091 195 acl-2010-Phylogenetic Grammar Induction

16 0.072578467 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

17 0.06627316 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

18 0.064217977 79 acl-2010-Cross-Lingual Latent Topic Extraction

19 0.062672749 162 acl-2010-Learning Common Grammar from Multilingual Corpus

20 0.062340248 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.174), (1, 0.047), (2, -0.151), (3, 0.18), (4, -0.067), (5, 0.024), (6, -0.025), (7, 0.049), (8, -0.012), (9, 0.049), (10, 0.05), (11, 0.053), (12, 0.018), (13, -0.011), (14, -0.065), (15, -0.045), (16, 0.117), (17, -0.095), (18, 0.037), (19, -0.047), (20, -0.098), (21, -0.048), (22, 0.005), (23, -0.124), (24, -0.016), (25, -0.037), (26, 0.001), (27, 0.03), (28, -0.048), (29, -0.066), (30, -0.041), (31, -0.001), (32, 0.012), (33, -0.053), (34, -0.008), (35, -0.062), (36, 0.013), (37, 0.003), (38, -0.076), (39, -0.113), (40, 0.009), (41, -0.083), (42, 0.039), (43, 0.107), (44, 0.094), (45, 0.054), (46, -0.013), (47, 0.124), (48, -0.025), (49, -0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93793702 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee

2 0.75226688 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

3 0.65749961 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie

Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.

4 0.63702005 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons

Author: Valentin Jijkoun ; Maarten de Rijke ; Wouter Weerkamp

5 0.62723827 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

Author: Wei Wei ; Jon Atle Gulla

Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.

6 0.58250099 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

7 0.56137234 141 acl-2010-Identifying Text Polarity Using Random Walks

8 0.5568257 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

9 0.50840557 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

10 0.48365802 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

11 0.47644398 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

12 0.44922575 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

13 0.44564381 195 acl-2010-Phylogenetic Grammar Induction

14 0.42371783 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

15 0.41897956 79 acl-2010-Cross-Lingual Latent Topic Extraction

16 0.41838616 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

17 0.4006525 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

18 0.39162591 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

19 0.39099228 104 acl-2010-Evaluating Machine Translations Using mNCD

20 0.38290682 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.288), (25, 0.025), (42, 0.069), (44, 0.029), (59, 0.082), (71, 0.017), (73, 0.074), (78, 0.025), (80, 0.017), (83, 0.099), (84, 0.023), (98, 0.138)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95035499 234 acl-2010-The Use of Formal Language Models in the Typology of the Morphology of Amerindian Languages

Author: Andres Osvaldo Porta

Abstract: The aim of this work is to present some preliminary results of an investigation in course on the typology of the morphology of the native South American languages from the point of view of the formal language theory. With this object, we give two contrasting examples of descriptions of two Aboriginal languages finite verb forms morphology: Argentinean Quechua (quichua santiague n˜o) and Toba. The description of the morphology of the finite verb forms of Argentinean quechua, uses finite automata and finite transducers. In this case the construction is straightforward using two level morphology and then, describes in a very natural way the Argentinean Quechua morphology using a regular language. On the contrary, the Toba verbs morphology, with a system that simultaneously uses prefixes and suffixes, has not a natural description as regular language. Toba has a complex system of causative suffixes, whose successive applications determinate the use of prefixes belonging different person marking prefix sets. We adopt the solution of Creider et al. (1995) to naturally deal with this and other similar morphological processes which involve interactions between prefixes and suffixes and then we describe the toba morphology using linear context-free languages.1 .

2 0.83704293 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

Author: Barbara McGillivray

Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.

same-paper 3 0.79463124 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems

Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee

4 0.76966083 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling

Author: Weiwei Sun

Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new “path” features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.

5 0.69608319 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information

Author: WenTing Wang ; Jian Su ; Chew Lim Tan

Abstract: Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. These structural syntactic features, together with other normal flat features are incorporated into our composite kernel to capture diverse knowledge for simultaneous discourse identification and classification for both explicit and implicit relations. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well. University of Singapore Singapore 117417 sg tacl @ comp .nus .edu . sg 1

6 0.61068034 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

7 0.60550612 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

8 0.60264558 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

9 0.58722645 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

10 0.58508122 116 acl-2010-Finding Cognate Groups Using Phylogenies

11 0.57960653 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

12 0.57782304 214 acl-2010-Sparsity in Dependency Grammar Induction

13 0.57109046 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

14 0.5660758 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

15 0.56605029 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

16 0.56529105 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

17 0.56508166 71 acl-2010-Convolution Kernel over Packed Parse Forest

18 0.56431878 56 acl-2010-Bridging SMT and TM with Translation Recommendation

19 0.56383479 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

20 0.5609318 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.