acl acl2011 acl2011-84 knowledge-graph by maker-knowledge-mining

84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

Source: pdf

Author: Souneil Park ; Kyung Soon Lee ; Junehwa Song

Abstract: We present disputant relation-based method for classifying news articles on contentious issues. We observe that the disputants of a contention are an important feature for understanding the discourse. It performs unsupervised classification on news articles based on disputant relations, and helps readers intuitively view the articles through the opponent-based frame. The readers can attain balanced understanding on the contention, free from a specific biased view. We applied a modified version of HITS algorithm and an SVM classifier trained with pseudo-relevant data for article analysis. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 kr , Abstract We present disputant relation-based method for classifying news articles on contentious issues. [sent-4, score-0.979]

2 We observe that the disputants of a contention are an important feature for understanding the discourse. [sent-5, score-0.689]

3 It performs unsupervised classification on news articles based on disputant relations, and helps readers intuitively view the articles through the opponent-based frame. [sent-6, score-1.071]

4 However, news articles are frequently biased and fail to fairly deliver conflicting arguments of the issue. [sent-11, score-0.368]

5 In this paper, we present disputant relationbased method for classifying news articles on con- 2Chonbuk National University 664-14 1ga Deokjin-dong Jeonju, Jeonbuk, Republic of Korea se l o lee @ chonbuk . [sent-14, score-0.842]

6 We observe that the disputants of a contention, i. [sent-17, score-0.44]

7 News producers primarily shape an article on a contention by selecting and covering specific disputants (Baker. [sent-21, score-0.841]

8 Readers also intuitively understand the contention by identifying who the opposing disputants are. [sent-23, score-0.856]

9 It performs classification in an unsupervised manner: it dynamically identifies opposing disputant groups and classifies the articles according to their positions. [sent-25, score-1.001]

10 As such, it effectively helps readers contrast articles of a contention and attain balanced understanding, free from specific biased viewpoints. [sent-26, score-0.472]

11 For the contention on the health care bill, an article may discuss the enlarged coverage whereas another may discuss the increase of insurance premiums. [sent-37, score-0.373]

12 In addition, we observe that opposing arguments of a contention are often complex to classify under these frames. [sent-38, score-0.507]

13 Ac s2s0o1ci1a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 340–349, ple, in a political contention on holding a referendum on the Sejong project1, the opposition parties strongly opposed and criticized the president office. [sent-41, score-0.462]

14 We demonstrate that the opponent-based frame is clear and effective for contrasting opposing views of contentious issues. [sent-44, score-0.435]

15 The frame does not require the documents to discuss common topics nor the opposing arguments to be positive vs. [sent-47, score-0.424]

16 On the other hand, the opposing disputants compete for news coverage to influence more readers and gain support (Miller et al. [sent-52, score-0.725]

17 Thus, the method focuses on identifying the disputants of each side and classifying the articles based on the side it covers. [sent-54, score-0.796]

18 We applied a modified version of HITS algorithm to identify the key opponents of an issue, and used disputant extraction techniques combined with an SVM classifier for article analysis. [sent-55, score-0.94]

19 The discourse of contentious issues in news articles show different characteristics from that studied in the sentiment classification tasks. [sent-87, score-0.495]

20 First, the opponents of a contentious issue often discuss different topics, as discussed in the example above. [sent-88, score-0.375]

21 Research in mass communication has showed that opposing disputants talk across each other, not by dialogue, i. [sent-89, score-0.589]

22 We frequently observed both sides of a contention articulating negative arguments attacking each other. [sent-95, score-0.422]

23 The forms of arguments are also complex and diverse to classify them as positive or negative; for example, an argument may just neglect the opponent‟s argument without positive or negative expressions, or emphasize a different discussion point. [sent-96, score-0.347]

24 For example, a news article can cast a negative light on a government program simply by covering the increase of deficit caused by it. [sent-99, score-0.351]

25 They assume a debate frame, which is similar to the frame of the sentiment classification task, i. [sent-105, score-0.351]

26 All articles of a debate in their corpus cover a coherent debate topic, e. [sent-109, score-0.427]

27 This debate frame is often not appropriate for contentious issues for similar reasons as the positive/negative frame. [sent-116, score-0.452]

28 In contrast, our method does not assume a fixed debate frame, and rather develops one based on the opponents of the contention at hand. [sent-117, score-0.567]

29 News articles of a contentious issue are more diverse than debate articles conveying explicit argument of a specific side. [sent-119, score-0.671]

30 There are news articles which cover both sides, facts without explicit opinions, and different topics unrelated to the arguments of either side. [sent-120, score-0.364]

31 However, these works also assume the same debate frame and use the debate corpus, e. [sent-124, score-0.401]

32 The selected issues range over diverse domains such as politics, local, diplomacy, economy; to name a few for example, the contention on the 4 river project, of which the key opponents are the government vs. [sent-139, score-0.604]

33 Second, they classified the articles which mainly deliver arguments for the topic to the “positive” category and those delivering arguments against the topic to the “negative” category. [sent-156, score-0.46]

34 The articles are classified to the “Other” category if they do not deal with the main topic nor cover positive or negative arguments. [sent-157, score-0.402]

35 Second, we asked to classify articles to a specific side if the articles cover only the positions, arguments, or information supportive of that side or if they cover information detrimental or criticism to its opposite side. [sent-159, score-0.688]

36 This is because the frame is more flexible to classify diverse articles of an issue, such as those covering arguments on different points, and those covering detrimental facts to a specific side without explicit positive or negative arguments. [sent-171, score-0.716]

37 The agreement was low especially when the main topic of the contention was interpreted differently among the annotators; the main topic was interpreted differently for issue 3, 7, 8, and 9. [sent-174, score-0.403]

38 Even when a disputant was assumed to have a positive attitude towards the topic, the disputant‟s main argument was not about the topic but about attacking the opponent” The annotators all agreed that the opponent-based frame is more effective to understand the contention. [sent-177, score-0.914]

39 It attempts to identify the two opposing groups of the issue at hand, and analyzes whether an article more reflects the position of a specific side. [sent-179, score-0.356]

40 In this competing process, news articles may give more chance of speaking to a specific side, explain or elaborate them, or provide supportive facts of that side (Baker 1994). [sent-183, score-0.38]

41 1 Disputant Extraction In this stage, the disputants who participate in the contention have to be extracted. [sent-188, score-0.689]

42 We utilize that many disputants appear as the subject of quotes in the news article set. [sent-189, score-0.914]

43 The articles actively quote or cover their action in order to deliver the contention lively. [sent-190, score-0.542]

44 The methods were effective in practice as quotes of articles frequently had a regular pattern. [sent-192, score-0.394]

45 The sentences which convey an utterance without double quotes, and those describing the action of a disputant are considered as indirect quotes (See the translated example 1 below). [sent-195, score-0.863]

46 2 Disputant Partitioning We develop key opponent-based partitioning method for disputant partitioning. [sent-207, score-0.733]

47 The other disputants are divided according to their relation with the key opponents, i. [sent-209, score-0.489]

48 The intuition behind the method is that there usually exists key opponents who represent the contention, and many participants argue about the key opponents whereas they seldom recognize and talk about minor disputants. [sent-212, score-0.511]

49 For instance, in the contention on “investigation result of the Cheonan sinking incident”, the government of North Korea and that of South Korea are the key opponents; other disputants, such as politicians, experts, civic group of South Korea, the government of U. [sent-213, score-0.458]

50 Thus, it is effective to analyze where the disputants stand regarding their attitude toward the key opponents. [sent-216, score-0.489]

51 Selecting key opponents: In order to identify the key opponents of the issue, we search for the disputants who frequently criticize, and are also criticized by other disputants. [sent-217, score-0.798]

52 A sentence is considered to express the disputant‟s criticism to another disputant if the following holds: 1) the sentence is a quote, 2) the disputant is the subject of the quote, 3) another disputant appears in the quote, and 4) a negative lexicon appears in the sentence. [sent-223, score-1.847]

53 On the other hand, if the disputant is not the subject but appears in the quote, the sentence is con- sidered to express a criticism about the disputant made by another disputant (See example 3. [sent-224, score-1.79]

54 The disputants are written in italic, and negative words are in boldface. [sent-225, score-0.497]

55 Each disputant is modeled as a node, and a link is made from a criticizing disputant to a criticized disputant. [sent-232, score-1.286]

56 The hub score of a node increases if it links to nodes with high authority score, and the authority score increases if it is pointed by many nodes with high hub score. [sent-244, score-0.384]

57 It enables us to separately measure the significance of a disputant‟s criticism (using the hub score) and the criticism about the disputant (using the authority score). [sent-246, score-0.839]

58 We aim to find the nodes which have both high hub score and high authority score; the key opponents will have many links to others and also be pointed by many nodes. [sent-247, score-0.433]

59 The initial hub score of a node is set to the number of quotes in which the corresponding disputant is the subject. [sent-250, score-0.915]

60 The initial authority score is set to the number of quotes in which the disputant appears but not as the subject. [sent-251, score-0.891]

61 In addition, the weight of each link (from a criticizing disputant to a criticized disputant) is set to the number of sentences that express such criticism. [sent-252, score-0.711]

62 More than two disputants can be selected if more than one disputant is active from a specific side. [sent-257, score-1.015]

63 In such cases, we choose the two disputants whose criticizing relationship is the strongest among the selected ones, i. [sent-258, score-0.508]

64 Partitioning minor disputants: Given the two key opponents, we partition the rest of disputants based on their relations with the key opponents. [sent-261, score-0.567]

65 For this, we identify whether each disputant has positive or negative relations with the key opponents. [sent-262, score-0.726]

66 The disputant is classified to the side of the key opponent who shows more positive relations. [sent-263, score-0.869]

67 If the disputant shows more negative relations, the disputant is classified to the opposite side. [sent-264, score-1.277]

68 The minor disputants may not be covered importantly in the article set; hence, it can be difficult to obtain sufficient data for analysis. [sent-266, score-0.593]

69 1) Positive Quote Rate (PQRab): Given two disputants (a key opponent a, and a minor disputant b), the feature measures the ratio of positive quotes between them. [sent-269, score-1.439]

70 A sentence is considered as a positive quote if the following conditions hold: the sentence is a direct or indirect quote, the two disputants appear in the sentence, one is the subject of the quote, and a positive lexicon appears in the 345 sentence. [sent-270, score-0.675]

71 The number of such sentences is divided by the number of all quotes in which the two disputants appear and one appears as the subject. [sent-271, score-0.681]

72 The same conditions are considered to detect negative quotes except that negative lexicon is used instead of positive lexicon. [sent-274, score-0.4]

73 3) Frequency of Standing Together (FSTab): This feature attempts to capture whether the two disputants share a position, e. [sent-275, score-0.44]

74 The same features are also calculated from the web news search results; we collect news articles of which the title includes the two disputants, i. [sent-282, score-0.331]

75 For PQR (NQR), it counts the titles which the two disputants appear with a positive (negative) lexicon. [sent-286, score-0.485]

76 3 – NQRac) or – NQRab) or Article Classification Each news article of the set is classified by analyzing which side is importantly covered. [sent-293, score-0.353]

77 We observed that the major components which shape an article on a contention are quotes from disputants and journalists‟ commentary. [sent-295, score-1.054]

78 Thus, our method considers two points for classification: first, from which side the article‟s quotes came; second, for the rest of the article‟s text, the similarity of the text to the arguments of each side. [sent-296, score-0.391]

79 As for the quotes of an article, the method calculates the proportion of the quotes from each side based on the disputant partitioning result. [sent-297, score-1.255]

80 An article is classified to a specific side if more of its quotes are from that side and more sentences are similar to that side: given an article a, and the two sides b and c, classify a to b if classify a to c if classify a to other, otherwise. [sent-306, score-0.899]

81 where SU: number of all sentences of the article Qi: number of quotes from the side i. [sent-307, score-0.454]

82 Thus, for an article written purely with quotes, the article is classified to a specific side if more than 70% of the quotes are from that side. [sent-315, score-0.629]

83 On the other hand, for an article which does not include quotes from any side, more than 60% of the sentences have to be determined similar to a specific side‟s quotes. [sent-316, score-0.365]

84 5 Evaluation and Discussion Our evaluation of the method is twofold: first, we evaluate the disputant partitioning results, second, the accuracy of classification. [sent-318, score-0.684]

85 To evaluate the disputant partitioning results, we had the annotators to extract the disputants of each issue, divide them into opposing two groups. [sent-321, score-1.31]

86 The false positives were mostly the disputants who appear only a few times both in the article set and the news search results. [sent-330, score-0.653]

87 This was mainly because some disputants were omitted in the disputant extraction stage. [sent-334, score-1.015]

88 However, most disputants who frequently appear in the article set were extracted and partitioned appropriately. [sent-338, score-0.564]

89 The disputant extraction and disputant partitioning is performed identically; however, it classifies news articles merely based on quotes. [sent-350, score-1.534]

90 An article is classified to one of the two opposing sides if more than 70% of the quotes are from that side, or to the “other” category otherwise. [sent-351, score-0.639]

91 The disputant relation-based method (DrC) performed better than the two comparison methods. [sent-357, score-0.575]

92 However, news article set includes a number of articles covering different topics irrelevant to the arguments of the disputants. [sent-367, score-0.475]

93 2) Article criticizing the quoted disputants: There were some articles criticizing the quoted disputants. [sent-383, score-0.355]

94 3) Errors in disputant partitioning: Some misclassifications were made due to the errors in the disputant partitioning stage, specifically, those who were classified to a wrong side. [sent-386, score-1.31]

95 Articles which refer to such disputants many times were misclassified. [sent-387, score-0.44]

96 6 Conclusion We study the problem of classifying news articles on contentious issues. [sent-388, score-0.404]

97 It involves new challenges as the discourse of contentious issues is complex, and news articles show different characteristics from commonly studied corpus, such as product reviews. [sent-389, score-0.419]

98 We propose opponent-based frame, and demonstrate that it is a clear and effective classification frame to contrast arguments of contentious issues. [sent-390, score-0.401]

99 We develop disputant relation-based classification and show that the method outperforms a text similarity-based approach. [sent-391, score-0.629]

100 Discovering and developing methods for issues which involve more than two disputants groups is a future work. [sent-396, score-0.517]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('disputant', 0.575), ('disputants', 0.44), ('contention', 0.249), ('quotes', 0.241), ('opponents', 0.192), ('articles', 0.153), ('frame', 0.149), ('opposing', 0.149), ('contentious', 0.137), ('debate', 0.126), ('article', 0.124), ('korea', 0.112), ('partitioning', 0.109), ('hub', 0.099), ('quote', 0.095), ('side', 0.089), ('news', 0.089), ('authority', 0.075), ('criticized', 0.068), ('criticizing', 0.068), ('arguments', 0.061), ('opponent', 0.06), ('negative', 0.057), ('drc', 0.056), ('classification', 0.054), ('government', 0.053), ('korean', 0.052), ('classified', 0.051), ('south', 0.051), ('key', 0.049), ('classify', 0.048), ('readers', 0.047), ('issue', 0.046), ('criticism', 0.045), ('positive', 0.045), ('issues', 0.04), ('topic', 0.037), ('annotators', 0.037), ('groups', 0.037), ('category', 0.037), ('sides', 0.037), ('hits', 0.036), ('argument', 0.035), ('civic', 0.034), ('criticisms', 0.034), ('fstab', 0.034), ('nqrab', 0.034), ('pqrab', 0.034), ('referendum', 0.034), ('classifies', 0.033), ('somasundaran', 0.033), ('quoted', 0.033), ('parties', 0.03), ('supportive', 0.03), ('indirect', 0.03), ('minor', 0.029), ('stance', 0.029), ('covering', 0.028), ('president', 0.028), ('opposition', 0.027), ('political', 0.026), ('classifying', 0.025), ('biased', 0.023), ('cheonan', 0.023), ('fdab', 0.023), ('fstac', 0.023), ('ideological', 0.023), ('nqr', 0.023), ('nqrac', 0.023), ('ounis', 0.023), ('pqr', 0.023), ('pqrac', 0.023), ('schon', 0.023), ('shim', 0.023), ('deliver', 0.023), ('janyce', 0.022), ('sentiment', 0.022), ('cover', 0.022), ('diverse', 0.021), ('topics', 0.02), ('wilson', 0.02), ('subject', 0.02), ('outlinks', 0.02), ('agrawal', 0.02), ('ior', 0.02), ('politicians', 0.02), ('sinking', 0.02), ('facts', 0.019), ('opposite', 0.019), ('conflicting', 0.019), ('wiebe', 0.019), ('attacking', 0.018), ('detrimental', 0.018), ('swapna', 0.018), ('thumbs', 0.018), ('understand', 0.018), ('nodes', 0.018), ('stage', 0.017), ('describing', 0.017), ('interpreted', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

Author: Souneil Park ; Kyung Soon Lee ; Junehwa Song

2 0.084532671 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

Author: Dipanjan Das ; Noah A. Smith

Abstract: We describe a new approach to disambiguating semantic frames evoked by lexical predicates previously unseen in a lexicon or annotated data. Our approach makes use of large amounts of unlabeled data in a graph-based semi-supervised learning framework. We construct a large graph where vertices correspond to potential predicates and use label propagation to learn possible semantic frames for new ones. The label-propagated graph is used within a frame-semantic parser and, for unknown predicates, results in over 15% absolute improvement in frame identification accuracy and over 13% absolute improvement in full frame-semantic parsing F1 score on a blind test set, over a state-of-the-art supervised baseline.

3 0.063420422 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

Author: Ivan Titov ; Alexandre Klementiev

Abstract: We propose a non-parametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicate-argument relations between the fragments. We use hierarchical PitmanYor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the MetropolisHastings split-merge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by us- ing the induced semantic representation for the question answering task in the biomedical domain.

4 0.063036747 52 acl-2011-Automatic Labelling of Topic Models

Author: Jey Han Lau ; Karl Grieser ; David Newman ; Timothy Baldwin

Abstract: We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics, significantly better than a benchmark method.

5 0.062534228 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

Author: Paula Carvalho ; Luis Sarmento ; Jorge Teixeira ; Mario J. Silva

Abstract: We investigate the expression of opinions about human entities in user-generated content (UGC). A set of 2,800 online news comments (8,000 sentences) was manually annotated, following a rich annotation scheme designed for this purpose. We conclude that the challenge in performing opinion mining in such type of content is correctly identifying the positive opinions, because (i) they are much less frequent than negative opinions and (ii) they are particularly exposed to verbal irony. We also show that the recognition of human targets poses additional challenges on mining opinions from UGC, since they are frequently mentioned by pronouns, definite descriptions and nicknames. 1

6 0.057388842 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts

7 0.056608748 117 acl-2011-Entity Set Expansion using Topic information

8 0.052396376 109 acl-2011-Effective Measures of Domain Similarity for Parsing

9 0.050735105 159 acl-2011-Identifying Noun Product Features that Imply Opinions

10 0.049695704 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

11 0.049161509 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

12 0.046744138 292 acl-2011-Target-dependent Twitter Sentiment Classification

13 0.045741707 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

14 0.045519434 147 acl-2011-Grammatical Error Correction with Alternating Structure Optimization

15 0.044568967 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

16 0.042455308 315 acl-2011-Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment

17 0.042179387 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

18 0.04146643 281 acl-2011-Sentiment Analysis of Citations using Sentence Structure-Based Features

19 0.041192658 216 acl-2011-MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

20 0.040683623 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.113), (1, 0.075), (2, -0.007), (3, 0.019), (4, 0.007), (5, 0.021), (6, 0.001), (7, 0.035), (8, -0.015), (9, -0.032), (10, -0.041), (11, -0.03), (12, -0.009), (13, 0.007), (14, -0.01), (15, -0.011), (16, -0.012), (17, -0.039), (18, -0.005), (19, -0.025), (20, 0.005), (21, 0.018), (22, -0.044), (23, -0.042), (24, 0.037), (25, 0.023), (26, 0.0), (27, -0.003), (28, 0.002), (29, 0.02), (30, 0.061), (31, -0.021), (32, -0.075), (33, 0.011), (34, 0.015), (35, -0.002), (36, 0.023), (37, 0.02), (38, -0.07), (39, 0.0), (40, -0.023), (41, 0.07), (42, -0.042), (43, 0.01), (44, 0.005), (45, 0.024), (46, -0.033), (47, -0.042), (48, -0.098), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90605229 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

Author: Souneil Park ; Kyung Soon Lee ; Junehwa Song

2 0.65913069 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

Author: Dipanjan Das ; Noah A. Smith

3 0.61360681 214 acl-2011-Lost in Translation: Authorship Attribution using Frame Semantics

Author: Steffen Hedegaard ; Jakob Grue Simonsen

Abstract: We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve clas- sifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier.

4 0.60826349 68 acl-2011-Classifying arguments by scheme

Author: Vanessa Wei Feng ; Graeme Hirst

Abstract: Argumentation schemes are structures or templates for various kinds of arguments. Given the text of an argument with premises and conclusion identified, we classify it as an instance ofone offive common schemes, using features specific to each scheme. We achieve accuracies of 63–91% in one-against-others classification and 80–94% in pairwise classification (baseline = 50% in both cases).

5 0.59687269 211 acl-2011-Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates

Author: Paula Carvalho ; Luis Sarmento ; Jorge Teixeira ; Mario J. Silva

6 0.55448514 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models

7 0.54957587 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

8 0.5406068 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

9 0.52050382 133 acl-2011-Extracting Social Power Relationships from Natural Language

10 0.51505512 136 acl-2011-Finding Deceptive Opinion Spam by Any Stretch of the Imagination

11 0.51265883 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

12 0.51174009 194 acl-2011-Language Use: What can it tell us?

13 0.50369459 156 acl-2011-IMASS: An Intelligent Microblog Analysis and Summarization System

14 0.49075672 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations

15 0.48500267 297 acl-2011-That's What She Said: Double Entendre Identification

16 0.48473912 286 acl-2011-Social Network Extraction from Texts: A Thesis Proposal

17 0.47609115 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text

18 0.46479648 159 acl-2011-Identifying Noun Product Features that Imply Opinions

19 0.46410021 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

20 0.45780656 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.037), (12, 0.294), (17, 0.036), (26, 0.019), (31, 0.052), (37, 0.076), (39, 0.058), (41, 0.048), (55, 0.01), (59, 0.028), (62, 0.01), (72, 0.041), (88, 0.016), (91, 0.045), (96, 0.134)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74751961 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues

Author: Souneil Park ; Kyung Soon Lee ; Junehwa Song

2 0.64255816 263 acl-2011-Reordering Constraint Based on Document-Level Context

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.

3 0.5671916 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

Author: William M. Darling ; Fei Song

Abstract: Statistical approaches to automatic text summarization based on term frequency continue to perform on par with more complex summarization methods. To compute useful frequency statistics, however, the semantically important words must be separated from the low-content function words. The standard approach of using an a priori stopword list tends to result in both undercoverage, where syntactical words are seen as semantically relevant, and overcoverage, where words related to content are ignored. We present a generative probabilistic modeling approach to building content distributions for use with statistical multi-document summarization where the syntax words are learned directly from the data with a Hidden Markov Model and are thereby deemphasized in the term frequency statistics. This approach is compared to both a stopword-list and POS-tagging approach and our method demonstrates improved coverage on the DUC 2006 and TAC 2010 datasets using the ROUGE metric.

4 0.52871913 301 acl-2011-The impact of language models and loss functions on repair disfluency detection

Author: Simon Zwarts ; Mark Johnson

Abstract: Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and . examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-theart.

5 0.52650344 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

Author: Zhongguo Li

Abstract: Lots of Chinese characters are very productive in that they can form many structured words either as prefixes or as suffixes. Previous research in Chinese word segmentation mainly focused on identifying only the word boundaries without considering the rich internal structures of many words. In this paper we argue that this is unsatisfying in many ways, both practically and theoretically. Instead, we propose that word structures should be recovered in morphological analysis. An elegant approach for doing this is given and the result is shown to be promising enough for encouraging further effort in this direction. Our probability model is trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way. 1 Why Parse Word Structures? Research in Chinese word segmentation has progressed tremendously in recent years, with state of the art performing at around 97% in precision and recall (Xue, 2003; Gao et al., 2005; Zhang and Clark, 2007; Li and Sun, 2009). However, virtually all these systems focus exclusively on recognizing the word boundaries, giving no consideration to the internal structures of many words. Though it has been the standard practice for many years, we argue that this paradigm is inadequate both in theory and in practice, for at least the following four reasons. The first reason is that if we confine our definition of word segmentation to the identification of word boundaries, then people tend to have divergent 1405 opinions as to whether a linguistic unit is a word or not (Sproat et al., 1996). This has led to many different annotation standards for Chinese word segmentation. Even worse, this could cause inconsistency in the same corpus. For instance, 䉂擌奒 ‘vice president’ is considered to be one word in the Penn Chinese Treebank (Xue et al., 2005), but is split into two words by the Peking University corpus in the SIGHAN Bakeoffs (Sproat and Emerson, 2003). Meanwhile, 䉂䀓惼 ‘vice director’ and 䉂䚲䡮 ‘deputy are both segmented into two words in the same Penn Chinese Treebank. In fact, all these words are composed of the prefix 䉂 ‘vice’ and a root word. Thus the structure of 䉂擌奒 ‘vice president’ can be represented with the tree in Figure 1. Without a doubt, there is complete agree- manager’ NN ,,ll JJf NNf 䉂擌奒 Figure 1: Example of a word with internal structure. ment on the correctness of this structure among native Chinese speakers. So if instead of annotating only word boundaries, we annotate the structures of every word, then the annotation tends to be more 1 1Here it is necessary to add a note on terminology used in this paper. Since there is no universally accepted definition of the “word” concept in linguistics and especially in Chinese, whenever we use the term “word” we might mean a linguistic unit such as 䉂擌奒 ‘vice president’ whose structure is shown as the tree in Figure 1, or we might mean a smaller unit such as 擌奒 ‘president’ which is a substructure of that tree. Hopefully, ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. A ?c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s405–1414, consistent and there could be less duplication of efforts in developing the expensive annotated corpus. The second reason is applications have different requirements for granularity of words. Take the personal name 撱嗤吼 ‘Zhou Shuren’ as an example. It’s considered to be one word in the Penn Chinese Treebank, but is segmented into a surname and a given name in the Peking University corpus. For some applications such as information extraction, the former segmentation is adequate, while for others like machine translation, the later finer-grained output is more preferable. If the analyzer can produce a structure as shown in Figure 4(a), then every application can extract what it needs from this tree. A solution with tree output like this is more elegant than approaches which try to meet the needs of different applications in post-processing (Gao et al., 2004). The third reason is that traditional word segmentation has problems in handling many phenomena in Chinese. For example, the telescopic compound 㦌撥怂惆 ‘universities, middle schools and primary schools’ is in fact composed ofthree coordinating elements 㦌惆 ‘university’, 撥惆 ‘middle school’ and 怂惆 ‘primary school’ . Regarding it as one flat word loses this important information. Another example is separable words like 扩扙 ‘swim’ . With a linear segmentation, the meaning of ‘swimming’ as in 扩堑扙 ‘after swimming’ cannot be properly represented, since 扩扙 ‘swim’ will be segmented into discontinuous units. These language usages lie at the boundary between syntax and morphology, and are not uncommon in Chinese. They can be adequately represented with trees (Figure 2). (a) NN (b) ???HHH JJ NNf ???HHH JJf JJf JJf 㦌撥怂惆 VV ???HHH VV NNf ZZ VVf VVf 扩扙堑 Figure 2: Example of telescopic compound (a) and separable word (b). The last reason why we should care about word the context will always make it clear what is being referred to with the term “word”. 1406 structures is related to head driven statistical parsers (Collins, 2003). To illustrate this, note that in the Penn Chinese Treebank, the word 戽䊂䠽吼 ‘English People’ does not occur at all. Hence constituents headed by such words could cause some difficulty for head driven models in which out-ofvocabulary words need to be treated specially both when they are generated and when they are conditioned upon. But this word is in turn headed by its suffix 吼 ‘people’, and there are 2,233 such words in Penn Chinese Treebank. If we annotate the structure of every compound containing this suffix (e.g. Figure 3), such data sparsity simply goes away.

6 0.52614355 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

7 0.52311862 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

8 0.522856 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation

9 0.52261221 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

10 0.52233946 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

11 0.52090454 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

12 0.5208087 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

13 0.5202651 117 acl-2011-Entity Set Expansion using Topic information

14 0.52013499 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

15 0.51996028 28 acl-2011-A Statistical Tree Annotator and Its Applications

16 0.51992267 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

17 0.51992154 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

18 0.51970142 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

19 0.51923895 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

20 0.51873136 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding