emnlp emnlp2012 emnlp2012-38 knowledge-graph by maker-knowledge-mining

38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction

Source: pdf

Author: Peifeng Li ; Guodong Zhou ; Qiaoming Zhu ; Libin Hou

Abstract: Current Chinese event extraction systems suffer much from two problems in trigger identification: unknown triggers and word segmentation errors to known triggers. To resolve these problems, this paper proposes two novel inference mechanisms to explore special characteristics in Chinese via compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our approach over a strong baseline. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract Current Chinese event extraction systems suffer much from two problems in trigger identification: unknown triggers and word segmentation errors to known triggers. [sent-3, score-1.497]

2 To resolve these problems, this paper proposes two novel inference mechanisms to explore special characteristics in Chinese via compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions. [sent-4, score-1.636]

3 1 Introduction Event extraction, a classic information extraction task, is to identify instances of a predefined event type and can be typically divided into four subtasks: trigger identification, trigger type determination, argument identification and argument role determination. [sent-6, score-2.143]

4 In the literature, most studies focus on English event extraction and have achieved certain success (e. [sent-7, score-0.26]

5 In comparison, there are few successful stories regarding Chinese event extraction due to special characteristics in Chinese trigger identification. [sent-14, score-1.028]

6 In particular, there are two major reasons for the low performance: unknown triggers and word segmentation errors to known triggers. [sent-15, score-0.469]

7 Table 1 gives the statistics of unknown triggers and word segmentation errors to known triggers in both the 1 1 In this paper, data is called trigger. [sent-16, score-0.817]

8 a a trigger word/phrase occurring in the training known trigger and otherwise, an unknown 1006 ACE 2005 Chinese and English using 10fold cross-validation. [sent-17, score-1.649]

9 In each validation, we leave 10% trigger mentions as the test set and the remaining ones as the training set. [sent-18, score-0.887]

10 It shows that these corpora2 two cases cover almost 30% of Chinese trigger mentions while this figure reduces to only about 9% in English. [sent-20, score-0.889]

11 It also shows that given the same number of event mentions, there are 30% more different triggers in Chinese than that in English. [sent-21, score-0.543]

12 This justifies the low performance (specifically, the recall) of a Chinese event extraction system, which normally extracts those known triggers occurring in the training data as candidate instances and uses a classifier to distinguish correct triggers from wrong ones. [sent-22, score-1.015]

13 %sn96g5 la%inshd English event extraction with regard to unknown triggers and word segmentation errors to known triggers. [sent-26, score-0.709]

14 In this paper, we propose two novel inference mechanisms to Chinese trigger identification by employing compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions. [sent-28, score-2.536]

15 For the sake of fair comparison, we choose the same number of event mentions from the English corpus as the cross-validation data. [sent-31, score-0.281]

16 unknown triggers by employing compositional semantics inside Chinese triggers. [sent-32, score-0.688]

17 Very often, distinguishing true trigger mentions from pseudo ones is only possible with contextual information. [sent-34, score-0.942]

18 Sections 4 and 5 describe two novel inference mechanisms to Chinese trigger identification by employing compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions. [sent-38, score-2.536]

19 2 Related Work Almost all the existing studies on event extraction concern English. [sent-41, score-0.26]

20 1 Chinese Event Extraction Compared with tremendous efforts in English event extraction, there are only a few studies on Chinese event extraction. [sent-48, score-0.41]

21 (2008) modeled event extraction as a pipeline of classification tasks. [sent-50, score-0.24]

22 Chen and Ji (2009a) proposed a bootstrapping framework, which exploited extra information captured by an English event extraction system. [sent-52, score-0.24]

23 Ji (2009) extracted cross-lingual predicate clusters using bilingual parallel corpora and a cross-lingual information extraction system, and then used the derived clusters to improve the performance of Chinese event extraction. [sent-55, score-0.24]

24 However, the compositional semantics mentioned in this paper is more fined-grained and focuses on how to construct Chinese characters into a word and mine the semantics of words from the word structures, especially of verbs as event triggers. [sent-59, score-0.546]

25 To our knowledge, there is only one paper associated with compositional semantics inside Chinese words. [sent-60, score-0.264]

26 Specially, several studies have successfully incorporated trigger or entity consistency constraint into event extraction. [sent-64, score-1.133]

27 Ji and Grishman (2008) employed a rule-based approach to propagate consistent triggers and arguments across topic- related documents. [sent-69, score-0.382]

28 Liao and Grishman (2010) employed cross-event consistency information to improve sentence-level event extraction. [sent-72, score-0.318]

29 (201 1) regarded entity type consistency as a key feature to predict event mentions and adopted this inference method to improve the traditional event extraction system. [sent-74, score-0.694]

30 During testing, each word in the test set is first scanned for instances of known triggers from the training set. [sent-77, score-0.377]

31 When an instance is found, the trigger identifier is applied to distinguish true trigger mentions from pseudo ones. [sent-78, score-1.76]

32 If true, the trigger type determiner is then applied to recognize its event type. [sent-79, score-1.007]

33 For any entity mentions in the sentence, the argument identifier is employed to assign possible arguments to them afterwards. [sent-80, score-0.267]

34 3 in F1measure on trigger identification, trigger type 3 http://ictclas. [sent-102, score-1.6]

35 org/ determination, argument identification and argument role determination, respectively, with both gains in precision and recall. [sent-103, score-0.267]

36 41 trigger type determination argument role determination For our baseline system, given the small performance gaps between trigger identification and trigger type determination (3. [sent-119, score-2.883]

37 8) and between argument identification and argument role determination (3. [sent-123, score-0.362]

38 4), the performance bottlenecks of our baseline system mainly exist in trigger identification and argument identification, particularly for the former one. [sent-127, score-0.949]

39 8), the former one, trigger identification, can only achieve the performance of 61. [sent-132, score-0.788]

40 In this paper, we will focus on trigger identification to improve its performance, particularly for the recall, via compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions. [sent-135, score-2.454]

41 In this section, we introduce a more fine-grained semantics - the compositional semantics in Chinese verb structure - and unveil its effect and usage in Chinese language processing by employing it into Chinese event extraction. [sent-138, score-0.596]

42 voraE)tiexanlwmde4tpk)lirsofcC来击会私mh(pascrmohasicemvotaier)ns见毙)a到信l(smketlio )atenrics Chinese words Therefore, it is natural to infer unknown triggers by employing compositional semantics inside Chinese triggers. [sent-149, score-0.709]

43 ) where “划伤” is a known trigger and “刺伤” is an unknown one. [sent-154, score-0.861]

44 In above examples, the semantics of “ 划伤 ” (injure by scratching) can be largely determined from those of its component characters “ 划 ” (scratch) and “伤” (injure) while the semantics of “ 刺伤 ” (injure by stabbing) from those of its component characters “刺” (stab) and “伤” (injure). [sent-155, score-0.276]

45 Since these two triggers have similar internal structures, we can easily infer that “刺伤” is a trigger of injure event if “划伤” is known as a trigger of injure event. [sent-156, score-2.409]

46 Similarly, we can infer more triggers for injure event, such as “ 伤 ” (injure by burning), “撞伤” (injure by hitting), “压伤 ” (injure by pressing), all with component character “伤” (injure) as the head and the other component character as the way of causing injury. [sent-157, score-0.577]

47 Since most triggers in Chinese event extraction 灼 5 are verbs , we focus on the compositional semantics in the verb structure. [sent-158, score-0.882]

48 Normally, almost all verbs contain one or more single-character verbs as the basic element to construct a verb (we call it basic verb, shorted as BV) and the semantics of such a verb thus can be inferred from its BV. [sent-164, score-0.257]

49 ” 5 Actually, in the ACE 2005 Chinese (training) corpus, more than 90% of triggers are either verbs al or verbal nouns (those verbs which act as nouns). [sent-182, score-0.378]

50 we don’t 1010 From above structures, a BV plays an important role in the verb structure and most of semantics of a verb can be interred from its contained BV and two words normally have very similar semantics if they have the same BV (e. [sent-184, score-0.359]

51 2 Inferring via Compositional Semantics inside Chinese Triggers Here a simple rule is employed to infer triggers via compositional semantics inside Chinese triggers: a verb is a trigger if it contains a BV which occurs as a known trigger or is contained in a known trigger. [sent-192, score-2.42]

52 Table 5 shows the distribution of the set of triggers (contains the same BV ) classified by number of triggers. [sent-193, score-0.348]

53 As for trigger 6 mentions, these percentages become 89. [sent-197, score-0.788]

54 2% (75/88) of triggers of Trial-Hearing event mentions contain “审” (trial) and 85. [sent-201, score-0.629]

55 4% (117/138) of triggers of injure event mentions contains “伤” (injure). [sent-202, score-0.749]

56 It is worthwhile to note that such inference works for unknown triggers and word    6 We didn’t tag BVs in the training set and regards all singlecharacter verbs contained in triggers as BVs. [sent-207, score-0.789]

57 segmentation errors to known triggers since in both cases, their BVs will always exist as either a SCW or a component of a word. [sent-208, score-0.445]

58 3 Noise Filtering One problem with above inference is that while it is able to recover some true triggers and increase the recall, it may introduce many pseudo ones and harm the precision. [sent-210, score-0.448]

59 Non-trigger Filtering A Chinese word will not be a trigger if it appears in the training set but never trigger an event. [sent-212, score-1.576]

60 POS filtering A Chinese word will not be a trigger if it has a different POS from that of the same known trigger or similar known triggers 7 in the training set. [sent-215, score-2.008]

61 Verb structure filtering A Chinese word will not be a trigger if its verb structure is different from that of the same known trigger or similar known triggers in the training set. [sent-225, score-2.069]

62 For example, we can find that all triggers including “解” (unbind) (e. [sent-228, score-0.348]

63 For unknown triggers, we can merge two or more neighboring short words or single characters as a trigger candidate. [sent-243, score-0.872]

64 In this paper, for each single-character verb in a document after word segmentation, this single-character verb can be merged with either previous SCW or next SCW to form a trigger candidate if this single-character verb has occurred in the training set with the same verb structure. [sent-244, score-1.059]

65 Given above recovered triggers for both known and unknown triggers, the key issue here is how to distinguish true triggers from pseudo ones. [sent-245, score-0.824]

66 In this paper, we employ discourse consistency between Chinese trigger mentions for Chinese event extraction. [sent-246, score-1.255]

67 Previous studies on English event extraction have proved the effectiveness of both cross-entity and cross-document consistency. [sent-247, score-0.26]

68 Similarly, argument missing is another issue in Chinese event extraction and almost 55% of arguments are missing in the ACE 2005 Chinese corpus. [sent-251, score-0.354]

69 Normally, using a feature-based approach to distinguish true triggers from pseudo ones is very difficult from the sentence level if some of related arguments are missing from the triggeroccurring sentence. [sent-252, score-0.434]

70 Comparison of discourse consistency between Chinese and English trigger mentions Table 6 compares the probabilities of discourse consistency between Chinese and English trigger mentions in the ACE 2005 Chinese and English corpora. [sent-259, score-2.12]

71 It’s considered discourse-consistent when all the appearances of a trigger have the same event type while instance-based consistency refers to pair-wired cases. [sent-261, score-1.114]

72 It shows that within the discourse, there is a strong consistency in both Chinese and English between trigger mentions: if 1012 one instance of a word is a trigger, other instances in the same discourse will be a trigger of the same event type with very high probability. [sent-262, score-1.981]

73 Probabilities of discourse-level consistency of top 10 frequent triggers It also shows that discourse consistency in Chinese triggers holds much more likely than the English counterpart. [sent-266, score-0.989]

74 Figure 2 give the probabilities of discourse-level consistency of top 10 frequent triggers, which occupy 18% of event mentions in the ACE 2005 Chinese corpus. [sent-267, score-0.4]

75 2 Inference via Discourse Consistency between Chinese Trigger Mentions Given a discourse and different mentions of a trigger returned by the trigger identifier, we can simply accept those mentions with high probability as true mentions of the trigger and discard those with low probability8. [sent-269, score-2.715]

76  Probability of the discourse consistency of the candidate trigger mention in the training set. [sent-271, score-1.02]

77 1 Chinese Trigger Identification Table 7 shows the impact of compositional semantics in trigger identification. [sent-276, score-1.006]

78 Here, the baseline just extracts those triggers occurring in the POS tags, that percentage will be increased to 14. [sent-277, score-0.348]

79 In particular, to keep true triggers in our candidate set as many as possible, we just filter out those candidates which occur as non-triggers more than 5 times in the training set according to our validation on the development set. [sent-283, score-0.378]

80 7% (823) of pseudo triggers are filtered out while only 1. [sent-285, score-0.401]

81 4% of candidate triggers have wrong POS tags in the development set. [sent-291, score-0.364]

82 Manual inspection shows that if we correct those wrong 1013 Table 8 shows the contribution of employing compositional semantics and discourse consistency to trigger identification on the held-out test set. [sent-293, score-1.304]

83 5% in recall, benefiting from both compositional semantics and discourse consistency mechanisms. [sent-296, score-0.404]

84 Our observation shows that our compositional semantics inference adds almost 10% new non-triggers into candidates which are very hard to distinguish. [sent-300, score-0.252]

85 Table 8 also justifies the impact of the discourse consistency between trigger mentions in trigger identification and the effect of the additional discourse-level trigger identifier, with a big gain of 5. [sent-301, score-2.733]

86 2 Chinese Event Extraction Table 9 shows the contribution of trigger identification with compositional semantics and discourse consistency to overall event extraction on the held-out test set. [sent-305, score-1.512]

87 From the results presented in Table 9, we can find that our approach can improve the F1measure for trigger identification by 9. [sent-307, score-0.868]

88 In addition, the results of two annotators show that Chinese event extraction is really challenging even for a well-educated human being. [sent-318, score-0.24]

89 As shown in Table 9, the inter-annotator agreement on trigger identification and trigger type determination is even less than 45%. [sent-319, score-1.775]

90 Although this figure is very low, it is not surprising: the results on the English ACE 2005 corpus show that the inter-annotator agreement on trigger identification is only about 40% (Ji and Grishman, 2008). [sent-320, score-0.868]

91 Detailed analysis shows that a human annotator tends to make more mistakes in trigger identification for two reasons. [sent-321, score-0.884]

92 The first reason is that a human annotator always misses some event mentions when a sentence contains more than one event mention. [sent-322, score-0.492]

93 Table 9 also shows the performance gaps of human annotators between trigger identification and trigger type determination is very small (2. [sent-324, score-1.775]

94 It ensures that trigger 1014 identification is the most important step in Chinese event extraction for a human being. [sent-327, score-1.108]

95 For human annotators, it’s much easier to determine the event type of a trigger, identify its arguments and determine the role of each argument, all with more than 90% in accuracy, once a trigger is identified correctly. [sent-328, score-1.05]

96 This paper shows that the compositional semantics in the verb structure provides an ideal way to expand the coverage of triggers. [sent-337, score-0.279]

97 7 Conclusion In this paper we propose two novel inference mechanisms to Chinese trigger identification. [sent-339, score-0.838]

98 In particular, compositional semantics inside Chinese triggers and discourse consistency between Chinese trigger mentions are used to resolve two critical issues in Chinese trigger identification: unknown triggers and word segmentation errors to known triggers. [sent-340, score-2.929]

99 It shows that such novel inference mechanisms for Chinese event extraction are linguistically justified and pragmatically beneficial to real world applications. [sent-342, score-0.29]

100 In future work, we will focus on how to introduce the discourse information into the individual classifiers to capture those long-distance features and joint learning of subtasks in Chinese event extraction. [sent-343, score-0.285]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('trigger', 0.788), ('triggers', 0.348), ('event', 0.195), ('chinese', 0.182), ('bv', 0.17), ('compositional', 0.128), ('injure', 0.12), ('consistency', 0.107), ('determination', 0.095), ('semantics', 0.09), ('mentions', 0.086), ('argument', 0.081), ('identification', 0.08), ('discourse', 0.079), ('grishman', 0.07), ('verb', 0.061), ('ji', 0.061), ('ace', 0.06), ('inside', 0.046), ('extraction', 0.045), ('unknown', 0.044), ('identifier', 0.043), ('bvs', 0.043), ('scw', 0.043), ('yangarber', 0.043), ('pseudo', 0.041), ('liao', 0.037), ('segmentation', 0.033), ('employing', 0.032), ('mechanisms', 0.031), ('mention', 0.03), ('known', 0.029), ('characters', 0.028), ('heng', 0.027), ('maslennikov', 0.026), ('filtering', 0.026), ('meet', 0.026), ('role', 0.025), ('patwardhan', 0.025), ('type', 0.024), ('kill', 0.023), ('entity', 0.023), ('hardy', 0.022), ('infer', 0.021), ('component', 0.02), ('studies', 0.02), ('inference', 0.019), ('fire', 0.019), ('arguments', 0.018), ('pos', 0.018), ('hong', 0.018), ('justifies', 0.017), ('chen', 0.017), ('commerce', 0.017), ('errata', 0.017), ('qiaoming', 0.017), ('unbind', 0.017), ('normally', 0.017), ('gupta', 0.016), ('character', 0.016), ('candidate', 0.016), ('employed', 0.016), ('inconsistency', 0.016), ('causing', 0.016), ('annotator', 0.016), ('rule', 0.015), ('contained', 0.015), ('verbs', 0.015), ('almost', 0.015), ('structures', 0.015), ('borovets', 0.015), ('postgraduate', 0.015), ('ahn', 0.015), ('sometime', 0.015), ('errors', 0.015), ('units', 0.014), ('true', 0.014), ('meanings', 0.014), ('ones', 0.013), ('roman', 0.013), ('chua', 0.013), ('overt', 0.013), ('recover', 0.013), ('ralph', 0.012), ('trial', 0.012), ('talks', 0.012), ('english', 0.012), ('filtered', 0.012), ('boulder', 0.012), ('applies', 0.012), ('neighboring', 0.012), ('predefined', 0.012), ('levy', 0.012), ('occupy', 0.012), ('neighbor', 0.011), ('prague', 0.011), ('finkel', 0.011), ('subtasks', 0.011), ('document', 0.011), ('nearest', 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction

Author: Peifeng Li ; Guodong Zhou ; Qiaoming Zhu ; Libin Hou

2 0.16611141 72 emnlp-2012-Joint Inference for Event Timeline Construction

Author: Quang Do ; Wei Lu ; Dan Roth

Abstract: This paper addresses the task of constructing a timeline of events mentioned in a given text. To accomplish that, we present a novel representation of the temporal structure of a news article based on time intervals. We then present an algorithmic approach that jointly optimizes the temporal structure by coupling local classifiers that predict associations and temporal relations between pairs of temporal entities with global constraints. Moreover, we present ways to leverage knowledge provided by event coreference to further improve the system performance. Overall, our experiments show that the joint inference model significantly outperformed the local classifiers by 9.2% of relative improvement in F1. The experiments also suggest that good event coreference could make remarkable contribution to a robust event timeline construction system.

3 0.098211333 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.

4 0.074584275 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

Author: Zhongguo Li ; Guodong Zhou

Abstract: Most previous approaches to syntactic parsing of Chinese rely on a preprocessing step of word segmentation, thereby assuming there was a clearly defined boundary between morphology and syntax in Chinese. We show how this assumption can fail badly, leading to many out-of-vocabulary words and incompatible annotations. Hence in practice the strict separation of morphology and syntax in the Chinese language proves to be untenable. We present a unified dependency parsing approach for Chinese which takes unsegmented sentences as input and outputs both morphological and syntactic structures with a single model and algorithm. By removing the intermediate word segmentation, the unified parser no longer needs separate notions for words and phrases. Evaluation proves the effectiveness of the unified model and algorithm in parsing structures of words, phrases and sen- tences simultaneously. 1

5 0.070532754 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

Author: Jiayi Zhao ; Xipeng Qiu ; Shu Zhang ; Feng Ji ; Xuanjing Huang

Abstract: In modern Chinese articles or conversations, it is very popular to involve a few English words, especially in emails and Internet literature. Therefore, it becomes an important and challenging topic to analyze Chinese-English mixed texts. The underlying problem is how to tag part-of-speech (POS) for the English words involved. Due to the lack of specially annotated corpus, most of the English words are tagged as the oversimplified type, “foreign words”. In this paper, we present a method using dynamic features to tag POS of mixed texts. Experiments show that our method achieves higher performance than traditional sequence labeling methods. Meanwhile, our method also boosts the performance of POS tagging for pure Chinese texts.

6 0.061921794 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

7 0.057196256 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

8 0.057105135 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

9 0.048850182 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

10 0.04744355 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

11 0.046069894 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

12 0.043796793 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

13 0.042899646 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

14 0.039160606 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

15 0.037630919 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

16 0.035151351 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

17 0.03495419 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

18 0.034625694 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

19 0.03444289 19 emnlp-2012-An Entity-Topic Model for Entity Linking

20 0.033659294 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.127), (1, 0.065), (2, -0.001), (3, -0.131), (4, 0.075), (5, 0.01), (6, -0.029), (7, -0.121), (8, -0.041), (9, -0.063), (10, -0.149), (11, -0.059), (12, -0.011), (13, -0.061), (14, -0.004), (15, -0.026), (16, -0.022), (17, -0.015), (18, 0.137), (19, -0.201), (20, -0.1), (21, 0.022), (22, -0.004), (23, 0.12), (24, 0.018), (25, -0.011), (26, -0.037), (27, -0.002), (28, -0.07), (29, -0.098), (30, 0.043), (31, -0.14), (32, -0.034), (33, -0.178), (34, 0.047), (35, -0.046), (36, -0.113), (37, -0.049), (38, -0.145), (39, -0.084), (40, -0.018), (41, -0.034), (42, -0.052), (43, -0.163), (44, 0.048), (45, -0.058), (46, -0.023), (47, 0.009), (48, 0.11), (49, -0.107)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96240467 38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction

Author: Peifeng Li ; Guodong Zhou ; Qiaoming Zhu ; Libin Hou

2 0.65846729 72 emnlp-2012-Joint Inference for Event Timeline Construction

Author: Quang Do ; Wei Lu ; Dan Roth

3 0.38651779 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

Author: David McClosky ; Christopher D. Manning

Abstract: We present a distantly supervised system for extracting the temporal bounds of fluents (relations which only hold during certain times, such as attends school). Unlike previous pipelined approaches, our model does not assume independence between each fluent or even between named entities with known connections (parent, spouse, employer, etc.). Instead, we model what makes timelines of fluents consistent by learning cross-fluent constraints, potentially spanning entities as well. For example, our model learns that someone is unlikely to start a job at age two or to marry someone who hasn’t been born yet. Our system achieves a 36% error reduction over a pipelined baseline.

4 0.3737554 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

5 0.36308065 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

Author: Hila Weisman ; Jonathan Berant ; Idan Szpektor ; Ido Dagan

Abstract: Learning inference relations between verbs is at the heart of many semantic applications. However, most prior work on learning such rules focused on a rather narrow set of information sources: mainly distributional similarity, and to a lesser extent manually constructed verb co-occurrence patterns. In this paper, we claim that it is imperative to utilize information from various textual scopes: verb co-occurrence within a sentence, verb cooccurrence within a document, as well as overall corpus statistics. To this end, we propose a much richer novel set of linguistically motivated cues for detecting entailment between verbs and combine them as features in a supervised classification framework. We empirically demonstrate that our model significantly outperforms previous methods and that information from each textual scope contributes to the verb entailment learning task.

6 0.33647743 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

7 0.33363739 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

8 0.31764641 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

9 0.30829072 63 emnlp-2012-Identifying Event-related Bursts via Social Media Activities

10 0.28218347 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

11 0.2593258 106 emnlp-2012-Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features

12 0.25892344 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs

13 0.20738952 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

14 0.20466812 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

15 0.19420977 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation

16 0.17988816 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types

17 0.17581801 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

18 0.17254883 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

19 0.16973849 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

20 0.16561151 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.015), (16, 0.024), (25, 0.011), (34, 0.036), (41, 0.012), (60, 0.104), (63, 0.043), (64, 0.389), (65, 0.04), (70, 0.028), (73, 0.017), (74, 0.048), (76, 0.052), (80, 0.017), (86, 0.019), (95, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9285447 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence

Author: Hila Weisman ; Jonathan Berant ; Idan Szpektor ; Ido Dagan

same-paper 2 0.87177539 38 emnlp-2012-Employing Compositional Semantics and Discourse Consistency in Chinese Event Extraction

Author: Peifeng Li ; Guodong Zhou ; Qiaoming Zhu ; Libin Hou

3 0.79923797 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification

Author: Natalia Ponomareva ; Mike Thelwall

Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.

4 0.66414839 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

Author: Dan Garrette ; Jason Baldridge

Abstract: Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MINGREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to per- formance over the original MIN-GREEDY algorithm for both English and Italian data.

5 0.47650361 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

6 0.46492556 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation

7 0.46352208 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

8 0.45302412 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

9 0.44982097 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

10 0.44147286 72 emnlp-2012-Joint Inference for Event Timeline Construction

11 0.43618339 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

12 0.42905837 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging

13 0.41682419 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

14 0.41438633 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

15 0.40944889 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

16 0.4088541 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure

17 0.40838751 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

18 0.40785566 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model

19 0.4054265 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs

20 0.40479702 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis