emnlp emnlp2013 emnlp2013-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jonathan K. Kummerfeld ; Dan Klein
Abstract: Coreference resolution metrics quantify errors but do not analyze them. Here, we consider an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types. Using this tool, we first compare the error distributions across a large set of systems, then analyze common errors across the top ten systems, empirically characterizing the major unsolved challenges of the coreference resolution task.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract Coreference resolution metrics quantify errors but do not analyze them. [sent-4, score-0.52]
2 Here, we consider an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types. [sent-5, score-0.944]
3 Using this tool, we first compare the error distributions across a large set of systems, then analyze common errors across the top ten systems, empirically characterizing the major unsolved challenges of the coreference resolution task. [sent-6, score-1.105]
4 1 Introduction Metrics produce measurements that concisely summarize performance on the full range of error types, and for coreference resolution there has been extensive work on developing effective metrics (Luo, 2005; Recasens and Hovy, 2011). [sent-7, score-0.826]
5 Previous investigations of coreference errors have focused on quantifying the importance of subtasks such as named entity recognition and anaphoricity detection, typically by measuring accuracy improvements when partial gold annotations are provided (Stoyanov et al. [sent-9, score-1.223]
6 For coreference resolution the drawback of this approach is that decisions are often interdependent, and so even partial gold information is extremely informative. [sent-13, score-0.742]
7 Also, previous work only considered errors by counting links, which does not capture certain errors in a natural way, e. [sent-14, score-0.591]
8 265 We present a new tool that automatically classifies errors in the standard output of any coreference resolution system. [sent-18, score-1.038]
9 Since our tool uses only system output, we are able to classify errors made by systems of any architecture, including both systems that use link-based inference and systems that use global inference methods. [sent-20, score-0.468]
10 First, we compare the error distributions on coreference resolution of all of the systems from the CoNLL 2011shared task plus several publicly available systems. [sent-22, score-0.849]
11 This comparison adds to the analysis from the shared task by illustrating the substantial variation in the types of errors different systems make. [sent-23, score-0.515]
12 This investigation identifies key outstanding challenges and presents the impact that solving each of them would have in terms of changes in the standard coreference resolution metrics. [sent-25, score-0.688]
13 We find that the best systems are not best across all error types, that a large proportion of span errors are due to superficial parse differences, and that the biggest performance loss is on missed entities that contain a small number of mentions. [sent-26, score-0.675]
14 This work presents a comprehensive investigation of common errors in coreference resolution, identifying particular issues worth focusing on in future research. [sent-27, score-0.733]
15 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 2t6ic5s–27 , 2 Background Most coreference work focuses on accuracy improvements, as measured by metrics such as MUC (Vilain et al. [sent-33, score-0.491]
16 The only common forms of further analysis are results for anaphoricity detection and scores for each mention type (nominal, pronoun, proper). [sent-35, score-0.335]
17 More fine consideration of some subtasks does occur, for example, anaphoricity detection, which has been recognized as a key challenge in coreference resolution for decades and regularly has separate results reported (Paice and Husk, 1987; Sobha et al. [sent-40, score-0.774]
18 Some work has also included anecdotal discussion of specific error types or manual classification of a small set of errors, but these approaches do not effectively quantify the relative impact of different errors (Chen and Ng, 2012; Martschat et al. [sent-44, score-0.445]
19 First, they measured accuracy improvements when their system was given gold annotations for three subtasks of coreference resolution: mention detection, named entity recognition, and anaphoricity detection. [sent-49, score-1.212]
20 To isolate other types oferrors they de- fined resolution classes, based on both the type of a mention, and properties of possible antecedents (for example, nominals that have a possible antecedent that is an exact string match). [sent-50, score-0.465]
21 For each resolution class they measured performance while giving the system gold annotations for all other classes. [sent-51, score-0.377]
22 One error is a mention missing from the system output, he. [sent-53, score-0.598]
23 between the nine classes they defined, it misses the cascade effect of errors that only occur when all mentions are being resolved at once. [sent-55, score-0.465]
24 , 2011, 2012), which explored the impact of mention detection and anaphoricity detection through subtasks with different types of gold annotation. [sent-57, score-0.464]
25 3 Error Classification When inspecting the output of coreference resolution systems, several types of errors become immediately apparent: entities that have been divided into pieces, spurious entities, non-referential pronouns that have been assigned antecedents, and so on. [sent-61, score-1.326]
26 For example, in Figure 1, we can intuitively see two pronoun related mistakes: a missing mention (he), and a divided entity where the two pieces are the blue pronouns (I2, I2, myself2) and the red proper names (President Clinton1, Mr. [sent-64, score-1.204]
27 We focused on using system output because other methods cannot uniformly apply to the full range of coreference resolution decoding methods, from link based methods to global inference methods. [sent-68, score-0.738]
28 Alter Span transforms an incorrect system mention into a gold mention that has the same head token. [sent-77, score-0.581]
29 In Figure 2 this stage is demonstrated by a mention in the leftmost entity, which has its span altered, indicated by the change from an X to a light blue circle. [sent-78, score-0.457]
30 Split breaks the system entities into pieces, each containing mentions from a single gold entity. [sent-80, score-0.436]
31 In Figure 2 there are three changes in this stage: the leftmost entity is split into a red piece and a light blue piece, the middle entity is split into a dark red piece and an X, and the rightmost entity is split into singletons. [sent-81, score-1.064]
32 Introduce creates a singleton entity for each mention that is missing from the system output. [sent-86, score-0.713]
33 In Figure 2 this stage involves the introduction of a light blue mention and two white mentions. [sent-87, score-0.381]
34 blue entity is merged with the rest of the blue entity, and the two white mentions are merged. [sent-96, score-0.601]
35 This could either be a single operation, one entity being split into N pieces, or N −1 operations, each involving a single piece being split 1o offp efrroamtio tnhse, r eeascth ho fin thvoel entity. [sent-101, score-0.337]
36 in Figure 3(i), the system mention Gorbachev is replaced by the annotated mention Soviet leader Gorbachev. [sent-113, score-0.497]
37 the new entity includes pronouns that were already present in the system output. [sent-124, score-0.369]
38 The reasoning for this is that most pronouns in the corpus are coreferent, so including just the pronouns from an entity is not meaningfully different from missing the entity entirely. [sent-125, score-0.859]
39 As for the Missing Entity error type, this error is still assigned if the original entity contained pronouns that were valid. [sent-130, score-0.592]
40 5 Broad System Comparison Table 1presents the frequency of errors for each system and F-Scores for standard metrics1 on the test set of the 2011 CoNLL shared task. [sent-164, score-0.459]
41 Each bar is filled in proportion to the number of errors the system made, with a full bar corresponding to the number of errors listed in the bottom row. [sent-165, score-0.597]
42 However, the metrics do not convey the significant variation in the types of errors systems make. [sent-167, score-0.381]
43 , 2012), improvements are not monotonic, with better systems often making more errors of one type when decreasing the frequency of another type. [sent-172, score-0.349]
44 In this setting, three of the error types are not present, but there are still Missing Mentions and Missing Entities because systems do not always choose an antecedent, leaving a mention as a singleton, which is then ignored. [sent-180, score-0.425]
45 In the next section, we characterize the common errors on a finer level by breaking down each error type by a range of properties. [sent-182, score-0.45]
46 Of the system mentions involved in span errors, 27. [sent-215, score-0.366]
47 Overall it seems that span errors can best be dealt with by improving parsing, though it is not possible to completely eliminate these errors because of inconsistent annotations. [sent-218, score-0.666]
48 Table 3 divides these errors by the type of mention involved and presents some of the most frequent Extra Mentions and Missing Mentions. [sent-221, score-0.581]
49 For the corpus statistics we count as mentions all NP spans in the gold parse plus any word tagged with PRP, WP, WDT, or WRB (following the definition of gold mention boundaries for the CoNLL tasks). [sent-222, score-0.583]
50 1u5ped by properties of the mention and the entity it is in. [sent-234, score-0.471]
51 In Table 4 we consider the Extra Mention errors and Missing Mention errors involving proper names and nominals. [sent-237, score-0.625]
52 The top section counts errors in which the mention involved in the error has an exact string match with a mention in the cluster, or whether it has just a head match. [sent-238, score-0.901]
53 The second section of the table considers the named entity annotations in OntoNotes, counting how often the mention’s type matches the type of the cluster. [sent-239, score-0.403]
54 For these two error types, our observations agree with previous work: the most common specific error is the identification of pleonastic pronouns, named entity types are of limited use, and head matching is already being used about as effectively as it can be. [sent-242, score-0.537]
55 3 Extra Entities and Missing Entities In this section, we consider the errors that involve an entire entity that was either missing from the system output or does not exist in the annotations. [sent-247, score-0.761]
56 for entities containing one nominal and one pronoun (row 0 1 1) there are far more Missing errors than Extra errors, while entities containing two pronouns (row 0 0 2) have the opposite trend. [sent-251, score-0.851]
57 It is clear that entities consisting of a single type of mention are the primary source of these errors, accounting for 85. [sent-252, score-0.431]
58 Table 6 shows counts for these cases divided into three groups: when all mentions are identical, when all mentions have the same head, and the rest. [sent-255, score-0.47]
59 5182Extraniy MentionExtraMissing errors where the entity has just two mentions: a pronoun and either a nominal or a proper name. [sent-258, score-0.707]
60 These errors include cases like the example below, where two mentions are not considered coreferent because they are generic: everybody tends to mistake the part for the whole. [sent-261, score-0.504]
61 Here, mistaking the part for the whole is For missing entities we see the opposite trend, with Exact match cases accounting for less than 12% of nominal errors. [sent-262, score-0.466]
62 One way of interpreting these errors is from the perspective of the pronoun, which is either incorrectly coreferent (Extra), or incorrectly noncoreferent (Missing). [sent-270, score-0.355]
63 However, the distribution of errors is quite different, with it being balanced here where previously it skewed heavily towards extra mentions, while that was balanced in Table 3 but is skewed towards being part of Missing Entities here. [sent-272, score-0.424]
64 Extra Entity errors and Missing Entity errors are particularly challenging because they are dominated by entities that are either just nominals, or a nominal and a pronoun, and for these cases the string matching features are often misleading. [sent-273, score-0.79]
65 4 Conflated Entities and Divided Entities Table 8 breaks down the Conflated Entities errors and Divided Entity errors by the composition of the part being split/merged and the rest of the entity involved. [sent-278, score-0.799]
66 Clearly pronouns being placed incorrectly is the biggest issue here, with almost all of the common errors involving a part with just pronouns. [sent-280, score-0.484]
67 One particularly noticeable issue involves entities composed entirely of pronouns, which are often created by systems conflating the pronouns of two entities together. [sent-282, score-0.445]
68 6% of Conflated Entity errors led to a complete gold entity with no other errors, and only 21. [sent-289, score-0.566]
69 Finding finer characterizations of these errors is difficult, as almost any division produces sparse counts, reflecting the long tail of mistakes that make up these two error types. [sent-292, score-0.516]
70 5 Cataphora Cataphora (when an anaphor precedes its antecedent) is a pronoun-specific problem that does not fit easily in the common left-to-right coreference resolution approach. [sent-296, score-0.658]
71 In Table 9 we show how well systems handle this challenge by counting mentions based on whether they are cataphoric in the annotations, are cataphoric in the system output, and whether the antecedents match. [sent-299, score-0.384]
72 Systems handle cataphora poorly, missing almost all of the true instances, and introducing a large number of extra cases. [sent-300, score-0.397]
73 6 Entity Properties Gender, number, person, and named entity type are properties commonly used in coreference resolution systems. [sent-303, score-0.977]
74 In Table 11 we present the percentage of entities that contain mentions with properties of more than one type. [sent-307, score-0.352]
75 For named entity types we considered the annotations in OntoNotes; for the other properties we derive them from the pronouns in each cluster. [sent-308, score-0.491]
76 For all of the properties, there are many entities that we could not assign a value to, either because no named entity information was available, or because no pronouns with an unambiguous value for the property were present. [sent-309, score-0.494]
77 For named entity information, OntoNotes only has annotations for 68% of gold entities, suggesting that named entity taggers are of limited usefulness, matching observations on the MUC and ACE corpora (Stoyanov et al. [sent-310, score-0.614]
78 n42506thelowrscin is calculated independently, relative to the change after the span errors have been corrected. [sent-322, score-0.387]
79 Some values are negative because the merge operations involved in fixing the errors are applying to clusters that contain mentions from more than one gold entity. [sent-323, score-0.672]
80 1% Table 11: Percentage of entities that contain mentions with properties that disagree. [sent-328, score-0.352]
81 7% ofentities with a mixture ofnamed entity types there may be mistakes in the coreference annotations, or mistakes in the named entity annotations. [sent-332, score-1.054]
82 4 By fixing each of the other error types in isolation, we can get a sense of the gain if just that error type is addressed. [sent-337, score-0.337]
83 4This difference was necessary as the later errors make changes relative to the state of the entities after the Span Errors are corrected, e. [sent-340, score-0.436]
84 in Figure 2 a blue and red entity is split that previously contained an X instead of one of the blue mentions. [sent-342, score-0.49]
85 Missing Entity errors have the most substantial impact, which reflects the precision oriented nature of many coreference resolution systems. [sent-344, score-0.937]
86 7 Conclusion While the improvement of metrics and the organization of shared tasks have been crucial for progress in coreference resolution, there is much insight to be gained by performing a close analysis of errors. [sent-345, score-0.632]
87 We have presented a new means of automatically classifying coreference errors that provides an exhaustive view of error types. [sent-346, score-0.864]
88 Using our tool we have analyzed the output of a large set of coreference resolution systems and investigated the common challenges across state-of-the-art systems. [sent-347, score-0.789]
89 Combining the best of two worlds: A hybrid approach to multilingual coreference resolution. [sent-386, score-0.454]
90 Simple coreference resolution with rich syntactic and semantic features. [sent-398, score-0.658]
91 An incremental model for coreference resolution with restrictive antecedent accessibility. [sent-414, score-0.697]
92 Stanfords multi-pass sieve coreference resolution system at the conll-201 1 shared task. [sent-439, score-0.838]
93 Conll-201 1 shared task: Modeling unrestricted coreference in ontonotes. [sent-479, score-0.669]
94 Conll-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes. [sent-483, score-0.669]
95 Relaxcor participation in conll shared task on coreference resolution. [sent-497, score-0.755]
96 Link type based pre-cluster pair model for coreference resolution. [sent-511, score-0.494]
97 Conundrums in noun phrase coreference resolution: making sense of the state-of-theart. [sent-515, score-0.454]
98 Reconciling ontonotes: Unrestricted coreference resolution in ontonotes with reconcile. [sent-523, score-0.734]
99 A machine learning-based coreference detection system Language Learning: Shared for ontonotes. [sent-543, score-0.493]
100 Combining syntactic and semantic features by svm for unrestricted coreference resolution. [sent-559, score-0.528]
wordName wordTfidf (topN-words)
[('coreference', 0.454), ('errors', 0.279), ('mention', 0.229), ('fifteenth', 0.222), ('resolution', 0.204), ('entity', 0.203), ('missing', 0.199), ('mentions', 0.186), ('extra', 0.145), ('shared', 0.141), ('error', 0.131), ('conll', 0.13), ('entities', 0.127), ('pronouns', 0.127), ('pronoun', 0.125), ('pradhan', 0.112), ('stoyanov', 0.112), ('conflated', 0.112), ('span', 0.108), ('blue', 0.09), ('gold', 0.084), ('ontonotes', 0.076), ('nominals', 0.074), ('unrestricted', 0.074), ('nominal', 0.066), ('anaphoricity', 0.066), ('kummerfeld', 0.062), ('mistakes', 0.061), ('tool', 0.06), ('divided', 0.059), ('red', 0.058), ('pages', 0.054), ('blanc', 0.053), ('cataphora', 0.053), ('zhekova', 0.053), ('annotations', 0.05), ('subtasks', 0.05), ('merge', 0.049), ('split', 0.049), ('pieces', 0.047), ('sobha', 0.046), ('conference', 0.045), ('division', 0.045), ('singleton', 0.043), ('bj', 0.042), ('operations', 0.041), ('output', 0.041), ('operation', 0.041), ('placed', 0.04), ('type', 0.04), ('system', 0.039), ('orkelund', 0.039), ('veselin', 0.039), ('cases', 0.039), ('properties', 0.039), ('antecedent', 0.039), ('incorrectly', 0.038), ('composition', 0.038), ('metrics', 0.037), ('named', 0.037), ('lance', 0.037), ('recasens', 0.037), ('sameer', 0.037), ('characterizing', 0.037), ('piece', 0.036), ('charton', 0.036), ('desislava', 0.036), ('holen', 0.036), ('paice', 0.036), ('pattabhi', 0.036), ('sapena', 0.036), ('sundar', 0.036), ('ubiu', 0.036), ('accounting', 0.035), ('ramshaw', 0.035), ('types', 0.035), ('proper', 0.034), ('berkeley', 0.034), ('noticeable', 0.034), ('antecedents', 0.034), ('counting', 0.033), ('names', 0.033), ('involved', 0.033), ('yang', 0.033), ('alter', 0.032), ('cai', 0.032), ('claire', 0.032), ('proceedings', 0.032), ('white', 0.032), ('reconcile', 0.031), ('submission', 0.031), ('martschat', 0.031), ('cataphoric', 0.031), ('stage', 0.03), ('hovy', 0.03), ('task', 0.03), ('nianwen', 0.03), ('systems', 0.03), ('changes', 0.03), ('dan', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999911 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
Author: Jonathan K. Kummerfeld ; Dan Klein
Abstract: Coreference resolution metrics quantify errors but do not analyze them. Here, we consider an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types. Using this tool, we first compare the error distributions across a large set of systems, then analyze common errors across the top ten systems, empirically characterizing the major unsolved challenges of the coreference resolution task.
2 0.60092551 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
Author: Greg Durrett ; Dan Klein
Abstract: Classical coreference systems encode various syntactic, discourse, and semantic phenomena explicitly, using heterogenous features computed from hand-crafted heuristics. In contrast, we present a state-of-the-art coreference system that captures such phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions. Surprisingly, our features are actually more effective than the corresponding hand-engineered ones at modeling these key linguistic phenomena, allowing us to win “easy victories” without crafted heuristics. These features are successful on syntax and discourse; however, they do not model semantic compatibility well, nor do we see gains from experiments with shallow semantic features from the literature, suggesting that this approach to semantics is an “uphill battle.” Nonetheless, our final system1 outperforms the Stanford system (Lee et al. (201 1), the winner of the CoNLL 2011 shared task) by 3.5% absolute on the CoNLL metric and outperforms the IMS system (Bj o¨rkelund and Farkas (2012), the best publicly available English coreference system) by 1.9% absolute.
3 0.43547663 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
Author: Fang Kong ; Hwee Tou Ng
Abstract: Coreference resolution plays a critical role in discourse analysis. This paper focuses on exploiting zero pronouns to improve Chinese coreference resolution. In particular, a simplified semantic role labeling framework is proposed to identify clauses and to detect zero pronouns effectively, and two effective methods (refining syntactic parser and refining learning example generation) are employed to exploit zero pronouns for Chinese coreference resolution. Evaluation on the CoNLL-2012 shared task data set shows that zero pronouns can significantly improve Chinese coreference resolution.
4 0.3885217 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution
Author: Kai-Wei Chang ; Rajhans Samdani ; Dan Roth
Abstract: Coreference resolution is a well known clustering task in Natural Language Processing. In this paper, we describe the Latent Left Linking model (L3M), a novel, principled, and linguistically motivated latent structured prediction approach to coreference resolution. We show that L3M admits efficient inference and can be augmented with knowledge-based constraints; we also present a fast stochastic gradient based learning. Experiments on ACE and Ontonotes data show that L3M and its constrained version, CL3M, are more accurate than several state-of-the-art approaches as well as some structured prediction models proposed in the literature.
5 0.33657032 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer
Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.
6 0.17367819 117 emnlp-2013-Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction
7 0.17349488 160 emnlp-2013-Relational Inference for Wikification
8 0.13534212 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
9 0.12687682 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
11 0.11637995 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model
12 0.1109192 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
13 0.10343507 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
14 0.0979397 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
15 0.097861908 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
16 0.089292146 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
17 0.088522248 118 emnlp-2013-Learning Biological Processes with Global Constraints
18 0.08381246 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?
19 0.067457937 74 emnlp-2013-Event-Based Time Label Propagation for Automatic Dating of News Articles
20 0.066920213 143 emnlp-2013-Open Domain Targeted Sentiment
topicId topicWeight
[(0, -0.27), (1, 0.344), (2, 0.571), (3, -0.201), (4, 0.077), (5, -0.064), (6, 0.024), (7, -0.08), (8, -0.034), (9, -0.014), (10, -0.073), (11, -0.004), (12, 0.014), (13, 0.0), (14, 0.023), (15, -0.015), (16, 0.038), (17, -0.023), (18, -0.014), (19, -0.057), (20, -0.01), (21, -0.041), (22, 0.003), (23, -0.037), (24, 0.066), (25, 0.011), (26, -0.056), (27, -0.031), (28, 0.019), (29, 0.012), (30, 0.016), (31, -0.026), (32, 0.004), (33, -0.032), (34, -0.061), (35, -0.024), (36, 0.01), (37, -0.038), (38, -0.044), (39, -0.023), (40, -0.007), (41, -0.009), (42, 0.012), (43, -0.041), (44, -0.018), (45, -0.004), (46, 0.013), (47, 0.016), (48, 0.006), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.97456241 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
Author: Jonathan K. Kummerfeld ; Dan Klein
Abstract: Coreference resolution metrics quantify errors but do not analyze them. Here, we consider an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types. Using this tool, we first compare the error distributions across a large set of systems, then analyze common errors across the top ten systems, empirically characterizing the major unsolved challenges of the coreference resolution task.
2 0.95895875 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
Author: Greg Durrett ; Dan Klein
Abstract: Classical coreference systems encode various syntactic, discourse, and semantic phenomena explicitly, using heterogenous features computed from hand-crafted heuristics. In contrast, we present a state-of-the-art coreference system that captures such phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions. Surprisingly, our features are actually more effective than the corresponding hand-engineered ones at modeling these key linguistic phenomena, allowing us to win “easy victories” without crafted heuristics. These features are successful on syntax and discourse; however, they do not model semantic compatibility well, nor do we see gains from experiments with shallow semantic features from the literature, suggesting that this approach to semantics is an “uphill battle.” Nonetheless, our final system1 outperforms the Stanford system (Lee et al. (201 1), the winner of the CoNLL 2011 shared task) by 3.5% absolute on the CoNLL metric and outperforms the IMS system (Bj o¨rkelund and Farkas (2012), the best publicly available English coreference system) by 1.9% absolute.
3 0.93438691 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer
Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.
4 0.90899569 1 emnlp-2013-A Constrained Latent Variable Model for Coreference Resolution
Author: Kai-Wei Chang ; Rajhans Samdani ; Dan Roth
Abstract: Coreference resolution is a well known clustering task in Natural Language Processing. In this paper, we describe the Latent Left Linking model (L3M), a novel, principled, and linguistically motivated latent structured prediction approach to coreference resolution. We show that L3M admits efficient inference and can be augmented with knowledge-based constraints; we also present a fast stochastic gradient based learning. Experiments on ACE and Ontonotes data show that L3M and its constrained version, CL3M, are more accurate than several state-of-the-art approaches as well as some structured prediction models proposed in the literature.
5 0.89436489 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
Author: Fang Kong ; Hwee Tou Ng
Abstract: Coreference resolution plays a critical role in discourse analysis. This paper focuses on exploiting zero pronouns to improve Chinese coreference resolution. In particular, a simplified semantic role labeling framework is proposed to identify clauses and to detect zero pronouns effectively, and two effective methods (refining syntactic parser and refining learning example generation) are employed to exploit zero pronouns for Chinese coreference resolution. Evaluation on the CoNLL-2012 shared task data set shows that zero pronouns can significantly improve Chinese coreference resolution.
6 0.52334112 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
7 0.50636864 160 emnlp-2013-Relational Inference for Wikification
8 0.44433784 117 emnlp-2013-Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction
9 0.41064444 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
10 0.38784587 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
11 0.34433556 23 emnlp-2013-Animacy Detection with Voting Models
12 0.32430947 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
13 0.31199247 75 emnlp-2013-Event Schema Induction with a Probabilistic Entity-Driven Model
15 0.2886098 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
16 0.28577858 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition
17 0.27207062 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?
18 0.25877976 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
19 0.23928498 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
20 0.22926022 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data
topicId topicWeight
[(3, 0.036), (17, 0.153), (18, 0.031), (22, 0.046), (30, 0.057), (50, 0.024), (51, 0.201), (58, 0.104), (66, 0.027), (71, 0.03), (75, 0.064), (77, 0.015), (95, 0.014), (96, 0.092), (97, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.8575595 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
Author: Jonathan K. Kummerfeld ; Dan Klein
Abstract: Coreference resolution metrics quantify errors but do not analyze them. Here, we consider an automated method of categorizing errors in the output of a coreference system into intuitive underlying error types. Using this tool, we first compare the error distributions across a large set of systems, then analyze common errors across the top ten systems, empirically characterizing the major unsolved challenges of the coreference resolution task.
2 0.83011544 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
Author: Greg Durrett ; Dan Klein
Abstract: Classical coreference systems encode various syntactic, discourse, and semantic phenomena explicitly, using heterogenous features computed from hand-crafted heuristics. In contrast, we present a state-of-the-art coreference system that captures such phenomena implicitly, with a small number of homogeneous feature templates examining shallow properties of mentions. Surprisingly, our features are actually more effective than the corresponding hand-engineered ones at modeling these key linguistic phenomena, allowing us to win “easy victories” without crafted heuristics. These features are successful on syntax and discourse; however, they do not model semantic compatibility well, nor do we see gains from experiments with shallow semantic features from the literature, suggesting that this approach to semantics is an “uphill battle.” Nonetheless, our final system1 outperforms the Stanford system (Lee et al. (201 1), the winner of the CoNLL 2011 shared task) by 3.5% absolute on the CoNLL metric and outperforms the IMS system (Bj o¨rkelund and Farkas (2012), the best publicly available English coreference system) by 1.9% absolute.
Author: Jinpeng Wang ; Wayne Xin Zhao ; Haitian Wei ; Hongfei Yan ; Xiaoming Li
Abstract: Hot trends are likely to bring new business opportunities. For example, “Air Pollution” might lead to a significant increase of the sales of related products, e.g., mouth mask. For ecommerce companies, it is very important to make rapid and correct response to these hot trends in order to improve product sales. In this paper, we take the initiative to study the task of how to identify trend related products. The major novelty of our work is that we automatically learn commercial intents revealed from microblogs. We carefully construct a data collection for this task and present quite a few insightful findings. In order to solve this problem, we further propose a graph based method, which jointly models relevance and associativity. We perform extensive experiments and the results showed that our methods are very effective.
4 0.79005396 112 emnlp-2013-Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves
Author: Hannaneh Hajishirzi ; Leila Zilles ; Daniel S. Weld ; Luke Zettlemoyer
Abstract: Many errors in coreference resolution come from semantic mismatches due to inadequate world knowledge. Errors in named-entity linking (NEL), on the other hand, are often caused by superficial modeling of entity context. This paper demonstrates that these two tasks are complementary. We introduce NECO, a new model for named entity linking and coreference resolution, which solves both problems jointly, reducing the errors made on each. NECO extends the Stanford deterministic coreference system by automatically linking mentions to Wikipedia and introducing new NEL-informed mention-merging sieves. Linking improves mention-detection and enables new semantic attributes to be incorporated from Freebase, while coreference provides better context modeling by propagating named-entity links within mention clusters. Experiments show consistent improve- ments across a number of datasets and experimental conditions, including over 11% reduction in MUC coreference error and nearly 21% reduction in F1 NEL error on ACE 2004 newswire data.
5 0.787552 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
Author: Fang Kong ; Hwee Tou Ng
Abstract: Coreference resolution plays a critical role in discourse analysis. This paper focuses on exploiting zero pronouns to improve Chinese coreference resolution. In particular, a simplified semantic role labeling framework is proposed to identify clauses and to detect zero pronouns effectively, and two effective methods (refining syntactic parser and refining learning example generation) are employed to exploit zero pronouns for Chinese coreference resolution. Evaluation on the CoNLL-2012 shared task data set shows that zero pronouns can significantly improve Chinese coreference resolution.
7 0.77732641 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries
8 0.77510464 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
9 0.76971102 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
10 0.76448148 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
11 0.76312405 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
12 0.76222032 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
13 0.75612098 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
14 0.75300109 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
15 0.75258809 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
16 0.7519533 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
17 0.75038463 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
18 0.74744231 160 emnlp-2013-Relational Inference for Wikification
19 0.74670321 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
20 0.74459636 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery