acl acl2010 acl2010-233 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Micha Elsner ; Eugene Charniak
Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.
Reference: text
sentIndex sentText sentNum sentScore
1 edu sne Abstract We investigate coreference relationships between NPs with the same head noun. [sent-3, score-0.418]
2 It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. [sent-4, score-0.42]
3 We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision. [sent-5, score-0.333]
4 1 Introduction Full NP coreference, the task of discovering which non-pronominal NPs in a discourse refer to the same entity, is widely known to be challenging. [sent-6, score-0.041]
5 In practice, however, most work focuses on the subtask of linking NPs with different head words. [sent-7, score-0.217]
6 Decisions involving NPs with the same head word have not attracted nearly as much attention, and many systems, especially unsupervised ones, operate under the assumption that all same-head pairs corefer. [sent-8, score-0.334]
7 This is by no means always the case– there are several systematic exceptions to the rule. [sent-9, score-0.034]
8 In this paper, we show that these exceptions are fairly common, and describe an unsupervised system which learns to distinguish them from coreferent same-head pairs. [sent-10, score-0.311]
9 Primarily, this is because they are a comparatively easy subtask in a notoriously difficult area; Stoyanov et al. [sent-12, score-0.043]
10 (2009) shows that, among NPs headed by common nouns, those which have an exact match earlier in the document are the easiest to resolve (variant MUC score . [sent-13, score-0.146]
11 53), by far the worst performance is on those without any match at all (. [sent-15, score-0.042]
12 This effect is magnified by most popular metrics for coreference, which reward finding links within large clusters more than they punish proposing spurious links, making it hard to improve performance by linking conservatively. [sent-17, score-0.264]
13 Systems that use gold mention boundaries (the locations of NPs marked by annotators)1 have even less need to worry about same-head relationships, since most NPs which disobey the conventional assumption are not marked as mentions. [sent-18, score-0.396]
14 In this paper, we count how often same-head pairs fail to corefer in the MUC-6 corpus, showing that gold mention detection hides most such pairs, but more realistic detection finds large numbers. [sent-19, score-0.566]
15 We also present an unsupervised generative model which learns to make certain samehead pairs non-coreferent. [sent-20, score-0.291]
16 The model is based on the idea that pronoun referents are likely to be salient noun phrases in the discourse, so we can learn about NP antecedents using pronominal antecedents as a starting point. [sent-21, score-0.461]
17 Since our model links fewer NPs than the baseline, it improves precision but decreases recall. [sent-23, score-0.13]
18 This tradeoff is favorable for CEAF, but not for b3. [sent-24, score-0.056]
19 2 Related work Unsupervised systems specify the assumption of same-head coreference in several ways: by as1Gold mention detection means something slightly different in the ACE corpus, where the system input contains every NP annotated with an entity type. [sent-25, score-0.481]
20 (These three systems, perhaps not coincidentally, use gold mentions. [sent-29, score-0.112]
21 ) An exception is Ng (2008), who points out that head identity is not an entirely reliable cue and instead uses exact string match (minus deter- miners) for common NPs and an alias detection system for proper NPs. [sent-30, score-0.292]
22 This work uses mentions extracted with an NP chunker. [sent-31, score-0.227]
23 However, while using exact string match raises precision, many non-matching phrases are still coreferent, so this approach cannot be considered a full solution to the problem. [sent-33, score-0.091]
24 , 2009) attempts to determine the contributions of various categories of NP to coreference scores, and shows (as stated above) that common NPs which partially match an earlier mention are not well resolved by the state-of-the-art RECONCILE system, which uses pairwise classification. [sent-36, score-0.458]
25 They also show that using gold mention boundaries makes the coreference task substantially easier, and argue that this experimental setting is “rather unrealistic”. [sent-37, score-0.574]
26 3 Descriptive study: MUC-6 We begin by examining how often non-same-head pairs appear in the MUC-6 coreference dataset. [sent-38, score-0.415]
27 To do so, we compare two artificial coreference systems: the link-all strategy links all, and only, full (non-pronominal) NP pairs with the same head which occur within 10 sentences of one another. [sent-39, score-0.546]
28 We compare our results to the gold standard using two metrics. [sent-41, score-0.112]
29 b3(Bagga and Baldwin, 1998) is a standard metric which calculates a precision and recall for each mention. [sent-42, score-0.06]
30 The mention CEAF (Luo, 2005) constructs a maximum-weight bipar2The choice of 10 sentences as the window size captures most, but not all, of the available recall. [sent-43, score-0.142]
31 Using nouns mention detection, it misses 117 possible same-head links, or about 10%. [sent-44, score-0.245]
32 However, precision drops further as the window size increases. [sent-45, score-0.06]
33 tite matching between gold and proposed clusters, then gives the percentage of entities whose gold label and proposed label match. [sent-46, score-0.224]
34 b3 gives more weight to errors involving larger clusters (since these lower scores for several mentions at once); for mention CEAF, all mentions are weighted equally. [sent-47, score-0.668]
35 The gold mentions method takes only mentions marked by annotators. [sent-50, score-0.612]
36 The nps method takes all base noun phrases detected by the parser. [sent-51, score-0.636]
37 Finally, the nouns method takes all nouns, even those that do not head NPs; this method maximizes recall, since it does not exclude prenominals in phrases like “a Bush spokesman”. [sent-52, score-0.217]
38 ) For each experimental setting, we show the number of mentions detected, and how many of them are linked to some antecedent by the system. [sent-54, score-0.322]
39 b3 shows a large drop in precision when all same-head pairs are linked; in fact, in the nps and nouns settings, only about halfthe same-headed NPs are actually coreferent (864 real links, 1592 pairs for nps). [sent-56, score-1.052]
40 This demonstrates that non-coreferent same-head pairs not only occur, but are actually rather common in the dataset. [sent-57, score-0.123]
41 The drop in precision is much less obvious in the gold mentions setting, however; most unlinked same-head pairs are not annotated as mentions in the gold data, which is one reason why systems run in this experimental setting can afford to ignore them. [sent-58, score-0.904]
42 Improperly linking same-head pairs causes a loss in precision, but scores are dominated by recall3. [sent-59, score-0.15]
43 Thus, reporting b3 helps to mask the impact of these pairs when examining the final f-score. [sent-60, score-0.141]
44 39 pairs denoted different entities (“recent employees” vs “employees who have worked for longer”) disambiguated by modifiers or sometimes by discourse position. [sent-62, score-0.174]
45 The next largest group (24) consists of time and measure phrases like “ten miles”. [sent-63, score-0.049]
46 12 pairs refer to parts or quantities 3This bias is exaggerated for systems which only link same-head pairs, but continues to apply to real systems; for instance (Haghighi and Klein, 2009) has a b3 precision of 84 and recall of 67. [sent-64, score-0.219]
47 34 MentionsLinkedb3prrecFmention CEAF Gold mentions LAOirlnaigkcnleamlent1 19 92 29 91 41 968542N891P03s0. [sent-65, score-0.227]
48 Gold mentions leave little room for improvement between baseline and oracle; detecting more mentions widens the gap between them. [sent-78, score-0.485]
49 With realistic mention detection, precision and CEAF scores improve over baselines, while recall and f-scores drop. [sent-79, score-0.264]
50 involving a generic The remaining proper 9 4 noun phrases headed by Inc. [sent-88, score-0.204]
51 To define our generative model, we assume that the parse trees for the entire document D are given, except for the subtrees with root nonterminal NP, denoted ni, which our system will generate. [sent-91, score-0.105]
52 These subtrees are related by a hidden set of alignments, ai, which link each NP to another NP (which we call a generator) appearing somewhere before it in the document, or to a null antecedent. [sent-92, score-0.161]
53 The generative process fills in all the NP nodes in order, from left to right. [sent-94, score-0.039]
54 When deciding on a generator for NP ni, we can extract features characterizing its relationship to a potential generator gj. [sent-96, score-0.19]
55 These fea- tures, which we denote f(ni, gj , D), may depend on their relative position in the document D, and on any features of gj, since we have already generated its tree. [sent-97, score-0.277]
56 As usual for IBM models, we learn using EM, and we need to start our alignment function off with a good initial set of parameters. [sent-99, score-0.056]
57 Since antecedents of NPs and pronouns (both salient NPs) often occur in similar syntactic environments, we use an alignment function for pronoun coreference as a starting point. [sent-100, score-0.612]
58 This alignment can be learned from raw data, making our approach unsupervised. [sent-101, score-0.056]
59 We take the pronoun model of Charniak and Elsner (2009)4 as our starting point. [sent-102, score-0.152]
60 Then our alignment (parameterized by feature weights w) is: p(ai = j |G, D) ∝ exp(f(ni, gj , D) • w) The weights w are learned by gradient descent on the log-likelihood. [sent-104, score-0.367]
61 To use this model within EM, we alternate an E-step where we calculate the expected alignments E[ai = j], then an Mstep where we run gradient descent. [sent-105, score-0.034]
62 (We have also had some success with stepwise EM as in (Liang and Klein, 2009), but this requires some tuning to work properly. [sent-106, score-0.031]
63 35 As features, we take the same features as Charniak and Elsner (2009): sentence and word-count distance between ni and gj, sentence position of each, syntactic role of each, and head type of gj (proper, common or pronoun). [sent-111, score-0.62]
64 We designed this feature set to distinguish prominent NPs in the discourse, and also to be able to detect abstract or partitive phrases by examining modifiers and determiners. [sent-113, score-0.176]
65 To produce full NPs and learn same-head coreference, we focus on learning a good alignment using the pronoun model as a starting point. [sent-114, score-0.208]
66 For translation, we use a trivial model, p(ni |gai ) = 1 if the two have the same head, and 0 otherwise, except for the null antecedent, which draws heads from a multinomial distribution over words. [sent-115, score-0.086]
67 While we could learn an alignment and then treat all generators as antecedents, so that only NPs aligned to the null antecedent were not labeled coreferent, in practice this model would align nearly all the same-head pairs. [sent-116, score-0.293]
68 Therefore, our model is actually a mixture of two IBM models, pC and pN, where pC produces NPs with antecedents and pN produces pairs that share a head, but are not coreferent. [sent-118, score-0.22]
69 In all experimental settings, the model improves precision over the baseline while decreasing recall– that is, it misses some legitimate coreferent pairs while correctly excluding many of the spurious ones. [sent-125, score-0.443]
70 Because of the precision-recall tradeoff at which the systems operate, this results in reduced b3 and link F. [sent-126, score-0.098]
71 However, for the nps and nouns settings, where the parser is responsible for finding mentions, the tradeoff is positive for the CEAF metrics. [sent-127, score-0.632]
72 For instance, in the nps setting, it improves over baseline by 57%. [sent-128, score-0.521]
73 As expected, the model does poorly in the gold mentions setting, doing worse than baseline on both metrics. [sent-129, score-0.339]
74 Although it is possible to get very high precision in this setting, the model is far too conservative, linking less than half of the available mentions to anything, when in fact about 60% of them are coreferent. [sent-130, score-0.348]
75 As we explain above, this experimental setting makes it mostly unnecessary to worry about non-coreferent same-head pairs because the MUC-6 annotators don’t often mark them. [sent-131, score-0.185]
76 6 Conclusions While same-head pairs are easier to resolve than same-other pairs, they are still non-trivial and deserve further attention in coreference research. [sent-132, score-0.423]
77 It is also important to report results using a realistic mention detector as well as gold mentions. [sent-134, score-0.316]
78 An Expecta- tion Maximization approach to pronoun resolution. [sent-147, score-0.121]
79 Simple coreference resolution with rich syntactic and semantic features. [sent-160, score-0.329]
80 Conundrums in noun phrase coreference resolution: Making sense of the stateof-the-art. [sent-187, score-0.307]
wordName wordTfidf (topN-words)
[('nps', 0.521), ('gj', 0.277), ('coreference', 0.274), ('ni', 0.23), ('mentions', 0.227), ('coreferent', 0.176), ('ceaf', 0.158), ('mention', 0.142), ('np', 0.128), ('pronoun', 0.121), ('head', 0.113), ('gold', 0.112), ('antecedents', 0.097), ('antecedent', 0.095), ('generator', 0.095), ('elsner', 0.092), ('pairs', 0.089), ('charniak', 0.088), ('null', 0.086), ('haghighi', 0.078), ('pn', 0.078), ('stoyanov', 0.072), ('links', 0.07), ('detection', 0.065), ('oracle', 0.065), ('unsupervised', 0.062), ('realistic', 0.062), ('samehead', 0.062), ('linking', 0.061), ('precision', 0.06), ('ibm', 0.058), ('pt', 0.058), ('generators', 0.056), ('alignment', 0.056), ('tradeoff', 0.056), ('resolution', 0.055), ('nouns', 0.055), ('em', 0.054), ('pc', 0.053), ('employees', 0.053), ('qp', 0.053), ('vadas', 0.053), ('examining', 0.052), ('worry', 0.05), ('phrases', 0.049), ('misses', 0.048), ('micha', 0.046), ('bagga', 0.046), ('marked', 0.046), ('setting', 0.046), ('headed', 0.044), ('mcclosky', 0.044), ('modifiers', 0.044), ('subtask', 0.043), ('eugene', 0.042), ('match', 0.042), ('link', 0.042), ('poon', 0.041), ('discourse', 0.041), ('proper', 0.041), ('learns', 0.039), ('spurious', 0.039), ('klein', 0.039), ('generative', 0.039), ('cherry', 0.037), ('involving', 0.037), ('clusters', 0.035), ('ai', 0.034), ('actually', 0.034), ('exceptions', 0.034), ('nothing', 0.034), ('gradient', 0.034), ('noun', 0.033), ('salient', 0.033), ('nonterminal', 0.033), ('aria', 0.033), ('operate', 0.033), ('subtrees', 0.033), ('detected', 0.033), ('honolulu', 0.032), ('plays', 0.031), ('starting', 0.031), ('punish', 0.031), ('alias', 0.031), ('corefer', 0.031), ('spokesman', 0.031), ('sne', 0.031), ('bursty', 0.031), ('legitimate', 0.031), ('afford', 0.031), ('sourcelanguage', 0.031), ('easiest', 0.031), ('stepwise', 0.031), ('partitive', 0.031), ('deserve', 0.031), ('widens', 0.031), ('anaphora', 0.03), ('resolve', 0.029), ('real', 0.028), ('magnified', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 233 acl-2010-The Same-Head Heuristic for Coreference
Author: Micha Elsner ; Eugene Charniak
Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.
2 0.4648225 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
Author: Vincent Ng
Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.
3 0.33029714 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
Author: Marta Recasens ; Eduard Hovy
Abstract: This paper explores the effect that different corpus configurations have on the performance of a coreference resolution system, as measured by MUC, B3, and CEAF. By varying separately three parameters (language, annotation scheme, and preprocessing information) and applying the same coreference resolution system, the strong bonds between system and corpus are demonstrated. The experiments reveal problems in coreference resolution evaluation relating to task definition, coding schemes, and features. They also ex- pose systematic biases in the coreference evaluation metrics. We show that system comparison is only possible when corpus parameters are in exact agreement.
4 0.26336932 73 acl-2010-Coreference Resolution with Reconcile
Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom
Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.
5 0.24104145 28 acl-2010-An Entity-Level Approach to Information Extraction
Author: Aria Haghighi ; Dan Klein
Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.
6 0.18132792 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
7 0.16668795 139 acl-2010-Identifying Generic Noun Phrases
8 0.15161325 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing
9 0.1391909 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
10 0.1261656 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization
11 0.11820383 133 acl-2010-Hierarchical Search for Word Alignment
12 0.1128782 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
13 0.098828718 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
14 0.095413469 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
15 0.092728905 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
16 0.083288819 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
17 0.081088357 169 acl-2010-Learning to Translate with Source and Target Syntax
18 0.077237979 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
19 0.077217706 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
20 0.076843783 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
topicId topicWeight
[(0, -0.23), (1, 0.042), (2, 0.03), (3, -0.273), (4, -0.166), (5, 0.424), (6, -0.005), (7, 0.032), (8, 0.145), (9, 0.143), (10, 0.054), (11, -0.144), (12, -0.042), (13, -0.104), (14, 0.078), (15, -0.039), (16, -0.0), (17, -0.081), (18, -0.068), (19, -0.001), (20, -0.037), (21, 0.024), (22, 0.037), (23, -0.056), (24, 0.007), (25, -0.045), (26, -0.005), (27, 0.006), (28, -0.018), (29, -0.026), (30, -0.005), (31, -0.05), (32, 0.022), (33, 0.009), (34, -0.023), (35, -0.057), (36, -0.028), (37, -0.007), (38, -0.033), (39, 0.037), (40, 0.014), (41, -0.025), (42, 0.053), (43, 0.031), (44, -0.032), (45, 0.01), (46, 0.005), (47, 0.023), (48, 0.074), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.96383381 233 acl-2010-The Same-Head Heuristic for Coreference
Author: Micha Elsner ; Eugene Charniak
Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.
2 0.92420077 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
Author: Marta Recasens ; Eduard Hovy
Abstract: This paper explores the effect that different corpus configurations have on the performance of a coreference resolution system, as measured by MUC, B3, and CEAF. By varying separately three parameters (language, annotation scheme, and preprocessing information) and applying the same coreference resolution system, the strong bonds between system and corpus are demonstrated. The experiments reveal problems in coreference resolution evaluation relating to task definition, coding schemes, and features. They also ex- pose systematic biases in the coreference evaluation metrics. We show that system comparison is only possible when corpus parameters are in exact agreement.
3 0.91888654 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
Author: Vincent Ng
Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.
4 0.91651118 73 acl-2010-Coreference Resolution with Reconcile
Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom
Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.
5 0.60502392 28 acl-2010-An Entity-Level Approach to Information Extraction
Author: Aria Haghighi ; Dan Klein
Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.
6 0.51989275 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
7 0.50778502 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
8 0.48558694 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
9 0.44898078 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing
10 0.44119999 139 acl-2010-Identifying Generic Noun Phrases
11 0.37252977 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
12 0.34159079 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization
13 0.30018887 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
14 0.28649867 133 acl-2010-Hierarchical Search for Word Alignment
15 0.27757078 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
16 0.27710018 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
17 0.26146704 130 acl-2010-Hard Constraints for Grammatical Function Labelling
18 0.25428787 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
19 0.2497161 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
20 0.2478565 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
topicId topicWeight
[(14, 0.012), (25, 0.106), (33, 0.012), (42, 0.037), (59, 0.121), (73, 0.04), (78, 0.047), (80, 0.015), (83, 0.169), (84, 0.02), (88, 0.19), (98, 0.129)]
simIndex simValue paperId paperTitle
1 0.93137115 112 acl-2010-Extracting Social Networks from Literary Fiction
Author: David Elson ; Nicholas Dames ; Kathleen McKeown
Abstract: We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials. We derive the networks from dialogue interactions, and thus our method depends on the ability to determine when two characters are in conversation. Our approach involves character name chunking, quoted speech attribution and conversation detection given the set of quotes. We extract features from the social networks and examine their correlation with one another, as well as with metadata such as the novel’s setting. Our results provide evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars. Instead, our results suggest an alternative explanation for differences in social networks.
same-paper 2 0.90840793 233 acl-2010-The Same-Head Heuristic for Coreference
Author: Micha Elsner ; Eugene Charniak
Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.
3 0.86773193 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
4 0.82198656 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
Author: Vincent Ng
Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.
5 0.80789727 73 acl-2010-Coreference Resolution with Reconcile
Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom
Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.
6 0.80189413 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
7 0.80172074 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
8 0.79347235 71 acl-2010-Convolution Kernel over Packed Parse Forest
9 0.79085863 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
10 0.78951347 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."
11 0.7847783 169 acl-2010-Learning to Translate with Source and Target Syntax
12 0.78375584 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
13 0.78352535 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
14 0.7804324 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
15 0.7794013 158 acl-2010-Latent Variable Models of Selectional Preference
16 0.77937382 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
17 0.77926177 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
18 0.77618855 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
19 0.77615952 60 acl-2010-Collocation Extraction beyond the Independence Assumption
20 0.77562863 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans