acl acl2010 acl2010-73 knowledge-graph by maker-knowledge-mining

73 acl-2010-Coreference Resolution with Reconcile

Source: pdf

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. [sent-6, score-1.801]

2 This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. [sent-7, score-0.939]

3 Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. [sent-8, score-0.022]

4 With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. [sent-9, score-1.46]

5 Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. [sent-10, score-2.245]

6 We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets. [sent-11, score-0.683]

7 1 Introduction Noun phrase coreference resolution (or simply coreference resolution) is the problem of identifying all noun phrases (NPs) that refer to the same entity in a text. [sent-12, score-1.407]

8 The problem of coreference resolution is fundamental in the field of natural language processing (NLP) because of its usefulness for other NLP tasks, as well as the theoretical interest in understanding the computational mechanisms involved in government, binding and linguistic reference. [sent-13, score-0.847]

9 Several formal evaluations have been conducted for the coreference resolution task (e. [sent-14, score-0.847]

10 , MUC-6 (1995), ACE NIST (2004)), and the data sets created for these evaluations have become standard benchmarks in the field (e. [sent-16, score-0.056]

11 However, it is still frustratingly difficult to compare results across different coreference resolution systems. [sent-19, score-0.874]

12 Reported coreference resolu- tion scores vary wildly across data sets, evaluation metrics, and system configurations. [sent-20, score-0.583]

13 gov We believe that one root cause of these disparities is the high cost of implementing an end-toend coreference resolution system. [sent-27, score-0.893]

14 Coreference resolution is a complex problem, and successful systems must tackle a variety of non-trivial subproblems that are central to the coreference task e. [sent-28, score-0.852]

15 , mention/markable detection, anaphor identification and that require substantial implementation efforts. [sent-30, score-0.028]

16 As a result, many researchers exploit gold-standard annotations, when available, as a substitute for component technologies to solve these subproblems. [sent-31, score-0.023]

17 Unfortunately, the use of gold standard annotations for key/critical component technologies leads to an unrealistic evaluation setting, and makes it impossible to directly compare results against coreference resolvers that solve all of these subproblems from scratch. [sent-33, score-0.677]

18 Comparison of coreference resolvers is further hindered by the use of several competing (and non-trivial) evaluation measures, and data sets that have substantially different task definitions and annotation formats. [sent-34, score-0.651]

19 Additionally, coreference resolution is a pervasive problem in NLP and many NLP applications could benefit from an effective coreference resolver that can be easily configured and customized. [sent-35, score-1.507]

20 To address these issues, we have created a platform for coreference resolution, called Reconcile, that can serve as a software infrastructure to sup- port the creation of, experimentation with, and evaluation of coreference resolvers. [sent-36, score-1.176]

21 Reconcile was designed with the following seven desiderata in mind: • implement the basic underlying software ar156 UppsalaP,r Sowce ed ein ,g 1s1 o-f16 th Jeu AlyC 2L0 210 1. [sent-37, score-0.11]

22 While several other coreference resolution systems are publicly available (e. [sent-42, score-0.842]

23 (2008)), none meets all seven of these desiderata (see Related Work). [sent-46, score-0.073]

24 Reconcile is a modular software platform that abstracts the basic architecture of most contemporary supervised learningbased coreference resolution systems (e. [sent-47, score-1.014]

25 (2001), Ng and Cardie (2002), Bengtson and Roth (2008)) and achieves performance comparable to the state-of-the-art on several benchmark data sets. [sent-50, score-0.058]

26 Additionally, Reconcile can be easily reconfigured to use different algorithms, features, preprocessing elements, evaluation settings and metrics. [sent-51, score-0.066]

27 In the rest of this paper, we review related work (Section 2), describe Reconcile’s organization and components (Section 3) and show experimental results for Reconcile on six data sets and two evaluation metrics (Section 4). [sent-52, score-0.103]

28 2 Related Work Several coreference resolution systems are currently publicly available. [sent-53, score-0.823]

29 , 2004) is an implementation of the Lappin and Leass’ (1994) Resolution of Anaphora Procedure (RAP). [sent-55, score-0.028]

30 JavaRap resolves only pronouns and, thus, it is not directly comparable to Reconcile. [sent-56, score-0.027]

31 , 2008) (which can be considered a successor of GuiTaR) are both modular systems that target the full coreference resolution task. [sent-58, score-0.857]

32 As such, both systems come close to meeting the majority of the desiderata set forth in Section 1. [sent-59, score-0.054]

33 In addition, the architecture and system components of Reconcile (including a comprehensive set of features that draw on the expertise of state-of-the-art supervised learning approaches, such as Bengtson and Roth (2008)) result in performance closer to the state-of-the-art. [sent-61, score-0.118]

34 Coreference resolution has received much research attention, resulting in an array of approaches, algorithms and features. [sent-62, score-0.304]

35 Reconcile is modeled after typical supervised learning approaches to coreference resolution (e. [sent-63, score-0.865]

36 However, there have been other approaches to coreference resolution, including unsupervised and semi-supervised approaches (e. [sent-67, score-0.583]

37 McCallum and Wellner (2004) and Finley and Joachims (2005)), competition approaches (e. [sent-71, score-0.046]

38 Most of these approaches rely on some notion of pairwise feature-based similarity and can be directly implemented in Reconcile. [sent-76, score-0.022]

39 3 System Description Reconcile was designed to be a research testbed capable of implementing most current approaches to coreference resolution. [sent-77, score-0.617]

40 Reconcile is written in Java, to be portable across platforms, and was designed to be easily reconfigurable with respect to subcomponents, feature sets, parameter settings, etc. [sent-78, score-0.068]

41 The basic architecture of the system includes five major steps. [sent-83, score-0.057]

42 Starting with a corpus of documents together with a manually annotated coreference resolution answer key1, Reconcile performs 1Only required during training. [sent-84, score-0.823]

43 All of the extractors utilize a syntactic parse of the text and the output of a Named Entity (NE) extractor, but extract different constructs as specialized in the corresponding definition. [sent-95, score-0.028]

44 The NP extractors successfully recognize about 95% of the NPs in the MUC and ACE gold standards. [sent-96, score-0.028]

45 Using annotations produced during preprocessing, Reconcile produces feature vectors for pairs of NPs. [sent-99, score-0.025]

46 Reconcile includes over 80 features, inspired by other successful coreference resolution systems such as Soon et al. [sent-101, score-0.823]

47 Reconcile learns a classifier that operates on feature vectors representing Table 1: Preprocessing components available in Reconcile. [sent-105, score-0.078]

48 A clustering algorithm consolidates the predictions output by the classifier and forms the final set of coreference clusters (chains). [sent-109, score-0.585]

49 Finally, during testing Reconcile runs scoring algorithms that compare the chains produced by the system to the goldstandard chains in the answer key. [sent-112, score-0.131]

50 Each of the five steps above can invoke different components. [sent-113, score-0.018]

51 Reconcile’s modularity makes it 2Some structured coreference resolution algorithms (e. [sent-114, score-0.843]

52 , McCallum and Wellner (2004) and Finley and Joachims (2005)) combine the classification and clustering steps above. [sent-116, score-0.046]

53 ),dk21w90it5o)lk8 Table 2: Available implementations for different modules available in Reconcile. [sent-119, score-0.04]

54 easy for new components to be implemented and existing ones to be removed or replaced. [sent-120, score-0.064]

55 Reconcile’s standard distribution comes with a comprehensive set of implemented components those available for steps 2–5 are shown in Table 2. [sent-121, score-0.078]

56 Only about 15% of the code is concerned with running existing components in the preprocessing step, while the rest deals with NP extraction, implementations of features, clustering algorithms and scorers. [sent-123, score-0.128]

57 More details about Recon– cile’s architecture and available components and features can be found in Stoyanov et al. [sent-124, score-0.117]

58 1 Data Sets Reconcile incorporates the six most commonly used coreference resolution data sets, two from the MUC conferences (MUC-6, 1995; MUC-7, 1997) and four from the ACE Program (NIST, 2004). [sent-127, score-0.881]

59 Performance is evaluated according to the B3 and MUC scoring metrics. [sent-131, score-0.051]

60 2 The Reconcile2010 Configuration Reconcile can be easily configured with different algorithms for markable detection, anaphoricity determination, feature extraction, etc. [sent-133, score-0.14]

61 to differentiate it from the general Reconcile2010 is configured using the following components: 1. [sent-136, score-0.062]

62 For all data sets, scores are higher than MUC scores. [sent-147, score-0.022]

63 B3 The MUC score is highest for the MUC6 data set, while B3 scores are higher for the ACE data sets as compared to the MUC data sets. [sent-148, score-0.054]

64 Due to the difficulties outlined in Section 1, results for Reconcile presented here are directly comparable only to a limited number of scores reported in the literature. [sent-149, score-0.049]

65 The bottom three rows of Table 3 list these comparable scores, which show that Reconcile2010 exhibits state-ofthe-art performance for supervised learning-based coreference resolvers. [sent-150, score-0.586]

66 A more detailed study of Reconcile-based coreference resolution systems in different evaluation scenarios can be found in Stoyanov et al. [sent-151, score-0.823]

67 5 Conclusions Reconcile is a general architecture for coreference resolution that can be used to easily create various coreference resolvers. [sent-153, score-1.446]

68 Reconcile provides broad support for experimentation in coreference resolution, including implementation of the basic architecture of contemporary state-of-the-art coreference systems and a variety of individual modules employed in these systems. [sent-154, score-1.261]

69 Additionally, Reconcile handles all of the formatting and scoring peculiarities of the most widely used coreference resolution data sets (those created as part of the MUC and ACE conferences) and, thus, allows for easy implementation and evaluation across these data sets. [sent-155, score-0.979]

70 We hope that Reconcile will support experimental research in coreference resolution and provide a state-of-the-art coreference resolver for both researchers and application developers. [sent-156, score-1.418]

71 We believe that in this way Reconcile will facilitate meaningful and consistent comparisons of coreference resolution systems. [sent-157, score-0.845]

72 The full Reconcile release is available for download at http : / /www . [sent-158, score-0.019]

73 424– – – – – – Table 3: Scores for Reconcile on six data sets and scores for comparable coreference systems. [sent-172, score-0.65]

74 A mention-synchronous coreference resolution algorithm based on the bell tree. [sent-256, score-0.823]

75 A general-purpose, off-the-shelf anaphora resolution module: implementation and preliminary evaluation. [sent-297, score-0.347]

76 A public reference implementation of the rap anaphora resolution algorithm. [sent-306, score-0.38]

77 Conundrums in noun phrase coreference resolution: Making sense of the state-of-the-art. [sent-321, score-0.566]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('reconcile', 0.666), ('coreference', 0.539), ('resolution', 0.284), ('muc', 0.121), ('ace', 0.088), ('nps', 0.086), ('cardie', 0.073), ('stoyanov', 0.07), ('bengtson', 0.066), ('resolvers', 0.062), ('configured', 0.062), ('livermore', 0.062), ('bart', 0.059), ('architecture', 0.057), ('resolver', 0.056), ('soon', 0.055), ('desiderata', 0.054), ('scoring', 0.051), ('versley', 0.05), ('opennlp', 0.047), ('finley', 0.046), ('substituting', 0.046), ('poesio', 0.046), ('qiu', 0.044), ('ut', 0.043), ('contemporary', 0.042), ('gilbert', 0.042), ('lnl', 0.041), ('components', 0.041), ('ah', 0.04), ('preprocessing', 0.039), ('implementing', 0.037), ('buttler', 0.036), ('lappin', 0.036), ('anaphora', 0.035), ('experimentation', 0.035), ('modular', 0.034), ('javarap', 0.033), ('rap', 0.033), ('gov', 0.033), ('sets', 0.032), ('benchmark', 0.031), ('anaphoricity', 0.031), ('guitar', 0.031), ('wellner', 0.031), ('ng', 0.031), ('chains', 0.03), ('six', 0.03), ('subproblems', 0.029), ('kabadjov', 0.029), ('frustratingly', 0.029), ('roth', 0.029), ('implementation', 0.028), ('clustering', 0.028), ('conferences', 0.028), ('extractors', 0.028), ('noun', 0.027), ('message', 0.027), ('cornell', 0.027), ('luo', 0.027), ('np', 0.027), ('comparable', 0.027), ('easily', 0.027), ('annotations', 0.025), ('infrastructure', 0.025), ('understanding', 0.024), ('competition', 0.024), ('evaluations', 0.024), ('easy', 0.023), ('mccallum', 0.023), ('substitute', 0.023), ('lawrence', 0.023), ('facilitate', 0.022), ('unrealistic', 0.022), ('across', 0.022), ('scores', 0.022), ('approaches', 0.022), ('modules', 0.021), ('determination', 0.021), ('edu', 0.021), ('platform', 0.02), ('riloff', 0.02), ('supervised', 0.02), ('algorithms', 0.02), ('seven', 0.019), ('java', 0.019), ('designed', 0.019), ('available', 0.019), ('yang', 0.019), ('named', 0.019), ('nips', 0.019), ('joachims', 0.019), ('software', 0.018), ('haghighi', 0.018), ('classifier', 0.018), ('entity', 0.018), ('steps', 0.018), ('hindered', 0.018), ('bree', 0.018), ('cile', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 73 acl-2010-Coreference Resolution with Reconcile

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

2 0.53422922 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

Author: Vincent Ng

Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.

3 0.35266161 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

Author: Marta Recasens ; Eduard Hovy

Abstract: This paper explores the effect that different corpus configurations have on the performance of a coreference resolution system, as measured by MUC, B3, and CEAF. By varying separately three parameters (language, annotation scheme, and preprocessing information) and applying the same coreference resolution system, the strong bonds between system and corpus are demonstrated. The experiments reveal problems in coreference resolution evaluation relating to task definition, coding schemes, and features. They also ex- pose systematic biases in the coreference evaluation metrics. We show that system comparison is only possible when corpus parameters are in exact agreement.

4 0.26336932 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.

5 0.21351071 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

6 0.17345537 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue

7 0.16504739 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

8 0.1556976 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

9 0.094004601 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

10 0.080123156 28 acl-2010-An Entity-Level Approach to Information Extraction

11 0.079230882 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

12 0.057168253 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

13 0.056608003 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

14 0.054092798 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

15 0.051836744 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

16 0.043577578 139 acl-2010-Identifying Generic Noun Phrases

17 0.040519234 169 acl-2010-Learning to Translate with Source and Target Syntax

18 0.035129488 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

19 0.034854792 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

20 0.03477918 81 acl-2010-Decision Detection Using Hierarchical Graphical Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.135), (1, 0.108), (2, 0.01), (3, -0.311), (4, -0.186), (5, 0.457), (6, 0.006), (7, 0.023), (8, 0.049), (9, 0.167), (10, 0.074), (11, -0.073), (12, 0.02), (13, -0.097), (14, 0.045), (15, -0.024), (16, -0.018), (17, -0.096), (18, -0.123), (19, 0.007), (20, -0.034), (21, -0.001), (22, 0.002), (23, -0.088), (24, -0.007), (25, 0.015), (26, -0.033), (27, -0.035), (28, -0.038), (29, 0.01), (30, -0.06), (31, -0.06), (32, -0.002), (33, 0.025), (34, -0.037), (35, -0.024), (36, -0.056), (37, -0.037), (38, 0.028), (39, 0.062), (40, 0.007), (41, -0.028), (42, 0.078), (43, 0.045), (44, 0.015), (45, -0.03), (46, 0.01), (47, -0.014), (48, -0.079), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9832359 73 acl-2010-Coreference Resolution with Reconcile

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

2 0.92941099 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

Author: Vincent Ng

3 0.89270413 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

Author: Marta Recasens ; Eduard Hovy

4 0.81881607 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

5 0.50526172 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue

Author: Ryu Iida ; Syumpei Kobayashi ; Takenobu Tokunaga

Abstract: This paper proposes an approach to reference resolution in situated dialogues by exploiting extra-linguistic information. Recently, investigations of referential behaviours involved in situations in the real world have received increasing attention by researchers (Di Eugenio et al., 2000; Byron, 2005; van Deemter, 2007; Spanger et al., 2009). In order to create an accurate reference resolution model, we need to handle extra-linguistic information as well as textual information examined by existing approaches (Soon et al., 2001 ; Ng and Cardie, 2002, etc.). In this paper, we incorporate extra-linguistic information into an existing corpus-based reference resolution model, and investigate its effects on refer- ence resolution problems within a corpus of Japanese dialogues. The results demonstrate that our proposed model achieves an accuracy of 79.0% for this task.

6 0.45506343 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

7 0.44447577 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

8 0.43632805 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

9 0.41363838 28 acl-2010-An Entity-Level Approach to Information Extraction

10 0.35678571 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

11 0.30579311 139 acl-2010-Identifying Generic Noun Phrases

12 0.25167432 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

13 0.19267718 81 acl-2010-Decision Detection Using Hierarchical Graphical Models

14 0.17161615 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

15 0.16713773 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

16 0.16427648 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

17 0.13564566 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

18 0.13514952 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

19 0.12818676 59 acl-2010-Cognitively Plausible Models of Human Language Processing

20 0.12417803 61 acl-2010-Combining Data and Mathematical Models of Language Change

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.061), (39, 0.016), (42, 0.066), (57, 0.183), (59, 0.073), (73, 0.049), (78, 0.023), (79, 0.02), (80, 0.024), (83, 0.216), (84, 0.017), (88, 0.03), (98, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88469851 73 acl-2010-Coreference Resolution with Reconcile

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

2 0.80153054 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

3 0.78923875 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

Author: Marta Recasens ; Eduard Hovy

4 0.78826547 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Author: Emily Pitler ; Annie Louis ; Ani Nenkova

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference informa- tion, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

5 0.78155279 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

Author: Jenny Rose Finkel ; Christopher D. Manning

Abstract: One of the main obstacles to producing high quality joint models is the lack of jointly annotated data. Joint modeling of multiple natural language processing tasks outperforms single-task models learned from the same data, but still underperforms compared to single-task models learned on the more abundant quantities of available single-task annotated data. In this paper we present a novel model which makes use of additional single-task annotated data to improve the performance of a joint model. Our model utilizes a hierarchical prior to link the feature weights for shared features in several single-task models and the joint model. Experiments on joint parsing and named entity recog- nition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data.

6 0.78081089 112 acl-2010-Extracting Social Networks from Literary Fiction

7 0.78006983 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

8 0.77140689 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

9 0.77040344 256 acl-2010-Vocabulary Choice as an Indicator of Perspective

10 0.76699239 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

11 0.76675534 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

12 0.75896811 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

13 0.75764453 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

14 0.75612557 31 acl-2010-Annotation

15 0.75352818 233 acl-2010-The Same-Head Heuristic for Coreference

16 0.74194968 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

17 0.74131364 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

18 0.74102265 81 acl-2010-Decision Detection Using Hierarchical Graphical Models

19 0.74091023 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

20 0.73735082 32 acl-2010-Arabic Named Entity Recognition: Using Features Extracted from Noisy Data