acl acl2010 acl2010-1 knowledge-graph by maker-knowledge-mining

1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."


Source: pdf

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu l ino s Abstract We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. [sent-7, score-0.327]

2 We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful. [sent-9, score-1.112]

3 1 Introduction Much of the work in the field of Natural Language Processing is founded on an assumption of semantic compositionality: that there are identifiable, separable components of an unspecified inference process that will develop as research in NLP progresses. [sent-10, score-0.103]

4 Tasks such as Named Entity and coreference resolution, syntactic and shallow semantic parsing, and information and relation extraction have been identified as worthwhile tasks and pursued by numerous researchers. [sent-11, score-0.19]

5 While many have (nearly) immediate application to real world tasks like search, many are also motivated by their potential contribution to more ambitious Natural Language tasks. [sent-12, score-0.09]

6 It is clear that the components/tasks identified so far do not suffice in themselves to solve tasks requiring more complex rea- soning and synthesis of information; many other tasks must be solved to achieve human-like performance on tasks such as Question Answering. [sent-13, score-0.256]

7 But there is no clear process for identifying potential tasks (other than consensus by a sufficient number of researchers), nor for quantifying their potential contribution to existing NLP tasks, let alone to Natural Language Understanding. [sent-14, score-0.125]

8 Recent “grand challenges” such as Learning by Reading, Learning To Read, and Machine Reading are prompting more careful thought about the way these tasks relate, and what tasks must be solved in order to understand text sufficiently well to reliably reason with it. [sent-15, score-0.225]

9 This is an appropriate time to consider a systematic process for identifying semantic analysis tasks relevant to natural language understanding, and for assessing their potential impact on NLU system performance. [sent-16, score-0.179]

10 The RTE task has been designed specifically to exercise textual inference capabilities, in a format that would make RTE systems potentially useful components in other “deep” NLP tasks such as Question Answering and Machine Translation. [sent-27, score-0.279]

11 Determining the correct label for a single textual entailment example requires human analysts to make many smaller, localized decisions which may depend on each other. [sent-29, score-0.475]

12 A broad, carefully conducted effort to identify and annotate such local phenomena in RTE corpora would allow their distributions in RTE examples to be quantified, and allow evaluation of NLP solutions in the context of RTE. [sent-30, score-0.614]

13 Such phenomena will almost certainly correspond to elements of linguistic theory; but this approach brings a data-driven approach to focus attention on those phenomena that are well-represented in the RTE corpora, and which can be identified with sufficiently close agreement. [sent-32, score-0.639]

14 At present, it is hard to know what specific capabilities different RTE systems have, and hence, which aspects of successful systems are worth emulating or reusing. [sent-34, score-0.167]

15 through reuse of successful solutions and focus on unresolved problems. [sent-36, score-0.111]

16 We argue that Textual Entailment, as an application that clearly requires sophisticated textual inference to perform well, requires the solution of a range of sub-problems, some familiar and some not yet known. [sent-38, score-0.216]

17 We therefore propose RTE as a promising and worthwhile task for large-scale community involvement, as it motivates the study of many other NLP problems in the context of general textual inference. [sent-39, score-0.202]

18 We use this to motivate a large-scale annotation effort to provide data with the mark-up sufficient to support these goals. [sent-41, score-0.151]

19 To stimulate discussion of suitable annotation and evaluation models, we propose a candidate model, and provide results from a pilot annotation effort (section 3). [sent-42, score-0.323]

20 We argue that such an evaluation and annotation effort can identify relevant subproblems whose solution will benefit not only Textual Entailment but a range of other long-standing NLP tasks, and can stimulate development of new ones. [sent-44, score-0.274]

21 We also show how this data can be used to investigate the behavior of some of the highest-scoring RTE systems from the most recent challenge (section 4). [sent-45, score-0.092]

22 , 2007) to include the additional requirement that systems identify when the Hypothesis contradicts the Text. [sent-52, score-0.093]

23 This operational definition of Textual Entailment avoids commitment to any specific knowledge representation, inference method, or learning approach, thus encouraging application of a wide range of techniques to the problem. [sent-54, score-0.089]

24 1 An Illustrative Example The simple RTE examples in figure 1 (most RTE examples have much longer Texts) illustrate some typical inference capabilities demonstrated by human readers in determining whether one span of text contains the meaning of another. [sent-56, score-0.316]

25 To recognize that Hypothesis 1is entailed by the text, a human reader must recognize that “another company” in the Hypothesis can match “LexCorp”. [sent-57, score-0.161]

26 She must also identify the nominalized relation “purchase”, and determine that “A purchased by B” implies “B acquires A”. [sent-58, score-0.091]

27 To recognize that Hypothesis 2 contradicts the Text, similar steps are required, together with the inference that because the stated purchase price is different in the Text and Hypothesis, but with high probability refers to the same transaction, Hypothesis 2 contradicts the Text. [sent-59, score-0.238]

28 agreement and bottom row shows the number of correct (positive, negative) examples on which the pair of systems agree. [sent-63, score-0.139]

29 Table 2 reports the observed agreement between systems and the lexical baseline in terms of the percentage of examples on which a pair of systems gave the same label. [sent-71, score-0.186]

30 The agreement between most systems and the baseline is about 67%, which suggests that systems are not simply augmented versions of the lexical baseline, and are also distinct from each other in their behaviors. [sent-72, score-0.109]

31 However, it is not possible to objectively assess the role these capabilities play in each system’s performance from the system outputs alone. [sent-78, score-0.128]

32 It is premature, however, to conclude that these resources have little potential impact on RTE system performance: most RTE researchers agree that the real contribution of individual resources is difficult to assess. [sent-82, score-0.179]

33 Various efforts have been made by individual research teams to address specific capabilities that are intuitively required for good RTE performance, such as (de Marneffe et al. [sent-84, score-0.133]

34 , 2008), and the formal treatment of entailment phenomena in (MacCartney and Manning, 2009) depends on and formalizes a divide-and-conquer approach to entailment resolution. [sent-85, score-0.98]

35 To devote real effort to identify and develop such capabilities, researchers must be confident that the resources (and the will! [sent-87, score-0.24]

36 If it were even known what phenomena were relevant to specific entailment examples, it might be possible to more accurately distinguish system capabilities, and promote adoption of successful solutions to sub-problems. [sent-90, score-0.84]

37 Of course, if examples were also annotated with explanations in a consistent format, this could form the basis of a new evaluation of the kind essayed in the pilot study in (Giampiccolo et al. [sent-92, score-0.195]

38 Such an effort would require a steady output of RTE examples to form the underpinning of these annotations; and in order to get sufficient data to represent less common, but nonetheless important, phenomena, a large body of data is ultimately needed. [sent-96, score-0.165]

39 A research team interested in annotating a new phenomenon should use examples drawn from the common corpus. [sent-97, score-0.127]

40 Aside from any task-specific gold standard annotation they add to the entailment pairs, they should augment existing explana- tions by indicating in which examples their phenomenon occurs, and at which point in the existing explanation for each example. [sent-98, score-0.559]

41 In fact, this latter effort identifying phenomena relevant to textual inference, marking relevant RTE examples, and generating explanations itself enables other researchers to select from known problems, assess their likely impact, and automatically generate rel– – 1202 evant corpora. [sent-99, score-0.73]

42 To assess the feasibility of annotating RTEoriented local entailment phenomena, we developed an inference model that could be followed by annotators, and conducted a pilot annotation study. [sent-100, score-0.612]

43 We based our initial effort on observations about RTE data we made while participating in RTE challenges, together with intuitive conceptions of the kinds of knowledge that might be available in semi-structured or structured form. [sent-101, score-0.088]

44 In this section, we present our annotation inference model, and the results of our pilot annotation effort. [sent-102, score-0.269]

45 1 Inference Process To identify and annotate RTE sub-phenomena in RTE examples, we need a defensible model for the entailment process that will lead to consistent annotation by different researchers, and to an extensible framework that can accommodate new phenomena as they are identified. [sent-104, score-0.729]

46 We modeled the entailment process as one of manipulating the text and hypothesis to be as similar as possible, by first identifying parts of the text that matched parts of the hypothesis, and then identifying connecting structure. [sent-105, score-0.571]

47 As we followed this procedure for a given example, we marked which entailment phenomena were required for the inference. [sent-109, score-0.655]

48 First, we would identify the arguments “BMI” and “another company” in the Hypothesis as matching “BMI” and “LexCorp” respectively, re- quiring 1) Parent-Sibling to recognize that “LexCorp” can match “company”. [sent-111, score-0.131]

49 We would then observe that the only possible match for the hypothesis argument “for $3. [sent-117, score-0.121]

50 Note that neither explanation mentions the anaphora resolution connecting “they” to “traders”, because it is not strictly required to determine the entailment label. [sent-120, score-0.425]

51 We annotated examples with domains (such as “Work”) for two reasons: to establish whether some phenomena are correlated with particular domains; and to identify domains that are sufficiently well-represented that a knowledge engineering study might be possible. [sent-125, score-0.513]

52 While we did not generate an explicit representation of our entailment process, i. [sent-126, score-0.348]

53 explanations, we tracked which phenomena were strictly required for inference. [sent-128, score-0.307]

54 The phenomena that we considered during annotation are presented in Tables 3, 4, 5, and 6. [sent-134, score-0.347]

55 Two passes were made over the data: the first covered 50 examples from each RTE sub-task, while the second covered an additional 20 examples from each sub-task. [sent-142, score-0.154]

56 Tables 3, 4, 5, and 6 present information about the distribution of the phenomena we tagged, and the inter-annotator agreement (Cohen’s Kappa (Cohen, 1960)) for each. [sent-144, score-0.321]

57 “Occurrence” lists the average percentage of examples labeled with a phenomenon by the two annotators. [sent-145, score-0.127]

58 The results confirmed our initial intuition about some phenomena: for example, that coreference resolution is central to RTE, and that detecting the connecting structure is crucial in discerning negative from positive examples. [sent-159, score-0.103]

59 We also found strong evidence that the difference between contradiction and unknown entailment examples is often due to the behavior of certain relations that either preclude certain other relations holding between the same arguments (for example, winning a contest vs. [sent-160, score-0.531]

60 We found that for some examples, there was more than one way to infer the hypothesis from the text. [sent-162, score-0.099]

61 Typically, for positive examples this involved overlap between phenomena; for example, Coreference might be expected to resolve implicit rela1204 tions induced from appositive structures. [sent-163, score-0.1]

62 In future efforts, annotators should record the entailment steps they used to reach their decision. [sent-165, score-0.377]

63 At a minimum, each inference step must identify the spans of the Text and Hypothesis that are involved and the name of the entailment phenomenon represented; in addition, a partial order over steps must be specified when one inference step requires that another has been completed. [sent-167, score-0.614]

64 Future annotation efforts should also add a category “Other”, to indicate for each example whether the annotator considers the listed entailment phenomena sufficient to identify the label. [sent-168, score-0.764]

65 These, together with specifications that minimize the likely disagreements between different groups of annotators, are processes that must be refined as part of the broad community effort we seek to stimulate. [sent-170, score-0.187]

66 To answer this question, we looked at the top-5 systems and tried to find which phenomena are active in the mistakes they make. [sent-175, score-0.343]

67 (a) Most systems seem to fail on examples that need numeric reasoning to get the entailment decision right. [sent-176, score-0.482]

68 For example, system H got all 10 examples with numeric reasoning wrong. [sent-177, score-0.16]

69 (c) Most systems make errors in examples that have a disconnected or exclusion component (argument/relation). [sent-180, score-0.152]

70 (d) Some phenomena are handled well by certain systems, but not by others. [sent-182, score-0.284]

71 For example, failing to recognize a parent-sibling relation between entities/concepts seems to be one of the top-5 phenomena active in systems E and H. [sent-183, score-0.389]

72 System H also fails to correctly label over 53% of the examples having kinship relation. [sent-184, score-0.104]

73 Which phenomena have strong correlations to the entailment labels among hard examples? [sent-186, score-0.654]

74 Some of the phenomena that strongly correlate with the TE labels on hard examples are: deeper lexical relation between words (ρ = 0. [sent-189, score-0.431]

75 Further, we find that the top-5 systems tend to make mistakes in cases where the lexical approach also makes mistakes (ρ = 0. [sent-192, score-0.115]

76 In order to better understand the system behavior, we wanted to check if we could predict the system behavior based on the phenomena we identified as important in the examples. [sent-196, score-0.399]

77 We learned SVM classifiers over the identified phenomena and the lexical similarity score to predict both the labels and errors systems make for each of the top-5 systems. [sent-197, score-0.372]

78 This indicates that although the identified phenomena are indicative of the system performance, it is probably too simplistic to assume that system behavior can be easily reproduced solely as a disjunction of phenomena present in the examples. [sent-199, score-0.683]

79 Does identifying the phenomena correctly help learn a better TE system? [sent-201, score-0.346]

80 We tried to learn an entailment classifier over the phenomenon identified and the top 5 system outputs. [sent-202, score-0.46]

81 The results show that correctly identifying the named-entity and numeric quantity mis1205 matches improves the overall accuracy significantly. [sent-205, score-0.094]

82 If we further recognize the need for knowledge resources correctly, we can correctly explain the label for 80% of the examples. [sent-206, score-0.107]

83 Adding the entailment and negation features helps us explain the label for 97% of the examples in the annotated corpus. [sent-207, score-0.425]

84 It must be clarified that the results do not show the textual entailment problem itself is solved with 97% accuracy. [sent-208, score-0.534]

85 However, we believe that if a system could recognize key negation phenomena such as Named Entity mismatch, presence of Excluding arguments, etc. [sent-209, score-0.359]

86 correctly and consistently, it could model them as a Contradiction features in the final inference process to significantly improve its overall accuracy. [sent-210, score-0.087]

87 Similarly, identifying and resolving the key entailment phenomena in the examples, would boost the inference process in positive examples. [sent-211, score-0.75]

88 However, significant effort is still required to obtain near-accurate knowledge and linguistic resources. [sent-212, score-0.111]

89 This distinction opens the door to “purposeful”, or goal-directed, inference in a way that may not be relevant to a task studied in isolation. [sent-218, score-0.094]

90 The RTE community seems mainly convinced that incremental advances in local entailment phe- nomena (including application of world knowledge) are needed to make significant progress. [sent-219, score-0.421]

91 They need ways to identify sub-problems of textual inference, and to evaluate those solutions both in isolation and in the context of RTE. [sent-220, score-0.252]

92 RTE system developers are likely to reward well-engineered solutions by adopting them and citing their authors, because such solutions are easier to incorporate into RTE systems. [sent-221, score-0.203]

93 They are also more likely to adopt solutions with established performance levels. [sent-222, score-0.091]

94 We have therefore proposed a suitable annotation effort, to provide the resources necessary for more detailed evaluation of RTE systems. [sent-225, score-0.089]

95 We have presented a linguistically-motivated 1206 analysis of entailment data based on a step-wise procedure to resolve entailment decisions, intended to allow independent annotators to reach consistent decisions, and conducted a pilot annotation effort to assess the feasibility of such a task. [sent-226, score-1.037]

96 We do not claim that our set of domains or phenomena are complete: for example, our illustrative example could be tagged with a domain Mergers and Acquisitions, and a different team of researchers might consider Nominalization Resolution to be a subset of Simple Verb Rules. [sent-227, score-0.413]

97 In other cases, the annotators can simply indicate the phenomena being merged or split (or even replaced). [sent-230, score-0.313]

98 This information will allow other researchers to integrate different annotation sources and maintain a consistent set of annotations. [sent-231, score-0.144]

99 6 Conclusions In this paper, we have presented a case for a broad, long-term effort by the NLP community to coordinate annotation efforts around RTE corpora, and to evaluate solutions to NLP tasks relating to textual inference in the context of RTE. [sent-232, score-0.551]

100 We have proposed an initial annotation scheme to prompt discussion, and through a pilot study, demonstrated that such annotation is both feasible and useful. [sent-234, score-0.209]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('rte', 0.754), ('entailment', 0.348), ('phenomena', 0.284), ('textual', 0.127), ('hypothesis', 0.099), ('solutions', 0.091), ('effort', 0.088), ('pilot', 0.083), ('examples', 0.077), ('capabilities', 0.075), ('nlp', 0.074), ('annotation', 0.063), ('researchers', 0.061), ('inference', 0.06), ('purchase', 0.056), ('recognize', 0.054), ('recognizing', 0.052), ('phenomenon', 0.05), ('coreference', 0.047), ('giampiccolo', 0.045), ('bmi', 0.045), ('lexcorp', 0.045), ('domains', 0.044), ('tasks', 0.044), ('community', 0.043), ('promote', 0.042), ('identified', 0.041), ('dagan', 0.041), ('agreement', 0.037), ('explanations', 0.035), ('challenge', 0.035), ('identifying', 0.035), ('efforts', 0.035), ('mistakes', 0.034), ('contradicts', 0.034), ('identify', 0.034), ('relevant', 0.034), ('resolution', 0.033), ('numeric', 0.032), ('behavior', 0.032), ('worthwhile', 0.032), ('assess', 0.032), ('must', 0.031), ('smart', 0.031), ('nomena', 0.03), ('sammons', 0.03), ('sufficiently', 0.03), ('got', 0.03), ('range', 0.029), ('contradiction', 0.029), ('company', 0.029), ('annotators', 0.029), ('solved', 0.028), ('correctly', 0.027), ('text', 0.027), ('feasibility', 0.026), ('mismatch', 0.026), ('disconnected', 0.026), ('stimulate', 0.026), ('resources', 0.026), ('cohen', 0.026), ('relation', 0.026), ('tac', 0.026), ('broad', 0.025), ('pascal', 0.025), ('systems', 0.025), ('contest', 0.024), ('contradictions', 0.024), ('illustrative', 0.024), ('mirkin', 0.024), ('nlu', 0.024), ('nominalization', 0.024), ('transaction', 0.024), ('verbocean', 0.024), ('solve', 0.024), ('component', 0.024), ('ido', 0.024), ('positive', 0.023), ('components', 0.023), ('required', 0.023), ('ambitious', 0.023), ('potential', 0.023), ('lexical', 0.022), ('hard', 0.022), ('impact', 0.022), ('match', 0.022), ('explanation', 0.021), ('bentivogli', 0.021), ('grand', 0.021), ('reliably', 0.021), ('arguments', 0.021), ('system', 0.021), ('ask', 0.02), ('danilo', 0.02), ('dirt', 0.02), ('ontonotes', 0.02), ('pado', 0.02), ('unspecified', 0.02), ('successful', 0.02), ('allow', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

2 0.48842716 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Author: Milen Kouylekov ; Matteo Negri

Abstract: This paper presents a general-purpose open source package for recognizing Textual Entailment. The system implements a collection of algorithms, providing a configurable framework to quickly set up a working environment to experiment with the RTE task. Fast prototyping of new solutions is also allowed by the possibility to extend its modular architecture. We present the tool as a useful resource to approach the Textual Entailment problem, as an instrument for didactic purposes, and as an opportunity to create a collaborative environment to promote research in the field.

3 0.35281911 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

4 0.21663631 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.

5 0.18455857 121 acl-2010-Generating Entailment Rules from FrameNet

Author: Roni Ben Aharon ; Idan Szpektor ; Ido Dagan

Abstract: Idan Szpektor Ido Dagan Yahoo! Research Department of Computer Science Haifa, Israel Bar-Ilan University idan @ yahoo- inc .com Ramat Gan, Israel dagan @ c s .biu . ac . i l FrameNet is a manually constructed database based on Frame Semantics. It models the semantic Many NLP tasks need accurate knowledge for semantic inference. To this end, mostly WordNet is utilized. Yet WordNet is limited, especially for inference be- tween predicates. To help filling this gap, we present an algorithm that generates inference rules between predicates from FrameNet. Our experiment shows that the novel resource is effective and complements WordNet in terms of rule coverage.

6 0.11061214 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

7 0.08380542 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

8 0.082722887 31 acl-2010-Annotation

9 0.065018684 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

10 0.062440958 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

11 0.060690548 59 acl-2010-Cognitively Plausible Models of Human Language Processing

12 0.060197704 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

13 0.051836744 73 acl-2010-Coreference Resolution with Reconcile

14 0.051118437 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

15 0.050988186 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

16 0.049976654 27 acl-2010-An Active Learning Approach to Finding Related Terms

17 0.047312085 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

18 0.047211707 67 acl-2010-Computing Weakest Readings

19 0.046326812 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives

20 0.045649018 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.166), (1, 0.112), (2, 0.013), (3, -0.137), (4, -0.019), (5, 0.15), (6, 0.04), (7, 0.098), (8, -0.264), (9, -0.238), (10, -0.139), (11, 0.362), (12, -0.138), (13, 0.06), (14, -0.158), (15, -0.197), (16, -0.126), (17, -0.05), (18, -0.025), (19, 0.015), (20, -0.04), (21, -0.049), (22, 0.024), (23, 0.065), (24, -0.122), (25, 0.008), (26, 0.068), (27, 0.0), (28, -0.047), (29, -0.036), (30, -0.043), (31, -0.047), (32, 0.018), (33, 0.067), (34, -0.01), (35, 0.011), (36, -0.082), (37, -0.033), (38, 0.024), (39, 0.008), (40, 0.073), (41, 0.06), (42, -0.046), (43, 0.036), (44, -0.013), (45, 0.02), (46, 0.063), (47, 0.034), (48, 0.017), (49, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95402449 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

2 0.93097532 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Author: Milen Kouylekov ; Matteo Negri

Abstract: This paper presents a general-purpose open source package for recognizing Textual Entailment. The system implements a collection of algorithms, providing a configurable framework to quickly set up a working environment to experiment with the RTE task. Fast prototyping of new solutions is also allowed by the possibility to extend its modular architecture. We present the tool as a useful resource to approach the Textual Entailment problem, as an instrument for didactic purposes, and as an opportunity to create a collaborative environment to promote research in the field.

3 0.72671801 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.

4 0.7056753 121 acl-2010-Generating Entailment Rules from FrameNet

Author: Roni Ben Aharon ; Idan Szpektor ; Ido Dagan

Abstract: Idan Szpektor Ido Dagan Yahoo! Research Department of Computer Science Haifa, Israel Bar-Ilan University idan @ yahoo- inc .com Ramat Gan, Israel dagan @ c s .biu . ac . i l FrameNet is a manually constructed database based on Frame Semantics. It models the semantic Many NLP tasks need accurate knowledge for semantic inference. To this end, mostly WordNet is utilized. Yet WordNet is limited, especially for inference be- tween predicates. To help filling this gap, we present an algorithm that generates inference rules between predicates from FrameNet. Our experiment shows that the novel resource is effective and complements WordNet in terms of rule coverage.

5 0.67748582 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

6 0.30105788 67 acl-2010-Computing Weakest Readings

7 0.29182711 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

8 0.27816138 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

9 0.27354643 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

10 0.25574282 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

11 0.21846981 126 acl-2010-GernEdiT - The GermaNet Editing Tool

12 0.2165657 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

13 0.2156928 31 acl-2010-Annotation

14 0.21368721 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

15 0.20471346 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives

16 0.20416375 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

17 0.20358665 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

18 0.2019892 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

19 0.1892522 259 acl-2010-WebLicht: Web-Based LRT Services for German

20 0.1832978 238 acl-2010-Towards Open-Domain Semantic Role Labeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.014), (25, 0.059), (39, 0.022), (42, 0.039), (44, 0.017), (59, 0.073), (65, 0.14), (73, 0.05), (76, 0.02), (78, 0.048), (80, 0.078), (83, 0.196), (84, 0.034), (97, 0.011), (98, 0.101)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9082942 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

2 0.86894268 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

3 0.81888431 73 acl-2010-Coreference Resolution with Reconcile

Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom

Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.

4 0.80272639 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

5 0.80209482 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Author: Emily Pitler ; Annie Louis ; Ani Nenkova

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference informa- tion, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

6 0.80209231 66 acl-2010-Compositional Matrix-Space Models of Language

7 0.80184853 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

8 0.7992667 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

9 0.79840893 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

10 0.7977531 112 acl-2010-Extracting Social Networks from Literary Fiction

11 0.79613626 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

12 0.7949158 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

13 0.79366404 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

14 0.79241562 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

15 0.79116368 31 acl-2010-Annotation

16 0.7852487 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

17 0.78125185 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

18 0.78056234 256 acl-2010-Vocabulary Choice as an Indicator of Perspective

19 0.77872002 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

20 0.77501887 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information