acl acl2013 acl2013-297 knowledge-graph by maker-knowledge-mining

297 acl-2013-Recognizing Partial Textual Entailment


Source: pdf

Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych

Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The recently suggested idea of partial textual entailment may remedy this problem. [sent-3, score-0.929]

2 We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. [sent-4, score-2.359]

3 Indeed, our results show that these methods are useful for rec- ognizing partial entailment. [sent-5, score-0.15]

4 We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment. [sent-6, score-1.092]

5 1 Introduction Approaches for applied semantic inference over texts gained growing attention in recent years, largely triggered by the textual entailment framework (Dagan et al. [sent-7, score-0.867]

6 Textual entailment is a generic paradigm for semantic inference, where the objective is to recognize whether a textual hypothesis (labeled H) can be inferred from another given text (labeled T). [sent-9, score-0.943]

7 The definition of textual entailment is in some sense strict, in that it requires that H’s meaning be implied by T in its entirety. [sent-10, score-0.82]

8 This means that from an entailment perspective, a text that contains the main ideas of a hypothesis, but lacks a minor detail, is indiscernible from an entirely unrelated text. [sent-11, score-0.621]

9 For example, if T is “muscles move bones”, and H “the main job of muscles is to move bones”, then T does not entail H, and we are left with no sense of how close (T, H) were to entailment. [sent-12, score-0.438]

10 In the related problem of semantic text similarity, gradual measures are already in use. [sent-13, score-0.054]

11 The semantic text similarity challenge in SemEval 2012 Ido Iryna Gurevych§ Dagan† § Ubiquitous Knowledge Processing Lab qCuoitmoupsu Ktenr Sowcileendcgee D Peropacerstmsinengt Technische Universit a¨t Darmstadt (Agirre et al. [sent-14, score-0.082]

12 , 2012) explicitly defined different levels of similarity from 5 (semantic equivalence) to 0 (no relation). [sent-15, score-0.028]

13 For instance, 4 was defined as “the two sentences are mostly equivalent, but some unimportant details differ”, and 3 meant that “the two sentences are roughly equivalent, but some important information differs”. [sent-16, score-0.035]

14 Though this modeling does indeed provide finer-grained notions of similarity, it is not appropriate for semantic inference for two reasons. [sent-17, score-0.093]

15 Secondly, similarity is not sufficiently well-defined for sound semantic inference; for example, “snowdrops bloom in summer” and “snowdrops bloom in winter” may be similar, but have contradictory meanings. [sent-19, score-0.117]

16 All in all, these measures of similarity do not quite capture the gradual relation needed for semantic inference. [sent-20, score-0.103]

17 An appealing approach to dealing with the rigidity of textual entailment, while preserving the more precise nature of the entailment definition, is by breaking down the hypothesis into components, and attempting to recognize whether each one is individually entailed by T. [sent-21, score-0.956]

18 It is called partial textual entailment, because we are only interested in recognizing whether a single element of the hypothesis is entailed. [sent-22, score-0.578]

19 To differentiate the two tasks, we will refer to the original textual entailment task as complete textual entailment. [sent-23, score-1.064]

20 Partial textual entailment was first introduced by Nielsen et al. [sent-24, score-0.799]

21 (2009), who presented a machine learning approach and showed significant improvement over baseline methods. [sent-25, score-0.026]

22 Our goal in this paper is to investigate the idea of partial textual entailment, and assess whether 451 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-28, score-0.361]

23 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 451–45 , existing complete textual entailment methods can be used to recognize it. [sent-30, score-0.904]

24 We assume the facet model presented in SemEval 2013, and adapt existing technologies to the task of recognizing partial entailment (Section 3). [sent-31, score-1.3]

25 , 2009) by evaluating these adapted methods on the new RTE-8 benchmark (Section 4). [sent-33, score-0.046]

26 Partial entailment may also facilitate an alternative divide and conquer approach to complete textual entailment. [sent-34, score-0.865]

27 2 Task Definition In order to tackle partial entailment, we need to find a way to decompose a hypothesis. [sent-36, score-0.155]

28 (2009) defined a model of facets, where each such facet is a pair of words in the hypothesis and the direct semantic relation connecting those two words. [sent-38, score-0.501]

29 For example, in the sentence “the main job of muscles is to move bones”, the pair (muscles, move) represents a facet. [sent-41, score-0.367]

30 While it is not explicitly stated, reading the original sentence indicates that muscles is the agent of move. [sent-42, score-0.227]

31 Formally, the task of recognizing faceted entailment is a binary classification task. [sent-43, score-0.993]

32 Given a text T, a hypothesis H, and a facet within the hypothesis (w1, w2), determine whether the facet is either expressed or unaddressed by the text. [sent-44, score-1.051]

33 The facet (muscles, move) refers to the agent role in H, and is expressed by T. [sent-48, score-0.455]

34 However, the facet (move, bones), which refers to a theme or direct object relation in H, is unaddressed by T. [sent-49, score-0.497]

35 3 Recognizing Faceted Entailment Our goal is to investigate whether existing entailment recognition approaches can be adapted to recognize faceted entailment. [sent-50, score-0.926]

36 Hence, we specified relatively simple decision mechanisms over a set of entailment detection modules. [sent-51, score-0.659]

37 Given a text and a facet, each module reports whether it recognizes entailment, and the decision mechanism then determines the binary class (expressed or unaddressed) accordingly. [sent-52, score-0.109]

38 1 Entailment Modules Current textual entailment systems operate across different linguistic levels, mainly on lexical inference and syntax. [sent-54, score-0.873]

39 We examined three representative modules that reflect these levels: ExactMatch, Lexical Inference, and Syntactic Inference. [sent-55, score-0.036]

40 We then check whether both facet lemmas w1, w2 appear in the text’s bag-of-words. [sent-57, score-0.461]

41 Exact matching was used as a baseline in previous recognizing textual entailment challenges (Bentivogli et al. [sent-58, score-0.988]

42 , 2011), and similar methods of lemmamatching were used as a component in recognizing textual entailment systems (Clark and Harrison, 2010; Shnarch et al. [sent-59, score-0.962]

43 Lexical Inference This feature checks whether both facet words, or semantically related words, appear in T. [sent-61, score-0.439]

44 We use WordNet (Fellbaum, 1998) with the Resnik similarity measure (Resnik, 1995) and count a facet term wi as matched if the similarity score exceeds a certain threshold (0. [sent-62, score-0.463]

45 Both w1 and w2 must match for this module’s entailment decision to be positive. [sent-64, score-0.621]

46 Syntactic Inference This module builds upon the open source1 Bar-Ilan University Textual Entailment Engine (BIUTEE) (Stern and Dagan, 2011). [sent-65, score-0.055]

47 It determines entailment according to the “cost” of generating the hypothesis from the text. [sent-67, score-0.654]

48 BIUTEE has shown state-of-the-art performance on previous recognizing textual entailment challenges (Stern and Dagan, 2012). [sent-69, score-0.962]

49 Since BIUTEE processes dependency trees, both T and the facet must be parsed. [sent-70, score-0.407]

50 BIUTEE can now be given T and P (as the hypothesis), and try to recognize whether the former entails the latter. [sent-78, score-0.1]

51 Given its very high precision, we decided to use this module as an initial filter, and employ the others for classifying the “harder” cases. [sent-84, score-0.036]

52 The challenge focuses on the domain of scholastic quizzes, and attempts to emulate the meticulous marking process that teachers do on a daily basis. [sent-88, score-0.035]

53 Given a question, a student’s response, and a reference answer, the task of student response analysis is to determine whether the student answered correctly. [sent-89, score-0.433]

54 This task can be approximated as a special case of textual entailment; by assigning the student’s answer as T and the reference answer as H, we are basically asking whether one can infer the correct (reference) answer from the student’s response. [sent-90, score-0.426]

55 In this case, H is a reference answer to the question: Q: What is the main job of muscles? [sent-92, score-0.174]

56 T is essentially the student answer, though it is also possible to define T as the union of both the question and the student answer. [sent-93, score-0.339]

57 There were two tracks in the challenge: complete textual entailment (the main task) and partial Unseen Answers Baseline BaseLex BaseSyn Disjunction Majority Unseen Questions Unseen Domains . [sent-95, score-1.016]

58 816 Table 1: Micro-averaged F1 on the faceted SciEntsBank test set. [sent-110, score-0.23]

59 , 2012), which is annotated at facet-level, and provides a convenient test-bed for evaluation of both partial and complete entailment. [sent-113, score-0.196]

60 The test set has 16,263 facet-response pairs based on 5,106 student responses over 15 domains (learning modules). [sent-115, score-0.204]

61 Performance was measured using micro-averaged F1, over three different scenarios: Unseen Answers Classify new answers to ques- tions seen in training. [sent-116, score-0.063]

62 Contains 464 student re- Unseen Questions Classify new answers to questions that were not seen in training, but other questions from the same domain were. [sent-118, score-0.319]

63 Unseen Domains Classify new answers to unseen questions from unseen domains. [sent-120, score-0.301]

64 While BaseLex and BaseSyn improve upon the baseline, they seem to make different mistakes, in particular false positives. [sent-126, score-0.05]

65 453 Unfortunately, our system was the only submission in the partial entailment pilot track of RTE8, so we have no comparisons with other systems. [sent-131, score-0.797]

66 However, the absolute improvement from the exact-match baseline to the more sophisticated Majority is in the same ballpark as that of the best systems in previous recognizing textual entailment challenges. [sent-132, score-0.988]

67 For instance, in the previous recognizing textual entailment challenge (Bentivogli et al. [sent-133, score-0.997]

68 We can therefore conclude that existing approaches for recognizing textual entailment can indeed be adapted for recognizing partial entailment. [sent-137, score-1.305]

69 5 Utilizing Partial Entailment for Recognizing Complete Entailment Encouraged by our results, we ask whether the same algorithms that performed well on the faceted entailment task can be used for recognizing complete textual entailment. [sent-138, score-1.29]

70 We performed an initial experiment that examines this concept and sheds some light on the potential role of partial entailment as a possible facilitator for complete entailment. [sent-139, score-0.816]

71 Aggregate the individual facet results and decide on complete entailment accordingly. [sent-145, score-1.073]

72 Facet Decomposition For this initial investigation, we use the facets provided in SciEntsBank; i. [sent-146, score-0.116]

73 we assume that the step of facet decomposition has already been carried out. [sent-148, score-0.434]

74 When the dataset was created for RTE-8, many facets were extracted automatically, but only a subset was selected. [sent-149, score-0.138]

75 The facet selection process was done manually, as part of the dataset’s annotation. [sent-150, score-0.407]

76 For example, in “the main job of muscles is to move bones”, the facet (job, muscles) was not selected, because it was not critical for answering the question. [sent-151, score-0.774]

77 In addition, we introduce GoldBased that uses the gold annotation of faceted entailment, and thus Unseen Answers Unseen Questions Unseen Domains Baseline Majority GoldBased . [sent-155, score-0.23]

78 712 Table 2: Micro-averaged F1 on the 2-way com- plete entailment SciEntsBank test set. [sent-167, score-0.6]

79 provides a certain upper bound on the performance of determining complete entailment based on facets. [sent-168, score-0.666]

80 Aggregation We chose the simplest sensible aggregation rule to decide on overall entailment: a student answer is classified as correct (i. [sent-169, score-0.237]

81 it entails the reference answer) if it expresses each of the reference answer’s facets. [sent-171, score-0.089]

82 Although this heuristic is logical from a strict entailment perspective, it might yield false negatives on this particular dataset. [sent-172, score-0.676]

83 This happens because tutors may sometimes grade answers as valid even if they omit some less important, or indirectly implied, facets. [sent-173, score-0.083]

84 First, the task of student response analysis is only an approximation of textual entailment, albeit a good one. [sent-177, score-0.41]

85 This discrepancy was also observed by the RTE-8 challenge organizers (Dzikovska et al. [sent-178, score-0.035]

86 The second reason is because some of the original facets were filtered when creating the dataset. [sent-180, score-0.116]

87 This caused both false positives (when important facets were filtered out) and false negatives (when unimportant facets were retained). [sent-181, score-0.355]

88 Our Majority mechanism, which requires that the two underlying methods for partial entailment detection (Lexical Inference and Syntactic Inference) agree on a positive classification, bridges about half the gap from the baseline to the gold based method. [sent-182, score-0.756]

89 This measure is not directly comparable to our facet-based systems, because it did not rely on manually selected facets, and due to some variations in the dataset size (about 20% of the student responses were not included in the pilot task dataset). [sent-184, score-0.252]

90 However, these results may indicate the 454 prospects of using faceted entailment for complete entailment recognition, suggesting it as an attractive research direction. [sent-185, score-1.496]

91 6 Conclusion and Future Work In this paper, we presented an empirical attempt to tackle the problem of partial textual entailment. [sent-186, score-0.329]

92 We demonstrated that existing methods for recognizing (complete) textual entailment can be successfully adapted to this setting. [sent-187, score-0.987]

93 Furthermore, our work focused on a specific decomposition model faceted entailment. [sent-190, score-0.257]

94 Other flavors of partial entailment should be investigated as well. [sent-191, score-0.73]

95 Finally, we examined the possibility of utilizing partial entailment for recognizing complete entailment in a semi-automatic setting, which relied on the manual facet annotation in the RTE-8 dataset. [sent-192, score-1.966]

96 Our – preliminary results suggest that this approach is indeed feasible, and warrant further research on facet-based entailment methods that rely on fullyautomatic facet extraction. [sent-193, score-1.032]

97 SemEval-2012 Task 6: A pilot on semantic textual similarity. [sent-200, score-0.265]

98 Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge. [sent-223, score-1.173]

99 Using information content to evaluate semantic similarity in a taxonomy. [sent-236, score-0.047]

100 Biutee: A modular open-source system for recognizing textual entailment. [sent-249, score-0.362]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('entailment', 0.6), ('facet', 0.407), ('faceted', 0.23), ('muscles', 0.207), ('textual', 0.199), ('recognizing', 0.163), ('student', 0.16), ('partial', 0.13), ('biutee', 0.122), ('nielsen', 0.118), ('facets', 0.116), ('bones', 0.115), ('dzikovska', 0.102), ('unseen', 0.095), ('baselex', 0.092), ('scientsbank', 0.092), ('ido', 0.08), ('dagan', 0.075), ('move', 0.071), ('basesyn', 0.069), ('goldbased', 0.069), ('unaddressed', 0.069), ('job', 0.068), ('complete', 0.066), ('answers', 0.063), ('stern', 0.062), ('rodney', 0.061), ('disjunction', 0.061), ('bentivogli', 0.057), ('answer', 0.055), ('majority', 0.055), ('hypothesis', 0.054), ('exact', 0.052), ('response', 0.051), ('inference', 0.049), ('questions', 0.048), ('pilot', 0.047), ('bestcomplete', 0.046), ('snowdrops', 0.046), ('myroslava', 0.041), ('shnarch', 0.041), ('recognize', 0.039), ('mechanisms', 0.038), ('module', 0.036), ('modules', 0.036), ('gradual', 0.035), ('bloom', 0.035), ('unimportant', 0.035), ('challenge', 0.035), ('semeval', 0.035), ('asher', 0.034), ('entailed', 0.032), ('whether', 0.032), ('false', 0.031), ('reference', 0.03), ('entails', 0.029), ('clark', 0.029), ('danilo', 0.028), ('luisa', 0.028), ('expressed', 0.028), ('similarity', 0.028), ('decomposition', 0.027), ('baseline', 0.026), ('scenarios', 0.026), ('negatives', 0.026), ('indeed', 0.025), ('adapted', 0.025), ('lexical', 0.025), ('decompose', 0.025), ('agirre', 0.024), ('hoa', 0.023), ('responses', 0.023), ('dataset', 0.022), ('fragment', 0.022), ('aggregation', 0.022), ('lemmas', 0.022), ('path', 0.021), ('main', 0.021), ('classify', 0.021), ('harder', 0.021), ('decision', 0.021), ('relation', 0.021), ('implied', 0.021), ('benchmark', 0.021), ('domains', 0.021), ('submission', 0.02), ('sheds', 0.02), ('unstated', 0.02), ('quizzes', 0.02), ('ognizing', 0.02), ('tutors', 0.02), ('mechanism', 0.02), ('resnik', 0.02), ('agent', 0.02), ('semantic', 0.019), ('lab', 0.019), ('essentially', 0.019), ('upon', 0.019), ('strict', 0.019), ('conjunction', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 297 acl-2013-Recognizing Partial Textual Entailment

Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych

Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.

2 0.38790402 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition

Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen

Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.

3 0.17465247 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations

Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki

Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.

4 0.10727467 269 acl-2013-PLIS: a Probabilistic Lexical Inference System

Author: Eyal Shnarch ; Erel Segal-haLevi ; Jacob Goldberger ; Ido Dagan

Abstract: This paper presents PLIS, an open source Probabilistic Lexical Inference System which combines two functionalities: (i) a tool for integrating lexical inference knowledge from diverse resources, and (ii) a framework for scoring textual inferences based on the integrated knowledge. We provide PLIS with two probabilistic implementation of this framework. PLIS is available for download and developers of text processing applications can use it as an off-the-shelf component for injecting lexical knowledge into their applications. PLIS is easily configurable, components can be extended or replaced with user generated ones to enable system customization and further research. PLIS includes an online interactive viewer, which is a powerful tool for investigating lexical inference processes. 1 Introduction and background Semantic Inference is the process by which machines perform reasoning over natural language texts. A semantic inference system is expected to be able to infer the meaning of one text from the meaning of another, identify parts of texts which convey a target meaning, and manipulate text units in order to deduce new meanings. Semantic inference is needed for many Natural Language Processing (NLP) applications. For instance, a Question Answering (QA) system may encounter the following question and candidate answer (Example 1): Q: which explorer discovered the New World? A: Christopher Columbus revealed America. As there are no overlapping words between the two sentences, to identify that A holds an answer for Q, background world knowledge is needed to link Christopher Columbus with explorer and America with New World. Linguistic knowledge is also needed to identify that reveal and discover refer to the same concept. Knowledge is needed in order to bridge the gap between text fragments, which may be dissimilar on their surface form but share a common meaning. For the purpose of semantic inference, such knowledge can be derived from various resources (e.g. WordNet (Fellbaum, 1998) and others, detailed in Section 2.1) in a form which we denote as inference links (often called inference/entailment rules), each is an ordered pair of elements in which the first implies the meaning of the second. For instance, the link ship→vessel can be derived from tshtaen hypernym rkel sahtiiopn→ ovfe Wsseolr cdNanet b. Other applications can benefit from utilizing inference links to identify similarity between language expressions. In Information Retrieval, the user’s information need may be expressed in relevant documents differently than it is expressed in the query. Summarization systems should identify text snippets which convey the same meaning. Our work addresses a generic, application in- dependent, setting of lexical inference. We therefore adopt the terminology of Textual Entailment (Dagan et al., 2006), a generic paradigm for applied semantic inference which captures inference needs of many NLP applications in a common underlying task: given two textual fragments, termed hypothesis (H) and text (T), the task is to recognize whether T implies the meaning of H, denoted T→H. For instance, in a QA application, H reprTe→seHnts. Fthoer question, a innd a T Q a c aanpdpilidcaattei answer. pInthis setting, T is likely to hold an answer for the question if it entails the question. It is challenging to properly extract the needed inference knowledge from available resources, and to effectively utilize it within the inference process. The integration of resources, each has its own format, is technically complex and the quality 97 ProceedingSsof oiaf, th Beu 5lg1asrtia A,n Anuuaglu Mst 4ee-9tin 2g0 o1f3. th ?ec A20ss1o3ci Aastisoonci faotrio Cno fomrp Cuotamtipountaalti Loinnaglu Lisitnigcsu,is patigcess 97–102, Figure 1: PLIS schema - a text-hypothesis pair is processed by the Lexical Integrator which uses a set of lexical resources to extract inference chains which connect the two. The Lexical Inference component provides probability estimations for the validity of each level of the process. ofthe resulting inference links is often unknown in advance and varies considerably. For coping with this challenge we developed PLIS, a Probabilistic Lexical Inference System1 . PLIS, illustrated in Fig 1, has two main modules: the Lexical Integra- tor (Section 2) accepts a set of lexical resources and a text-hypothesis pair, and finds all the lexical inference relations between any pair of text term ti and hypothesis term hj, based on the available lexical relations found in the resources (and their combination). The Lexical Inference module (Section 3) provides validity scores for these relations. These term-level scores are used to estimate the sentence-level likelihood that the meaning of the hypothesis can be inferred from the text, thus making PLIS a complete lexical inference system. Lexical inference systems do not look into the structure of texts but rather consider them as bag ofterms (words or multi-word expressions). These systems are easy to implement, fast to run, practical across different genres and languages, while maintaining a competitive level of performance. PLIS can be used as a stand-alone efficient inference system or as the lexical component of any NLP application. PLIS is a flexible system, allowing users to choose the set of knowledge resources as well as the model by which inference 1The complete software package is available at http:// www.cs.biu.ac.il/nlp/downloads/PLIS.html and an online interactive viewer is available for examination at http://irsrv2. cs.biu.ac.il/nlp-net/PLIS.html. is done. PLIS can be easily extended with new knowledge resources and new inference models. It comes with a set of ready-to-use plug-ins for many common lexical resources (Section 2.1) as well as two implementation of the scoring framework. These implementations, described in (Shnarch et al., 2011; Shnarch et al., 2012), provide probability estimations for inference. PLIS has an interactive online viewer (Section 4) which provides a visualization of the entire inference process, and is very helpful for analysing lexical inference models and lexical resources usability. 2 Lexical integrator The input for the lexical integrator is a set of lexical resources and a pair of text T and hypothesis H. The lexical integrator extracts lexical inference links from the various lexical resources to connect each text term ti ∈ T with each hypothesis term hj ∈ H2. A lexical i∈nfTer wenicthe elianckh hinydpicoathteess a semantic∈ rHelation between two terms. It could be a directional relation (Columbus→navigator) or a bai ddiirreeccttiioonnaall one (car ←→ automobile). dSirinecceti knowledge resources vary lien) their representation methods, the lexical integrator wraps each lexical resource in a common plug-in interface which encapsulates resource’s inner representation method and exposes its knowledge as a list of inference links. The implemented plug-ins that come with PLIS are described in Section 2.1. Adding a new lexical resource and integrating it with the others only demands the implementation of the plug-in interface. As the knowledge needed to connect a pair of terms, ti and hj, may be scattered across few resources, the lexical integrator combines inference links into lexical inference chains to deduce new pieces of knowledge, such as Columbus −r −e −so −u −rc −e →2 −r −e −so −u −rc −e →1 navigator explorer. Therefore, the only assumption −t −he − l−e −x →ica elx integrator makes, regarding its input lexical resources, is that the inferential lexical relations they provide are transitive. The lexical integrator generates lexical infer- ence chains by expanding the text and hypothesis terms with inference links. These links lead to new terms (e.g. navigator in the above chain example and t0 in Fig 1) which can be further expanded, as all inference links are transitive. A transitivity 2Where iand j run from 1 to the length of the text and hypothesis respectively. 98 limit is set by the user to determine the maximal length for inference chains. The lexical integrator uses a graph-based representation for the inference chains, as illustrates in Fig 1. A node holds the lemma, part-of-speech and sense of a single term. The sense is the ordinal number of WordNet sense. Whenever we do not know the sense of a term we implement the most frequent sense heuristic.3 An edge represents an inference link and is labeled with the semantic relation of this link (e.g. cytokine→protein is larbeellaetdio wni othf tt hheis sW linokrd (Nee.gt .re clayttiookni hypernym). 2.1 Available plug-ins for lexical resources We have implemented plug-ins for the follow- ing resources: the English lexicon WordNet (Fellbaum, 1998)(based on either JWI, JWNL or extJWNL java APIs4), CatVar (Habash and Dorr, 2003), a categorial variations database, Wikipedia-based resource (Shnarch et al., 2009), which applies several extraction methods to derive inference links from the text and structure of Wikipedia, VerbOcean (Chklovski and Pantel, 2004), a knowledge base of fine-grained semantic relations between verbs, Lin’s distributional similarity thesaurus (Lin, 1998), and DIRECT (Kotlerman et al., 2010), a directional distributional similarity thesaurus geared for lexical inference. To summarize, the lexical integrator finds all possible inference chains (of a predefined length), resulting from any combination of inference links extracted from lexical resources, which link any t, h pair of a given text-hypothesis. Developers can use this tool to save the hassle of interfacing with the different lexical knowledge resources, and spare the labor of combining their knowledge via inference chains. The lexical inference model, described next, provides a mean to decide whether a given hypothesis is inferred from a given text, based on weighing the lexical inference chains extracted by the lexical integrator. 3 Lexical inference There are many ways to implement an inference model which identifies inference relations between texts. A simple model may consider the 3This disambiguation policy was better than considering all senses of an ambiguous term in preliminary experiments. However, it is a matter of changing a variable in the configuration of PLIS to switch between these two policies. 4http://wordnet.princeton.edu/wordnet/related-projects/ number of hypothesis terms for which inference chains, originated from text terms, were found. In PLIS, the inference model is a plug-in, similar to the lexical knowledge resources, and can be easily replaced to change the inference logic. We provide PLIS with two implemented baseline lexical inference models which are mathematically based. These are two Probabilistic Lexical Models (PLMs), HN-PLM and M-PLM which are described in (Shnarch et al., 2011; Shnarch et al., 2012) respectively. A PLM provides probability estimations for the three parts of the inference process (as shown in Fig 1): the validity probability of each inference chain (i.e. the probability for a valid inference relation between its endpoint terms) P(ti → hj), the probability of each hypothesis term to →b e i hnferred by the entire text P(T → hj) (term-level probability), eanntdir teh tee probability o hf the entire hypothesis to be inferred by the text P(T → H) (sentencelteov eble probability). HN-PLM describes a generative process by which the hypothesis is generated from the text. Its parameters are the reliability level of each of the resources it utilizes (that is, the prior probability that applying an arbitrary inference link derived from each resource corresponds to a valid inference). For learning these parameters HN-PLM applies a schema of the EM algorithm (Dempster et al., 1977). Its performance on the recognizing textual entailment task, RTE (Bentivogli et al., 2009; Bentivogli et al., 2010), are in line with the state of the art inference systems, including complex systems which perform syntactic analysis. This model is improved by M-PLM, which deduces sentence-level probability from term-level probabilities by a Markovian process. PLIS with this model was used for a passage retrieval for a question answering task (Wang et al., 2007), and outperformed state of the art inference systems. Both PLMs model the following prominent aspects of the lexical inference phenomenon: (i) considering the different reliability levels of the input knowledge resources, (ii) reducing inference chain probability as its length increases, and (iii) increasing term-level probability as we have more inference chains which suggest that the hypothesis term is inferred by the text. Both PLMs only need sentence-level annotations from which they derive term-level inference probabilities. To summarize, the lexical inference module 99 ?(? → ?) Figure 2: PLIS interactive viewer with Example 1 demonstrates knowledge integration of multiple inference chains and resource combination (additional explanations, which are not part of the demo, are provided in orange). provides the setting for interfacing with the lexical integrator. Additionally, the module provides the framework for probabilistic inference models which estimate term-level probabilities and integrate them into a sentence-level inference decision, while implementing prominent aspects of lexical inference. The user can choose to apply another inference logic, not necessarily probabilistic, by plugging a different lexical inference model into the provided inference infrastructure. 4 The PLIS interactive system PLIS comes with an online interactive viewer5 in which the user sets the parameters of PLIS, inserts a text-hypothesis pair and gets a visualization of the entire inference process. This is a powerful tool for investigating knowledge integration and lexical inference models. Fig 2 presents a screenshot of the processing of Example 1. On the right side, the user configures the system by selecting knowledge resources, adjusting their configuration, setting the transitivity limit, and choosing the lexical inference model to be applied by PLIS. After inserting a text and a hypothesis to the appropriate text boxes, the user clicks on the infer button and PLIS generates all lexical inference chains, of length up to the transitivity limit, that connect text terms with hypothesis terms, as available from the combination of the selected input re5http://irsrv2.cs.biu.ac.il/nlp-net/PLIS.html sources. Each inference chain is presented in a line between the text and hypothesis. PLIS also displays the probability estimations for all inference levels; the probability of each chain is presented at the end of its line. For each hypothesis term, term-level probability, which weighs all inference chains found for it, is given below the dashed line. The overall sentence-level probability integrates the probabilities of all hypothesis terms and is displayed in the box at the bottom right corner. Next, we detail the inference process of Example 1, as presented in Fig 2. In this QA example, the probability of the candidate answer (set as the text) to be relevant for the given question (the hypothesis) is estimated. When utilizing only two knowledge resources (WordNet and Wikipedia), PLIS is able to recognize that explorer is inferred by Christopher Columbus and that New World is inferred by America. Each one of these pairs has two independent inference chains, numbered 1–4, as evidence for its inference relation. Both inference chains 1 and 3 include a single inference link, each derived from a different relation of the Wikipedia-based resource. The inference model assigns a higher probability for chain 1since the BeComp relation is much more reliable than the Link relation. This comparison illustrates the ability of the inference model to learn how to differ knowledge resources by their reliability. Comparing the probability assigned by the in100 ference model for inference chain 2 with the probabilities assigned for chains 1 and 3, reveals the sophisticated way by which the inference model integrates lexical knowledge. Inference chain 2 is longer than chain 1, therefore its probability is lower. However, the inference model assigns chain 2 a higher probability than chain 3, even though the latter is shorter, since the model is sensitive enough to consider the difference in reliability levels between the two highly reliable hypernym relations (from WordNet) of chain 2 and the less reliable Link relation (from Wikipedia) of chain 3. Another aspect of knowledge integration is exemplified in Fig 2 by the three circled probabilities. The inference model takes into consideration the multiple pieces of evidence for the inference of New World (inference chains 3 and 4, whose probabilities are circled). This results in a termlevel probability estimation for New World (the third circled probability) which is higher than the probabilities of each chain separately. The third term of the hypothesis, discover, remains uncovered by the text as no inference chain was found for it. Therefore, the sentence-level inference probability is very low, 37%. In order to identify that the hypothesis is indeed inferred from the text, the inference model should be provided with indications for the inference of discover. To that end, the user may increase the transitivity limit in hope that longer inference chains provide the needed information. In addition, the user can examine other knowledge resources in search for the missing inference link. In this example, it is enough to add VerbOcean to the input of PLIS to expose two inference chains which connect reveal with discover by combining an inference link from WordNet and another one from VerbOcean. With this additional information, the sentence-level probability increases to 76%. This is a typical scenario of utilizing PLIS, either via the interactive system or via the software, for analyzing the usability of the different knowledge resources and their combination. A feature of the interactive system, which is useful for lexical resources analysis, is that each term in a chain is clickable and links to another screen which presents all the terms that are inferred from it and those from which it is inferred. Additionally, the interactive system communicates with a server which runs PLIS, in a fullduplex WebSocket connection6. This mode of operation is publicly available and provides a method for utilizing PLIS, without having to install it or the lexical resources it uses. Finally, since PLIS is a lexical system it can easily be adjusted to other languages. One only needs to replace the basic lexical text processing tools and plug in knowledge resources in the target language. If PLIS is provided with bilingual resources,7 it can operate also as a cross-lingual inference system (Negri et al., 2012). For instance, the text in Fig 3 is given in English, while the hypothesis is written in Spanish (given as a list of lemma:part-of-speech). The left side of the figure depicts a cross-lingual inference process in which the only lexical knowledge resource used is a man- ually built English-Spanish dictionary. As can be seen, two Spanish terms, jugador and casa remain uncovered since the dictionary alone cannot connect them to any of the English terms in the text. As illustrated in the right side of Fig 3, PLIS enables the combination of the bilingual dictionary with monolingual resources to produce cross-lingual inference chains, such as footballer−h −y −p −er−n y −m →player− −m −a −nu − →aljugador. Such inferenc−e − c−h −a −in − →s hpalavey trh− e− capability otro. overcome monolingual language variability (the first link in this chain) as well as to provide cross-lingual translation (the second link). 5 Conclusions To utilize PLIS one should gather lexical resources, obtain sentence-level annotations and train the inference model. Annotations are available in common data sets for task such as QA, Information Retrieval (queries are hypotheses and snippets are texts) and Student Response Analysis (reference answers are the hypotheses that should be inferred by the student answers). For developers of NLP applications, PLIS offers a ready-to-use lexical knowledge integrator which can interface with many common lexical knowledge resources and constructs lexical inference chains which combine the knowledge in them. A developer who wants to overcome lexical language variability, or to incorporate background knowledge, can utilize PLIS to inject lex6We used the socket.io implementation. 7A bilingual resource holds inference links which connect terms in different languages (e.g. an English-Spanish dictionary can provide the inference link explorer→explorador). 101 Figure 3 : PLIS as a cross-lingual inference system. Left: the process with a single manual bilingual resource. Right: PLIS composes cross-lingual inference chains to increase hypothesis coverage and increase sentence-level inference probability. ical knowledge into any text understanding application. PLIS can be used as a lightweight inference system or as the lexical component of larger, more complex inference systems. Additionally, PLIS provides scores for infer- ence chains and determines the way to combine them in order to recognize sentence-level inference. PLIS comes with two probabilistic lexical inference models which achieved competitive performance levels in the tasks of recognizing textual entailment and passage retrieval for QA. All aspects of PLIS are configurable. The user can easily switch between the built-in lexical resources, inference models and even languages, or extend the system with additional lexical resources and new inference models. Acknowledgments The authors thank Eden Erez for his help with the interactive viewer and Miquel Espl a` Gomis for the bilingual dictionaries. This work was partially supported by the European Community’s 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287923 (EXCITEMENT) and the Israel Science Foundation grant 880/12. References Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the web for fine-grained semantic verb relations. In Proc. of EMNLP. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, series [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for English. In Proc. of NAACL. Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, 16(4):359–389. Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proc. of COLOING-ACL. Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In Proc. of SemEval. Eyal Shnarch, Libby Barak, and Ido Dagan. 2009. Extracting lexical reference rules from Wikipedia. In Proc. of ACL. Eyal Shnarch, Jacob Goldberger, and Ido Dagan. 2011. Towards a probabilistic model for lexical entailment. In Proc. of the TextInfer Workshop. Eyal Shnarch, Ido Dagan, and Jacob Goldberger. 2012. A probabilistic lexical model for ranking textual inferences. In Proc. of *SEM. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasisynchronous grammar for QA. In Proc. of EMNLP. 102

5 0.099799417 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak

Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.

6 0.098050259 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web

7 0.09699145 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

8 0.079240814 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

9 0.076788515 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

10 0.070012152 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

11 0.069815278 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

12 0.068538152 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

13 0.068027951 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

14 0.066450126 250 acl-2013-Models of Translation Competitions

15 0.066232547 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

16 0.064055115 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

17 0.054622784 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

18 0.054297738 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

19 0.052180704 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations

20 0.045763094 169 acl-2013-Generating Synthetic Comparable Questions for News Articles


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.122), (1, 0.04), (2, -0.005), (3, -0.113), (4, -0.038), (5, 0.021), (6, -0.043), (7, -0.079), (8, 0.038), (9, 0.036), (10, 0.006), (11, 0.022), (12, -0.028), (13, -0.028), (14, 0.032), (15, -0.056), (16, 0.074), (17, 0.015), (18, 0.077), (19, -0.062), (20, 0.06), (21, 0.008), (22, -0.087), (23, 0.049), (24, -0.092), (25, 0.207), (26, -0.148), (27, 0.017), (28, 0.082), (29, -0.32), (30, -0.04), (31, 0.177), (32, 0.078), (33, -0.068), (34, -0.029), (35, -0.052), (36, 0.025), (37, -0.137), (38, -0.13), (39, -0.086), (40, 0.062), (41, 0.015), (42, -0.01), (43, 0.153), (44, -0.079), (45, 0.096), (46, 0.078), (47, 0.131), (48, 0.003), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97371036 297 acl-2013-Recognizing Partial Textual Entailment

Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych

Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.

2 0.90847296 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition

Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen

Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.

3 0.88207513 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations

Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki

Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.

4 0.7302748 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web

Author: Katsuma Narisawa ; Yotaro Watanabe ; Junta Mizuno ; Naoaki Okazaki ; Kentaro Inui

Abstract: This paper presents novel methods for modeling numerical common sense: the ability to infer whether a given number (e.g., three billion) is large, small, or normal for a given context (e.g., number of people facing a water shortage). We first discuss the necessity of numerical common sense in solving textual entailment problems. We explore two approaches for acquiring numerical common sense. Both approaches start with extracting numerical expressions and their context from the Web. One approach estimates the distribution ofnumbers co-occurring within a context and examines whether a given value is large, small, or normal, based on the distri- bution. Another approach utilizes textual patterns with which speakers explicitly expresses their judgment about the value of a numerical expression. Experimental results demonstrate the effectiveness of both approaches.

5 0.6400429 269 acl-2013-PLIS: a Probabilistic Lexical Inference System

Author: Eyal Shnarch ; Erel Segal-haLevi ; Jacob Goldberger ; Ido Dagan

Abstract: This paper presents PLIS, an open source Probabilistic Lexical Inference System which combines two functionalities: (i) a tool for integrating lexical inference knowledge from diverse resources, and (ii) a framework for scoring textual inferences based on the integrated knowledge. We provide PLIS with two probabilistic implementation of this framework. PLIS is available for download and developers of text processing applications can use it as an off-the-shelf component for injecting lexical knowledge into their applications. PLIS is easily configurable, components can be extended or replaced with user generated ones to enable system customization and further research. PLIS includes an online interactive viewer, which is a powerful tool for investigating lexical inference processes. 1 Introduction and background Semantic Inference is the process by which machines perform reasoning over natural language texts. A semantic inference system is expected to be able to infer the meaning of one text from the meaning of another, identify parts of texts which convey a target meaning, and manipulate text units in order to deduce new meanings. Semantic inference is needed for many Natural Language Processing (NLP) applications. For instance, a Question Answering (QA) system may encounter the following question and candidate answer (Example 1): Q: which explorer discovered the New World? A: Christopher Columbus revealed America. As there are no overlapping words between the two sentences, to identify that A holds an answer for Q, background world knowledge is needed to link Christopher Columbus with explorer and America with New World. Linguistic knowledge is also needed to identify that reveal and discover refer to the same concept. Knowledge is needed in order to bridge the gap between text fragments, which may be dissimilar on their surface form but share a common meaning. For the purpose of semantic inference, such knowledge can be derived from various resources (e.g. WordNet (Fellbaum, 1998) and others, detailed in Section 2.1) in a form which we denote as inference links (often called inference/entailment rules), each is an ordered pair of elements in which the first implies the meaning of the second. For instance, the link ship→vessel can be derived from tshtaen hypernym rkel sahtiiopn→ ovfe Wsseolr cdNanet b. Other applications can benefit from utilizing inference links to identify similarity between language expressions. In Information Retrieval, the user’s information need may be expressed in relevant documents differently than it is expressed in the query. Summarization systems should identify text snippets which convey the same meaning. Our work addresses a generic, application in- dependent, setting of lexical inference. We therefore adopt the terminology of Textual Entailment (Dagan et al., 2006), a generic paradigm for applied semantic inference which captures inference needs of many NLP applications in a common underlying task: given two textual fragments, termed hypothesis (H) and text (T), the task is to recognize whether T implies the meaning of H, denoted T→H. For instance, in a QA application, H reprTe→seHnts. Fthoer question, a innd a T Q a c aanpdpilidcaattei answer. pInthis setting, T is likely to hold an answer for the question if it entails the question. It is challenging to properly extract the needed inference knowledge from available resources, and to effectively utilize it within the inference process. The integration of resources, each has its own format, is technically complex and the quality 97 ProceedingSsof oiaf, th Beu 5lg1asrtia A,n Anuuaglu Mst 4ee-9tin 2g0 o1f3. th ?ec A20ss1o3ci Aastisoonci faotrio Cno fomrp Cuotamtipountaalti Loinnaglu Lisitnigcsu,is patigcess 97–102, Figure 1: PLIS schema - a text-hypothesis pair is processed by the Lexical Integrator which uses a set of lexical resources to extract inference chains which connect the two. The Lexical Inference component provides probability estimations for the validity of each level of the process. ofthe resulting inference links is often unknown in advance and varies considerably. For coping with this challenge we developed PLIS, a Probabilistic Lexical Inference System1 . PLIS, illustrated in Fig 1, has two main modules: the Lexical Integra- tor (Section 2) accepts a set of lexical resources and a text-hypothesis pair, and finds all the lexical inference relations between any pair of text term ti and hypothesis term hj, based on the available lexical relations found in the resources (and their combination). The Lexical Inference module (Section 3) provides validity scores for these relations. These term-level scores are used to estimate the sentence-level likelihood that the meaning of the hypothesis can be inferred from the text, thus making PLIS a complete lexical inference system. Lexical inference systems do not look into the structure of texts but rather consider them as bag ofterms (words or multi-word expressions). These systems are easy to implement, fast to run, practical across different genres and languages, while maintaining a competitive level of performance. PLIS can be used as a stand-alone efficient inference system or as the lexical component of any NLP application. PLIS is a flexible system, allowing users to choose the set of knowledge resources as well as the model by which inference 1The complete software package is available at http:// www.cs.biu.ac.il/nlp/downloads/PLIS.html and an online interactive viewer is available for examination at http://irsrv2. cs.biu.ac.il/nlp-net/PLIS.html. is done. PLIS can be easily extended with new knowledge resources and new inference models. It comes with a set of ready-to-use plug-ins for many common lexical resources (Section 2.1) as well as two implementation of the scoring framework. These implementations, described in (Shnarch et al., 2011; Shnarch et al., 2012), provide probability estimations for inference. PLIS has an interactive online viewer (Section 4) which provides a visualization of the entire inference process, and is very helpful for analysing lexical inference models and lexical resources usability. 2 Lexical integrator The input for the lexical integrator is a set of lexical resources and a pair of text T and hypothesis H. The lexical integrator extracts lexical inference links from the various lexical resources to connect each text term ti ∈ T with each hypothesis term hj ∈ H2. A lexical i∈nfTer wenicthe elianckh hinydpicoathteess a semantic∈ rHelation between two terms. It could be a directional relation (Columbus→navigator) or a bai ddiirreeccttiioonnaall one (car ←→ automobile). dSirinecceti knowledge resources vary lien) their representation methods, the lexical integrator wraps each lexical resource in a common plug-in interface which encapsulates resource’s inner representation method and exposes its knowledge as a list of inference links. The implemented plug-ins that come with PLIS are described in Section 2.1. Adding a new lexical resource and integrating it with the others only demands the implementation of the plug-in interface. As the knowledge needed to connect a pair of terms, ti and hj, may be scattered across few resources, the lexical integrator combines inference links into lexical inference chains to deduce new pieces of knowledge, such as Columbus −r −e −so −u −rc −e →2 −r −e −so −u −rc −e →1 navigator explorer. Therefore, the only assumption −t −he − l−e −x →ica elx integrator makes, regarding its input lexical resources, is that the inferential lexical relations they provide are transitive. The lexical integrator generates lexical infer- ence chains by expanding the text and hypothesis terms with inference links. These links lead to new terms (e.g. navigator in the above chain example and t0 in Fig 1) which can be further expanded, as all inference links are transitive. A transitivity 2Where iand j run from 1 to the length of the text and hypothesis respectively. 98 limit is set by the user to determine the maximal length for inference chains. The lexical integrator uses a graph-based representation for the inference chains, as illustrates in Fig 1. A node holds the lemma, part-of-speech and sense of a single term. The sense is the ordinal number of WordNet sense. Whenever we do not know the sense of a term we implement the most frequent sense heuristic.3 An edge represents an inference link and is labeled with the semantic relation of this link (e.g. cytokine→protein is larbeellaetdio wni othf tt hheis sW linokrd (Nee.gt .re clayttiookni hypernym). 2.1 Available plug-ins for lexical resources We have implemented plug-ins for the follow- ing resources: the English lexicon WordNet (Fellbaum, 1998)(based on either JWI, JWNL or extJWNL java APIs4), CatVar (Habash and Dorr, 2003), a categorial variations database, Wikipedia-based resource (Shnarch et al., 2009), which applies several extraction methods to derive inference links from the text and structure of Wikipedia, VerbOcean (Chklovski and Pantel, 2004), a knowledge base of fine-grained semantic relations between verbs, Lin’s distributional similarity thesaurus (Lin, 1998), and DIRECT (Kotlerman et al., 2010), a directional distributional similarity thesaurus geared for lexical inference. To summarize, the lexical integrator finds all possible inference chains (of a predefined length), resulting from any combination of inference links extracted from lexical resources, which link any t, h pair of a given text-hypothesis. Developers can use this tool to save the hassle of interfacing with the different lexical knowledge resources, and spare the labor of combining their knowledge via inference chains. The lexical inference model, described next, provides a mean to decide whether a given hypothesis is inferred from a given text, based on weighing the lexical inference chains extracted by the lexical integrator. 3 Lexical inference There are many ways to implement an inference model which identifies inference relations between texts. A simple model may consider the 3This disambiguation policy was better than considering all senses of an ambiguous term in preliminary experiments. However, it is a matter of changing a variable in the configuration of PLIS to switch between these two policies. 4http://wordnet.princeton.edu/wordnet/related-projects/ number of hypothesis terms for which inference chains, originated from text terms, were found. In PLIS, the inference model is a plug-in, similar to the lexical knowledge resources, and can be easily replaced to change the inference logic. We provide PLIS with two implemented baseline lexical inference models which are mathematically based. These are two Probabilistic Lexical Models (PLMs), HN-PLM and M-PLM which are described in (Shnarch et al., 2011; Shnarch et al., 2012) respectively. A PLM provides probability estimations for the three parts of the inference process (as shown in Fig 1): the validity probability of each inference chain (i.e. the probability for a valid inference relation between its endpoint terms) P(ti → hj), the probability of each hypothesis term to →b e i hnferred by the entire text P(T → hj) (term-level probability), eanntdir teh tee probability o hf the entire hypothesis to be inferred by the text P(T → H) (sentencelteov eble probability). HN-PLM describes a generative process by which the hypothesis is generated from the text. Its parameters are the reliability level of each of the resources it utilizes (that is, the prior probability that applying an arbitrary inference link derived from each resource corresponds to a valid inference). For learning these parameters HN-PLM applies a schema of the EM algorithm (Dempster et al., 1977). Its performance on the recognizing textual entailment task, RTE (Bentivogli et al., 2009; Bentivogli et al., 2010), are in line with the state of the art inference systems, including complex systems which perform syntactic analysis. This model is improved by M-PLM, which deduces sentence-level probability from term-level probabilities by a Markovian process. PLIS with this model was used for a passage retrieval for a question answering task (Wang et al., 2007), and outperformed state of the art inference systems. Both PLMs model the following prominent aspects of the lexical inference phenomenon: (i) considering the different reliability levels of the input knowledge resources, (ii) reducing inference chain probability as its length increases, and (iii) increasing term-level probability as we have more inference chains which suggest that the hypothesis term is inferred by the text. Both PLMs only need sentence-level annotations from which they derive term-level inference probabilities. To summarize, the lexical inference module 99 ?(? → ?) Figure 2: PLIS interactive viewer with Example 1 demonstrates knowledge integration of multiple inference chains and resource combination (additional explanations, which are not part of the demo, are provided in orange). provides the setting for interfacing with the lexical integrator. Additionally, the module provides the framework for probabilistic inference models which estimate term-level probabilities and integrate them into a sentence-level inference decision, while implementing prominent aspects of lexical inference. The user can choose to apply another inference logic, not necessarily probabilistic, by plugging a different lexical inference model into the provided inference infrastructure. 4 The PLIS interactive system PLIS comes with an online interactive viewer5 in which the user sets the parameters of PLIS, inserts a text-hypothesis pair and gets a visualization of the entire inference process. This is a powerful tool for investigating knowledge integration and lexical inference models. Fig 2 presents a screenshot of the processing of Example 1. On the right side, the user configures the system by selecting knowledge resources, adjusting their configuration, setting the transitivity limit, and choosing the lexical inference model to be applied by PLIS. After inserting a text and a hypothesis to the appropriate text boxes, the user clicks on the infer button and PLIS generates all lexical inference chains, of length up to the transitivity limit, that connect text terms with hypothesis terms, as available from the combination of the selected input re5http://irsrv2.cs.biu.ac.il/nlp-net/PLIS.html sources. Each inference chain is presented in a line between the text and hypothesis. PLIS also displays the probability estimations for all inference levels; the probability of each chain is presented at the end of its line. For each hypothesis term, term-level probability, which weighs all inference chains found for it, is given below the dashed line. The overall sentence-level probability integrates the probabilities of all hypothesis terms and is displayed in the box at the bottom right corner. Next, we detail the inference process of Example 1, as presented in Fig 2. In this QA example, the probability of the candidate answer (set as the text) to be relevant for the given question (the hypothesis) is estimated. When utilizing only two knowledge resources (WordNet and Wikipedia), PLIS is able to recognize that explorer is inferred by Christopher Columbus and that New World is inferred by America. Each one of these pairs has two independent inference chains, numbered 1–4, as evidence for its inference relation. Both inference chains 1 and 3 include a single inference link, each derived from a different relation of the Wikipedia-based resource. The inference model assigns a higher probability for chain 1since the BeComp relation is much more reliable than the Link relation. This comparison illustrates the ability of the inference model to learn how to differ knowledge resources by their reliability. Comparing the probability assigned by the in100 ference model for inference chain 2 with the probabilities assigned for chains 1 and 3, reveals the sophisticated way by which the inference model integrates lexical knowledge. Inference chain 2 is longer than chain 1, therefore its probability is lower. However, the inference model assigns chain 2 a higher probability than chain 3, even though the latter is shorter, since the model is sensitive enough to consider the difference in reliability levels between the two highly reliable hypernym relations (from WordNet) of chain 2 and the less reliable Link relation (from Wikipedia) of chain 3. Another aspect of knowledge integration is exemplified in Fig 2 by the three circled probabilities. The inference model takes into consideration the multiple pieces of evidence for the inference of New World (inference chains 3 and 4, whose probabilities are circled). This results in a termlevel probability estimation for New World (the third circled probability) which is higher than the probabilities of each chain separately. The third term of the hypothesis, discover, remains uncovered by the text as no inference chain was found for it. Therefore, the sentence-level inference probability is very low, 37%. In order to identify that the hypothesis is indeed inferred from the text, the inference model should be provided with indications for the inference of discover. To that end, the user may increase the transitivity limit in hope that longer inference chains provide the needed information. In addition, the user can examine other knowledge resources in search for the missing inference link. In this example, it is enough to add VerbOcean to the input of PLIS to expose two inference chains which connect reveal with discover by combining an inference link from WordNet and another one from VerbOcean. With this additional information, the sentence-level probability increases to 76%. This is a typical scenario of utilizing PLIS, either via the interactive system or via the software, for analyzing the usability of the different knowledge resources and their combination. A feature of the interactive system, which is useful for lexical resources analysis, is that each term in a chain is clickable and links to another screen which presents all the terms that are inferred from it and those from which it is inferred. Additionally, the interactive system communicates with a server which runs PLIS, in a fullduplex WebSocket connection6. This mode of operation is publicly available and provides a method for utilizing PLIS, without having to install it or the lexical resources it uses. Finally, since PLIS is a lexical system it can easily be adjusted to other languages. One only needs to replace the basic lexical text processing tools and plug in knowledge resources in the target language. If PLIS is provided with bilingual resources,7 it can operate also as a cross-lingual inference system (Negri et al., 2012). For instance, the text in Fig 3 is given in English, while the hypothesis is written in Spanish (given as a list of lemma:part-of-speech). The left side of the figure depicts a cross-lingual inference process in which the only lexical knowledge resource used is a man- ually built English-Spanish dictionary. As can be seen, two Spanish terms, jugador and casa remain uncovered since the dictionary alone cannot connect them to any of the English terms in the text. As illustrated in the right side of Fig 3, PLIS enables the combination of the bilingual dictionary with monolingual resources to produce cross-lingual inference chains, such as footballer−h −y −p −er−n y −m →player− −m −a −nu − →aljugador. Such inferenc−e − c−h −a −in − →s hpalavey trh− e− capability otro. overcome monolingual language variability (the first link in this chain) as well as to provide cross-lingual translation (the second link). 5 Conclusions To utilize PLIS one should gather lexical resources, obtain sentence-level annotations and train the inference model. Annotations are available in common data sets for task such as QA, Information Retrieval (queries are hypotheses and snippets are texts) and Student Response Analysis (reference answers are the hypotheses that should be inferred by the student answers). For developers of NLP applications, PLIS offers a ready-to-use lexical knowledge integrator which can interface with many common lexical knowledge resources and constructs lexical inference chains which combine the knowledge in them. A developer who wants to overcome lexical language variability, or to incorporate background knowledge, can utilize PLIS to inject lex6We used the socket.io implementation. 7A bilingual resource holds inference links which connect terms in different languages (e.g. an English-Spanish dictionary can provide the inference link explorer→explorador). 101 Figure 3 : PLIS as a cross-lingual inference system. Left: the process with a single manual bilingual resource. Right: PLIS composes cross-lingual inference chains to increase hypothesis coverage and increase sentence-level inference probability. ical knowledge into any text understanding application. PLIS can be used as a lightweight inference system or as the lexical component of larger, more complex inference systems. Additionally, PLIS provides scores for infer- ence chains and determines the way to combine them in order to recognize sentence-level inference. PLIS comes with two probabilistic lexical inference models which achieved competitive performance levels in the tasks of recognizing textual entailment and passage retrieval for QA. All aspects of PLIS are configurable. The user can easily switch between the built-in lexical resources, inference models and even languages, or extend the system with additional lexical resources and new inference models. Acknowledgments The authors thank Eden Erez for his help with the interactive viewer and Miquel Espl a` Gomis for the bilingual dictionaries. This work was partially supported by the European Community’s 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287923 (EXCITEMENT) and the Israel Science Foundation grant 880/12. References Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the web for fine-grained semantic verb relations. In Proc. of EMNLP. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, series [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for English. In Proc. of NAACL. Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, 16(4):359–389. Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proc. of COLOING-ACL. Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In Proc. of SemEval. Eyal Shnarch, Libby Barak, and Ido Dagan. 2009. Extracting lexical reference rules from Wikipedia. In Proc. of ACL. Eyal Shnarch, Jacob Goldberger, and Ido Dagan. 2011. Towards a probabilistic model for lexical entailment. In Proc. of the TextInfer Workshop. Eyal Shnarch, Ido Dagan, and Jacob Goldberger. 2012. A probabilistic lexical model for ranking textual inferences. In Proc. of *SEM. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasisynchronous grammar for QA. In Proc. of EMNLP. 102

6 0.41439742 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

7 0.3660745 237 acl-2013-Margin-based Decomposed Amortized Inference

8 0.36002076 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

9 0.34186724 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits

10 0.33469179 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

11 0.33051527 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations

12 0.32696885 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

13 0.32540596 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

14 0.31315351 265 acl-2013-Outsourcing FrameNet to the Crowd

15 0.29902852 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

16 0.29022521 61 acl-2013-Automatic Interpretation of the English Possessive

17 0.28187898 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

18 0.27448592 104 acl-2013-DKPro Similarity: An Open Source Framework for Text Similarity

19 0.27080289 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies

20 0.26370886 242 acl-2013-Mining Equivalent Relations from Linked Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.121), (6, 0.065), (11, 0.091), (15, 0.021), (24, 0.045), (26, 0.039), (35, 0.088), (42, 0.033), (48, 0.037), (70, 0.046), (84, 0.22), (88, 0.021), (90, 0.02), (95, 0.07)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83638531 297 acl-2013-Recognizing Partial Textual Entailment

Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych

Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.

2 0.79467046 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory

Author: Mei Tu ; Yu Zhou ; Chengqing Zong

Abstract: Rhetorical structure theory (RST) is widely used for discourse understanding, which represents a discourse as a hierarchically semantic structure. In this paper, we propose a novel translation framework with the help of RST. In our framework, the translation process mainly includes three steps: 1) Source RST-tree acquisition: a source sentence is parsed into an RST tree; 2) Rule extraction: translation rules are extracted from the source tree and the target string via bilingual word alignment; 3) RST-based translation: the source RST-tree is translated with translation rules. Experiments on Chinese-to-English show that our RST-based approach achieves improvements of 2.3/0.77/1.43 BLEU points on NIST04/NIST05/CWMT2008 respectively. 1

3 0.74172717 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

Abstract: Fast alignment is essential for many natural language tasks. But in the setting of monolingual alignment, previous work has not been able to align more than one sentence pair per second. We describe a discriminatively trained monolingual word aligner that uses a Conditional Random Field to globally decode the best alignment with features drawn from source and target sentences. Using just part-of-speech tags and WordNet as external resources, our aligner gives state-of-the-art result, while being an order-of-magnitude faster than the previous best performing system.

4 0.71941298 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain

Author: Marine Carpuat ; Hal Daume III ; Katharine Henry ; Ann Irvine ; Jagadeesh Jagarlamudi ; Rachel Rudinger

Abstract: Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SENSESPOTTING, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.

5 0.67450613 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

Author: Ulle Endriss ; Raquel Fernandez

Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.

6 0.6619885 43 acl-2013-Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

7 0.65681028 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

8 0.65538591 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

9 0.65462446 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

10 0.65407497 237 acl-2013-Margin-based Decomposed Amortized Inference

11 0.65375787 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit

12 0.65174264 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

13 0.65138751 333 acl-2013-Summarization Through Submodularity and Dispersion

14 0.65095341 242 acl-2013-Mining Equivalent Relations from Linked Data

15 0.65011907 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension

16 0.64883119 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

17 0.64789939 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics

18 0.64713132 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

19 0.64603865 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

20 0.64564085 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis