acl acl2012 acl2012-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan
Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. [sent-3, score-0.28]
2 However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. [sent-4, score-0.162]
3 In this paper, we suggest a framework for evaluating inference-rule resources. [sent-5, score-0.112]
4 Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. [sent-6, score-0.199]
5 We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators. [sent-7, score-0.167]
6 1 Introduction Inference rules are an important component in semantic applications, such as Question Answering (QA) (Ravichandran and Hovy, 2002) and Information Extraction (IE) (Shinyama and Sekine, 2006), describing a directional inference relation between two text patterns with variables. [sent-8, score-0.197]
7 For example, to answer the question ‘Where was Reagan raised? [sent-9, score-0.05]
8 ’ a QA system can use the rule ‘X brought up in Y→X rQaAise sdy istne Ym’ ctoa nex utsraec tth teh reu answer rforuomgh ‘Reagan was brought up in Dixon ’. [sent-10, score-0.37]
9 Similarly, an IE system can use the rule ‘X work as Y→X hired as Y’ to extursaect t hthee rPuElReS ‘OXN w aorndk aRsO YL→E eXnt hitiieresd din a tshe Y “hiring” event from ‘Bob worked as an analyst for Dell’. [sent-11, score-0.24]
10 The significance of inference rules has led to substantial effort into developing algorithms that automatically learn inference rules (Lin and Pantel, 2001 ; Sekine, 2005; Schoenmackers et al. [sent-12, score-0.435]
11 However, despite their potential, utilization of inference rule resources is currently somewhat limited. [sent-20, score-0.386]
12 Thus, evaluation is necessary both for resource developers as well as for inference system developers who want to asses the quality of each resource. [sent-22, score-0.288]
13 Unfortunately, as evaluating inference rules is hard and costly, there is no clear evaluation standard, and this has become a slowing factor for progress in the field. [sent-23, score-0.325]
14 One option for evaluating inference rule resources is to measure their impact on an end task, as that is what ultimately interests an inference system developer. [sent-24, score-0.563]
15 However, this is often problematic since inference systems have many components that address multiple phenomena, and thus it is hard to assess the effect of a single resource. [sent-25, score-0.112]
16 An example is the Recognizing Textual Entailment (RTE) framework (Dagan et al. [sent-26, score-0.047]
17 , 2009), in which given a text T and a textual hypothesis H, a system determines whether H can be inferred from T. [sent-27, score-0.056]
18 This type of evaluation was established in RTE challenges by ablation tests (see RTE ablation tests in ACLWiki) and showed that resources’ impact can vary considerably from one system to another. [sent-28, score-0.144]
19 These issues have also been noted by Sammons et al. [sent-29, score-0.034]
20 Some attempts were made to let annotators judge rule correctness directly, that is by asking them to judge the correctness of a given rule (Shinyama et al. [sent-32, score-0.849]
21 (2007) observed that directly judging rules out of context often results in low inter-annotator agreement. [sent-35, score-0.125]
22 (2007) proposed “instance-based evaluation ”, in which annotators are presented with an application of a rule in a particular context and need to judge whether it results in a valid inference. [sent-40, score-0.505]
23 This simulates the utility of rules in an application and yields high inter-annotator agreement. [sent-41, score-0.138]
24 Unfortunately, their method requires lengthy guidelines and substantial annotator training effort, which are time consuming and costly. [sent-42, score-0.115]
25 Recently, crowdsourcing services such as Amazon Mechanical Turk (AMT) and CrowdFlower (CF)1 have been employed for semantic inference annotation (Snow et al. [sent-44, score-0.366]
26 These works focused on generating and annotating RTE text-hypothesis pairs, but did not address annotation and evaluation of inference rules. [sent-48, score-0.186]
27 In this paper, we propose a novel instance-based evaluation framework for inference rules that takes advantage of crowdsourcing. [sent-49, score-0.244]
28 Our method substantially simplifies annotation of rule applications and avoids annotator training completely. [sent-50, score-0.508]
29 The novelty in our framework is two-fold: (1) We simplify instance-based evaluation from a complex decision scenario to two independent binary decisions. [sent-51, score-0.047]
30 (2) We apply methodological principles that efficiently communicate the definition of the “inference” relation to untrained crowdsourcing workers (Turkers). [sent-52, score-0.365]
31 As a case study, we applied our method to evalu- ate algorithms for learning inference rules between predicates. [sent-53, score-0.197]
32 We show that we can produce many annotations cheaply, quickly, at good quality, while achieving high inter-annotator agreement. [sent-54, score-0.071]
33 2 Evaluating Rule Applications As mentioned, in instance-based evaluation individual rule applications are judged rather than rules in isolation, and the quality of a rule-resource is then evaluated by the validity of a sample of applications of its rules. [sent-55, score-0.578]
34 Rule application is performed by finding an instantiation of the rule left-hand-side in a corpus (termed LHS extraction) and then applying the rule on the extraction to produce an instantiation of the rule right-hand-side (termed RHS instantiation). [sent-56, score-1.376]
35 For example, the rule ‘X observe Y→X celebrate Y’ For example, 1https://www. [sent-57, score-0.398]
36 com 157 can be applied on the LHS extraction ‘they observe holidays ’ to produce the RHS instantiation ‘they celebrate holidays’. [sent-60, score-0.63]
37 The target of evaluation is to judge whether each rule application is valid or not. [sent-61, score-0.44]
38 Following the standard RTE task definition, a rule application is considered valid if a human reading the LHS extraction is highly likely to infer that the RHS instantiation is true (Dagan et al. [sent-62, score-0.708]
39 In the aforementioned example, the annotator is expected to judge that ‘they observe holidays ’ entails ‘they celebrate holidays’. [sent-64, score-0.51]
40 The first is that the LHS extraction is meaningless. [sent-66, score-0.091]
41 We regard a proposition as meaningful if a human can easily understand its meaning (despite some simple grammatical errors). [sent-67, score-0.145]
42 A meaningless LHS extraction usually occurs due to a faulty extraction process (e. [sent-68, score-0.298]
43 Such rule applications can either be extracted from the sample so that the rule-base is not penalized (since the problem is in the extraction procedure), or can be used as examples of non-entailment, if we are interested in overall performance. [sent-71, score-0.461]
44 A second situation is a meaningless RHS instantiation, usually caused by rule application in a wrong context. [sent-72, score-0.409]
45 This case is tagged as non-entailment (for example, applying the rule ‘X observe Y→X celebrate Y’ in the context of trhulee ee ‘xXtr oacbtisoenrv ‘companies obbraseterv Ye’ d inres ths ec codone ’). [sent-73, score-0.398]
46 Each rule application therefore requires an answer to the following three questions: 1) Is the LHS extraction meaningful? [sent-74, score-0.434]
47 3) If both are meaningful, does the LHS extraction entail the RHS instantiation? [sent-76, score-0.091]
48 3 Crowdsourcing Previous works using crowdsourcing noted some principles to help get the most out of the service(Wang et al. [sent-77, score-0.28]
49 The global task is split into simple sub-tasks, each dealing with a single aspect of the problem. [sent-80, score-0.037]
50 gold standard (GS) examples are combined with ac- tual annotations to continuously validate annotator reliability. [sent-85, score-0.192]
51 We split the annotation process into two tasks, the first to judge phrase meaningfulness (Questions 1 and 2 above) and the second to judge entailment (Question 3 above). [sent-86, score-0.633]
52 In Task 1, the LHS extractions and RHS instantiations of all rule applications are separated and presented to different Turkers independently of one another. [sent-87, score-0.383]
53 This task is simple, quick and cheap and allows Turkers to focus on the single aspect of judging phrase meaningfulness. [sent-88, score-0.125]
54 Rule applications for which both the LHS extraction and RHS instantiation are judged as meaningful are passed to Task 2, where Turkers need to decide whether a given rule application is valid. [sent-89, score-0.995]
55 If not for Task 1, Turkers would need to distinguish in Task 2 between non-entailment due to (1) an incorrect rule (2) a meaningless RHS instantiation (3) a meaningless LHS extraction. [sent-90, score-0.728]
56 Thanks to Task 1, Turkers are presented in Task 2 with two meaningful phrases and need to decide only whether one entails the other. [sent-91, score-0.182]
57 (2010) we only use results for which the confidence value provided by CF is greater than 70%. [sent-94, score-0.055]
58 (2007), whose judgments for each rule application are similar to ours, but had to be performed simultaneously by annotators, which required substantial training. [sent-97, score-0.334]
59 In keeping with the second principle above, the task description is made up of a short verbal explanation followed by positive and negative examples. [sent-99, score-0.071]
60 The definition of “meaningfulness” is conveyed via examples pointing to properties of the automatic phrase extraction process, as seen in Table 1. [sent-100, score-0.181]
61 As mentioned, rule applications for which both sides were judged as meaningful are evaluated for entail158 ment. [sent-102, score-0.555]
62 The challenge is to communicate the definition of “entailment” to Turkers. [sent-103, score-0.08]
63 To that end the task description begins with a short explanation followed by “easy” and “hard” examples with explanations, covering a variety of positive and negative entailment “types” (Table 2). [sent-104, score-0.351]
64 Defining “entailment” is quite difficult when deal- ing with expert annotators and still more with nonexperts, as was noted by Negri et al. [sent-105, score-0.14]
65 We therefore employ several additional mechanisms to get the definition of entailment across to Turkers and increase agreement with the GS. [sent-107, score-0.382]
66 We run an initial small test run and use its output to improve annotation in two ways: First, we take examples that were “confusing” for Turkers and add them to the GS with explanatory feedback presented when a Turker answers incorrectly. [sent-108, score-0.121]
67 , the pair ( ‘The owner be happy to help drivers ’, ‘The owner assist drivers’) was judged as entailing in the test run but only achieved a confidence value of 0. [sent-111, score-0.291]
68 Second, we add examples that were annotated unanimously by Turkers to the GS to increase its size, allowing CF to better estimate Turker’s reliability (following CF recommendations, we aim to have around 10% GS examples in every run). [sent-113, score-0.196]
69 In Section 4 we show that these mechanisms improved annotation quality. [sent-114, score-0.125]
70 4 Case Study As a case study, we used our evaluation methodology to compare four methods for learning entailment rules between predicates: DIRT (Lin and Pantel, 2001), Cover (Weeds and Weir, 2003), BInc (Szpektor and Dagan, 2008) and Berant et al. [sent-115, score-0.364]
71 To that end, we applied the methods on a set of one billion extractions (generously provided by Fader et al. [sent-117, score-0.06]
72 (201 1)) automatically extracted from the ClueWeb09 web crawl2, where each extraction comprises a predicate and two arguments. [sent-118, score-0.091]
73 tchfe,boum We randomly sampled 5,000 extractions, and for each one sampled four rules whose LHS matches the extraction from the union of the learned resources. [sent-126, score-0.246]
74 We then applied the rules, which resulted in 20,000 rule applications. [sent-127, score-0.24]
75 We annotated rule applications using our methodology and evaluated each learning method by comparing the rules learned by each method with the annotation generated by CF. [sent-128, score-0.567]
76 In Task 1, 281 rule applications were annotated as meaningless LHS extraction, and 1,012 were annotated as meaningful LHS extraction but meaningless RHS instantiation and so automatically annotated as non-entailment. [sent-129, score-1.164]
77 8,264 rule applications were passed on to Task 2, as both sides were judged meaningful (the remaining 10,443 discarded due to low CF confidence). [sent-130, score-0.595]
78 In Task 2, 5,555 rule applications were judged with a high confidence and supplied as output, 2,447 of them as positive entailment and 3,108 as negative. [sent-131, score-0.698]
79 Overall, 6,567 rule applications (dataset of this paper) were annotated for a total cost of $1000. [sent-132, score-0.362]
80 In tests run during development we experimented with Task 2 wording and GS examples, seeking to make the definition of entailment as clear as possible. [sent-134, score-0.308]
81 To do so we randomly sampled and manually annotated 200 rule applications (from the initial 20,000), and had Turkers judge them. [sent-135, score-0.513]
82 In our initial test, Turkers tended to answer “yes” comparing to our own annotation, with 0. [sent-136, score-0.05]
83 79 agreement between their annotation and ours, corresponding to a kappa score of 0. [sent-137, score-0.178]
84 After applying the mechanisms described in Section 3, false-positive rate was reduced from 18% to 6% while false-negative rate only increased from 4% to 5%, corresponding to a high agreement of 0. [sent-139, score-0.106]
85 In our test, 63% of the 200 rule applications were annotated unanimously by the Turkers. [sent-142, score-0.425]
86 Importantly, all these examples were in perfect agreement with our own annotation, reflecting their high reliability. [sent-143, score-0.102]
87 159 For the purpose of evaluating the resources learned by the algorithms we used annotations with CF confidence ≥ 0. [sent-144, score-0.225]
88 er the recallprecision curve (AUC) for DIRT, Cover, BInc and Berant et al. [sent-149, score-0.031]
89 Overall, we demonstrated that our evaluation framework allowed us to compare four different learning methods in low costs and within one week. [sent-156, score-0.047]
90 5 Discussion In this paper we have suggested a crowdsourcing framework for evaluating inference rules. [sent-157, score-0.404]
91 We have shown that by simplifying the previously-proposed instance-based evaluation framework we are able to take advantage of crowdsourcing services to replace trained expert annotators, resulting in good quality large scale annotations, for reasonable time and cost. [sent-158, score-0.268]
92 We have presented the methodological principles we developed to get the entailment decision across to Turkers, achieving very high agreement both with our annotations and between the annotators themselves. [sent-159, score-0.529]
93 Using the CrowdFlower forms we provide with this paper, the proposed methodology can be beneficial for both resource developers evaluating their output as well as inference system developers wanting to assess the quality of existing resources. [sent-160, score-0.399]
94 Acknowledgments This work was partially supported by the Israel Science Foundation grant 1112/08, the PASCAL2 Network of Excellence of the European Community FP7-ICT-2007-1-216886, and the European Communitys Seventh Framework Programme (FP7/2007-2013) under grant agreement no. [sent-161, score-0.055]
95 LEDIR: An unsupervised algorithm for learning directionality of inference rules. [sent-169, score-0.112]
96 Types of common-sense knowledge needed for recognizing textual entailment. [sent-185, score-0.09]
97 Divide and conquer: Crowdsourcing the creation of cross-lingual textual entailment corpora. [sent-193, score-0.289]
wordName wordTfidf (topN-words)
[('lhs', 0.325), ('turkers', 0.26), ('instantiation', 0.256), ('rule', 0.24), ('rhs', 0.233), ('entailment', 0.233), ('crowdsourcing', 0.18), ('meaningful', 0.145), ('cf', 0.135), ('szpektor', 0.132), ('celebrate', 0.125), ('holidays', 0.125), ('dagan', 0.122), ('judge', 0.116), ('meaningless', 0.116), ('inference', 0.112), ('gs', 0.11), ('shinyama', 0.109), ('rte', 0.096), ('meaningfulness', 0.094), ('extraction', 0.091), ('developers', 0.088), ('berant', 0.088), ('judged', 0.087), ('ido', 0.087), ('rules', 0.085), ('negri', 0.084), ('mehdad', 0.084), ('applications', 0.083), ('sekine', 0.08), ('israel', 0.077), ('annotator', 0.074), ('annotation', 0.074), ('annotations', 0.071), ('auc', 0.07), ('dirt', 0.07), ('principles', 0.066), ('evaluating', 0.065), ('annotators', 0.065), ('binc', 0.063), ('lobue', 0.063), ('slowing', 0.063), ('turker', 0.063), ('unanimously', 0.063), ('extractions', 0.06), ('satoshi', 0.06), ('textual', 0.056), ('confidence', 0.055), ('agreement', 0.055), ('reagan', 0.055), ('drivers', 0.055), ('crowdflower', 0.055), ('weeds', 0.055), ('application', 0.053), ('mechanisms', 0.051), ('schoenmackers', 0.05), ('answer', 0.05), ('kappa', 0.049), ('cheap', 0.048), ('framework', 0.047), ('pantel', 0.047), ('examples', 0.047), ('sammons', 0.047), ('matteo', 0.047), ('owner', 0.047), ('idan', 0.047), ('methodology', 0.046), ('fader', 0.044), ('yashar', 0.044), ('definition', 0.043), ('bhagat', 0.042), ('substantial', 0.041), ('expert', 0.041), ('ablation', 0.04), ('passed', 0.04), ('judging', 0.04), ('brought', 0.04), ('annotated', 0.039), ('methodological', 0.039), ('termed', 0.039), ('communicate', 0.037), ('simplifies', 0.037), ('entails', 0.037), ('task', 0.037), ('correctness', 0.036), ('ravichandran', 0.036), ('snow', 0.036), ('sampled', 0.035), ('resources', 0.034), ('recognizing', 0.034), ('noted', 0.034), ('yusuke', 0.034), ('explanation', 0.034), ('qa', 0.034), ('observe', 0.033), ('tests', 0.032), ('mechanical', 0.032), ('curve', 0.031), ('amazon', 0.031), ('valid', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 65 acl-2012-Crowdsourcing Inference-Rule Evaluation
Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan
Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.
2 0.18733501 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
Author: Asher Stern ; Ido Dagan
Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.
3 0.18557996 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger
Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.
4 0.18001521 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: We address a core aspect of the multilingual content synchronization task: the identification of novel, more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where: i) T and H are in different languages, and ii) entailment relations between T and H have to be checked in both directions. Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets.
5 0.1610454 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu
Abstract: unkown-abstract
6 0.1458938 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain
7 0.13749561 78 acl-2012-Efficient Search for Transformation-based Inference
8 0.09056595 184 acl-2012-String Re-writing Kernel
9 0.081744321 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
10 0.079439215 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
11 0.075489886 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
12 0.074458547 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
13 0.068576403 55 acl-2012-Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization
14 0.067176551 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
15 0.064769544 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
16 0.062514864 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
17 0.058696911 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums
18 0.058053259 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
19 0.057397578 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
20 0.057235062 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
topicId topicWeight
[(0, -0.179), (1, 0.037), (2, -0.028), (3, 0.082), (4, 0.021), (5, 0.066), (6, -0.058), (7, 0.273), (8, 0.001), (9, 0.045), (10, -0.127), (11, 0.271), (12, 0.025), (13, -0.162), (14, 0.127), (15, -0.007), (16, 0.032), (17, -0.113), (18, -0.019), (19, 0.007), (20, -0.033), (21, 0.02), (22, -0.011), (23, 0.007), (24, 0.042), (25, -0.086), (26, -0.071), (27, -0.093), (28, 0.035), (29, -0.001), (30, -0.009), (31, 0.003), (32, 0.093), (33, 0.038), (34, -0.078), (35, -0.069), (36, 0.06), (37, -0.092), (38, 0.089), (39, 0.09), (40, 0.106), (41, 0.034), (42, 0.062), (43, -0.055), (44, 0.007), (45, 0.009), (46, 0.127), (47, 0.007), (48, -0.006), (49, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.95715356 65 acl-2012-Crowdsourcing Inference-Rule Evaluation
Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan
Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.
2 0.76278669 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain
Author: Meni Adler ; Jonathan Berant ; Ido Dagan
Abstract: We present a novel text exploration model, which extends the scope of state-of-the-art technologies by moving from standard concept-based exploration to statement-based exploration. The proposed scheme utilizes the textual entailment relation between statements as the basis of the exploration process. A user of our system can explore the result space of a query by drilling down/up from one statement to another, according to entailment relations specified by an entailment graph and an optional concept taxonomy. As a prominent use case, we apply our exploration system and illustrate its benefit on the health-care domain. To the best of our knowledge this is the first implementation of an exploration system at the statement level that is based on the textual entailment relation. 1
3 0.74792176 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: We address a core aspect of the multilingual content synchronization task: the identification of novel, more informative or semantically equivalent pieces of information in two documents about the same topic. This can be seen as an application-oriented variant of textual entailment recognition where: i) T and H are in different languages, and ii) entailment relations between T and H have to be checked in both directions. Using a combination of lexical, syntactic, and semantic features to train a cross-lingual textual entailment system, we report promising results on different datasets.
4 0.70543456 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger
Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.
5 0.58282429 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
Author: Sindhu Raghavan ; Raymond Mooney ; Hyeonseo Ku
Abstract: Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires “reading between the lines”. Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. We propose an approach that uses Bayesian Logic Programs (BLPs), a statistical relational model combining firstorder logic and Bayesian networks, to infer additional implicit information from extracted facts. It involves learning uncertain commonsense knowledge (in the form of probabilistic first-order rules) from natural language text by mining a large corpus of automatically extracted facts. These rules are then used to derive additional facts from extracted information using BLP inference. Experimental evaluation on a benchmark data set for machine reading demonstrates the efficacy of our approach.
6 0.58155489 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
7 0.50980216 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
8 0.49941891 53 acl-2012-Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions
9 0.4784582 184 acl-2012-String Re-writing Kernel
10 0.43388969 78 acl-2012-Efficient Search for Transformation-based Inference
11 0.34583077 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
12 0.29015273 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
13 0.28945261 197 acl-2012-Tokenization: Returning to a Long Solved Problem A Survey, Contrastive Experiment, Recommendations, and Toolkit
14 0.28046826 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
15 0.27922195 186 acl-2012-Structuring E-Commerce Inventory
16 0.27807602 195 acl-2012-The Creation of a Corpus of English Metalanguage
17 0.27479783 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
18 0.27380392 44 acl-2012-CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora
19 0.271972 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
20 0.26972714 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
topicId topicWeight
[(25, 0.026), (26, 0.02), (28, 0.016), (30, 0.503), (37, 0.012), (39, 0.052), (74, 0.03), (84, 0.015), (85, 0.026), (90, 0.083), (92, 0.085), (99, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.91107589 65 acl-2012-Crowdsourcing Inference-Rule Evaluation
Author: Naomi Zeichner ; Jonathan Berant ; Ido Dagan
Abstract: The importance of inference rules to semantic applications has long been recognized and extensive work has been carried out to automatically acquire inference-rule resources. However, evaluating such resources has turned out to be a non-trivial task, slowing progress in the field. In this paper, we suggest a framework for evaluating inference-rule resources. Our framework simplifies a previously proposed “instance-based evaluation” method that involved substantial annotator training, making it suitable for crowdsourcing. We show that our method produces a large amount of annotations with high inter-annotator agreement for a low cost at a short period of time, without requiring training expert annotators.
2 0.79792261 144 acl-2012-Modeling Review Comments
Author: Arjun Mukherjee ; Bing Liu
Abstract: Writing comments about news articles, blogs, or reviews have become a popular activity in social media. In this paper, we analyze reader comments about reviews. Analyzing review comments is important because reviews only tell the experiences and evaluations of reviewers about the reviewed products or services. Comments, on the other hand, are readers’ evaluations of reviews, their questions and concerns. Clearly, the information in comments is valuable for both future readers and brands. This paper proposes two latent variable models to simultaneously model and extract these key pieces of information. The results also enable classification of comments accurately. Experiments using Amazon review comments demonstrate the effectiveness of the proposed models.
3 0.72151577 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
Author: Matthieu Constant ; Anthony Sigogne ; Patrick Watrin
Abstract: and Parsing Anthony Sigogne Universit e´ Paris-Est LIGM, CNRS France s igogne @univ-mlv . fr Patrick Watrin Universit e´ de Louvain CENTAL Belgium pat rick .wat rin @ ucl ouvain .be view, their incorporation has also been considered The integration of multiword expressions in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly pre-identified. This paper evaluates two empirical strategies to integrate multiword units in a real constituency parsing context and shows that the results are not as promising as has sometimes been suggested. Firstly, we show that pregrouping multiword expressions before parsing with a state-of-the-art recognizer improves multiword recognition accuracy and unlabeled attachment score. However, it has no statistically significant impact in terms of F-score as incorrect multiword expression recognition has important side effects on parsing. Secondly, integrating multiword expressions in the parser grammar followed by a reranker specific to such expressions slightly improves all evaluation metrics.
4 0.65647876 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu
Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.
5 0.49589986 83 acl-2012-Error Mining on Dependency Trees
Author: Claire Gardent ; Shashi Narayan
Abstract: In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data.
6 0.43950865 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
7 0.43162745 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
8 0.42661989 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
9 0.42506728 139 acl-2012-MIX Is Not a Tree-Adjoining Language
10 0.41709435 190 acl-2012-Syntactic Stylometry for Deception Detection
11 0.41301438 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
12 0.39464054 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
13 0.38946196 197 acl-2012-Tokenization: Returning to a Long Solved Problem A Survey, Contrastive Experiment, Recommendations, and Toolkit
14 0.38943002 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
15 0.38690847 34 acl-2012-Automatically Learning Measures of Child Language Development
16 0.3853749 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
17 0.37830666 108 acl-2012-Hierarchical Chunk-to-String Translation
18 0.3740243 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
19 0.37350547 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
20 0.3734189 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models