acl acl2010 acl2010-30 knowledge-graph by maker-knowledge-mining

30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Source: pdf

Author: Milen Kouylekov ; Matteo Negri

Abstract: This paper presents a general-purpose open source package for recognizing Textual Entailment. The system implements a collection of algorithms, providing a configurable framework to quickly set up a working environment to experiment with the RTE task. Fast prototyping of new solutions is also allowed by the possibility to extend its modular architecture. We present the tool as a useful resource to approach the Textual Entailment problem, as an instrument for didactic purposes, and as an opportunity to create a collaborative environment to promote research in the field.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 An Open-Source Package for Recognizing Textual Entailment Milen Kouylekov and Matteo Negri FBK - Fondazione Bruno Kessler Via Sommarive 18, 38100 Povo (TN), Italy [ kouylekov ,negri ] @ fbk eu . [sent-1, score-0.169]

2 Abstract This paper presents a general-purpose open source package for recognizing Textual Entailment. [sent-2, score-0.14]

3 The system implements a collection of algorithms, providing a configurable framework to quickly set up a working environment to experiment with the RTE task. [sent-3, score-0.162]

4 Fast prototyping of new solutions is also allowed by the possibility to extend its modular architecture. [sent-4, score-0.135]

5 We present the tool as a useful resource to approach the Textual Entailment problem, as an instrument for didactic purposes, and as an opportunity to create a collaborative environment to promote research in the field. [sent-5, score-0.125]

6 On one side, current RTE technology might not be mature enough to provide reliable components for such integration. [sent-9, score-0.038]

7 On the other side, the lack of available tools makes experimentation with the task, and the fast prototyping of new solutions, particularly difficult. [sent-11, score-0.074]

8 We believe that RTE research would significantly benefit from such availability, since it would allow to quickly set up a working environment for experiments, encourage participation of newcomers, and eventually promote state of the art advances. [sent-13, score-0.187]

9 The main contribution of this paper is to present the latest release of EDITS (Edit Distance Textual Entailment Suite), a freely available, open source software package for recognizing Textual Entailment. [sent-14, score-0.206]

10 The system has been designed following three basic requirements: Modularity. [sent-15, score-0.049]

11 Modules can be composed through a configuration file, and extended as plug-ins according to individual requirements. [sent-17, score-0.107]

12 System’s workflow, the behavior of the basic components, and their IO formats are described in a comprehensive documentation available upon download. [sent-18, score-0.051]

13 In addition, both language dependent and language independent configurations are allowed by algorithms that manipulate different representations of the input data. [sent-21, score-0.154]

14 c 01200 S1y0s Atesmso Dcieamtioonns ftorart Cioonms,p puatagteiso 4n2a–l4 L7in,guistics Figure 1: Entailment Engine, main components and workflow Adaptability. [sent-29, score-0.069]

15 Modules can be tuned over training data to optimize performance along several dimensions (e. [sent-30, score-0.025]

16 overall Accuracy, Precision/Recall trade-off on YES and NO entailment judgements). [sent-32, score-0.425]

17 In addition, an optimization component based on genetic algorithms is available to automatically set parameters starting from a basic configuration. [sent-33, score-0.134]

18 The latest release of the package can be downloaded from http : / /edits fbk eu. [sent-36, score-0.236]

19 System Overview The EDITS package allows to: • • • Create an Entailment Engine (Figure 1) by defining nits E n btaasilimc components (i. [sent-39, score-0.111]

20 EDITS implements a distance-based framework which assumes that the probability of an entailment relation between a given T-H pair is inversely proportional to the distance between T and H (i. [sent-42, score-0.617]

21 Within this framework the system implements and harmonizes different approaches to distance computation, providing both edit distance algorithms, and similarity algorithms (see Section 3 . [sent-45, score-0.675]

22 Each algorithm returns a normalized distance score (a number between 0 and 1). [sent-47, score-0.129]

23 At a training stage, distance scores calculated over annotated T-H pairs are used to estimate a threshold that best separates positive from negative examples. [sent-48, score-0.129]

24 The threshold, which is stored in a Model, is used at a test stage to assign an entailment judgement and a confidence score to each test pair. [sent-49, score-0.493]

25 In the creation of a distance Entailment Engine, algorithms are combined with cost schemes (see Section 3 . [sent-50, score-0.594]

26 3), and optional external knowledge represented as rules (see Section 3. [sent-52, score-0.035]

27 Besides the definition of a single Entailment Engine, a unique feature of EDITS is that it allows for the combination of multiple Entailment Engines in different ways (see Section 4. [sent-54, score-0.028]

28 Pre-defined basic components are already provided with EDITS, allowing to create a variety of entailment engines. [sent-56, score-0.488]

29 Fast prototyping of new solutions is also allowed by the possibility to extend the modular architecture of the system with new algorithms, cost schemes, rules, or plug-ins to new language processing components. [sent-57, score-0.363]

30 3 Basic Components This section overviews the main components of a distance Entailment Engine, namely: i) algorithms, iii) cost schemes, iii) the cost optimizer, and iv) entailment/contradiction rules. [sent-58, score-0.575]

31 1 Algorithms Algorithms are used to compute a distance score between T-H pairs. [sent-60, score-0.129]

32 EDITS provides a set of predefined algorithms, including edit distance algorithms, and similarity algorithms adapted to the proposed distance framework. [sent-61, score-0.694]

33 The choice of the available algorithms is motivated by their large use documented in RTE literature2. [sent-62, score-0.08]

34 Edit distance algorithms cast the RTE task as the problem of mapping the whole content of H into the content of T. [sent-63, score-0.209]

35 Mappings are performed as sequences of editing operations (i. [sent-64, score-0.084]

36 insertion, deletion, substitution of text portions) needed to transform T into H, where each edit operation has a cost associated with it. [sent-66, score-0.506]

37 The distance algorithms available in the current release of the system are: 2Detailed descriptions of all the systems participating in the TAC RTE Challenge are available at http :/ /www. [sent-67, score-0.272]

38 Similarity algorithms are adapted to the EDITS distance framework by transforming measures of the lexical/semantic similarity between T and H into distance measures. [sent-71, score-0.426]

39 These algorithms are also adapted to use the three edit operations to support overlap calculation, and define term weights. [sent-72, score-0.387]

40 2 Cost Schemes Cost schemes are used to define the cost of each edit operation. [sent-76, score-0.581]

41 Cost schemes are defined as XML files that explicitly associate a cost (a positive real number) to each edit operation applied to elements of T and H. [sent-77, score-0.683]

42 For instance, Tree Edit Distance will manipulate nodes in a dependency tree representation, whereas Token Edit Distance and similarity algorithms will manipulate words. [sent-79, score-0.23]

43 =B - substituting A swuibths tBit costs (2A0, Bif) =A2 a0nd if fB A are Bdif -fe sruebnts. [sent-81, score-0.066]

44 t In the distance-based framework adopted by EDITS, the interaction between algorithms and cost schemes plays a central role. [sent-82, score-0.44]

45 Given a T-H pair, in fact, the distance score returned by an algorithm directly depends on the cost of the operations applied to transform T into H (edit distance algorithms), or on the cost of mapping words in H with words in T (similarity algorithms). [sent-83, score-0.754]

46 Such interaction determines the overall behaviour of an Entailment Engine, since distance scores returned by the same algorithm with different cost schemes can be considerably different. [sent-84, score-0.521]

47 This allows users to define (and optimize, as explained in Section 3. [sent-85, score-0.051]

48 3) the cost schemes that best suit the RTE data they want to model3. [sent-86, score-0.36]

49 EDITS provides two predefined cost schemes: • • Simple Cost Scheme - the one shown in Figure 2, setting ficxheedm ceo -st ths efo orn eea schho ewdnit operation. [sent-87, score-0.251]

50 IDF Cost Scheme - insertion and deletion costs Cfoors a Swcohredm w are snsete rtotio othne inndve rdseel edtioocnument frequency of w (IDF(w)). [sent-88, score-0.142]

51 The substitution cost is set to 0 if a word w1 from T and a word w2 from H are the same, and IDF(w1)+IDF(w2) otherwise. [sent-89, score-0.251]

52 3For instance, when dealing with T-H pairs composed by texts that are much longer than the hypotheses (as in the RTE5 Campaign), setting low deletion costs avoids penalization to short Hs fully contained in the Ts. [sent-90, score-0.112]

53 44 In the creation of new cost schemes, users can express edit operation costs, and conditions over the A and B elements, using a meta-language based on a lisp-like syntax (e. [sent-91, score-0.535]

54 The system also provides functions to access data stored in hash files. [sent-94, score-0.089]

55 For example, the IDF Cost Scheme accesses the IDF values of the most frequent 100K English words (calculated on the Brown Corpus) stored in a file distributed with the system. [sent-95, score-0.149]

56 Users can create new hash files to collect statistics about words in other languages, or other information to be used inside the cost scheme. [sent-96, score-0.263]

57 3 Cost Optimizer A cost optimizer is used to adapt cost schemes (either those provided with the system, or new ones defined by the user) to specific datasets. [sent-98, score-0.668]

58 The optimizer is based on cost adaptation through genetic algorithms, as proposed in (Mehdad, 2009). [sent-99, score-0.337]

59 To this aim, cost schemes can be parametrized by externalizing as parameters the edit operations costs. [sent-100, score-0.637]

60 The optimizer iterates over training data using different values ofthese parameters until on optimal set is found (i. [sent-101, score-0.104]

61 lexical, syntactic, semantic) about the probability of entailment or contradiction between elements of T and H. [sent-107, score-0.49]

62 Rules are invoked by cost schemes to influence the cost of substitutions between elements of T and H. [sent-108, score-0.601]

63 Typically, the cost of the substitution between two elements A and B is inversely proportional to the probability that A entails B . [sent-109, score-0.317]

64 Rules are stored in XML files called Rule Repositories, with the format shown in Figure 3. [sent-110, score-0.068]

65 Each rule consists of three parts: i) a left-hand side, ii) a right-hand side, iii) a probability that the left-hand side entails (or contradicts) the righthand side. [sent-111, score-0.027]

66 EDITS provides three predefined sets of lexical entailment rules acquired from lexical resources widely used in RTE: WordNet4 , Lin’s word similarity dictionaries5 , and VerbOcean6 . [sent-112, score-0.565]

67 Figure 3: Example of XML Rule Repository 4 Using the System This section provides basic information about the use of EDITS, which can be run with commands in a Unix Shell. [sent-123, score-0.025]

68 A complete guide to all the parameters of the main script is available as HTML documentation downloadable with the package. [sent-124, score-0.026]

69 1 Input The input of the system is an entailment corpus represented in the EDITS Text Annotation Format (ETAF), a simple XML internal annotation format. [sent-126, score-0.449]

70 ETAF is used to represent both the input T-H pairs, and the entailment and contradiction rules. [sent-127, score-0.453]

71 Plug-ins for several widely used annotation tools (including TreeTagger, Stanford Parser, and OpenNLP) can be downloaded from the system’s website. [sent-129, score-0.032]

72 2 Configuration The creation of an Entailment Engine is done by defining its basic components (algorithms, cost schemes, optimizer, and rules) through an XML configuration file. [sent-133, score-0.399]

73 The configuration file is divided in modules, each having a set of options. [sent-134, score-0.219]

74 Adding external knowledge to an entailment engine can be done by extending the configuration file with a reference to a rules file (e. [sent-136, score-0.892]

75 3 Training and Test Given a configuration file and an RTE corpus annotated in ETAF, the user can run the training procedure to learn a model. [sent-141, score-0.219]

76 overall Accuracy, Precision/Recall trade-off on YES and/or NO entailment judgements). [sent-144, score-0.425]

77 The output of the training phase is a model: a zip file that contains the learned threshold, the configuration file, the cost scheme, and the entailment/contradiction rules used to calculate the threshold. [sent-146, score-0.458]

78 The explicit availability of all this information in the model allows users to share, replicate and modify experiments7. [sent-147, score-0.077]

79 Given a model and an un-annotated RTE corpus as input, the test procedure produces a file containing for each pair: i) the decision of the system (YES, NO), ii) the confidence of the decision, iii) the entailment score, iv) the sequence of edit operations made to calculate the entailment score. [sent-148, score-1.263]

80 This can be done by grouping their definitions as sub-modules in the configuration file. [sent-151, score-0.107]

81 EDITS allows users to define customized combination strategies, or to use two predefined combination modalities provided with the package, 7Our policy is to publish online the models we use for participation in the RTE Challenges. [sent-152, score-0.22]

82 We encourage other users of EDITS to do the same, thus creating a collaborative environment, allow new users to quickly modify working configurations, and replicate results. [sent-153, score-0.215]

83 The two modalities combine in different ways the entailment scores produced by multiple independent engines, and return a final decision for each T-H pair. [sent-155, score-0.456]

84 Linear Combination returns an overall entailment score as the weighted sum of the entailment scores returned by each engine: ! [sent-156, score-0.882]

85 i=0 In this formula, weighti is an ad-hoc weight parameter for each entailment engine. [sent-159, score-0.463]

86 Optimal weight parameters can be determined using the same optimization strategy used to optimize the cost schemes, as described in Section 3 . [sent-160, score-0.229]

87 Classifier Combination is similar to the approach proposed in (Malakasiotis and Androutsopoulos, 2007), and is based on using the entailment scores returned by each engine as features to train a classifier (see Figure 4). [sent-162, score-0.558]

88 By default the plug-in uses an SVM classifier, but other Weka algorithms can be specified as options in the configuration file. [sent-164, score-0.187]

89 The following configuration file describes a combination of two engines (i. [sent-165, score-0.305]

90 nz/ml/weka 9A linear combination can be easily obtained by changing the alias of the highest-level module (“weka”) into “linear”. [sent-173, score-0.028]

91 5 Experiments with EDITS To give an idea of the potentialities of the EDITS package in terms of flexibility and adaptability, this section reports some results achieved in RTE-related tasks by previous versions of the tool. [sent-175, score-0.073]

92 As regards the RTE Challenges, in the last years EDITS has been used to participate both in the PASCAL/TAC RTE Campaigns for the English language (Mehdad et al. [sent-177, score-0.029]

93 In the “Search” task (which consists in finding all the sentences that entail a given H in a given set of documents about a topic) the same configuration achieved an F1 of 33. [sent-183, score-0.107]

94 To promote the use of EDITS and ease experimentation, the complete models used to produce each submitted run can be downloaded with the system. [sent-188, score-0.08]

95 An improved model obtained with the current release of EDITS, and trained over RTE-5 data (61. [sent-189, score-0.039]

96 As regards application-oriented integrations, EDITS has been successfully used as a core component in a Restricted-Domain Question Answering system within the EU-Funded QALL-ME Project10 . [sent-191, score-0.053]

97 In recognizing 14 relations relevant in the CINEMA domain present in a collection of spoken English requests, the system 10http://qallme. [sent-193, score-0.091]

98 6 Conclusion We have presented the first open source package for recognizing Textual Entailment. [sent-197, score-0.14]

99 The system offers a modular, flexible, and adaptable working environment to experiment with the task. [sent-198, score-0.102]

100 In addition, the availability of pre-defined system configurations, tested in the past Evaluation Campaigns, represents a first contribution to set up a collaborative environment, and promote advances in RTE research. [sent-199, score-0.102]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('edits', 0.431), ('entailment', 0.425), ('rte', 0.397), ('edit', 0.221), ('cost', 0.204), ('schemes', 0.156), ('evalita', 0.13), ('mehdad', 0.13), ('distance', 0.129), ('idf', 0.114), ('file', 0.112), ('etaf', 0.108), ('negri', 0.108), ('configuration', 0.107), ('xml', 0.105), ('kouylekov', 0.104), ('optimizer', 0.104), ('engine', 0.101), ('textual', 0.085), ('algorithms', 0.08), ('matteo', 0.076), ('package', 0.073), ('milen', 0.07), ('recognizing', 0.067), ('costs', 0.066), ('cabrio', 0.065), ('fbk', 0.065), ('yashar', 0.065), ('similarity', 0.058), ('engines', 0.058), ('operations', 0.056), ('users', 0.051), ('prototyping', 0.049), ('promote', 0.048), ('substitution', 0.047), ('environment', 0.047), ('predefined', 0.047), ('manipulate', 0.046), ('deletion', 0.046), ('androutsopoulos', 0.043), ('malakasiotis', 0.043), ('overture', 0.043), ('swarm', 0.043), ('scheme', 0.043), ('yes', 0.042), ('release', 0.039), ('components', 0.038), ('parlance', 0.038), ('weighti', 0.038), ('stored', 0.037), ('tac', 0.037), ('elements', 0.037), ('modular', 0.036), ('rules', 0.035), ('participation', 0.035), ('implements', 0.034), ('operation', 0.034), ('modules', 0.033), ('elena', 0.033), ('shasha', 0.033), ('downloaded', 0.032), ('returned', 0.032), ('files', 0.031), ('working', 0.031), ('glickman', 0.031), ('campaigns', 0.031), ('modalities', 0.031), ('judgement', 0.031), ('workflow', 0.031), ('insertion', 0.03), ('iii', 0.03), ('adapted', 0.03), ('collaborative', 0.03), ('genetic', 0.029), ('particle', 0.029), ('campaign', 0.029), ('inversely', 0.029), ('regards', 0.029), ('configurations', 0.028), ('contradiction', 0.028), ('editing', 0.028), ('hash', 0.028), ('combination', 0.028), ('latest', 0.027), ('side', 0.027), ('ii', 0.027), ('replicate', 0.026), ('documentation', 0.026), ('judgements', 0.026), ('quickly', 0.026), ('bernardo', 0.025), ('magnini', 0.025), ('weka', 0.025), ('basic', 0.025), ('optimize', 0.025), ('experimentation', 0.025), ('solutions', 0.025), ('possibility', 0.025), ('creation', 0.025), ('system', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Author: Milen Kouylekov ; Matteo Negri

2 0.48842716 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

3 0.29440075 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

4 0.25010803 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.

5 0.20367898 121 acl-2010-Generating Entailment Rules from FrameNet

Author: Roni Ben Aharon ; Idan Szpektor ; Ido Dagan

Abstract: Idan Szpektor Ido Dagan Yahoo! Research Department of Computer Science Haifa, Israel Bar-Ilan University idan @ yahoo- inc .com Ramat Gan, Israel dagan @ c s .biu . ac . i l FrameNet is a manually constructed database based on Frame Semantics. It models the semantic Many NLP tasks need accurate knowledge for semantic inference. To this end, mostly WordNet is utilized. Yet WordNet is limited, especially for inference be- tween predicates. To help filling this gap, we present an algorithm that generates inference rules between predicates from FrameNet. Our experiment shows that the novel resource is effective and complements WordNet in terms of rule coverage.

6 0.12407634 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

7 0.098725095 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

8 0.086680867 56 acl-2010-Bridging SMT and TM with Translation Recommendation

9 0.071454212 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

10 0.06956327 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

11 0.068523422 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

12 0.05882233 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

13 0.058230735 116 acl-2010-Finding Cognate Groups Using Phylogenies

14 0.055681214 67 acl-2010-Computing Weakest Readings

15 0.054342747 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

16 0.050203837 27 acl-2010-An Active Learning Approach to Finding Related Terms

17 0.045796912 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands

18 0.043489404 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

19 0.04260521 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

20 0.041679241 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.16), (1, 0.074), (2, -0.003), (3, -0.117), (4, -0.003), (5, 0.091), (6, 0.046), (7, 0.09), (8, -0.287), (9, -0.248), (10, -0.167), (11, 0.376), (12, -0.122), (13, 0.034), (14, -0.168), (15, -0.238), (16, -0.125), (17, -0.065), (18, -0.051), (19, 0.02), (20, -0.012), (21, -0.072), (22, 0.046), (23, 0.086), (24, -0.071), (25, 0.046), (26, 0.035), (27, -0.025), (28, -0.06), (29, -0.004), (30, 0.008), (31, -0.058), (32, 0.01), (33, 0.053), (34, -0.024), (35, -0.006), (36, -0.048), (37, -0.063), (38, 0.047), (39, -0.011), (40, 0.058), (41, 0.068), (42, 0.005), (43, 0.016), (44, -0.009), (45, 0.066), (46, 0.081), (47, 0.034), (48, 0.013), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9667992 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Author: Milen Kouylekov ; Matteo Negri

2 0.90498912 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

3 0.73795396 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

4 0.68324405 121 acl-2010-Generating Entailment Rules from FrameNet

Author: Roni Ben Aharon ; Idan Szpektor ; Ido Dagan

5 0.61776739 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

6 0.32725421 67 acl-2010-Computing Weakest Readings

7 0.27693799 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

8 0.23926345 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

9 0.23397234 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

10 0.21649072 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

11 0.21592994 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

12 0.20542066 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction

13 0.20295782 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

14 0.20153777 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

15 0.1932012 116 acl-2010-Finding Cognate Groups Using Phylogenies

16 0.19275019 126 acl-2010-GernEdiT - The GermaNet Editing Tool

17 0.18565263 56 acl-2010-Bridging SMT and TM with Translation Recommendation

18 0.18536872 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

19 0.18375711 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

20 0.18362384 259 acl-2010-WebLicht: Web-Based LRT Services for German

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.342), (14, 0.016), (25, 0.055), (39, 0.017), (42, 0.03), (44, 0.013), (59, 0.081), (73, 0.042), (78, 0.05), (80, 0.019), (83, 0.099), (84, 0.033), (97, 0.011), (98, 0.106)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77445769 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

Author: Milen Kouylekov ; Matteo Negri

2 0.64002568 31 acl-2010-Annotation

Author: Eduard Hovy

Abstract: unkown-abstract

3 0.60907489 114 acl-2010-Faster Parsing by Supertagger Adaptation

Author: Jonathan K. Kummerfeld ; Jessika Roesner ; Tim Dawborn ; James Haggerty ; James R. Curran ; Stephen Clark

Abstract: We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtain- ing significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.

4 0.479514 158 acl-2010-Latent Variable Models of Selectional Preference

Author: Diarmuid O Seaghdha

Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.

5 0.47899878 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng

Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1

6 0.47874302 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

7 0.47573537 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

8 0.47546238 71 acl-2010-Convolution Kernel over Packed Parse Forest

9 0.47362065 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

10 0.47212008 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

11 0.47088593 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

12 0.46938321 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

13 0.46743104 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

14 0.46741629 248 acl-2010-Unsupervised Ontology Induction from Text

15 0.46714512 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

16 0.46705586 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser

17 0.46704817 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

18 0.4670071 130 acl-2010-Hard Constraints for Grammatical Function Labelling

19 0.46665502 39 acl-2010-Automatic Generation of Story Highlights

20 0.46663877 195 acl-2010-Phylogenetic Grammar Induction