acl acl2010 acl2010-258 knowledge-graph by maker-knowledge-mining

258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs


Source: pdf

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Presupposition relations between verbs are not very well covered in existing lexical semantic resources. [sent-3, score-0.478]

2 We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. [sent-4, score-1.338]

3 We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. [sent-5, score-1.196]

4 This presupposition does not need to be stated, but is implicitly understood. [sent-11, score-0.473]

5 The phenomenon of presupposition has been throughly investigated by philosophers and linguists (i. [sent-13, score-0.473]

6 There are only few attempts for practical implementations of presupposition in computational linguistics (e. [sent-16, score-0.473]

7 Especially, presupposition is understudied in the field of corpus-based learning of semantic relations. [sent-19, score-0.598]

8 Machine learning methods have been previously applied to determine semantic relations such as is-a and part-of, also succession, reaction and production (Pantel and Pennacchiotti, 2006). [sent-20, score-0.313]

9 Chklovski and Pantel (2004) explored classification of fine-grained verb semantic relations, such as similarity, strength, antonymy, enablement and happens-before. [sent-21, score-0.659]

10 For the task of entailment recognition, learning of entailment relations was attempted (Pekar, 2008). [sent-22, score-0.515]

11 None of the previous work investigated subclassifying semantic relations including presupposition and entailment, two relations that are closely related, but behave differently in context. [sent-23, score-0.908]

12 In particular, the inferential behaviour of presuppositions and entailments crucially differs in special semantic contexts. [sent-24, score-0.279]

13 , while presuppositions are preserved under negation (as in Columbus managed/didn ’t manage to reach India the presupposition tried to), entailments do not survive under negation (John F. [sent-27, score-0.797]

14 This paper presents a weakly supervised algorithm for learning presupposition relations between verbs cast as a discriminative classification problem. [sent-31, score-1.093]

15 Among the semantic relations defined specifically for verbs are entailment, hyponymy, troponymy, antonymy and cause. [sent-38, score-0.618]

16 However, not all of them are well covered, for example, there are only few entries for presupposition and entailment in WordNet. [sent-39, score-0.653]

17 0c S2t0u1d0en Ats Rsoecseia tricohn W fo r Cksohmop ,u pta gtieosna 9l7 L–i1n0g2u,istics One attempt to acquire fine-grained semantic relations from corpora is VerbOcean (Chklovski and Pantel, 2004). [sent-42, score-0.28]

18 Chklovski and Pantel used a semi-automatic approach for extracting semantic relations between verbs using a list of patterns. [sent-43, score-0.478]

19 The selection of the semantic relations was inspired by WordNet. [sent-44, score-0.28]

20 VerbOcean showed good accuracy values for the antonymy (50%), similarity (63%) and strength (75%) relations. [sent-45, score-0.205]

21 However, VerbOcean doesn’t distinguish between entailment and presupposition; they are conflated in the classes enablement and happens-before. [sent-46, score-0.242]

22 However, the method is able to recognize the existence of semantic relations holding between verbs and hence can be used as a basis for finding and further discriminating more detailed semantic relations. [sent-49, score-0.643]

23 3 A Weakly Supervised Approach to Learning Presupposition Relations We describe a weakly supervised approach for learning semantic relations between verbs includ- ing implicit relations such as presupposition. [sent-50, score-0.782]

24 Our aim is to perform a type-based classification of verb pairs. [sent-51, score-0.472]

25 , we determine the class of a verbpair relation by observing co-occurrences of these verbs in contexts that are indicative for their intrinsic meaning relation. [sent-54, score-0.355]

26 This task differs from a token-based classification, which aims at classifying each verb pair instance as it occurs in context. [sent-55, score-0.461]

27 We distinguish between the five classes of semantic relations presented in Table 1. [sent-57, score-0.28]

28 We chose entailment, temporal inclusion and antonymy, because these relations may be confounded with the presupposition relation. [sent-58, score-0.803]

29 A special class other/no comprises semantic relations not discussed in this paper (e. [sent-59, score-0.309]

30 synonymy) and verb pairs that are not related by a semantic relation. [sent-61, score-0.572]

31 The relations can be subdivided into symmetric and asymmetric relations, and relations that involve temporal sequence, or those that do not involve a temporal order, as displayed in Table 1. [sent-62, score-0.533]

32 Our algorithm starts with a small number of seed verb pairs selected manually for each relation and iteratively classifies a large set of unseen and un- RSeemlaatniotincExampleSymmetryTSeemqupeonrcael labeled verb pairs. [sent-64, score-1.082]

33 Training the Classifiers We independently train binary classifiers for each semantic relation using both shallow and deep features. [sent-66, score-0.382]

34 The predictions of the classifiers are combined using ensemble learning techniques to determine the most confident classification. [sent-69, score-0.216]

35 The obtained list of the classified instances is ranked using pattern scores, in order to select the most reliable candidates for extension of the training set. [sent-70, score-0.287]

36 Both shallow lexical-syntactic and deep syntactic features are used for the classification of semantic relations. [sent-72, score-0.333]

37 the distance between two analyzed verbs and the order of their appearance 2. [sent-74, score-0.248]

38 verb form (tense, aspect, modality, voice), presence of negation and polarity verbs1 3. [sent-75, score-0.456]

39 co-reference relation holding between the subjects and objects of the verbs (both verbs have the same subject/object, subject of one verb corresponds to the object of the second or there is no relation between them). [sent-80, score-0.98]

40 1Polarity verbs are taken from the polarity lexicon of Nairn et al. [sent-82, score-0.254]

41 Candidate verb pairs are obtained from a previously compiled list of highly associated verbs. [sent-88, score-0.491]

42 We use the DIRT Collection (Lin and Pantel, 2001) from which we further extract pairs of highly associated verbs as candidates for classification. [sent-89, score-0.346]

43 The advantage of this resource is that it consists ofpairs ofverbs which stand in a semantic relation (cf. [sent-90, score-0.251]

44 This considerably reduces the number of verb pairs that need to be processed as candidates in our classification task. [sent-92, score-0.62]

45 DIRT contains 5,604 verb types and 808,764 verb pair types. [sent-93, score-0.788]

46 This still represents a huge number of verb pairs to be processed. [sent-94, score-0.447]

47 We therefore filtered the extracted set by checking verb pair frequency in the first three parts of the ukWAC corpus (Baroni et al. [sent-95, score-0.434]

48 This reduces the number of verb pairs to 199,393. [sent-101, score-0.447]

49 For each semantic relation we select three verb pairs as seeds. [sent-102, score-0.667]

50 The only exception is temporal inclusion for which we selected six verb pairs, due to the low frequency of such verb pairs within a single sentence. [sent-103, score-1.007]

51 These verb pairs were used for building an initial training corpus of verb pairs in context. [sent-104, score-0.93]

52 The remaining verb pairs are used to build the corpus of unlabeled verb pairs in context in the iterative classification process. [sent-105, score-1.066]

53 Given these verb pairs, we extracted sentences for training and for unlabeled data set from the first three parts of the UKWAC corpus (Baroni et al. [sent-107, score-0.474]

54 We compiled a set of CQP queries (Evert, 2005) to find sentences that contain both verbs of a verb pair and applied them on UKWAC 1 . [sent-109, score-0.706]

55 We filter out sentences with more than 60 words and sentences with a distance between verbs exceeding 20 words. [sent-113, score-0.258]

56 To avoid growing complexity, only sentences with exactly one occurrence of each verb pair are retained. [sent-114, score-0.464]

57 We also remove sentences that trigger wrong candidates, in which the auxiliaries have or do appear in a candidate verb pair. [sent-115, score-0.423]

58 All sentences containing seed verb pairs extracted from UKWAC 1 are annotated manually with two values true/false in order to separate the negative training data. [sent-122, score-0.669]

59 We build an extended, heuristically annotated training set for the seed verb pairs, by extracting further instances from the remaining corpora (UKWAC 2 and UKWAC 3). [sent-125, score-0.558]

60 , we manually compiled a small stoplist of patterns that are used to filter out wrong instances. [sent-127, score-0.28]

61 For example, the verbs look and see can stand in an entailment relation if look is followed by the prepositions at, on, in, but not in case of prepositions after or forward (e. [sent-129, score-0.56]

62 To further enrich the training set of data, synonyms of the verb pairs are manually selected from WordNet. [sent-134, score-0.571]

63 The corresponding verb pairs were extracted from UKWAC 1. [sent-135, score-0.447]

64 The overall size of the training set for the first classification step is 15,717 sentences from which 5,032 are manually labeled, 9,918 sentences are automatically labeled and 757 sentences contain synonymous verb pairs. [sent-141, score-0.735]

65 We balanced the training set by undersampling entailment and other/no by 20% and correspondingly oversampling the temporal inclusion class. [sent-145, score-0.391]

66 Similar to other pattern-based approaches we use a set of seed verb pairs to induce indicative patterns for each semantic relation. [sent-147, score-0.735]

67 We use the induced patterns to restrict the number of the verb pair candidates and to rank the labelled instances in the iterative classification step. [sent-148, score-0.754]

68 The patterns use information about the verb forms of analyzed verb pairs, modal verbs and the 99 polarity verbs (only if they are related to the analyzed verbs) and coordinating/subordinating conjunctions connecting two verbs. [sent-149, score-1.371]

69 The analyzed verbs in the sentence are substituted with V1 and V2 placeholders in the pattern. [sent-150, score-0.248]

70 and the verb pair (find,seek) we induce the following pattern: V2 and do [not|n ’t] V1. [sent-152, score-0.434]

71 Examples of the best patterns we determined for semantic relations are presented in Table 2. [sent-154, score-0.358]

72 The pattern reliability is calculated as follows: where: rπ(p) =|I1|iP∈Impmaxi(pim,pi)× ri(i) (1) pmi(i, p) - pointwise mutual information (PMI) between the instance iand the pattern p; maxpmi - maximum PMI between all patterns and all instances; ri (i) - reliability of an instance i. [sent-156, score-0.504]

73 For seeds ri (i) = 1(they are selected manually), for the next iterations the instance reliability is: ri(i) = |P1|pP∈Ppmmaix(pi,mpi) × rπ(p) (2) We also consider uPsing the patterns as a feature for classification, in case they turn out to be sufficiently discriminative. [sent-157, score-0.308]

74 As the primary goal of this paper is to classify semantic relations on the type level, we elaborated a first gold standard dataset for typebased classification. [sent-161, score-0.497]

75 We used a small sample of 100 verb pairs randomly selected from the automatically labeled corpus. [sent-162, score-0.52]

76 2 While the first gold standard dataset of verb pairs was annotated out of context, we constructed a second gold standard of verb pairs annotated at the token level, i. [sent-170, score-1.114]

77 3 We proposed to one judge to annotate the same 100 verb pair types as in the previous annotation task, this time in context. [sent-178, score-0.493]

78 For this purpose we randomly selected 10 instances for each verb pair type (for rare verb pair types only 5). [sent-179, score-0.968]

79 Only 10% of verb pair types were established as conflicting with the ground truth. [sent-181, score-0.509]

80 Both token-based and type-based classification starts with determining of the most confident classification for instances. [sent-188, score-0.326]

81 Each instance of the corpus of unlabeled verb pairs is classified by the individual binary classifiers. [sent-189, score-0.618]

82 In order to select the most confident classification we compare the votes of the individual classifiers as follows: 1. [sent-190, score-0.252]

83 If an instance is classified as true by more than one classifier, we consider only the classification with the highest confidence. [sent-194, score-0.235]

84 4 In contrast to token-based classification that accepts only one semantic relation, for type-based classification we allow the existence of more than one semantic relation for a verb pair. [sent-195, score-0.935]

85 If less than 10% of the instances for a verb pair are classified with some specific semantic relation, this classification is considered to be unconfident and is discarded. [sent-197, score-0.836]

86 If a verb pair is classified as positive for more than three semantic relations, this verb pair remains unclassified. [sent-199, score-1.083]

87 If a verb pair is classified with up to three semantic relations and if more than 10% of the examples are classified with any ofthese relations, the verb pair is labeled with all of them. [sent-201, score-1.37]

88 4We assume that within a given context a verb pair can exhibit only one relation. [sent-210, score-0.434]

89 Majority - the semantic relation with which the majority of the sentences containing a verb pair have been annotated. [sent-213, score-0.721]

90 , but after removing the label NONE from all relation assignments except for those cases where NONE is the only label assigned to a verb pair. [sent-216, score-0.449]

91 5 We computed accuracy as the number of verb pairs which were correctly labeled by the system divided by the total number of system labels. [sent-217, score-0.527]

92 We compare our results against a baseline of random assignment, taking the distribution found in the manually labeled gold standard as the underlying verb relation distribution. [sent-218, score-0.601]

93 We also evaluate the accuracy of classification for tokenbased classification as the number of instances which were correctly labeled by the system divided by the total number of system labels. [sent-221, score-0.385]

94 The best performance is achieved by antonymy (72% and 42% respectively for both 5The second measure was used because in many cases the relation NONE has been determined to be the majority class. [sent-226, score-0.272]

95 101 Semantic relationCountAccuracyBaseline measures), followed by temporal inclusion, presupposition and entailment. [sent-228, score-0.571]

96 Given the difficulty of the classification we suspect that correction of system output relations for establishing a gold standard bears a strong risk in favouring system classifications. [sent-237, score-0.356]

97 6 Conclusion and Future Work The results achieved in our experiment show that weakly supervised methods can be applied for learning presupposition relations between verbs. [sent-238, score-0.777]

98 It would be interesting to test our algorithm with different amounts of manually annotated training sets and different combinations of manually and automatically annotated training sets to determine the minimal amount of data needed to assure good accuracy. [sent-241, score-0.305]

99 Also, we are going to analyze the influence of single features on the classification and determining optimal feature sets, as well as the question of including patterns in the feature set. [sent-243, score-0.224]

100 : Verbocean: Mining the web for fine-grained semantic verb relations. [sent-257, score-0.479]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('presupposition', 0.473), ('verb', 0.354), ('ukwac', 0.217), ('verbs', 0.198), ('pantel', 0.187), ('entailment', 0.18), ('verbocean', 0.174), ('relations', 0.155), ('antonymy', 0.14), ('reliability', 0.126), ('semantic', 0.125), ('chklovski', 0.121), ('classification', 0.118), ('weakly', 0.107), ('presuppositions', 0.1), ('temporal', 0.098), ('relation', 0.095), ('typebased', 0.093), ('pairs', 0.093), ('classified', 0.09), ('pair', 0.08), ('patterns', 0.078), ('inclusion', 0.077), ('none', 0.075), ('pennacchiotti', 0.073), ('classifiers', 0.072), ('instances', 0.069), ('pmi', 0.067), ('judges', 0.067), ('dirt', 0.063), ('confident', 0.062), ('enablement', 0.062), ('sandt', 0.062), ('stalnaker', 0.062), ('stoplist', 0.062), ('tremper', 0.062), ('baroni', 0.061), ('deep', 0.06), ('manually', 0.057), ('polarity', 0.056), ('seed', 0.056), ('candidates', 0.055), ('pekar', 0.054), ('nairn', 0.054), ('cqp', 0.054), ('entailments', 0.054), ('unlabeled', 0.054), ('gold', 0.053), ('analyzed', 0.05), ('ensemble', 0.049), ('evert', 0.047), ('xle', 0.047), ('ri', 0.046), ('negation', 0.046), ('crouch', 0.044), ('compiled', 0.044), ('annotated', 0.043), ('labeled', 0.042), ('supervised', 0.042), ('ground', 0.041), ('holding', 0.04), ('elaborated', 0.04), ('frank', 0.039), ('manage', 0.039), ('didn', 0.039), ('reach', 0.039), ('wrong', 0.039), ('accuracy', 0.038), ('annotations', 0.038), ('synonymous', 0.038), ('pattern', 0.037), ('majority', 0.037), ('truth', 0.036), ('training', 0.036), ('witten', 0.035), ('conflicting', 0.034), ('bos', 0.034), ('determine', 0.033), ('conjunctions', 0.033), ('selected', 0.031), ('columbus', 0.031), ('classify', 0.031), ('judge', 0.031), ('stand', 0.031), ('sentences', 0.03), ('shallow', 0.03), ('difficulty', 0.03), ('comprises', 0.029), ('indicative', 0.029), ('prepositions', 0.028), ('token', 0.028), ('annotation', 0.028), ('determining', 0.028), ('lin', 0.027), ('instance', 0.027), ('strength', 0.027), ('galina', 0.027), ('subdivided', 0.027), ('condoravdi', 0.027), ('tnhse', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

2 0.16441378 85 acl-2010-Detecting Experiences from Weblogs

Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng

Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.

3 0.14188436 127 acl-2010-Global Learning of Focused Entailment Graphs

Author: Jonathan Berant ; Ido Dagan ; Jacob Goldberger

Abstract: We propose a global algorithm for learning entailment relations between predicates. We define a graph structure over predicates that represents entailment relations as directed edges, and use a global transitivity constraint on the graph to learn the optimal set of edges, by formulating the optimization problem as an Integer Linear Program. We motivate this graph with an application that provides a hierarchical summary for a set of propositions that focus on a target concept, and show that our global algorithm improves performance by more than 10% over baseline algorithms.

4 0.12829426 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

Author: Shachar Mirkin ; Ido Dagan ; Sebastian Pado

Abstract: Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the basis of an in-depth analysis of entailment instances, we argue that discourse references have the potential of substantially improving textual entailment recognition, and identify a number of research directions towards this goal.

5 0.12699732 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

6 0.12460946 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

7 0.12070128 121 acl-2010-Generating Entailment Rules from FrameNet

8 0.11723489 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

9 0.11061214 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

10 0.10867915 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

11 0.10708724 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

12 0.098725095 30 acl-2010-An Open-Source Package for Recognizing Textual Entailment

13 0.093369879 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

14 0.086819366 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

15 0.084882818 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information

16 0.083689034 27 acl-2010-An Active Learning Approach to Finding Related Terms

17 0.082917817 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

18 0.082068257 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

19 0.081501551 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

20 0.078380428 25 acl-2010-Adapting Self-Training for Semantic Role Labeling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.222), (1, 0.134), (2, 0.015), (3, -0.002), (4, 0.071), (5, 0.051), (6, 0.043), (7, 0.07), (8, -0.088), (9, -0.09), (10, -0.081), (11, 0.191), (12, -0.088), (13, -0.041), (14, 0.071), (15, 0.058), (16, 0.08), (17, 0.023), (18, 0.078), (19, -0.002), (20, 0.043), (21, 0.076), (22, -0.014), (23, 0.096), (24, 0.008), (25, -0.087), (26, -0.014), (27, 0.073), (28, -0.029), (29, 0.012), (30, -0.007), (31, 0.021), (32, 0.072), (33, -0.022), (34, 0.07), (35, -0.02), (36, -0.05), (37, 0.085), (38, 0.054), (39, -0.003), (40, -0.029), (41, -0.061), (42, 0.043), (43, -0.099), (44, -0.01), (45, -0.052), (46, -0.107), (47, 0.026), (48, -0.026), (49, -0.149)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96235371 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

2 0.73026043 85 acl-2010-Detecting Experiences from Weblogs

Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng

Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.

3 0.66018224 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

Author: Ruihong Huang ; Ellen Riloff

Abstract: This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynam- ically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.

4 0.65488338 121 acl-2010-Generating Entailment Rules from FrameNet

Author: Roni Ben Aharon ; Idan Szpektor ; Ido Dagan

Abstract: Idan Szpektor Ido Dagan Yahoo! Research Department of Computer Science Haifa, Israel Bar-Ilan University idan @ yahoo- inc .com Ramat Gan, Israel dagan @ c s .biu . ac . i l FrameNet is a manually constructed database based on Frame Semantics. It models the semantic Many NLP tasks need accurate knowledge for semantic inference. To this end, mostly WordNet is utilized. Yet WordNet is limited, especially for inference be- tween predicates. To help filling this gap, we present an algorithm that generates inference rules between predicates from FrameNet. Our experiment shows that the novel resource is effective and complements WordNet in terms of rule coverage.

5 0.63465476 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet

Author: Clifton McFate

Abstract: A robust dictionary of semantic frames is an essential element of natural language understanding systems that use ontologies. However, creating lexical resources that accurately capture semantic representations en masse is a persistent problem. Where the sheer amount of content makes hand creation inefficient, computerized approaches often suffer from over generality and difficulty with sense disambiguation. This paper describes a semi-automatic method to create verb semantic frames in the Cyc ontology by converting the information contained in VerbNet into a Cyc usable format. This method captures the differences in meaning between types of verbs, and uses existing connections between WordNet, VerbNet, and Cyc to specify distinctions between individual verbs when available. This method provides 27,909 frames to OpenCyc which currently has none and can be used to extend ResearchCyc as well. We show that these frames lead to a 20% increase in sample sentences parsed over the Research Cyc verb lexicon. 1

6 0.61932462 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

7 0.59868771 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

8 0.58139825 126 acl-2010-GernEdiT - The GermaNet Editing Tool

9 0.57201475 127 acl-2010-Global Learning of Focused Entailment Graphs

10 0.57067543 141 acl-2010-Identifying Text Polarity Using Random Walks

11 0.54963911 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

12 0.53732145 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

13 0.52493203 139 acl-2010-Identifying Generic Noun Phrases

14 0.51534164 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

15 0.51443225 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

16 0.50940204 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

17 0.49809703 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

18 0.48384535 25 acl-2010-Adapting Self-Training for Semantic Role Labeling

19 0.47670904 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

20 0.47620267 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.034), (42, 0.012), (59, 0.572), (73, 0.043), (78, 0.051), (80, 0.011), (83, 0.073), (84, 0.014), (98, 0.088)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98479491 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging

Author: Michael Lamar ; Yariv Maron ; Mark Johnson ; Elie Bienenstock

Abstract: We revisit the algorithm of Schütze (1995) for unsupervised part-of-speech tagging. The algorithm uses reduced-rank singular value decomposition followed by clustering to extract latent features from context distributions. As implemented here, it achieves state-of-the-art tagging accuracy at considerably less cost than more recent methods. It can also produce a range of finer-grained taggings, with potential applications to various tasks. 1

2 0.98222685 151 acl-2010-Intelligent Selection of Language Model Training Data

Author: Robert C. Moore ; William Lewis

Abstract: We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods.

same-paper 3 0.97577828 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

Author: Galina Tremper

Abstract: Presupposition relations between verbs are not very well covered in existing lexical semantic resources. We propose a weakly supervised algorithm for learning presupposition relations between verbs that distinguishes five semantic relations: presupposition, entailment, temporal inclusion, antonymy and other/no relation. We start with a number of seed verb pairs selected manually for each semantic relation and classify unseen verb pairs. Our algorithm achieves an overall accuracy of 36% for type-based classification.

4 0.96750164 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

Author: Joern Wuebker ; Arne Mauser ; Hermann Ney

Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.

5 0.96697468 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik

Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.

6 0.86623096 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

7 0.85777581 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

8 0.79561698 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

9 0.79217356 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

10 0.78019857 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

11 0.77666342 114 acl-2010-Faster Parsing by Supertagger Adaptation

12 0.77459931 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

13 0.76203465 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network

14 0.75943291 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

15 0.75876492 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

16 0.75014395 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

17 0.74831009 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

18 0.74680626 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

19 0.73834443 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging

20 0.73621005 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years