acl acl2010 acl2010-41 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
Reference: text
sentIndex sentText sentNum sentScore
1 Automatic Selectional Preference Acquisition for Latin verbs Barbara McGillivray University of Pisa Italy b . [sent-1, score-0.049]
2 it Abstract We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. [sent-4, score-0.103]
3 Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. [sent-5, score-0.04]
4 We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora. [sent-6, score-0.075]
5 1 Introduction Automatic acquisition of semantic information from corpora is a challenge for research on lowresourced languages, especially when semantically annotated corpora are not available. [sent-7, score-0.224]
6 In addition, a Latin version of WordNet Latin WordNet (LWN; Minozzi, (2009) is being compiled, consisting of around 10,000 lemmas inserted in the multilingual structure of MultiWordNet (Bentivogli et al. [sent-13, score-0.047]
7 The number and the size of these resources are – – small when compared with the corpora and the lexicons for modern languages, e. [sent-15, score-0.035]
8 Concerning semantic processing, no semantically annotated Latin corpus is available yet; building such a corpus manually would take considerable time and energy. [sent-18, score-0.096]
9 Hence, research in computational semantics for Latin would benefit from exploiting the existing resources and tools through automatic lexical acquisition methods. [sent-19, score-0.058]
10 In this paper we deal with automatic acquisition of verbal selectional preferences (SPs) for Latin, i. [sent-20, score-0.237]
11 the semantic preferences of verbs on their arguments: e. [sent-22, score-0.196]
12 we expect the object position of the verb edo ‘eat’ to be mostly filled by nouns from the food domain. [sent-24, score-0.161]
13 SPs are defined as probability distributions over semantic features extracted as sets of LWN nodes. [sent-26, score-0.157]
14 The input data are two subcategorization lexicons automatically extracted from the LDT and the ITTB (McGillivray and Passarotti, 2009). [sent-27, score-0.072]
15 Our main contribution is to create a new tool for semantic processing of Latin by adapting computational techniques developed for extant languages to the special case of Latin. [sent-28, score-0.096]
16 The way our model combines the syntactic information contained in the treebanks with the lexical semantic knowledge from LWN allows us to overcome some of the difficulties related to the small size of the input corpora. [sent-30, score-0.15]
17 This is the main difference from corpora for modern languages, together with the absence of semantic annotation. [sent-31, score-0.131]
18 Moreover, we face the problem of evaluating our system’s ability to generalize over unseen cases by using text occurrences, as access to human linguistic judgements is denied for Latin. [sent-32, score-0.075]
19 In the rest of the paper we will briefly summarize previous work on SP acquisition and motivate 73 UppsaPlra,o Sceweedidnegn,s o 1f3 t Jhuely AC 20L10 20. [sent-33, score-0.058]
20 WN-based approaches translate the generalization problem into estimating preference probabilities over a noun hierarchy and solve it by means of different statistical tools that use the input data as a training set: cf. [sent-39, score-0.084]
21 Agirre and Martinez (2001) acquire SPs for verb classes instead of single verb lemmas by using a semantically annotated corpus and WN. [sent-42, score-0.251]
22 Distributional methods aim at automatically inducing semantic classes from distributional data in corpora by means of various similarity measures and unsupervised clustering algorithms: cf. [sent-43, score-0.217]
23 In fact, our dataset consists of subcategorization frames extracted from two relatively small treebanks, amounting to a little over 100,000 word tokens overall. [sent-52, score-0.228]
24 The originality of this approach is an incremental clustering algorithm for verb occurrences called frames which are identified by specific syntactic and semantic features, such as the number of verbal arguments, the syntactic pattern, and the semantic properties of each argument, i. [sent-56, score-0.62]
25 Based on a probabilistic measure of similarity between the frames’ features, the clustering produces larger sets called constructions. [sent-59, score-0.086]
26 The constructions for a verb contribute to the next step, which acquires the verb’s SPs as semantic profiles, i. [sent-60, score-0.315]
27 The model exploits the structure of WN so that predictions over unseen cases are possible. [sent-63, score-0.075]
28 3 The model The input data are two corpus-driven subcategorization lexicons which record the subcategorization frames of each verbal token occurring in the corpora: these frames contain morphosyntactic information on the verb’s arguments, as well as their lexical fillers. [sent-64, score-0.507]
29 We illustrate how we adapted Alishahi’s definitions of frame features and formulae to our case. [sent-70, score-0.14]
30 Alishahi uses a semantically annotated English corpus, so she defines the verb’s semantic primitives, the arguments’ participant roles and their semantic categories; since we do not have such annotation, we used the WN semantic information. [sent-71, score-0.288]
31 The syntactic feature of a frame (ft1) is the set of syntactic slots of its verb’s subcategorization pattern, extracted from the lexicons. [sent-72, score-0.178]
32 In addition, the first type of semantic features of a frame (ft2) collects the semantic properties of the verb’s arguments as the set of LWN synonyms and hypernyms of their fillers. [sent-74, score-0.505]
33 In the previous example this is {exsilium ‘exile’, proscriptio ‘proscription’, rejection, actio, actus ‘raocstc’ }ri. [sent-75, score-0.048]
34 me (ft3) collects the semantic properties of the verb in the form of the verb’s synsets. [sent-77, score-0.272]
35 In the above example, these are all synsets of eo ‘go’, among which ‘{eo, gradior, grassor, ingredior, procedo, prodeo, 1Cicero, In Catilinam, II, 7. [sent-78, score-0.121]
36 2We listed the LWN node of the lemma exsilium, followed by its hypernyms; each node apart from rejection, which is English and is not filled by a Latin lemma in LWN is translated by the corresponding node in the English WN. [sent-79, score-0.092]
37 The prior probability P(k) is calculated= =fr {omF }th. [sent-83, score-0.061]
38 e number of frames contained in k divided by the total number of frames. [sent-84, score-0.156]
39 P(ft1(F) |k) is 1 when all the frames in k contain all the( syntactic 1s wlothse onf a aFll. [sent-86, score-0.156]
40 For each argument position a, we estimated the probability P(ft2 (F) |k) as the sum of the semantic scores between( FF a|knd) aesa tchhe h s uinm k o: P(ft2(F)|k) = Ph∈ksemn skcore(h,F) where the semantic score sem score(h, F) = counts the overlap between the semantic properties S(h) of h (i. [sent-87, score-0.505]
41 the LWN hypernyms of the fillers in h) and the semantic properties S(F) of F (for argument a), over |S(F) |. [sent-89, score-0.57]
42 |S(h|S)∩(FS)(|F)| P(ft3(F)|k) = Ph∈ksynsn skcore(h,F) where the synset score syns score(h, F) o|Svyenrsleatps( bv|Se yrtbnws(he e)ts)n(∩vSe tyhrbne(sFe sty)s)n|(vse rtbs( fFo)r | theca vlceurbla itnes h atnhde = the synsets for the verb in F over the number of synsets for the verb in F. [sent-90, score-0.415]
43 3 We introduced the syntactic and synset scores in order to account for a frequent phenomenon in our data: the partial matches between the values of the features in F and in k. [sent-91, score-0.041]
44 2 Selectional preferences The clustering algorithm defines the set of constructions in which the generalization step over unseen cases is performed. [sent-93, score-0.327]
45 SPs are defined as semantic profiles, that is, probability distributions over the semantic properties, i. [sent-94, score-0.253]
46 For example, we get the probability of the node actio ‘act’ in the position ‘A (in)Obj[acc]’ for eo ‘go’ . [sent-97, score-0.213]
47 If s is a semantic property and a an argument position for a verb v, the semantic profile Pa(s|v) is the sum of Pa(s, k|v) over all constructio(nss| k) containing v or a WN-synonym lo fc v, ti. [sent-98, score-0.421]
48 a vnesr bk contained in one or more synsets for v. [sent-100, score-0.064]
49 Pa(s, k|v) is approximated as where P(k,v) P(k,vP)P(va)(s|k,v), is estimated asPkn0kn·kfr0·efqr(ekq,v(k)0,v) To estimate PPa(s|k, v) we consider each frame h in k and account( sfo|kr:, a) wthee similarity abechtw fereanm v and the verb in h; b) the similarity between s and the fillers of h. [sent-101, score-0.529]
50 S Pa(s|k, v) is thus obtained by normalizing the sum (ofs tkh,evse) similarity scores over ramlla flrizaminges t hine k, divided by the total number of frames in k containing v or its synonyms. [sent-103, score-0.197]
51 The similarity scores weight the contributions of the synonyms of v, whose fillers play a role in the generalization step. [sent-104, score-0.319]
52 It was introduced because of the sparseness of our data, where 3The algorithm uses smoothed versions of all the previous formulae by adding a very small constant so that the probabilities are never 0. [sent-106, score-0.119]
53 many verbs are hapaxes, which makes the generalization from their fillers difficult. [sent-108, score-0.327]
54 4 Results and evaluation The clustering algorithm was run on 15509 frames and it generated 7105 constructions. [sent-109, score-0.201]
55 Table 1 displays the 5 constructions assigned to the 9 frames where the verb introduco ‘bring in, introduce’ occurs. [sent-110, score-0.533]
56 Note the semantic similarity between addo ‘add to, bring to’, immitto ‘send against, insert’, induco ‘bring forward, introduce’ and introduco, and the similarity between the syntactic patterns and the argument fillers within the same construction. [sent-111, score-0.542]
57 For example, finis ‘end, borders’ and effectus ‘result’ share the semantic properties ATTRIBUTE, COGNITIO ‘cognition’, CONSCIENTIA ‘conscience’, EVENTUM ‘event’, among others. [sent-112, score-0.17]
58 The vast majority of constructions contain less than 4 frames. [sent-113, score-0.117]
59 This contrasts with the more general constructions found by Alishahi (2008) and can be explained by several factors. [sent-114, score-0.117]
60 First, the coverage of LWN is quite low with respect to the fillers in our dataset. [sent-115, score-0.239]
61 In fact, 782 fillers out of 2408 could not be assigned to any LWN synset; for these lemmas the semantic scores with all the other nouns are 0, causing probabilities lower than the baseline; this results in assigning the frame to the singleton construction consisting of the frame itself. [sent-116, score-0.698]
62 The same happens for fillers consisting of verbal lemmas, participles, pronouns and named entities, which amount to a third of the total number. [sent-117, score-0.29]
63 0088 Table 2: Top 20 semantic properties in the semantic profile for ascendo ‘ascend’ + A (de)Obj[abl]. [sent-137, score-0.359]
64 As to the SP acquisition, we ran the system on all constructions generated by the clustering. [sent-142, score-0.117]
65 We excluded the pronouns occurring as argument fillers, and manually tagged the named entities. [sent-143, score-0.082]
66 For each verb lemma and slot we obtained a probability distribution over the 6608 LWN noun nodes. [sent-144, score-0.209]
67 Table 2 displays the 20 semantic properties with the highest SP probabilities as ablative arguments ofascendo ‘ascend’ introduced by de ‘down from’ , ‘out of’ . [sent-145, score-0.332]
68 This semantic profile was created from the following fillers for the verbs contained in the constructions for ascendo and its synonyms: abyssus ‘abyss’, fumus ‘smoke’, lacus ‘lake’, machina ‘machine’, manus ‘hand’, negotiatio ‘business’, mare ‘sea’, os ‘mouth’, templum ‘temple’ , terra ‘land’ . [sent-146, score-0.594]
69 These nouns are well represented by the semantic properties related to water and physical places. [sent-147, score-0.291]
70 Note also the high rank of general properties like actio ‘act’, which are associated to a large number of fillers and thus generally get a high probability. [sent-148, score-0.408]
71 Regarding evaluation, we are interested in testing two properties of our model: calibration and discrimination. [sent-149, score-0.108]
72 its ability to correctly estimate the SP probability of each single LWN node. [sent-162, score-0.061]
73 0019 Table 3: 15 nouns with the highest probabilities as accusative objects of dico ‘say’ . [sent-178, score-0.208]
74 Figure 1: Decreasing SP probabilities of the LWN leaf nodes for the objects of dico ‘say’ . [sent-179, score-0.149]
75 Table 3 displays the 15 nouns with the highest probabilities as direct objects for dico ‘say’ . [sent-180, score-0.271]
76 From table 3 and the rest of the distribution, represented in figure 1 we see that the model assigns a high probability to most seen fillers for dico in the corpus: anima ‘soul’, corpus ‘body’, locus ‘place’, pars ‘part’ , etc. [sent-181, score-0.557]
77 For what concerns evaluating the SP probability assigned to nouns unseen in the training set, Alishahi (2008) follows the approach suggested by Resnik (1993), using human plausibility judge– – ments on verb-noun pairs. [sent-182, score-0.195]
78 Given the absence of native speakers of Latin, we used random occurrences in corpora, considered as positive examples ofplausible argument fillers; on the other hand, we cannot extract non-plausible fillers from a corpus unless we use a frequency-based criterion. [sent-183, score-0.321]
79 However, we can measure how well our system predicts the probability of these unseen events. [sent-184, score-0.136]
80 As a preliminary evaluation experiment, we randomly selected from our corpora a list of 19 high-frequency verbs (freq. [sent-185, score-0.084]
81 >51) and 7 mediumfrequency verbs (11 <50), for each of which we chose an interesting argument slot. [sent-186, score-0.131]
82 Then we randomly extracted one filler for each such pair from two collections of Latin texts (Perseus Digital Library and Corpus Thomisticum), provided that it was not in the training set. [sent-187, score-0.049]
83 The semantic score in equation 1 on page 3 is then calculated between the set of semantic properties of n and that for f, to obtain the probability of finding the random filler n as an argument for a verb v. [sent-188, score-0.56]
84 For each of the 26 (verb, slot) pairs, we looked at three measures of central tendency: mean, me- dian and the value of the third quantile, which were compared with the probability assigned by the model to the random filler. [sent-189, score-0.061]
85 If this probability was higher than the measure, the outcome was considered a success. [sent-190, score-0.061]
86 For example, table 3 and figure 1 show that the filler for dico+A Obj [acc] in the evaluation set sententia ‘judgement’ is ranked 13th within the verb’s semantic profile. [sent-193, score-0.193]
87 – 5 – Conclusion and future work We proposed a method for automatically acquiring probabilistic SP for Latin verbs from a small corpus using the WN hierarchy; we suggested some 4The dataset consists of all LWN leaf nodes n, for which we calculated Pa (n|v). [sent-194, score-0.049]
88 rts B y(q dueafirntiilteiosn), i2f5 w%e odifv tihdee tleheaf d naotadseets have a probability higher than the value at the third quartile. [sent-196, score-0.061]
89 Therefore, in 19 cases out of 26 the random fillers are placed in the high-probability quarter of the plot, which is a good result, since this is where the preferred arguments gather. [sent-197, score-0.293]
90 77 new strategies for tackling the data sparseness in the crucial generalization step over unseen cases. [sent-198, score-0.154]
91 Our work also contributes to the state of the art in semantic processing of Latin by integrating syntactic information from annotated corpora with the lexical resource LWN. [sent-199, score-0.131]
92 This demonstrates the usefulness of the method for small corpora and the relevance of computational approaches for historical linguistics. [sent-200, score-0.035]
93 In order to measure the impact of the frame clusters for the SP acquisition, we plan to run the system for SP acquisition without performing the clustering step, thus defining all constructions as singleton sets containing one frame each. [sent-201, score-0.432]
94 Finally, an extensive evaluation will require a more comprehensive set, composed of a higher number of unseen argument fillers; from the frequencies of these nouns, it will be possible to directly compare plausible arguments (high frequency) and implausible ones (low frequency). [sent-202, score-0.211]
95 An iterative approach to estimating frequencies over a semantic hierarchy. [sent-242, score-0.096]
96 Creating a parallel treebank of the old Indo-European Bible translations. [sent-257, score-0.042]
97 Generalizing case frames using a thesaurus and the MDL principle. [sent-263, score-0.156]
98 The Index Thomisticus treebank project: Annotation, parsing and valency lexicon. [sent-276, score-0.042]
99 Kienpointner, editors, Proceedings of the 15th International Colloquium on Latin Linguistics (ICLL), Innsbrucker Beitraege zur Sprachwissenschaft. [sent-294, score-0.036]
100 Kienpointner, editors, Proceedings of the 15th International Colloquium on Latin Linguistics, Innsbrucker Beitr ¨age zur Sprachwissenschaft. [sent-303, score-0.036]
wordName wordTfidf (topN-words)
[('latin', 0.404), ('lwn', 0.381), ('fillers', 0.239), ('sps', 0.214), ('alishahi', 0.191), ('frames', 0.156), ('mcgillivray', 0.143), ('passarotti', 0.143), ('sp', 0.138), ('bamman', 0.119), ('exsilium', 0.119), ('constructions', 0.117), ('acc', 0.113), ('frame', 0.106), ('dico', 0.104), ('verb', 0.102), ('semantic', 0.096), ('actio', 0.095), ('introduco', 0.095), ('thomisticus', 0.095), ('obj', 0.084), ('exile', 0.083), ('argument', 0.082), ('hypernyms', 0.079), ('selectional', 0.077), ('unseen', 0.075), ('properties', 0.074), ('subcategorization', 0.072), ('bajc', 0.071), ('skcore', 0.071), ('soul', 0.071), ('synsets', 0.064), ('displays', 0.063), ('crane', 0.063), ('water', 0.062), ('probability', 0.061), ('nouns', 0.059), ('acquisition', 0.058), ('eo', 0.057), ('oj', 0.057), ('pars', 0.057), ('treebanks', 0.054), ('arguments', 0.054), ('eat', 0.052), ('preferences', 0.051), ('verbal', 0.051), ('wn', 0.05), ('verbs', 0.049), ('filler', 0.049), ('aacccc', 0.048), ('accc', 0.048), ('actus', 0.048), ('anima', 0.048), ('anreiter', 0.048), ('antrum', 0.048), ('argmkaxp', 0.048), ('ascendo', 0.048), ('foprrm', 0.048), ('haug', 0.048), ('iimndmuicotto', 0.048), ('iimntrmodittuoco', 0.048), ('innsbrucker', 0.048), ('ittb', 0.048), ('kienpointner', 0.048), ('ldt', 0.048), ('locus', 0.048), ('minozzi', 0.048), ('obbj', 0.048), ('oobbjj', 0.048), ('pnr', 0.048), ('rejection', 0.048), ('sententia', 0.048), ('vse', 0.048), ('lemmas', 0.047), ('ob', 0.047), ('lemma', 0.046), ('clustering', 0.045), ('profile', 0.045), ('probabilities', 0.045), ('bring', 0.043), ('index', 0.042), ('ph', 0.042), ('syns', 0.042), ('ascend', 0.042), ('fti', 0.042), ('treebank', 0.042), ('similarity', 0.041), ('synset', 0.041), ('sparseness', 0.04), ('generalization', 0.039), ('bible', 0.038), ('digital', 0.037), ('heritage', 0.036), ('zur', 0.036), ('corpora', 0.035), ('act', 0.035), ('bentivogli', 0.034), ('profiles', 0.034), ('calibration', 0.034), ('formulae', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
2 0.14423612 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
3 0.12689292 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
Author: Clifton McFate
Abstract: A robust dictionary of semantic frames is an essential element of natural language understanding systems that use ontologies. However, creating lexical resources that accurately capture semantic representations en masse is a persistent problem. Where the sheer amount of content makes hand creation inefficient, computerized approaches often suffer from over generality and difficulty with sense disambiguation. This paper describes a semi-automatic method to create verb semantic frames in the Cyc ontology by converting the information contained in VerbNet into a Cyc usable format. This method captures the differences in meaning between types of verbs, and uses existing connections between WordNet, VerbNet, and Cyc to specify distinctions between individual verbs when available. This method provides 27,909 frames to OpenCyc which currently has none and can be used to extend ResearchCyc as well. We show that these frames lead to a 20% increase in sample sentences parsed over the Research Cyc verb lexicon. 1
4 0.11214973 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
5 0.11146405 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
6 0.099914193 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
7 0.089034259 238 acl-2010-Towards Open-Domain Semantic Role Labeling
8 0.083528176 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
9 0.081501551 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
10 0.080659889 116 acl-2010-Finding Cognate Groups Using Phylogenies
11 0.079033919 130 acl-2010-Hard Constraints for Grammatical Function Labelling
12 0.078003041 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
13 0.077717572 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
14 0.077350557 216 acl-2010-Starting from Scratch in Semantic Role Labeling
15 0.068425864 103 acl-2010-Estimating Strictly Piecewise Distributions
16 0.068100668 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
17 0.066250898 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
18 0.065709174 85 acl-2010-Detecting Experiences from Weblogs
19 0.064703971 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
20 0.059433669 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
topicId topicWeight
[(0, -0.177), (1, 0.095), (2, 0.073), (3, 0.024), (4, 0.093), (5, -0.006), (6, -0.001), (7, 0.023), (8, 0.041), (9, -0.069), (10, 0.036), (11, 0.057), (12, 0.032), (13, 0.03), (14, 0.116), (15, 0.025), (16, 0.082), (17, 0.067), (18, 0.027), (19, 0.013), (20, 0.004), (21, 0.017), (22, 0.013), (23, -0.099), (24, 0.083), (25, -0.043), (26, -0.066), (27, -0.005), (28, 0.056), (29, 0.017), (30, -0.002), (31, -0.09), (32, 0.144), (33, -0.022), (34, 0.039), (35, -0.112), (36, -0.118), (37, 0.012), (38, 0.011), (39, 0.032), (40, 0.065), (41, 0.023), (42, -0.097), (43, 0.047), (44, -0.074), (45, -0.106), (46, -0.12), (47, 0.014), (48, -0.175), (49, -0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.90232128 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
2 0.749538 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
Author: Clifton McFate
Abstract: A robust dictionary of semantic frames is an essential element of natural language understanding systems that use ontologies. However, creating lexical resources that accurately capture semantic representations en masse is a persistent problem. Where the sheer amount of content makes hand creation inefficient, computerized approaches often suffer from over generality and difficulty with sense disambiguation. This paper describes a semi-automatic method to create verb semantic frames in the Cyc ontology by converting the information contained in VerbNet into a Cyc usable format. This method captures the differences in meaning between types of verbs, and uses existing connections between WordNet, VerbNet, and Cyc to specify distinctions between individual verbs when available. This method provides 27,909 frames to OpenCyc which currently has none and can be used to extend ResearchCyc as well. We show that these frames lead to a 20% increase in sample sentences parsed over the Research Cyc verb lexicon. 1
3 0.68702984 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
4 0.65010691 126 acl-2010-GernEdiT - The GermaNet Editing Tool
Author: Verena Henrich ; Erhard Hinrichs
Abstract: GernEdiT (short for: GermaNet Editing Tool) offers a graphical interface for the lexicographers and developers of GermaNet to access and modify the underlying GermaNet resource. GermaNet is a lexical-semantic wordnet that is modeled after the Princeton WordNet for English. The traditional lexicographic development of GermaNet was error prone and time-consuming, mainly due to a complex underlying data format and no opportunity of automatic consistency checks. GernEdiT replaces the earlier development by a more userfriendly tool, which facilitates automatic checking of internal consistency and correctness of the linguistic resource. This paper pre- sents all these core functionalities of GernEdiT along with details about its usage and usability. 1
5 0.60315442 85 acl-2010-Detecting Experiences from Weblogs
Author: Keun Chan Park ; Yoonjae Jeong ; Sung Hyon Myaeng
Abstract: Weblogs are a source of human activity knowledge comprising valuable information such as facts, opinions and personal experiences. In this paper, we propose a method for mining personal experiences from a large set of weblogs. We define experience as knowledge embedded in a collection of activities or events which an individual or group has actually undergone. Based on an observation that experience-revealing sentences have a certain linguistic style, we formulate the problem of detecting experience as a classification task using various features including tense, mood, aspect, modality, experiencer, and verb classes. We also present an activity verb lexicon construction method based on theories of lexical semantics. Our results demonstrate that the activity verb lexicon plays a pivotal role among selected features in the classification perfor- , mance and shows that our proposed method outperforms the baseline significantly.
6 0.59239388 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
7 0.52336234 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
8 0.48275 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
9 0.47824734 238 acl-2010-Towards Open-Domain Semantic Role Labeling
10 0.45687589 158 acl-2010-Latent Variable Models of Selectional Preference
11 0.42209649 216 acl-2010-Starting from Scratch in Semantic Role Labeling
12 0.41803455 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
13 0.39423519 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
14 0.38924012 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
15 0.37749761 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
16 0.36079299 116 acl-2010-Finding Cognate Groups Using Phylogenies
17 0.35344931 61 acl-2010-Combining Data and Mathematical Models of Language Change
18 0.35065058 16 acl-2010-A Statistical Model for Lost Language Decipherment
19 0.35013762 130 acl-2010-Hard Constraints for Grammatical Function Labelling
20 0.34525532 103 acl-2010-Estimating Strictly Piecewise Distributions
topicId topicWeight
[(7, 0.373), (14, 0.019), (25, 0.059), (42, 0.022), (59, 0.083), (72, 0.011), (73, 0.039), (76, 0.011), (78, 0.053), (80, 0.014), (83, 0.062), (84, 0.058), (97, 0.012), (98, 0.089)]
simIndex simValue paperId paperTitle
1 0.93586391 234 acl-2010-The Use of Formal Language Models in the Typology of the Morphology of Amerindian Languages
Author: Andres Osvaldo Porta
Abstract: The aim of this work is to present some preliminary results of an investigation in course on the typology of the morphology of the native South American languages from the point of view of the formal language theory. With this object, we give two contrasting examples of descriptions of two Aboriginal languages finite verb forms morphology: Argentinean Quechua (quichua santiague n˜o) and Toba. The description of the morphology of the finite verb forms of Argentinean quechua, uses finite automata and finite transducers. In this case the construction is straightforward using two level morphology and then, describes in a very natural way the Argentinean Quechua morphology using a regular language. On the contrary, the Toba verbs morphology, with a system that simultaneously uses prefixes and suffixes, has not a natural description as regular language. Toba has a complex system of causative suffixes, whose successive applications determinate the use of prefixes belonging different person marking prefix sets. We adopt the solution of Creider et al. (1995) to naturally deal with this and other similar morphological processes which involve interactions between prefixes and suffixes and then we describe the toba morphology using linear context-free languages.1 .
same-paper 2 0.75465858 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
3 0.63605428 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
Author: Weiwei Sun
Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new “path” features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.
4 0.62732404 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee
Abstract: Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.
5 0.55818486 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
Author: WenTing Wang ; Jian Su ; Chew Lim Tan
Abstract: Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. These structural syntactic features, together with other normal flat features are incorporated into our composite kernel to capture diverse knowledge for simultaneous discourse identification and classification for both explicit and implicit relations. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well. University of Singapore Singapore 117417 sg tacl @ comp .nus .edu . sg 1
6 0.46044034 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
7 0.44521701 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
8 0.42339647 71 acl-2010-Convolution Kernel over Packed Parse Forest
9 0.41908097 116 acl-2010-Finding Cognate Groups Using Phylogenies
10 0.40451837 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
11 0.40324754 158 acl-2010-Latent Variable Models of Selectional Preference
12 0.40187696 216 acl-2010-Starting from Scratch in Semantic Role Labeling
13 0.40122372 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
14 0.39908943 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
15 0.39699516 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
16 0.39636898 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images
17 0.39636266 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
18 0.39544141 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications
19 0.39535052 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
20 0.3947646 130 acl-2010-Hard Constraints for Grammatical Function Labelling