acl acl2010 acl2010-148 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper improves the use of pseudowords as an evaluation framework for selectional preferences. [sent-2, score-0.308]
2 While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. [sent-3, score-0.308]
3 A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. [sent-4, score-0.419]
4 We show that selectional preferences should instead be evaluated on the data in its entirety. [sent-13, score-0.341]
5 While pseudo-words are now less often used for word sense disambigation, they are a common way to evaluate selectional preferences, models that measure the strength of association between a predicate and its argument filler, e. [sent-19, score-0.294]
6 This paper studies the evaluation itself, showing how choices can lead to overly optimistic results if the evaluation is not designed carefully. [sent-28, score-0.201]
7 We show in this paper that current methods of applying pseudo-words to selectional preferences vary greatly, and suggest improvements. [sent-29, score-0.341]
8 Consider the following example of applying pseudo-words to the selectional restrictions of the verb focus: Original: This story focuses Test: This story/part focuses the campaign. [sent-34, score-0.292]
9 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 4 5–453, pseudo-words to evaluate selectional preferences. [sent-41, score-0.221]
10 First, selectional preferences historically focus on subsets of data such as unseen words or words in certain frequency ranges. [sent-42, score-0.699]
11 While work on unseen data is important, evaluating on the entire dataset provides an accurate picture of a model’s overall performance. [sent-43, score-0.27]
12 We will show that seen arguments actually dominate newspaper articles, and thus propose creating test sets that in- clude all verb-argument examples to avoid artificial evaluations. [sent-45, score-0.337]
13 We argue in favor of using nearest-neighbor frequencies and show how using random confounders produces overly optimistic results. [sent-48, score-0.393]
14 (1993) soon followed with a selectional preference proposal that focused on a language model’s effectiveness on unseen data. [sent-62, score-0.528]
15 This was the first use of such verb-noun pairs, as well as the first to test only on unseen pairs. [sent-65, score-0.307]
16 Several papers followed with differing methods of choosing a test pair (v, n) and its confounder v0. [sent-66, score-0.582]
17 (1999) tested all unseen (v, n) occurrences of the most frequent 1000 verbs in his corpus. [sent-68, score-0.32]
18 They then sorted verbs by corpus frequency and chose the neighboring verb v0 of v as the confounder to ensure the closest frequency match possible. [sent-69, score-0.794]
19 (1999) tested 3000 random (v, n) pairs, but required the verbs and nouns to appear between 30 and 3000 times in training. [sent-71, score-0.217]
20 They also chose confounders randomly so that the new pair was unseen. [sent-72, score-0.224]
21 Keller and Lapata (2003) specifically addressed the impact of unseen data by using the web to first ‘see’ the data. [sent-73, score-0.308]
22 They evaluated unseen pseudowords by attempting to first observe them in a larger corpus (the Web). [sent-74, score-0.357]
23 One modeling difference was to disambiguate the nouns as selectional preferences instead of the verbs. [sent-75, score-0.417]
24 Given a test pair (v, n) and its confounder (v, n0), they used web searches such as “v Det n” to make the decision. [sent-76, score-0.572]
25 As can be seen, there are two main factors when devising a pseudo-word evaluation for selectional preferences: (1) choosing (v, n) pairs from the test set, and (2) choosing the confounding n0 (or v0). [sent-82, score-0.387]
26 The confounder has not been looked at in detail and as best we can tell, these factors have varied significantly. [sent-83, score-0.497]
27 Most NLP tasks evaluate their entire datasets, but as described above, most selectional preference evaluations have focused only on unseen data. [sent-86, score-0.58]
28 This section investigates the extent of unseen examples in a typical training/testing environment 446 of newspaper articles. [sent-87, score-0.308]
29 We argue that, absent a system’s need for specialized performance on unseen data, a representative test set should include the dataset in its entirety. [sent-89, score-0.307]
30 We randomly selected documents from the year 2001 in the NYT portion of the corpus as development and test sets. [sent-98, score-0.211]
31 We then record every seen (vd, n) pair during training that is seen two or more times3 and then count the number of unseen pairs in the NYT development set (1455 tests). [sent-102, score-0.592]
32 Figure 1 plots the percentage of unseen arguments against training size when trained on either NYT or APW (the APW portion is smaller in total size, and the smaller BNC is provided for comparison). [sent-103, score-0.505]
33 This suggests that an evaluation focusing only on unseen data is not representative, potentially missing up to 90% of the data. [sent-106, score-0.27]
34 3Our results are thus conservative, as including all single occurrences would achieve even smaller unseen percentages. [sent-111, score-0.27]
35 Unseen Arguments in NYT Dev 434505 ABNGPNoYCoTgle nnUstneePerce2123100550 5 0 0 2 4 6 8 10 Number of Tokens in Training (hundred millions) 12 × Figure 1: Percentage of NYT development set that is unseen when trained on varying amounts of data. [sent-112, score-0.331]
36 Unseen Arguments by Type 5 0 0 2 4 6 8 10 Number of Tokens in Training (hundred millions) 12 × Figure 2: Percentage of subject/object/preposition arguments in the NYT development set that is unseen when trained on varying amounts of NYT data. [sent-117, score-0.421]
37 447 The third line across the bottom of the figure is the number of unseen pairs using Google n-gram data as proxy argument counts. [sent-119, score-0.405]
38 We include these Web counts to illustrate how an openly available source of counts affects unseen arguments. [sent-122, score-0.342]
39 Prepositions have the largest unseen percentage, but not surprisingly, also make up less of the training examples overall. [sent-124, score-0.307]
40 In order to analyze why pairs are unseen, we analyzed the distribution of rare words across unseen and seen examples. [sent-125, score-0.518]
41 We similarly define rare verbs over their or- dered frequencies (we count verb lemmas, and do not include the syntactic relations). [sent-128, score-0.224]
42 Corpus counts covered 2 years of the AP section, and we used the development set of the NYT section to extract the seen and unseen pairs. [sent-129, score-0.418]
43 Figure 3 shows the percentage of rare nouns and verbs that occur in unseen and seen pairs. [sent-130, score-0.614]
44 6% of the verbs in unseen pairs are rare, compared to only 4. [sent-132, score-0.353]
45 This suggests that many unseen pairs are unseen mainly because they contain low-frequency verbs, rather than because of containing low-frequency argument heads. [sent-137, score-0.62]
46 e Wnoowrk a dinWSD has shown that confounder choice can make the pseudo-disambiguation task significantly easier. [sent-141, score-0.526]
47 Nakov and Hearst (2003) further illustrated how random confounders are easier to identify than those selected from semantically ambiguous, yet related concepts. [sent-143, score-0.236]
48 Our approach evaluates selectional preferences, not WSD, but our results complement these findings. [sent-144, score-0.252]
49 We identified three methods of confounder selection based on varying levels of corpus freDistribution of Rare Verbs and Nouns in Tests 03 USnese ne Tne sTtes ts Figure 3: Comparison between seen and unseen tests (verb,relation,noun). [sent-145, score-0.986]
50 6% of unseen tests have rare verbs, compared to just 4. [sent-147, score-0.391]
51 quency: (1) choose a random noun, (2) choose a random noun from a frequency bucket similar to the original noun’s frequency, and (3) select the nearest neighbor, the noun with frequency closest to the original. [sent-150, score-0.649]
52 1 A New Baseline The analysis of unseen slots suggests a baseline that is surprisingly obvious, yet to our knowledge, has not yet been evaluated. [sent-154, score-0.399]
53 Part of the reason is that early work in pseudo-word disambiguation explicitly tested only unseen pairs4. [sent-155, score-0.319]
54 Our evaluation will include seen data, and since our analysis suggests that up to 90% is seen, a strong baseline should address this seen portion. [sent-156, score-0.291]
55 (2008) test pairs that fall below a mutual information threshold (might include some seen pairs), and Erk (2007) selects a subset of roles in FrameNet (Baker et al. [sent-159, score-0.206]
56 448 We propose a conditional probability baseline: P(n|vd) =( C ( v d , ∗n0) ioft Che(rvwdis,en) > 0 where C(vd, n) is the number of times the head word n was seen as an argument to the predicate v, and C(vd, ∗) is the number of times vd was seen with any argument. [sent-162, score-0.753]
57 bGerive onf a mteesst (vd, n) and its confounder (vd, n0), choose n if P(n|vd) > P(n0|vd), and n0 otherwise. [sent-163, score-0.534]
58 (1999) appear to propose a very similar baseline for verb-noun selectional preferences, but the paper evaluates unseen data, and so the conditional probability model is not studied. [sent-167, score-0.678]
59 The model is based on the idea that the arguments of a particular verb slot tend to be similar to each other. [sent-184, score-0.203]
60 Given two potential arguments for a verb, the correct one should correlate higher with the arguments observed with the verb during training. [sent-185, score-0.251]
61 All verbs and nouns are stemmed, and the development and test documents were isolated from training. [sent-192, score-0.202]
62 A noun is represented by a vector of verb slots and the number of times it is observed filling each slot. [sent-198, score-0.223]
63 2 Varying the Confounder We generated three different confounder sets based on word corpus frequency from the 41 test • documents. [sent-204, score-0.622]
64 As motivated in section 4, we use the following approaches: • Random: choose a random confounder from Rthea sdeto mof: nouns eth aa rta fnadllo mwi cthoinnf some r b frrooamd corpus frequency range. [sent-206, score-0.755]
65 b Gckievetend a test pair (vd, n), choose the bucket in which n belongs and randomly select a confounder n0 from that bucket. [sent-210, score-0.68]
66 Neighbor: sort all seen nouns by frequency aNnedi gchhbooosre: tshoer tco alnlf soeuennde nro nu0n tsh baty yis f rtehqeu nearest neighbor of n with greater frequency. [sent-211, score-0.566]
67 3 Model Implementation None of the models can make a decision if they identically score both potential arguments (most often true when both arguments were not seen with • the verb in training). [sent-213, score-0.363]
68 For the web baseline (reported as Google), we stemmed all words in the Google n-grams and counted every verb v and noun n that appear in Gigaword. [sent-216, score-0.262]
69 A noun’s representative vector consists of verb slots and the number of times the noun was seen in each slot. [sent-223, score-0.307]
70 We removed any verb slot not seen more than x times, where x varied based on all three factors: the dataset, confounder choice, and similarity metric. [sent-224, score-0.752]
71 These consist of always choosing the Baseline if it returns an answer (not a guessed unseen answer), and then backing off to the Google/Erk result for Baseline unknowns. [sent-229, score-0.382]
72 7 Results Results are given for the two dimensions: confounder choice and training size. [sent-231, score-0.563]
73 Figure 4 shows the performance change over the different confounder methods. [sent-233, score-0.497]
74 Each model follows the same pro- gression: it performs extremely well on the random test set, worse on buckets, and the lowest on the nearest neighbor. [sent-235, score-0.195]
75 3% performance with random confounders is significantly better than a 50-50 random choice. [sent-243, score-0.293]
76 Random is overly optimistic, reporting performance fnadro mabo ivse o more conservative (selective) confounder choices. [sent-259, score-0.596]
77 The Google n-gram backoff model is almost as good as backing off to the Erk smoothing model. [sent-274, score-0.211]
78 The overly optimistic performance on random data suggests using the nearest neighbor approach for experiments. [sent-283, score-0.504]
79 Nearest neighbor avoids evaluating on ‘easy’ datasets, and our baseline (at 79. [sent-284, score-0.256]
80 But perhaps just as important, the nearest neighbor approach facilitates the most reproducibile results in exper- iments since there is little ambiguity in how the confounder is selected. [sent-286, score-0.787]
81 Realistic Confounders: Despite its overoptimism, the random approach to confounder selection may be the correct approach in some circumstances. [sent-287, score-0.554]
82 For some tasks that need selectional preferences, random confounders may be more realistic. [sent-288, score-0.457]
83 It’s possible, for example, that the options in a PP-attachment task might be distributed more like the random rather than nearest neighbor models. [sent-289, score-0.347]
84 Absent such specific motiviation, a nearest neighbor approach is the most conservative, and has the advantage of creating a reproducible experiment, whereas random choice can vary across design. [sent-291, score-0.47]
85 We optimized argument cutoffs for each training size, but the model still appears to suffer from additional noise that the conditional probability baseline does not. [sent-295, score-0.24]
86 This may suggest that observing a test argument with a verb in training is more reliable than a smoothing model that compares all training arguments against that test example. [sent-296, score-0.435]
87 The only combination when Erk is better is when the training data includes just one year (one twelfth of the NYT section) and the confounder is chosen com451 Varying the Training Size Bucket Frequency Neighbor Frequency EBGaroksce-kgJCloiaefncsiEaGnroedkglTr89 a2678i. [sent-299, score-0.584]
88 The left and right tables represent two confounder choices: choose the confounder with frequency buckets, and choose by nearest frequency neighbor. [sent-306, score-1.345]
89 These results appear consistent with Erk (2007) because that work used the BNC corpus (the same size as one year of our data) and Erk chose confounders randomly within a broad frequency range. [sent-312, score-0.399]
90 Ultimately we have found that complex models for selectional preferences may not be necessary, depending on the task. [sent-317, score-0.341]
91 The higher computational needs of smoothing approaches are best for backing off when unseen data is encountered. [sent-318, score-0.413]
92 Further, analysis of the data shows that as more training data is made available, the seen examples make up a much larger portion of the test data. [sent-320, score-0.226]
93 Conditional probability is thus a very strong starting point if selectional preferences are an in- ternal piece to a larger application, such as semantic role labeling or parsing. [sent-321, score-0.371]
94 It is crucially important to be clear during evaluations about how the confounder was generated. [sent-323, score-0.549]
95 We suggest the approach of sorting nouns by frequency and using a neighbor as the confounder. [sent-324, score-0.353]
96 This will also help avoid evaluations that produce overly optimistic results. [sent-325, score-0.209]
97 We have shown that the evaluation is strongly affected by confounder choice, suggesting a nearest frequency neighbor approach to provide the most reproducible performance and avoid overly optimistic results. [sent-328, score-1.064]
98 We presented a conditional probability baseline that is both novel to the pseudo-word disambiguation task and strongly outperforms state-of-the-art models on entire documents. [sent-330, score-0.205]
99 We hope this provides a new reference point to the pseudo-word disambiguation task, and enables selectional preference models whose performance on the task similarly transfers to larger NLP applications. [sent-331, score-0.307]
100 Using the web to obtain frequencies for unseen bigrams. [sent-379, score-0.308]
wordName wordTfidf (topN-words)
[('confounder', 0.497), ('nyt', 0.313), ('vd', 0.299), ('unseen', 0.27), ('selectional', 0.221), ('erk', 0.21), ('neighbor', 0.189), ('confounders', 0.179), ('preferences', 0.12), ('apw', 0.119), ('seen', 0.112), ('buckets', 0.104), ('nearest', 0.101), ('optimistic', 0.091), ('arguments', 0.09), ('frequency', 0.088), ('pseudowords', 0.087), ('smoothing', 0.079), ('google', 0.077), ('nouns', 0.076), ('rare', 0.075), ('verb', 0.071), ('bnc', 0.07), ('backoff', 0.068), ('baseline', 0.067), ('overly', 0.066), ('bucket', 0.064), ('backing', 0.064), ('varying', 0.061), ('noun', 0.06), ('conditional', 0.059), ('random', 0.057), ('bergsma', 0.054), ('gale', 0.052), ('gigaword', 0.052), ('evaluations', 0.052), ('year', 0.05), ('verbs', 0.05), ('disambiguation', 0.049), ('choosing', 0.048), ('argument', 0.047), ('tests', 0.046), ('randomly', 0.045), ('jaccard', 0.045), ('nakov', 0.045), ('choices', 0.044), ('dagan', 0.043), ('tokens', 0.043), ('approximately', 0.042), ('slot', 0.042), ('gaustad', 0.04), ('portion', 0.04), ('documents', 0.039), ('guess', 0.039), ('web', 0.038), ('newspaper', 0.038), ('size', 0.037), ('choose', 0.037), ('training', 0.037), ('preference', 0.037), ('test', 0.037), ('counts', 0.036), ('tze', 0.035), ('pseudoword', 0.035), ('creating', 0.034), ('times', 0.034), ('conservative', 0.033), ('keller', 0.033), ('pairs', 0.033), ('surprisingly', 0.032), ('reproducible', 0.032), ('zapirain', 0.032), ('evaluates', 0.031), ('percentage', 0.031), ('framenet', 0.031), ('similarity', 0.03), ('wsd', 0.03), ('slots', 0.03), ('probability', 0.03), ('ball', 0.03), ('cosine', 0.03), ('sim', 0.029), ('lapata', 0.029), ('choice', 0.029), ('filling', 0.028), ('across', 0.028), ('count', 0.028), ('rooth', 0.027), ('proxy', 0.027), ('train', 0.026), ('predicate', 0.026), ('stemmed', 0.026), ('dominate', 0.026), ('det', 0.026), ('falls', 0.026), ('sch', 0.025), ('hundred', 0.025), ('significance', 0.025), ('fall', 0.024), ('cutoff', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
2 0.22385301 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
3 0.2161238 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
4 0.14858226 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
5 0.13689537 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
6 0.11431456 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
7 0.099914193 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
8 0.083134793 238 acl-2010-Towards Open-Domain Semantic Role Labeling
9 0.082755022 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
10 0.079552881 216 acl-2010-Starting from Scratch in Semantic Role Labeling
11 0.0777179 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
12 0.075319469 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
13 0.071803421 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
14 0.066264845 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
15 0.063301884 151 acl-2010-Intelligent Selection of Language Model Training Data
16 0.062248453 85 acl-2010-Detecting Experiences from Weblogs
17 0.061079271 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
18 0.057647295 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
19 0.057085089 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
20 0.054835286 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
topicId topicWeight
[(0, -0.188), (1, 0.1), (2, 0.064), (3, 0.032), (4, 0.11), (5, -0.001), (6, -0.015), (7, -0.01), (8, 0.075), (9, -0.095), (10, 0.032), (11, 0.07), (12, 0.106), (13, 0.045), (14, 0.206), (15, 0.032), (16, 0.047), (17, -0.027), (18, -0.001), (19, -0.08), (20, 0.072), (21, 0.09), (22, 0.069), (23, -0.03), (24, 0.005), (25, 0.005), (26, 0.053), (27, 0.029), (28, -0.083), (29, 0.035), (30, -0.086), (31, -0.082), (32, 0.115), (33, -0.015), (34, -0.071), (35, -0.017), (36, -0.132), (37, -0.045), (38, -0.043), (39, 0.074), (40, 0.109), (41, 0.138), (42, -0.049), (43, 0.011), (44, -0.045), (45, -0.056), (46, -0.074), (47, 0.035), (48, -0.115), (49, -0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.93057311 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
2 0.74130917 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
3 0.72054583 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
4 0.696244 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
Author: Barbara McGillivray
Abstract: We present a system that automatically induces Selectional Preferences (SPs) for Latin verbs from two treebanks by using Latin WordNet. Our method overcomes some of the problems connected with data sparseness and the small size of the input corpora. We also suggest a way to evaluate the acquired SPs on unseen events extracted from other Latin corpora.
5 0.60064608 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
6 0.59457976 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
7 0.50640988 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
8 0.44832706 238 acl-2010-Towards Open-Domain Semantic Role Labeling
9 0.43467957 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
10 0.43204394 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
11 0.4309482 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
12 0.42519358 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
13 0.40587249 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
14 0.39103884 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
15 0.38608065 85 acl-2010-Detecting Experiences from Weblogs
16 0.38291341 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
17 0.36086047 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
18 0.35473514 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
19 0.35375804 126 acl-2010-GernEdiT - The GermaNet Editing Tool
20 0.35323134 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
topicId topicWeight
[(14, 0.011), (25, 0.056), (42, 0.02), (44, 0.022), (59, 0.16), (63, 0.164), (73, 0.061), (76, 0.01), (78, 0.1), (80, 0.023), (83, 0.11), (84, 0.031), (97, 0.01), (98, 0.139)]
simIndex simValue paperId paperTitle
same-paper 1 0.88936234 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
2 0.84597421 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
3 0.8398813 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
4 0.83582878 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
5 0.83034688 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
6 0.82828945 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
7 0.82766497 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
8 0.81037092 169 acl-2010-Learning to Translate with Source and Target Syntax
9 0.80993229 130 acl-2010-Hard Constraints for Grammatical Function Labelling
10 0.80989516 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
11 0.80754769 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
12 0.80658853 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
13 0.80536306 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
14 0.80533242 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
15 0.80426633 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
16 0.80425155 114 acl-2010-Faster Parsing by Supertagger Adaptation
17 0.80410564 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
18 0.80075169 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
19 0.80043203 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
20 0.80035776 248 acl-2010-Unsupervised Ontology Induction from Text