acl acl2010 acl2010-158 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
Reference: text
sentIndex sentText sentNum sentScore
1 Latent variable models of selectional preference Diarmuid O´ S ´eaghdha University of Cambridge Computer Laboratory United Kingdom do2 4 2 @ c l . [sent-1, score-0.558]
2 Abstract This paper describes the application of so-called topic models to selectional preference induction. [sent-2, score-0.632]
3 Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. [sent-3, score-0.559]
4 Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data. [sent-4, score-0.187]
5 Recognising whether or not a selectional restriction is satisfied can be an important trigger for metaphorical interpretations (Wilks, 1978) and also plays a role in the time course of human sentence processing (Rayner et al. [sent-8, score-0.297]
6 A more relaxed notion of selectional preference captures the idea that certain classes of entities are more likely than others to fill a given argument slot of a predicate. [sent-10, score-0.759]
7 The notion of selectional preference is not restricted c am . [sent-13, score-0.46]
8 The fundamental problem that selectional preference models must address is data sparsity: in many cases insufficient corpus data is available to reliably measure the plausibility of a predicate-argument pair by counting its observed frequency. [sent-17, score-0.826]
9 A rarely seen pair may be fundamentally implausible (a carrot laughed) or plausible but rarely expressed (a manservant laughed). [sent-18, score-0.233]
10 1 In general, it is beneficial to smooth plausibility estimates by integrating knowledge about the frequency of other, similar predicate-argument pairs. [sent-19, score-0.352]
11 This paper takes up tools (“topic models”) that have been proven successful in modelling document-word co-occurrences and adapts them to the task of selectional preference learning. [sent-21, score-0.596]
12 Advantages of these models include a well-defined generative model that handles sparse data well, the ability to jointly induce semantic classes and predicate-specific distributions over those classes, and the enhanced statistical strength achieved by sharing knowledge across predicates. [sent-22, score-0.357]
13 Section 2 surveys prior work on selectional preference modelling and on semantic applications of topic models. [sent-23, score-0.767]
14 1 Selectional preference learning The representation (and latterly, learning) of selectional preferences for verbs and other predicates has long been considered a fundamental problem in computational semantics (Resnik, 1993). [sent-32, score-0.641]
15 Many approaches to the problem use lexical taxonomies such as WordNet to identify the semantic classes that typically fill a particular argument slot for a predicate (Resnik, 1993; Clark and Weir, 2002; Schulte im Walde et al. [sent-33, score-0.593]
16 In this paper, however, we focus on methods that do not assume the availability of a comprehensive taxonomy but rather induce semantic classes automatically from a corpus of text. [sent-35, score-0.213]
17 (1999) introduced a model of selectional preference induction that casts the prob- lem in a probabilistic latent-variable framework. [sent-38, score-0.51]
18 ’s model each observed predicateargument pair is probabilistically generated from a latent variable, which is itself generated from an underlying distribution on variables. [sent-40, score-0.215]
19 The use of latent variables, which correspond to coherent clusters of predicate-argument interactions, allow probabilities to be assigned to predicate-argument pairs which have not previously been observed by the model. [sent-41, score-0.243]
20 The discovery of these predicate-argument clusters and the estimation of distributions on latent and observed variables are performed simultaneously via an Expectation Maximisation procedure. [sent-42, score-0.194]
21 ’s latent variable approach, most directly in the model described in Section 3. [sent-44, score-0.219]
22 (2007) describe a corpusdriven smoothing model which is not probabilistic in nature but relies on similarity estimates from a “semantic space” model that identifies semantic similarity with closeness in a vector space of cooccurrences. [sent-47, score-0.231]
23 (2008) suggest learning selectional preferences in a discriminative way, by training a collection of SVM classifiers to recognise likely and unlikely arguments for predicates of interest. [sent-49, score-0.546]
24 They demonstrate that noisy counts from a Web search engine can yield estimates of plausibility for predicate-argument pairs that are superior to models learned from a smaller parsed corpus. [sent-51, score-0.501]
25 2 Topic modelling The task of inducing coherent semantic clusters is common to many research areas. [sent-55, score-0.307]
26 Formally seen, these are hierarchical Bayesian models which induce a set of latent variables or topics that are shared across documents. [sent-60, score-0.281]
27 The tools provided by this research are used in this paper as the building blocks of our selectional preference models. [sent-63, score-0.46]
28 (2007) integrate a model of random walks on the WordNet graph into an LDA topic model to build an unsupervised word sense disambiguation system. [sent-69, score-0.215]
29 (2007) demonstrate that topic models learned from document-word co-occurrences are good predictors of semantic association judgements by humans. [sent-75, score-0.309]
30 (2010) have also investigated the use of topic models for selectional preference learning. [sent-77, score-0.632]
31 Their goal is slightly different to ours in that they wish to model the probability of a binary predicate taking two specified arguments, i. [sent-78, score-0.288]
32 , P(n1, n2 |v), whereas we model the joint and conditional probabilities so fw a predicate taking a single specified argument. [sent-80, score-0.288]
33 The most likely explanation is that LinkLDA generates its two arguments independently, which may be suitable for distinct argument positions of a given predicate but is unsuitable when one of those “arguments” is in fact the predicate. [sent-83, score-0.482]
34 The models developed in this paper, though intended for semantic modelling, also bear some similarity to the internals of generative syntax models such as the “infinite tree” (Finkel et al. [sent-84, score-0.207]
35 1 Notation In the model descriptions below we assume a predicate vocabulary of V types, an argument vocabulary of N types and a relation vocabulary of R types. [sent-89, score-0.67]
36 Each predicate type is associated with a singe relation; for example the predicate type eat:V:dobj (the direct object of the verb eat) is treated as distinct from eat:V:subj (the subject of the verb eat). [sent-90, score-0.476]
37 Each model has at least one vocabulary of Z arbitrar- ily labelled latent variables. [sent-92, score-0.232]
38 fzn is the number of observations where the latent variable z has been associated with the argument type n, fzv is the number of observations where z has been associated with the predicate type v and fzr is the number of observations where z has been associated with the relation r. [sent-93, score-0.965]
39 Finally, fz· is the total number of observations associated with z and f·v is the total number of observations containing the predicate v. [sent-94, score-0.434]
40 The analogical move from modelling document-term cooccurrences to modelling predicate-argument cooccurrences is intuitive: we assume that each predicate is associated with a distribution over semantic classes (“topics”) and that these classes are shared across predicates. [sent-98, score-0.967]
41 The high-level “generative story” for the LDA selectional preference model is as follows: (1) For each predicate v, draw a multinomial distribution Θv over argument classes from a Dirichlet distribution with parameters α. [sent-99, score-1.364]
42 (2) For each argument class z, draw a multinomial distribution over argument types from a Dirichlet with parameters β. [sent-100, score-0.744]
43 The first term in (2) can be seen as a smoothed estimate of the probability that class z produces the argument n; the second is a smoothed estimate of the probability that predicate v takes an argument belonging to class z. [sent-102, score-0.867]
44 One important point is that the smoothing effects of the Dirichlet priors on Θv and Φz are greatest for predicates and arguments that are rarely seen, reflecting an intuitive lack of certainty. [sent-103, score-0.234]
45 This model estimates predicate-argument probabilities conditional on a given predicate v; it cannot by itself provide joint probabilities P(n, v|r), which are needed for our plausibility sev Pa(luna,tvio|nr. [sent-106, score-0.64]
46 ) Given a dataset of predicate-argument combinations and values for the hyperparameters α and β, the probability model is determined by the class assignment counts fzn and fzv. [sent-107, score-0.378]
47 ’s (1999) selectional preference model, a latent variable is responsible for generating both the predicate and argument types of an observation. [sent-124, score-1.042]
48 The basic LDA model can be extended to capture this kind of predicate-argument interaction; the generative story for the resulting ROOTH-LDA model is as follows: (1) For each relation r, draw a multinomial distribution Θr over interaction classes from a Dirichlet distribution with parameters α. [sent-125, score-0.623]
49 (2) For each class z, draw a multinomial Φz over argument types from a Dirichlet distribution with parameters and a multinomial Ψz over predicate types from a Dirichlet distribution with parameters γ. [sent-126, score-0.985]
50 β (3) To generate an observation for r, draw a class z from Θr, then draw an argument type n from Φz and a predicate type v from Ψz. [sent-127, score-0.731]
51 Unlike LDA, ROOTH-LDA does model the joint probability P(n, v|r) of a predicate and argument co-occurring. [sent-129, score-0.463]
52 nF,uvr|trh)er o fd aiff pererednicceaste are th aragtu imnfeonrtmation about predicate-argument co-occurrence is only shared within a given interaction class rather than across the whole dataset and that the distribution Φz is not specific to the predicate v but rather to the relation r. [sent-130, score-0.481]
53 4 A “dual-topic” model In our third model, we attempt to combine the advantages of LDA and ROOTH-LDA by clustering arguments and predicates according to separate 2http: //mallet . [sent-133, score-0.192]
54 Each observation is generated by two latent variables rather than one, which potentially allows the model to learn more flexible interactions between arguments and predicates. [sent-137, score-0.247]
55 : (1) For each relation r, draw a multinomial distribution Ξr over predicate classes from a Dirichlet with parameters κ. [sent-138, score-0.687]
56 (2) For each predicate class c, draw a multinomial over predicate types and a multinomial Θc over argument classes from Dirichlets with parameters γ and α respectively. [sent-139, score-1.23]
57 Ψc (3) For each argument class z, draw a multinomial distribution Φz over argument types from a Dirichlet with parameters β. [sent-140, score-0.744]
58 (4) To generate an observation for r, draw a predicate class c from Ξr, a predicate type from Ψc, an argument class z from and an argument type from Φz. [sent-141, score-1.156]
59 Other approaches are possible resampling argument and then predicate class assignments for each – observation in turn, or sampling argument and predicate assignments together by blocked sampling though from our experiments it does not seem that the choice of scheme makes a significant difference. [sent-143, score-1.15]
60 One popular method for evaluating selectional preference models is by testing the correlation between their predictions and human judgements of plausibility on a dataset ofpredicate-argument pairs. [sent-146, score-1.1]
61 In Section 5, we use two plausibility datasets to evaluate our models and compare to other previously published results. [sent-148, score-0.423]
62 During development we used the verb-noun plausibility dataset from Pad o´ et al. [sent-153, score-0.356]
63 439 Table 1: Most probable words for sample semantic classes induced from verb-object observations three runs. [sent-169, score-0.317]
64 1 Induced semantic classes Table 1 shows sample semantic classes induced by models trained on the corpus of BNC verb-object co-occurrences. [sent-174, score-0.456]
65 LDA clusters nouns only, while ROOTH-LDA and ROOTH-EM learn classes that generate both nouns and verbs and DUAL-LDA clusters nouns and verbs separately. [sent-175, score-0.338]
66 The LDA clusters are generally sensible: class 0 is exemplified by agreement and contract and class 1by information and datum. [sent-176, score-0.294]
67 , blocked sampling of argument and predicate classes), without overcoming this brittleness. [sent-182, score-0.49]
68 Unsurprisingly, ROOTH-EM’s classes have a similar feel to ROOTH-LDA; our general impression is that some of ROOTH-EM’s classes look even more coherent than the LDAbased models, presumably because it does not use priors to smooth its per-class distributions. [sent-183, score-0.393]
69 2 Comparison with Keller and Lapata (2003) Keller and Lapata (2003) collected a dataset of human plausibility judgements for three classes of grammatical relation: verb-object, noun-noun modification and adjective-noun modification. [sent-185, score-0.561]
70 The items in this dataset were not chosen to balance plausibility and implausibility (as in prior psycholinguistic experiments) but according to their corpus frequency, leading to a more realistic task. [sent-186, score-0.392]
71 30 predicates were selected for each relation; each predicate was matched with three arguments from different co-occurrence bands in the BNC, e. [sent-187, score-0.38]
72 Each predicate was also matched with three random arguments 440 Table 2: Results (Pearson r and Spearman ρ correlations) on Keller and Lapata’s (2003) plausibility data with which it does not co-occur in the BNC (e. [sent-190, score-0.616]
73 Following Keller and Lapata, – we report Pearson correlation coefficients between log-transformed predicted frequencies and the goldstandard plausibility scores (which are already logtransformed). [sent-197, score-0.36]
74 We also report Spearman rank correlations except where we do not have the original predictions (the Web count models), for completeness and because the predictions of preference models are may not be log-normally distributed as corpus counts are. [sent-198, score-0.507]
75 1 to facilitate the log transformation; it seems natural to take a zero prediction as a non-specific prediction of very low plausibility rather than a “missing value” as is done in other work (e. [sent-200, score-0.309]
76 For frequent predicate-argument pairs (Seen datasets), Web counts are clearly better; however, the BNC counts are unambiguously superior to LDA and ROOTH-LDA (whose predictions are based entirely on the generative model even for observed items) for the Seen verb-object data only. [sent-210, score-0.329]
77 As might be suspected from the mixing problems observed with DUAL-LDA, this model does not perform as well as LDA and ROOTH-LDA, though it does hold its own against the other selectional preference methods. [sent-211, score-0.51]
78 We hypothesise that the main reason for the superior numerical performance of the LDA models over EM is the principled smoothing provided by the use of Dirichlet priors, which has a small but discriminative effect on model predictions. [sent-229, score-0.252]
79 ’s model on unseen adjectivenoun combinations, and significantly worse than the same model on seen adjective-noun data. [sent-232, score-0.22]
80 Latent variable models that use EM for inference can be very sensitive to the number of latent variables chosen. [sent-233, score-0.226]
81 For example, the performance of ROOTH-EM worsens quickly if the number of clusters is overestimated; for the Keller and Lapata datasets, settings above 50 classes lead to clear overfitting and a precipitous drop in Pearson cor- relation scores. [sent-234, score-0.235]
82 (2009) demonstrate that LDA is relatively insensitive to the choice of topic vocabulary size Z when the α and β hyperparameters are optimised appropriately during estimation. [sent-236, score-0.202]
83 ’s finding for document modelling transfers to selectional preference models; within the range Z = 50–200 performance remains at a roughly similar level. [sent-239, score-0.596]
84 (2008) propose a discriminative approach to preference learning. [sent-246, score-0.203]
85 As part of their evaluation, they compare their approach to a number of others, including that of Erk (2007), on a plausibility dataset collected by Holmes et al. [sent-247, score-0.356]
86 ’s model, trained on the 3GB AQUAINT corpus, is the only model reported to achieve perfect accuracy on distinguishing plausible from implausible arguments. [sent-253, score-0.188]
87 6 6 Conclusions and future work This paper has demonstrated how Bayesian techniques originally developed for modelling the topical structure of documents can be adapted to learn probabilistic models of selectional preference. [sent-256, score-0.49]
88 These models are especially effective for estimating plausibility of low-frequency items, thus distinguishing rarity from clear implausibility. [sent-257, score-0.366]
89 The models presented here derive their predictions by modelling predicate-argument plausibility through the intermediary of latent variables. [sent-258, score-0.725]
90 report that all plausible pairs were seen in their corpus; three were unseen in ours, as well as 12 of the implausible pairs. [sent-261, score-0.258]
91 442 strategy for frequent combinations, where corpus counts are probably reliable and plausibility judgements may be affected by lexical collocation effects. [sent-262, score-0.445]
92 One principled method for folding corpus counts into LDA-like models would be to use hierarchical priors, as in the n-gram topic model of Wallach (2006). [sent-263, score-0.313]
93 ’s (2008) discriminative model this could be done in a number of ways, including using the induced classes of a topic model as features for a discriminative – classifier or using the discriminative classifier to produce additional high-quality training data from noisy unparsed text. [sent-265, score-0.498]
94 Comparison to plausibility judgements gives an intrinsic measure of model quality. [sent-266, score-0.44]
95 As mentioned in the Introduction, selectional preferences have many uses in NLP applications, and it will be interesting to evaluate the utility of Bayesian preference models in contexts such as semantic role labelling or human sentence processing modelling. [sent-267, score-0.64]
96 The probabilistic nature of topic models, coupled with an appropriate probabilistic task model, may facilitate the integration of class induction and task learning in a tight and principled way. [sent-268, score-0.265]
97 We also anticipate that latent variable models will prove effective for learning selectional preferences of semantic predicates (e. [sent-269, score-0.719]
98 Iam grateful to Frank Keller and Mirella Lapata for sharing their plausibility data, and to Andreas Vlachos and the anonymous ACL and CoNLL reviewers for their helpful comments. [sent-273, score-0.309]
99 The effect of plausibility on eye movements in reading. [sent-374, score-0.309]
100 Combining EM training and the MDL principle for an automatic verb classification incorporating selectional preferences. [sent-397, score-0.297]
wordName wordTfidf (topN-words)
[('lda', 0.347), ('plausibility', 0.309), ('selectional', 0.297), ('predicate', 0.238), ('argument', 0.175), ('preference', 0.163), ('dirichlet', 0.161), ('pad', 0.16), ('keller', 0.147), ('modelling', 0.136), ('rooth', 0.135), ('latent', 0.128), ('classes', 0.124), ('topic', 0.115), ('class', 0.114), ('bnc', 0.104), ('draw', 0.102), ('wallach', 0.1), ('multinomial', 0.098), ('observations', 0.098), ('lapata', 0.098), ('predictions', 0.095), ('bergsma', 0.09), ('implausible', 0.089), ('judgements', 0.081), ('predicates', 0.073), ('unseen', 0.069), ('arguments', 0.069), ('spearman', 0.067), ('preferences', 0.067), ('clusters', 0.066), ('topics', 0.063), ('priors', 0.06), ('linklda', 0.058), ('cooccurrences', 0.058), ('laughed', 0.058), ('models', 0.057), ('datasets', 0.057), ('semantic', 0.056), ('counts', 0.055), ('vocabulary', 0.054), ('erk', 0.052), ('seen', 0.051), ('correlation', 0.051), ('allocation', 0.051), ('model', 0.05), ('xz', 0.05), ('plausible', 0.049), ('coherent', 0.049), ('eat', 0.048), ('pearson', 0.048), ('holmes', 0.047), ('dataset', 0.047), ('griffiths', 0.047), ('assignments', 0.046), ('relation', 0.045), ('meng', 0.045), ('burnin', 0.044), ('carrot', 0.044), ('fzn', 0.044), ('pbnc', 0.044), ('xzffz', 0.044), ('parameters', 0.043), ('estimates', 0.043), ('correlations', 0.042), ('steyvers', 0.042), ('variable', 0.041), ('sampling', 0.041), ('verbs', 0.041), ('trh', 0.04), ('discriminative', 0.04), ('correlated', 0.039), ('schulte', 0.039), ('walde', 0.039), ('aquaint', 0.039), ('laughs', 0.039), ('induced', 0.039), ('distribution', 0.037), ('superior', 0.037), ('bayesian', 0.037), ('generative', 0.037), ('items', 0.036), ('blei', 0.036), ('principled', 0.036), ('brody', 0.036), ('mimno', 0.036), ('plda', 0.036), ('blocked', 0.036), ('impression', 0.036), ('laugh', 0.036), ('zapirain', 0.036), ('rasp', 0.036), ('combinations', 0.035), ('hyperparameters', 0.033), ('hanna', 0.033), ('reisinger', 0.033), ('reimplemented', 0.033), ('burnard', 0.033), ('induce', 0.033), ('smoothing', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999887 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
2 0.43075702 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
Author: Mark Johnson
Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
4 0.24325421 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
5 0.22719832 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
Author: Yotaro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: In predicate-argument structure analysis, it is important to capture non-local dependencies among arguments and interdependencies between the sense of a predicate and the semantic roles of its arguments. However, no existing approach explicitly handles both non-local dependencies and semantic dependencies between predicates and arguments. In this paper we propose a structured model that overcomes the limitation of existing approaches; the model captures both types of dependencies simultaneously by introducing four types of factors including a global factor type capturing non-local dependencies among arguments and a pairwise factor type capturing local dependencies between a predicate and an argument. In experiments the proposed model achieved competitive results compared to the stateof-the-art systems without applying any feature selection procedure.
6 0.2161238 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
7 0.17867193 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
8 0.17055383 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
9 0.16720228 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
10 0.1657971 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
11 0.14976628 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
12 0.1494891 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
13 0.13505627 216 acl-2010-Starting from Scratch in Semantic Role Labeling
14 0.13173755 79 acl-2010-Cross-Lingual Latent Topic Extraction
15 0.12726063 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
16 0.12391918 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
17 0.12248739 238 acl-2010-Towards Open-Domain Semantic Role Labeling
18 0.11999527 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
19 0.11825049 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
20 0.11655281 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization
topicId topicWeight
[(0, -0.286), (1, 0.188), (2, 0.167), (3, 0.075), (4, 0.118), (5, 0.004), (6, -0.054), (7, -0.085), (8, 0.18), (9, -0.294), (10, 0.031), (11, 0.005), (12, 0.244), (13, 0.105), (14, 0.289), (15, 0.013), (16, 0.02), (17, 0.001), (18, -0.08), (19, -0.17), (20, -0.047), (21, 0.017), (22, 0.032), (23, -0.048), (24, -0.058), (25, 0.081), (26, 0.073), (27, 0.017), (28, -0.093), (29, 0.062), (30, -0.033), (31, -0.06), (32, -0.024), (33, 0.019), (34, -0.144), (35, 0.021), (36, -0.029), (37, -0.019), (38, 0.06), (39, 0.037), (40, 0.04), (41, 0.052), (42, 0.014), (43, -0.021), (44, -0.012), (45, 0.005), (46, -0.025), (47, 0.023), (48, -0.058), (49, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.96052319 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
2 0.90802795 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
3 0.71916658 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
4 0.63277072 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
Author: Mark Johnson
Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
6 0.56596738 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
7 0.55077505 79 acl-2010-Cross-Lingual Latent Topic Extraction
8 0.5409286 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
9 0.50290245 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
10 0.49811137 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
11 0.49217322 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
12 0.45795542 238 acl-2010-Towards Open-Domain Semantic Role Labeling
13 0.43445197 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
14 0.41630036 216 acl-2010-Starting from Scratch in Semantic Role Labeling
15 0.41477793 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
16 0.40500006 8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization
17 0.40030038 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
18 0.3904787 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
19 0.37428397 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
20 0.37306842 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
topicId topicWeight
[(4, 0.019), (7, 0.015), (14, 0.015), (25, 0.06), (30, 0.084), (42, 0.019), (44, 0.026), (59, 0.103), (71, 0.01), (73, 0.066), (76, 0.015), (78, 0.157), (80, 0.012), (83, 0.111), (84, 0.065), (97, 0.021), (98, 0.116)]
simIndex simValue paperId paperTitle
same-paper 1 0.91786182 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
2 0.91289008 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
Author: Alan Ritter ; Mausam Mausam ; Oren Etzioni
Abstract: The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability. We present LDA-SP, which utilizes LinkLDA (Erosheva et al., 2004) to model selectional preferences. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional classbased approaches, it produces humaninterpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. We compare LDA-SP to several state-ofthe-art methods achieving an 85% increase in recall at 0.9 precision over mutual information (Erk, 2007). We also evaluate LDA-SP’s effectiveness at filtering improper applications of inference rules, where we show substantial improvement over Pantel et al. ’s system (Pantel et al., 2007).
3 0.91231155 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing
Author: Amit Dubey
Abstract: Probabilistic models of sentence comprehension are increasingly relevant to questions concerning human language processing. However, such models are often limited to syntactic factors. This paper introduces a novel sentence processing model that consists of a parser augmented with a probabilistic logic-based model of coreference resolution, which allows us to simulate how context interacts with syntax in a reading task. Our simulations show that a Weakly Interactive cognitive architecture can explain data which had been provided as evidence for the Strongly Interactive hypothesis.
4 0.90401924 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
5 0.88237917 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
Author: Hector-Hugo Franco-Penya
Abstract: ―Tree SRL system‖ is a Semantic Role Labelling supervised system based on a tree-distance algorithm and a simple k-NN implementation. The novelty of the system lies in comparing the sentences as tree structures with multiple relations instead of extracting vectors of features for each relation and classifying them. The system was tested with the English CoNLL-2009 shared task data set where 79% accuracy was obtained. 1
6 0.87084568 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
7 0.86545539 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
8 0.85310435 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
9 0.8454293 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
10 0.84047031 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
11 0.83520198 130 acl-2010-Hard Constraints for Grammatical Function Labelling
12 0.83385938 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
13 0.83055687 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
14 0.82995844 71 acl-2010-Convolution Kernel over Packed Parse Forest
15 0.82661057 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
16 0.8244527 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
17 0.82369804 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
18 0.82188821 59 acl-2010-Cognitively Plausible Models of Human Language Processing
19 0.81765127 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
20 0.81684405 248 acl-2010-Unsupervised Ontology Induction from Text