acl acl2010 acl2010-184 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. [sent-5, score-0.212]
2 Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. [sent-6, score-0.446]
3 We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. [sent-7, score-0.259]
4 We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. [sent-8, score-0.449]
5 1 Introduction In recent semantic role labeling (SRL) competitions such as the shared tasks of CoNLL 2005 and CoNLL 2008, supervised SRL systems have been trained on newswire text, and then tested on both an in-domain test set (Wall Street Journal text) and an out-of-domain test set (fiction). [sent-10, score-0.412]
6 Yet the baseline from CoNLL 2005 suggests that the fiction texts are actually easier than the newswire texts. [sent-12, score-0.212]
7 Building on recent efforts in domain adaptation, we develop unsupervised techniques for learning new representations of text. [sent-22, score-0.234]
8 Using latent-variable language models, we learn representations of texts that provide novel kinds of features to our supervised learning algorithms. [sent-23, score-0.302]
9 Sections 4, 5, 6 describe our SRL first, how we identify predicates in opentext, then how our baseline technique 968 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-29, score-0.24]
10 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 968–978, identifies and classifies arguments, and finally how we learn representations for improving argument identification and classification on out-of-domain text. [sent-31, score-0.463]
11 Typical representations in SRL and NLP use features of the local context to produce a representation. [sent-44, score-0.227]
12 For instance, one dimension of a traditional represen- tation R might be +1 if the instance contains the word “bank” as the head of a noun-phrase chunk that occurs before the predicate in the sentence, and 0 otherwise. [sent-45, score-0.422]
13 In our recent work (Huang and Yates, 2009) we show how to build systems that learn new representations for open-domain NLP using latentvariable language models like Hidden Markov Models (HMMs). [sent-48, score-0.233]
14 The Vite|srbi algorithm (Rabiner, 1989) can then be used to produce the optimal sequence of latent states si for a given instance x. [sent-59, score-0.337]
15 ’s criteria for open-domain representations: first, they are useful in making predictions on the training text because the HMM latent states categorize tokens according to distributional similarity. [sent-63, score-0.308]
16 3 Experimental Setup We test our open-domain semantic role labeling system using data from the CoNLL 2005 shared task (Carreras and M `arquez, 2005). [sent-66, score-0.265]
17 Every sentence in the dataset is automatically annotated with a number of NLP pipeline systems, including part-of-speech (POS) tags, phrase chunk labels (Carreras and M `arquez, 2003), namedentity tags, and full parse information by multiple parsers. [sent-73, score-0.26]
18 These pipeline systems are important for generating features for SRL, and one key reason for the poor performance of SRL systems on the Brown corpus is that the pipeline systems themselves perform worse. [sent-74, score-0.26]
19 They use a discriminative reranking approach to jointly predict the best set of argument boundaries and the best set of argument la- bels for a predicate. [sent-87, score-0.402]
20 Owing to the established difficulty of the Brown test set and the different domains of the Brown test and WSJ training data, this dataset makes for an excellent testbed for open-domain semantic role labeling. [sent-91, score-0.328]
21 While this task is almost trivial in the WSJ test set, where all but two out of over 5000 predicates can be observed in the training data, it is significantly more difficult in an open-domain setting. [sent-93, score-0.262]
22 1% of the predicates do not appear in the training data, and 11. [sent-95, score-0.272]
23 8% of the predicates appear at most twice in the training data (c. [sent-96, score-0.305]
24 5% of the WSJ test predicates that appear at most twice in training). [sent-99, score-0.307]
25 5 Table 1: Using HMM features in predicate identification reduces error in out-of-domain tests by 34. [sent-118, score-0.473]
26 There were 83 1 predicates in total; 51 never appeared in training and 98 appeared at most twice. [sent-122, score-0.276]
27 as predicates in training may not be predicates in the test set. [sent-123, score-0.43]
28 In an open-domain setting, therefore, we cannot rely solely on a catalog of predicates from the training data. [sent-124, score-0.214]
29 To address the task of open-domain predicate identification, we construct a Conditional Random Field (CRF) (Lafferty et al. [sent-125, score-0.227]
30 1 We use words, POS tags, chunk labels, and the predicate label at the preceding and following nodes as features for our Baseline system. [sent-128, score-0.547]
31 To learn an open-domain representation, we then trained an 80 state HMM on the unlabeled texts of the training and Brown test data, and used the Viterbi optimum states of each word as categorical features. [sent-129, score-0.375]
32 For predicates that never or rarely appear in training, the HMM features increase F1 by 4. [sent-131, score-0.31]
33 In all subsequent experiments, we fall back on the standard evaluation in which it is assumed that the boundaries of the predicate are given. [sent-137, score-0.227]
34 5 Semantic Role Labeling with HMM-based Representations Following standard practice, we divide the SRL task into two parts: argument identification and 1Available from http://sourceforge. [sent-139, score-0.291]
35 During argument identification, the system must label each token with labels that indicate either the beginning or interior of an argument (B-Arg or I-Arg), or a label that indicates the token is not part of an argument (O-Arg). [sent-142, score-0.871]
36 During argument classification, the system labels each token that is part of an argument with a class label, such as Arg0 or ArgM. [sent-143, score-0.553]
37 Following argument classification, multi-word arguments may have different classification labels for each token. [sent-144, score-0.342]
38 During argument identification we use the features below to predict the label Ai for token wi: words: wi, wi−1, and wi+1 • parts of speech (POS): POS tags ti, ti−1, and ti+1 • chunk labels: (e. [sent-155, score-0.664]
39 1 Incorporating HMM-based Representations As a first step towards an open-domain representation, we use an HMM with 80 latent state values, trained on the unlabeled text of the training and test sets, to produce Viterbi-optimal state values si for every token in the corpus. [sent-161, score-0.538]
40 2 Path Features Despite all of the features above, the SRL system has very little information to help it determine the syntactic relationship between a target predicate and a potential argument. [sent-164, score-0.348]
41 For instance, these baseline features provide only crude distance information to distinguish between multiple arguments that follow a predicate, and they make it difficult to correctly identify clause arguments or arguments that appear far from the predicate. [sent-165, score-0.501]
42 As a step in this direction, we introduce path features: features for the sequence of tokens be- 971 System Baseline Baseline+HMM Baseline+HMM+Paths Toutanova et al. [sent-167, score-0.24]
43 In standard SRL systems, these path features usually consist of a sequence of constituent parse nodes representing the shortest path through the parse tree between a word and the predicate (Gildea and Jurafsky, 2002). [sent-184, score-0.529]
44 We use four types of paths: word paths, POS paths, chunk paths, and HMM state paths. [sent-186, score-0.236]
45 Given an input sentence labeled with POS tags, and chunks, we construct path features for a token wi by concatenating words (or tags or chunk labels) between wi and the predicate. [sent-187, score-0.565]
46 For example, in the sentence “The HIV infection rate is expected to peak in 2010,” the word path between “rate” and predicate “peak” would be “is expected to”, and the POS path would be “VBZ VBD TO. [sent-188, score-0.445]
47 ” Since word, POS, and chunk paths are all subject to data sparsity for arguments that are far from the predicate, we build less-sparse path features by using paths of HMM states. [sent-189, score-0.86]
48 If we use a reasonable number of HMM states, each category label is much more common in the training data than the average word, and paths containing the HMM states should be much less sparse than word paths, and even chunk paths. [sent-190, score-0.567]
49 We call the result of adding path features to our feature set the Baseline+HMM+Paths system((BL). [sent-192, score-0.226]
50 As with the HMM models above, the latent states for word spans can be thought of as probabilistic categories for the spans. [sent-200, score-0.485]
51 And like the HMM models, we can turn the word span models into representations by using the state value for a span as a feature in our supervised SRL system. [sent-201, score-0.617]
52 Unlike path features, the features from our models of word spans consist of a single latent state value rather than a concatenation of state values, and as a consequence they tend to be much less sparse in the training data. [sent-202, score-0.812]
53 1 Span-HMM Representations We build our latent-variable models of word spans using variations of Hidden Markov Models, which we call Span-HMMs. [sent-204, score-0.303]
54 Each Span-HMM behaves just like a regular HMM, except that it includes one node, called a span node, that can gen- erate an entire span rather than a single word. [sent-206, score-0.328]
55 For instance, in the Span-HMM of Figure 1, node y5 is a span node that generates a span of length 3: “is expected to. [sent-207, score-0.442]
56 That is, at test time, we generate a Span-HMM feature for word wj by constructing a Span-HMM that has a span node for the sequence of words between wj and the predicate. [sent-209, score-0.269]
57 We determine the Viterbi optimal state ofthis span node, and use that state as the value of the new feature. [sent-210, score-0.306]
58 In our example in Figure 1, the value of span node y5 is used as a feature for 972 the token “rate”, since y5 generates the sequence of words between “rate” and the predicate “peak. [sent-211, score-0.507]
59 ” Notice that by using Span-HMMs to provide these features, we have condensed all paths in our data into a small number of categorical values. [sent-212, score-0.218]
60 Whereas there are a huge number of variations to the spans themselves, we can constrain the number of categories for the Span-HMM states to a reasonable number such that each category is likely to appear often in the training data. [sent-213, score-0.447]
61 The value of each Span-HMM state then represents a cluster of spans with similar delimiting words; some clusters will correlate with spans between predicates and arguments, and others with spans that do not connect predicates and arguments. [sent-214, score-1.13]
62 First, we take every sentence S in our training data and generate the set Spans(S) of all valid spans in the sentence. [sent-219, score-0.287]
63 For efficiency’s sake, we use only spans of length less than 15; approximately 95% of the arguments in our dataset were within 15 words of the predicate, so even with this restriction we are able to supply features for nearly all valid arguments. [sent-220, score-0.411]
64 The second step of our training procedure is to create a separate data point for each span of S. [sent-221, score-0.21]
65 For each span t ∈ Spans(S), we ceoacnhstr spucatn a Span-HMM hw spithan a regular nnos(dSe generating each element of S, except that a span node generates all of t. [sent-222, score-0.385]
66 Intuitively, running Baum-Welch over this data means that a span node with state k will be likely to generate two spans t1 and t2 if t1 and t2 tend to appear in similar contexts. [sent-224, score-0.591]
67 Thus, certain values of k will tend to appear for spans between predicates and arguments, and others will tend to appear between predicates and non-arguments. [sent-226, score-0.693]
68 This makes the value k informative for both argument identification and argument classification. [sent-227, score-0.492]
69 Since there are millions of different spans in our data, a straightforward implementation would require millions of parameters for each latent state of the Span-HMM. [sent-231, score-0.425]
70 If we use a small enough number of latent states in the base HMM (in experiments, we use 10 latent states), we drastically reduce the number of different spans in the data set, and therefore the number of parameters required for our model. [sent-237, score-0.569]
71 As with our other HMM-based models, we use the largest s number of latent states that will allow the resulting model to fit in our machine’s memory our previous experiments on representations for partof-speech tagging suggest that more latent states are usually better. [sent-239, score-0.606]
72 Our second approach trains a separate Span-HMM model for spans of different lengths. [sent-241, score-0.241]
73 We therefore use base HMM models with more latent states (up to 20) to annotate our sentences, and then train on the resulting Spans(ˆ s) as before. [sent-243, score-0.244]
74 With this technique, we produce features that are combinations of the state value for span nodes and the length of the span, in order to indicate which of our Span-HMM models the state value came from. [sent-244, score-0.419]
75 4 Combining Multiple Span-HMMs So far, our Span-HMM models produce one new feature for every token during argument identifi973 System Baseline+HMM+Paths Toutanova et al. [sent-247, score-0.289]
76 While these new features may be very helpful, ideally we would like our learned representations to produce multiple useful features for the CRF model, so that the CRF can combine the signals from each feature to learn a sophisticated model. [sent-265, score-0.34]
77 When we decode each of the models on training and test texts, we will obtain N different sequences of latent states, one for each locally-optimized model. [sent-268, score-0.236]
78 Figure 2 shows that when the argument is close to the predicate, both systems perform well, but as the distance from the predicate grows, our Multi-Span-HMM system is better able to identify and classify arguments than the Baseline+HMM+Paths system. [sent-303, score-0.583]
79 Table 6 provides results for argument identification and classification separately. [sent-304, score-0.291]
80 , 2007), SRL systems tend to have an easier time with porting argument identification to new domains, but are less strong at argument classification on new domains. [sent-307, score-0.524]
81 9 for argument identification, but suffers a much larger 8% drop in argument classification. [sent-310, score-0.402]
82 The Multi-Span-HMM model improves over the Baseline in both tasks and on both test sets, but the largest improvement (6%) is in argument classification on the Brown test set. [sent-311, score-0.297]
83 Figure 2: The Multi-Span-HMM (MSH) model is better able to identify and classify arguments that are far from the predicate than the Baseline+HMM+Paths (BL) model. [sent-350, score-0.342]
84 9 Table 6: Baseline (BL) and Multi-Span-HMM (MSH) performance on argument identification (Id. [sent-360, score-0.291]
85 While word path features can be highly valuable when there is training data available for them, only about 11% of the word paths in the Brown test set also appeared at all in the training data. [sent-364, score-0.541]
86 POS and chunk paths fared a bit better (22% and 33% respectively), but even then nearly 70% of all feature values had no available training data. [sent-365, score-0.388]
87 Thus Span- Figure 3: HMM path and Span-HMM features are far more likely to appear often in training data than the word, POS, and chunk path features. [sent-368, score-0.6]
88 Over 70% of Span-HMM-Base10 features in the Brown corpus appear at least three times during training; in contrast, fewer than 33% of chunk path features in the Brown corpus appear at all during training. [sent-369, score-0.558]
89 HMMs derive their power as representations for open-domain SRL from the fact that they provide features that are mostly the same across domains; 80% of the features of our Span-HMM-Base10 in the Brown corpus were observed at least once in the training data. [sent-370, score-0.357]
90 Table 7 shows examples of spans that were clustered into the same Span-HMM state, along with word to either side. [sent-371, score-0.241]
91 The emission from a span node are very sparse, so the Span-HMM has unsurprisingly learned to cluster spans according to the HMM states that precede and follow the span node. [sent-374, score-0.728]
92 One potentially interesting 975 Predicate Span B-Arg picked passed come sat the things up through the barbed wire down from Sundays over his second rock from at to in Table 7: Example spans labeled with the same Span-HMM state. [sent-376, score-0.241]
93 question for future work is whether a less sparse model of the spans themselves, such as a Na¨ ıve Bayes model for the span node, would yield a better clustering for producing features for semantic role labeling. [sent-378, score-0.665]
94 (2007b) also incorporate HMM-based representations into a system for the related task of Web information extraction, and are able to show that the system improves performance on rare terms. [sent-383, score-0.217]
95 F ¨urstenau and Lapata (2009b; 2009a) use semisupervised techniques to automatically annotate data for previously unseen predicates with semantic role information. [sent-384, score-0.343]
96 By incorporating learned features from HMMs and Span-HMMs trained on unlabeled text, our SRL system is able to correctly identify predicates in out-of-domain text with an F1 of 93. [sent-403, score-0.327]
97 5, and it can identify and classify arguments to predicates with an F1 of 73. [sent-404, score-0.254]
98 Our successes so far on out-of-domain tests bring hope that supervised NLP systems may eventually achieve the ideal where they no longer need new manually-labeled training data for every new domain. [sent-406, score-0.225]
99 Semi-supervised semantic role labeling using the latent words language model. [sent-453, score-0.293]
100 Distributional representations for handling sparsity in supervised sequence labeling. [sent-496, score-0.222]
wordName wordTfidf (topN-words)
[('srl', 0.383), ('hmm', 0.349), ('spans', 0.241), ('predicate', 0.227), ('argument', 0.201), ('paths', 0.177), ('brown', 0.17), ('predicates', 0.168), ('chunk', 0.165), ('span', 0.164), ('representations', 0.143), ('latent', 0.113), ('wsj', 0.112), ('path', 0.109), ('msh', 0.109), ('states', 0.102), ('conll', 0.097), ('toutanova', 0.096), ('crf', 0.094), ('si', 0.092), ('identification', 0.09), ('pradhan', 0.09), ('arguments', 0.086), ('bl', 0.085), ('features', 0.084), ('fiction', 0.082), ('role', 0.077), ('tests', 0.072), ('baseline', 0.072), ('state', 0.071), ('pos', 0.067), ('carreras', 0.064), ('yates', 0.062), ('token', 0.059), ('nr', 0.059), ('blitzer', 0.059), ('appear', 0.058), ('newswire', 0.058), ('domains', 0.058), ('node', 0.057), ('wi', 0.056), ('labels', 0.055), ('downey', 0.053), ('urstenau', 0.052), ('labeling', 0.052), ('semantic', 0.051), ('hmms', 0.05), ('sparse', 0.048), ('test', 0.048), ('arquez', 0.048), ('techniques', 0.047), ('tokens', 0.047), ('gildea', 0.047), ('supervised', 0.046), ('training', 0.046), ('domain', 0.044), ('bacchiani', 0.044), ('emple', 0.044), ('wachman', 0.044), ('preceding', 0.042), ('categorical', 0.041), ('adaptation', 0.041), ('pipeline', 0.04), ('punyakanok', 0.04), ('xavier', 0.038), ('escudero', 0.038), ('swier', 0.038), ('kucera', 0.038), ('unlabeled', 0.038), ('huang', 0.037), ('system', 0.037), ('tags', 0.036), ('ai', 0.035), ('llu', 0.035), ('ported', 0.035), ('abend', 0.035), ('xi', 0.035), ('memory', 0.033), ('call', 0.033), ('fernando', 0.033), ('twice', 0.033), ('fisher', 0.033), ('hagen', 0.033), ('weston', 0.033), ('sparsity', 0.033), ('systems', 0.032), ('immediately', 0.031), ('shai', 0.031), ('temple', 0.031), ('chunking', 0.031), ('appeared', 0.031), ('points', 0.031), ('instance', 0.03), ('copies', 0.03), ('far', 0.029), ('learn', 0.029), ('bank', 0.029), ('models', 0.029), ('label', 0.029), ('ci', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
2 0.39674062 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng
Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1
3 0.37564319 216 acl-2010-Starting from Scratch in Semantic Role Labeling
Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth
Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.
4 0.36441618 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
Author: Weiwei Sun
Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new “path” features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.
5 0.28649148 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
Author: Hector-Hugo Franco-Penya
Abstract: ―Tree SRL system‖ is a Semantic Role Labelling supervised system based on a tree-distance algorithm and a simple k-NN implementation. The novelty of the system lies in comparing the sentences as tree structures with multiple relations instead of extracting vectors of features for each relation and classifying them. The system was tested with the English CoNLL-2009 shared task data set where 79% accuracy was obtained. 1
6 0.27047649 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
7 0.25154284 238 acl-2010-Towards Open-Domain Semantic Role Labeling
8 0.2498707 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
9 0.24956872 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
10 0.24342492 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
11 0.23842832 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
12 0.18852983 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
13 0.16720228 158 acl-2010-Latent Variable Models of Selectional Preference
14 0.14185369 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
15 0.13571997 99 acl-2010-Efficient Third-Order Dependency Parsers
16 0.11691455 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging
17 0.11567522 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
18 0.10254817 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
19 0.10185795 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
20 0.09735015 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
topicId topicWeight
[(0, -0.344), (1, 0.143), (2, 0.415), (3, 0.215), (4, 0.035), (5, 0.002), (6, -0.262), (7, -0.05), (8, 0.022), (9, 0.045), (10, 0.009), (11, -0.06), (12, 0.029), (13, -0.072), (14, -0.151), (15, -0.069), (16, -0.081), (17, 0.046), (18, -0.035), (19, 0.044), (20, 0.045), (21, 0.039), (22, -0.031), (23, -0.009), (24, -0.049), (25, 0.036), (26, -0.009), (27, -0.008), (28, -0.051), (29, -0.005), (30, -0.032), (31, -0.014), (32, -0.002), (33, 0.057), (34, 0.042), (35, 0.027), (36, 0.041), (37, 0.002), (38, -0.008), (39, 0.028), (40, -0.0), (41, 0.049), (42, 0.035), (43, 0.014), (44, 0.01), (45, -0.041), (46, 0.014), (47, -0.023), (48, 0.001), (49, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.94396931 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
2 0.86089301 216 acl-2010-Starting from Scratch in Semantic Role Labeling
Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth
Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.
3 0.82009041 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
Author: Weiwei Sun
Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new “path” features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.
4 0.80055898 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng
Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1
5 0.78901637 238 acl-2010-Towards Open-Domain Semantic Role Labeling
Author: Danilo Croce ; Cristina Giannone ; Paolo Annesi ; Roberto Basili
Abstract: Current Semantic Role Labeling technologies are based on inductive algorithms trained over large scale repositories of annotated examples. Frame-based systems currently make use of the FrameNet database but fail to show suitable generalization capabilities in out-of-domain scenarios. In this paper, a state-of-art system for frame-based SRL is extended through the encapsulation of a distributional model of semantic similarity. The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects. The large scale empirical study here discussed confirms that state-of-art accuracy can be obtained for out-of-domain evaluations.
6 0.77412653 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
7 0.73439193 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
8 0.68700492 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
9 0.68282503 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
10 0.68040133 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
11 0.67950124 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
12 0.53936672 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
13 0.53003031 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
14 0.44550854 158 acl-2010-Latent Variable Models of Selectional Preference
15 0.44334894 248 acl-2010-Unsupervised Ontology Induction from Text
16 0.43578762 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
17 0.43498814 205 acl-2010-SVD and Clustering for Unsupervised POS Tagging
18 0.42569387 130 acl-2010-Hard Constraints for Grammatical Function Labelling
19 0.42406276 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers
20 0.3876082 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
topicId topicWeight
[(7, 0.014), (14, 0.035), (23, 0.072), (25, 0.06), (39, 0.013), (42, 0.028), (44, 0.011), (59, 0.147), (73, 0.06), (76, 0.018), (78, 0.101), (80, 0.026), (83, 0.105), (84, 0.043), (97, 0.011), (98, 0.17)]
simIndex simValue paperId paperTitle
1 0.96931148 107 acl-2010-Exemplar-Based Models for Word Meaning in Context
Author: Katrin Erk ; Sebastian Pado
Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.
2 0.94869268 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal
Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.
same-paper 3 0.94642639 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
4 0.94346178 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
5 0.93984699 63 acl-2010-Comparable Entity Mining from Comparative Questions
Author: Shasha Li ; Chin-Yew Lin ; Young-In Song ; Zhoujun Li
Abstract: Comparing one thing with another is a typical part of human decision making process. However, it is not always easy to know what to compare and what are the alternatives. To address this difficulty, we present a novel way to automatically mine comparable entities from comparative questions that users posted online. To ensure high precision and high recall, we develop a weakly-supervised bootstrapping method for comparative question identification and comparable entity extraction by leveraging a large online question archive. The experimental results show our method achieves F1measure of 82.5% in comparative question identification and 83.3% in comparable entity extraction. Both significantly outperform an existing state-of-the-art method. 1
6 0.93878967 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
7 0.92858434 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
8 0.92530155 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
9 0.9225173 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
10 0.92122376 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
11 0.92072403 130 acl-2010-Hard Constraints for Grammatical Function Labelling
12 0.91995591 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
13 0.91724074 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
14 0.91463399 169 acl-2010-Learning to Translate with Source and Target Syntax
15 0.91324627 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
16 0.91185188 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
17 0.91170007 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
18 0.91139227 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
19 0.91068542 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
20 0.91053736 248 acl-2010-Unsupervised Ontology Induction from Text