emnlp emnlp2011 emnlp2011-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , Abstract Dependency parsers are critical components within many NLP systems. [sent-2, score-0.143]
2 However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. [sent-3, score-0.719]
3 Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. [sent-4, score-0.509]
4 In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. [sent-5, score-0.728]
5 We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. [sent-6, score-0.115]
6 The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy. [sent-7, score-0.845]
7 Unfortunately, currently available dependency parsers suffer from at least one of several weaknesses including high running time, limited accuracy, vague dependency labels, and lack of non-projectivity support. [sent-9, score-0.675]
8 Furthermore, few parsers include any sort of 1257 additional semantic interpretation, such as interpretations for prepositions, possessives, or noun compounds. [sent-10, score-0.24]
9 In this paper, we describe 1) a new dependency conversion (Section 3) of the Penn Treebank (Marcus, et al. [sent-11, score-0.54]
10 , 2005; McDonald and Pereira, 2006), the Charniak (2000) parser, and the Berkeley parser (Petrov et al. [sent-15, score-0.137]
11 The experimental results show that the parser is substantially more accurate than Goldberg and Elhadad’s original implementation, with fairly similar overall speed. [sent-17, score-0.137]
12 Furthermore, the results prove that Stanford-granularity dependency labels can be learned by modern dependency parsing systems when using our Treebank conversion, unlike the Stanford conversion, for which Cer et al. [sent-18, score-0.622]
13 The optional semantic annotation modules also ProceedEindgisnb oufr tghhe, 2 S0c1o1tl Canodn,f eUrKen,c Jeuol yn 2 E7m–3p1ir,ic 2a0l1 M1. [sent-20, score-0.185]
14 tc ho2d0s11 in A Nsasotuciraatlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinagguesis 1ti2c5s7–1268, perform well, with the preposition sense disambiguation module exceeding the accuracy of the previous best reported result for fine-grained preposition sense disambiguation (85. [sent-22, score-0.491]
15 8%), the possessives interpretation system achieving over 85% accuracy, and the noun compound interpretation system performing similarly to an earlier version described by Tratz and Hovy (2010) at just over 79% accuracy. [sent-25, score-0.445]
16 2 Background The NLP community has recently seen a surge of interest in dependency parsing, with several CoNLL shared tasks focusing on it (Buchholz and Marsi, 2006; Nivre et al. [sent-26, score-0.279]
17 One of the main advantages of dependency parsing is the relative ease with which it can handle non-projectivity1 . [sent-28, score-0.343]
18 Unfortunately, most currently available dependency parsers produce relatively vague labels or, in many cases, produce no labels at all. [sent-30, score-0.627]
19 While the Stanford fine-grain dependency scheme (de Marneffe and Manning, 2008) has proven to be popular, recent experiments by Cer et al. [sent-31, score-0.337]
20 (2010) using the Stanford conversion of the Penn Treebank indicate that it is difficult for current dependency parsers to learn. [sent-32, score-0.683]
21 Indeed, the highest scoring parsers trained using the MSTPARSER (McDonald and Pereira, 2006) and MALTPARSER (Nivre et al. [sent-33, score-0.143]
22 This contrasted with the much higher performance obtained using a constituent-todependency conversion approach with accurate, but much slower, constituency parsers such as the Charniak and Johnson (2005) and Berkeley (Petrov et al. [sent-37, score-0.455]
23 1258 Though there are many syntactic parsers than can reconstruct the grammatical structure of a text, there are few, if any, accurate and widely accepted systems that also produce shallow semantic analysis of the text. [sent-43, score-0.223]
24 For example, a parser may indicate that, in the case of ‘ice statue’, ‘ice’ modifies ‘statue’ but will not indicate that ‘ice’ is the substance of the statue. [sent-44, score-0.137]
25 Similarly, a parser will indicate which words a preposition connects but will not give any semantic interpretation (e. [sent-45, score-0.373]
26 1 Relations and Structure Most recent English dependency parsers produce one of three sets of dependency types: unlabeled, some variant of the coarse labels used by the CoNLL dependency parsing shared-tasks (Buchholz and Marsi, 2006; Nivre et al. [sent-50, score-1.032]
27 , ADV, NMOD, PMOD), or Stanford’s dependency labels (de Marneffe and Manning, 2008). [sent-53, score-0.279]
28 Our dependency relation scheme is similar to Stanford’s basic scheme but has several differences. [sent-58, score-0.446]
29 The nsubjpass, csubjpass, and auxpass relations of Stanford’s are left out because adding them up front makes learning more difficult and the fact Table 1: Dependency scheme with differences versus basic Stanford dependencies highlighted. [sent-60, score-0.21]
30 Stanford’s aux dependencies are replaced using verbal chain (vch) links; conversion of these to Stanford-style aux dependencies is also trivial as a post-processing step. [sent-64, score-0.554]
31 3 The attr dependency is excluded because it is redundant with the cop relation due to different handling of copula, and the dependency scheme does not have an abbrev label because this information is not provided by the Penn Treebank. [sent-65, score-0.632]
32 The dependency scheme with dif- ferences with Stanford highlighted is presented in Table 1. [sent-66, score-0.337]
33 In addition to using a slightly different set of dependency names, a handful of relations, notably cop, conj, and cc, are treated in a different manner. [sent-67, score-0.228]
34 The Stanford scheme’s treatment of copula may be one reason why dependency parsers have trouble learning and applying it. [sent-69, score-0.458]
35 3The parsing system includes an optional script that can convert vch arcs into aux and auxpass and the subject relations into csubjpass and nsubjpass. [sent-71, score-0.602]
36 1259 Figure 1: Example comparing Stanford’s (top) handling of copula and coordinating conjunctions with ours (bottom). [sent-72, score-0.13]
37 , 1993) from constituent parses into dependency trees labeled according to the dependency scheme presented in the prior section. [sent-75, score-0.609]
38 Finally, an additional script makes additional changes and converts the intermediate output into the dependency scheme. [sent-84, score-0.39]
39 Using the modified head-finding rules for Johansson and Nugues’ (2007) converter results in fewer buggy trees than were present in the CoNLL shared tasks, including fewer trees in which words are headed by punctuation marks. [sent-86, score-0.231]
40 For sections 2– 21, there are far fewer generic dep/DEP relations (2,765) than with the Stanford conversion (34,134) or the CoNLL 2008 shared task conversion (23,81 1). [sent-87, score-0.675]
41 Also, the additional conversion script contains various rules for correcting part-of-speech (POS) errors using the syntactic structure as well as additional rules for some specific word forms, mostly common words with inconsistent taggings. [sent-88, score-0.426]
42 In total, the script changes over 9,500 part-of-speech tags, with the most common change being to change preposition tags (IN) into adverb tags (RB) for cases where there is no prepositional complement/object. [sent-90, score-0.355]
43 The conversion script contains a variety of additional rules for modifying the parse structure and fixing erroneous trees as well, including cases where one or more POS tags were incorrect and, as such, the initial dependency parse was flawed. [sent-92, score-0.819]
44 3% of this, or 556 of sentences, is due to the secondary conversion script, with sentences containing approximate currency amounts (e. [sent-97, score-0.379]
45 , about, over, nearly), is linked to the number following the currency symbol instead of to the currency symbol as it was in the CoNLL 2008 task. [sent-102, score-0.134]
46 Table 2: Top 15 part-of-speech tag changes performed by the conversion script. [sent-103, score-0.36]
47 1 Algorithm The parsing approach is based upon the nondirectional easy-first algorithm recently presented by Goldberg and Elhadad (2010). [sent-105, score-0.165]
48 However, since only O(n) dot products must be calculated by the parser and these have a large constant associated with them, the running time will rival O(n) parsers for any reasonable n, and, thus, a naive O(n2) implementation will be nearly as fast as a priority queue implementation in practice. [sent-111, score-0.28]
49 4 The algorithm has a couple potential advantages over standard shift-reduce style parsing algorithms. [sent-112, score-0.115]
50 The second advantage is that performing parse actions in a more flexible order than leftto-right/right-to-left shift-reduce parsing reduces the chance of error propagation. [sent-114, score-0.226]
51 Provided that no node is allowed to be moved past a token in such a way that a previous move operation is undone, there can be at most O(n2) moves and the overall worst-case complexity becomes O(n2 log n). [sent-118, score-0.118]
52 While theoretically slower, this has a limited impact upon actual parsing times 4See Goldberg and Elhadad (2010) for more explanation. [sent-119, score-0.165]
53 5 Though Goldberg and Elhadad’s (2010) original implementation only supports unlabeled dependencies, the algorithm itself is in no way limited in this regard, and it is simple enough to add labeled dependency support by treating each dependency label as a specific type of attach operation (e. [sent-121, score-0.456]
54 2 Features One of the key aspects of the parser is the complex set of features used. [sent-149, score-0.137]
55 Various feature templates are specifically designed to produce features that help with several syntactic issues including preposition attachment, coordination, adverbial clauses, clausal complements, and relative clauses. [sent-151, score-0.146]
56 However, a list of feature templates will be provided with the parser download. [sent-153, score-0.137]
57 The 175 word clusters utilized by the parser were created from the New York Times corpus (Sandhaus, 2008). [sent-160, score-0.187]
58 3 Training The parsing model is trained using a variant of the structured perceptron training algorithm used in the original Goldberg and Elhadad (2010) implementa1262 tion. [sent-216, score-0.115]
59 As an additional restriction, during training, move actions were only considered valid either if no other action was valid or if the token to be moved already had all its children attached and moving it caused it to be adjacent to its parent. [sent-225, score-0.25]
60 4 Speed Enhancements To enhance the speed for practical use, the parser uses constraints based upon the part-of-speech tags of the adjacent word pairs to eliminate invalid dependencies from even being evaluated. [sent-229, score-0.287]
61 The full details of the POS tagger are outside the scope of this paper; it is included with the parser download. [sent-244, score-0.178]
62 The final parser was trained for 31 iterations, which is the point at which its performance on the development set peaked. [sent-245, score-0.137]
63 The model trained using Goldberg and Elhadad’s (2010) easy-first parser serves as something of a baseline. [sent-251, score-0.137]
64 Unfortunately, it is not possible to directly compare the parser’s accuracy with most popular constituent parsers such as the Charniak (2000) and Berkeley (Petrov et al. [sent-254, score-0.203]
65 , which are required for the final script of the constituent-to-dependency conversion routine, and because they determine part-of-speech tags in conjunction with the parsing. [sent-256, score-0.469]
66 The results of the experiment are given in Table 3, including accuracy for individual arcs, nonprojective arcs only, and full sentence match. [sent-258, score-0.217]
67 Whenever an arc being traversed is found to cross a previously traversed arc, mark it as non-projective and continue. [sent-262, score-0.153]
68 To evaluate the impact of part-of-speech tagging error, results for parsing using the gold standard partof-speech tags are also included. [sent-263, score-0.158]
69 We also measured the speed of the parser on the various sentences in the test collection. [sent-264, score-0.137]
70 For reasonable sentence lengths, the parser scales quite well. [sent-265, score-0.137]
71 Figure 4: Parse times for Penn Treebank section 23 for the parsers on a PC with a 2. [sent-267, score-0.143]
72 Furthermore, the parser processed the entire test section in 8Versions 1. [sent-277, score-0.137]
73 Not surprisingly, the results for non-projective arcs are substantially lower than the results for all arcs, and the systems that are designed to handle them outperformed the strictly projective parsers in this regard. [sent-284, score-0.299]
74 The negative effect of part-of-speech tagging error appears to impact the different parsers about the same amount, with a loss of . [sent-285, score-0.143]
75 3% accuracy scores achieved by the Charniak and Berkeley parsers are not too different from the 93. [sent-292, score-0.203]
76 Taken together, these integrated modules enable the parsing system to produce substantially more informative out- put than a traditional parser. [sent-297, score-0.281]
77 Noun Compound Interpretation The noun compound interpretation system is a newer version of the system described by Tratz and Hovy (2010) with similar accuracy (79. [sent-305, score-0.296]
78 Possessives Interpretation The possessive interpretation system assigns interpretations to ’s possessives (e. [sent-308, score-0.209]
79 8 combined F1 measure for automatically-generated parse trees calculated over both predicate disambiguation and argument/adjunct classification (89. [sent-314, score-0.159]
80 6 F1 on argument and adjuncts corresponding to dependency links, and 86. [sent-316, score-0.266]
81 8 F1); this score is not directly comparable to any previous work due to some differences, including differences in both the parse tree conversion and the PropBank conversion. [sent-317, score-0.351]
82 While most recent dependency parsing research has used either vague labels, such as those of the CoNLL shared tasks, or no labels at all, some descriptive dependency label schemes exist. [sent-330, score-0.794]
83 By far the most prominent of these is the Stanford typed dependency scheme (de Marneffe and Manning, 2008). [sent-331, score-0.337]
84 Another descriptive scheme that exists, but which is less widely used in the NLP community, is the one used by Tapanainen and Järvinen’s parser (1997). [sent-332, score-0.291]
85 Unfortunately, the Stanford dependency conversion of the Penn Treebank has proven difficult to learn for current dependency parsers (Cer et al. [sent-333, score-0.911]
86 , 2010), and there is no publicly available dependency conversion according to Tapanainen and Järvinen’s scheme. [sent-334, score-0.54]
87 , 2009), which required participants to build systems capable of both syntactic parsing and Semantic Role Labeling (SRL) (Gildea and Jurafsky, 2002), are the most notable attempts to encourage the development of parsers with additional semantic annotation. [sent-350, score-0.299]
88 7 Conclusion In this paper, we have described a new high-quality dependency tree conversion of the Penn Treebank (Marcus, et al. [sent-355, score-0.54]
89 The Penn Treebank conversion process fixes a number of buggy trees and part-of-speech tags and produces dependency trees with a relatively small percentage of generic dep dependencies. [sent-358, score-0.715]
90 The experimental results show that dependency parsers can generally produce Stanford-granularity labels with high accuracy when using the new dependency conversion of the Penn Treebank, something which, according to the findings of Cer et al. [sent-359, score-1.061]
91 (2010), does not appear to be the case when training and testing dependency parsers on the Stanford conversion. [sent-360, score-0.371]
92 The parser achieves high labeled and unlabeled accuracy in the evaluation, 92. [sent-361, score-0.197]
93 Also, the parser proves to be quite fast, processing section 23 of the Penn Treebank in just over 30 seconds (a rate of over 75 sentences per second). [sent-367, score-0.137]
94 The parsing system is capable of not only producing fine-grained dependency relations, but can also produce shallow semantic annotations for preposi1266 tions, possessives, and noun compounds by using several optional integrated modules. [sent-368, score-0.536]
95 8% by a statistically significant margin, the possessives module is over 85% accurate, the noun compound interpretation module achieves 79. [sent-371, score-0.487]
96 6 F1 on argument and adjuncts corresponding to dependency links, for an overall F1 of 86. [sent-375, score-0.266]
97 Combined with the core parser, these modules allow the system to produce a substantially more informative textual analysis than a standard parser. [sent-377, score-0.166]
98 It would also be interesting to examine the impact on final parsing accuracy of the various differences between our dependency conversion and Stanford’s. [sent-380, score-0.715]
99 To aid future NLP research work, the code, including the treebank converter, part-of-speech tagger, parser, and semantic annotation add-ons, will be made publicly available for download via http://www. [sent-381, score-0.148]
100 The CoNLL2008 shared task on joint parsing of syntactic and semantic dependencies. [sent-594, score-0.207]
wordName wordTfidf (topN-words)
[('conversion', 0.312), ('goldberg', 0.299), ('elhadad', 0.249), ('dependency', 0.228), ('nivre', 0.16), ('stanford', 0.151), ('tratz', 0.144), ('parsers', 0.143), ('parser', 0.137), ('possessives', 0.121), ('parsing', 0.115), ('penn', 0.115), ('script', 0.114), ('psd', 0.111), ('scheme', 0.109), ('hovy', 0.107), ('preposition', 0.107), ('treebank', 0.107), ('arcs', 0.097), ('compound', 0.092), ('joakim', 0.091), ('interpretation', 0.088), ('copula', 0.087), ('modules', 0.087), ('johansson', 0.084), ('mcdonald', 0.077), ('conll', 0.077), ('vadas', 0.076), ('vague', 0.076), ('disambiguation', 0.076), ('maltparser', 0.075), ('actions', 0.072), ('petrov', 0.071), ('marneffe', 0.067), ('cop', 0.067), ('currency', 0.067), ('rquez', 0.067), ('vch', 0.067), ('arc', 0.065), ('cer', 0.065), ('module', 0.065), ('aux', 0.064), ('move', 0.061), ('accuracy', 0.06), ('mstparser', 0.06), ('nonprojective', 0.06), ('action', 0.06), ('projective', 0.059), ('dirk', 0.057), ('moved', 0.057), ('tapanainen', 0.057), ('dependencies', 0.057), ('optional', 0.057), ('noun', 0.056), ('propbank', 0.054), ('ice', 0.052), ('labels', 0.051), ('shared', 0.051), ('nugues', 0.05), ('clusters', 0.05), ('upon', 0.05), ('koo', 0.049), ('unfortunately', 0.049), ('cache', 0.048), ('converter', 0.048), ('srl', 0.048), ('changes', 0.048), ('marcus', 0.046), ('descriptive', 0.045), ('berkeley', 0.044), ('auxpass', 0.044), ('buggy', 0.044), ('cleft', 0.044), ('copyof', 0.044), ('csubjpass', 0.044), ('nonprojectivity', 0.044), ('pot', 0.044), ('rvinen', 0.044), ('slowed', 0.044), ('stale', 0.044), ('traversed', 0.044), ('trees', 0.044), ('coordinating', 0.043), ('xavier', 0.043), ('tags', 0.043), ('semantic', 0.041), ('tagger', 0.041), ('jens', 0.041), ('rewriting', 0.041), ('reordering', 0.04), ('informative', 0.04), ('parse', 0.039), ('iwpt', 0.039), ('surdeanu', 0.039), ('produce', 0.039), ('adjuncts', 0.038), ('metal', 0.038), ('statue', 0.038), ('unprocessed', 0.038), ('haji', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
2 0.28880516 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
3 0.19949397 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling
Author: Vivek Srikumar ; Dan Roth
Abstract: This paper presents a model that extends semantic role labeling. Existing approaches independently analyze relations expressed by verb predicates or those expressed as nominalizations. However, sentences express relations via other linguistic phenomena as well. Furthermore, these phenomena interact with each other, thus restricting the structures they articulate. In this paper, we use this intuition to define a joint inference model that captures the inter-dependencies between verb semantic role labeling and relations expressed using prepositions. The scarcity of jointly labeled data presents a crucial technical challenge for learning a joint model. The key strength of our model is that we use existing structure predictors as black boxes. By enforcing consistency constraints between their predictions, we show improvements in the performance of both tasks without retraining the individual models.
4 0.19637343 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
Author: Zhenghua Li ; Min Zhang ; Wanxiang Che ; Ting Liu ; Wenliang Chen ; Haizhou Li
Abstract: Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6% when using automatic POS tags instead of gold ones. To solve this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We design several joint models and their corresponding decoding algorithms to incorporate different feature sets. We further present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on Chinese Penn Treebank 5 show that our joint models significantly improve the state-of-the-art parsing accuracy by about 1.5%. Detailed analysis shows that the joint method is able to choose such POS tags that are more helpful and discriminative from parsing viewpoint. This is the fundamental reason of parsing accuracy improvement.
5 0.17525852 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
Author: Emily M. Bender ; Dan Flickinger ; Stephan Oepen ; Yi Zhang
Abstract: In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-theart parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.
6 0.1744983 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
7 0.1730656 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
8 0.17179762 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
9 0.16026253 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
10 0.15895531 136 emnlp-2011-Training a Parser for Machine Translation Reordering
11 0.13800731 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
12 0.10300668 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
13 0.094418727 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
14 0.093655936 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
15 0.092114598 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
16 0.088445693 128 emnlp-2011-Structured Relation Discovery using Generative Models
17 0.087258369 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
18 0.086638927 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
19 0.080956504 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
20 0.078450769 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
topicId topicWeight
[(0, 0.326), (1, 0.102), (2, -0.081), (3, 0.368), (4, -0.082), (5, 0.113), (6, 0.039), (7, 0.011), (8, 0.121), (9, 0.024), (10, -0.008), (11, -0.105), (12, 0.145), (13, 0.056), (14, 0.053), (15, 0.102), (16, 0.041), (17, -0.128), (18, 0.121), (19, -0.073), (20, 0.032), (21, 0.05), (22, -0.007), (23, 0.029), (24, -0.096), (25, -0.121), (26, -0.018), (27, 0.088), (28, -0.101), (29, -0.051), (30, -0.067), (31, -0.015), (32, -0.011), (33, -0.137), (34, 0.039), (35, 0.018), (36, -0.017), (37, -0.012), (38, -0.063), (39, 0.005), (40, 0.159), (41, 0.062), (42, -0.012), (43, 0.011), (44, 0.003), (45, 0.03), (46, -0.088), (47, -0.001), (48, -0.002), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.9674629 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
2 0.82970405 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
3 0.80699098 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
Author: Emily M. Bender ; Dan Flickinger ; Stephan Oepen ; Yi Zhang
Abstract: In order to obtain a fine-grained evaluation of parser accuracy over naturally occurring text, we study 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia. We construct a corresponding set of gold-standard target dependencies for these 1000 sentences, operationalize mappings to these targets from seven state-of-theart parsers, and evaluate the parsers against this data to measure their level of success in identifying these dependencies.
4 0.69737792 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
Author: Enrique Henestroza Anguiano ; Marie Candito
Abstract: This paper develops a framework for syntactic dependency parse correction. Dependencies in an input parse tree are revised by selecting, for a given dependent, the best governor from within a small set of candidates. We use a discriminative linear ranking model to select the best governor from a group of candidates for a dependent, and our model includes a rich feature set that encodes syntactic structure in the input parse tree. The parse correction framework is parser-agnostic, and can correct attachments using either a generic model or specialized models tailored to difficult attachment types like coordination and pp-attachment. Our experiments show that parse correction, combining a generic model with specialized models for difficult attachment types, can successfully improve the quality of predicted parse trees output by sev- eral representative state-of-the-art dependency parsers for French.
5 0.61094594 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
6 0.61076832 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
7 0.57660478 7 emnlp-2011-A Joint Model for Extended Semantic Role Labeling
8 0.54546016 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
9 0.52498657 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
10 0.49607897 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
11 0.4625636 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
12 0.43153328 136 emnlp-2011-Training a Parser for Machine Translation Reordering
13 0.40094876 118 emnlp-2011-SMT Helps Bitext Dependency Parsing
14 0.40027839 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
15 0.39094502 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
16 0.36474508 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
17 0.34416777 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
18 0.34360588 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
19 0.34104797 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French
20 0.33974782 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
topicId topicWeight
[(23, 0.09), (36, 0.035), (37, 0.015), (45, 0.044), (54, 0.019), (57, 0.011), (62, 0.018), (64, 0.034), (66, 0.028), (79, 0.066), (82, 0.014), (87, 0.013), (90, 0.019), (96, 0.504), (98, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.93834585 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
2 0.88409764 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
Author: Joel Lang ; Mirella Lapata
Abstract: In this paper we present a method for unsupervised semantic role induction which we formalize as a graph partitioning problem. Argument instances of a verb are represented as vertices in a graph whose edge weights quantify their role-semantic similarity. Graph partitioning is realized with an algorithm that iteratively assigns vertices to clusters based on the cluster assignments of neighboring vertices. Our method is algorithmically and conceptually simple, especially with respect to how problem-specific knowledge is incorporated into the model. Experimental results on the CoNLL 2008 benchmark dataset demonstrate that our model is competitive with other unsupervised approaches in terms of F1 whilst attaining significantly higher cluster purity.
3 0.74465281 101 emnlp-2011-Optimizing Semantic Coherence in Topic Models
Author: David Mimno ; Hanna Wallach ; Edmund Talley ; Miriam Leenders ; Andrew McCallum
Abstract: Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
4 0.5306921 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
Author: Zhenghua Li ; Min Zhang ; Wanxiang Che ; Ting Liu ; Wenliang Chen ; Haizhou Li
Abstract: Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6% when using automatic POS tags instead of gold ones. To solve this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We design several joint models and their corresponding decoding algorithms to incorporate different feature sets. We further present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on Chinese Penn Treebank 5 show that our joint models significantly improve the state-of-the-art parsing accuracy by about 1.5%. Detailed analysis shows that the joint method is able to choose such POS tags that are more helpful and discriminative from parsing viewpoint. This is the fundamental reason of parsing accuracy improvement.
5 0.51163149 127 emnlp-2011-Structured Lexical Similarity via Convolution Kernels on Dependency Trees
Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili
Abstract: Alessandro Moschitti DISI University of Trento 38123 Povo (TN), Italy mo s chitt i di s i @ .unit n . it Roberto Basili DII University of Tor Vergata 00133 Roma, Italy bas i i info .uni roma2 . it l@ over semantic networks, e.g. (Cowie et al., 1992; Wu and Palmer, 1994; Resnik, 1995; Jiang and Conrath, A central topic in natural language processing is the design of lexical and syntactic fea- tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical similarities. We define efficient and powerful kernels for measuring the similarity between dependency structures, whose surface forms of the lexical nodes are in part or completely different. The experiments with such kernels for question classification show an unprecedented results, e.g. 41% of error reduction of the former state-of-the-art. Additionally, semantic role classification confirms the benefit of semantic smoothing for dependency kernels.
6 0.5016939 116 emnlp-2011-Robust Disambiguation of Named Entities in Text
7 0.49844471 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms
8 0.49806648 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
9 0.48447514 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
10 0.47702789 144 emnlp-2011-Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
11 0.47598526 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
12 0.4737592 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
13 0.47301969 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
14 0.47288373 128 emnlp-2011-Structured Relation Discovery using Generative Models
15 0.47272098 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
16 0.46873829 147 emnlp-2011-Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy!
17 0.46554375 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
18 0.46416584 105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums
19 0.46347347 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs
20 0.46143278 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures