acl acl2013 acl2013-343 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Greg Coppola ; Mark Steedman
Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract Higher-order dependency features are known to improve dependency parser accuracy. [sent-10, score-0.667]
2 We investigate the incorporation of such features into a cube decoding phrase-structure parser. [sent-11, score-0.404]
3 We find considerable gains in accuracy on the range of standard metrics. [sent-12, score-0.094]
4 What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. [sent-13, score-0.42]
5 This suggests that higher-order dependency features are not simply overfitting the training material. [sent-15, score-0.29]
6 1 Introduction Higher-order dependency features encode more complex sub-parts of a dependency tree structure than first-order, bigram head-modifier relationships. [sent-16, score-0.506]
7 1 The clear trend in dependency parsing has been that the addition of such higher-order features improves parse accuracy (McDonald & Pereira, 2006; Carreras, 2007; Koo & Collins, 2010; Zhang & Nivre, 2011; Zhang & McDonald, 2012). [sent-17, score-0.464]
8 This finding suggests that the same benefits might be observed in phrase-structure parsing. [sent-18, score-0.057]
9 Phrasestructure parsers are generally stronger than dependency parsers (Petrov et al. [sent-20, score-0.384]
10 So, it might be that the information modelled by higher-order dependency features adds less of a benefit in the phrase-structure case. [sent-22, score-0.347]
11 1Examples of first-order and higher-order dependency features are given in §3. [sent-23, score-0.29]
12 To investigate this issue, we experiment using Huang’s (2008) cube decoding algorithm. [sent-25, score-0.33]
13 This algorithm allows structured prediction with nonlocal features, as discussed in §2. [sent-26, score-0.086]
14 Collins’s (1997) strategy otufr expanding uthssee phrase-structure parser’s dynamic program to incorporate head-modifier dependency information would not scale to the complex kinds of dependencies we will consider. [sent-27, score-0.34]
15 Using Huang’s algorithm, we can indeed incorporate arbitrary types of dependency feature, using a single, simple dynamic program. [sent-28, score-0.26]
16 Compared to the baseline, non-local feature set of Collins (2000) and Charniak & Johnson (2005), we find that higher-order dependencies do in fact tend to improve performance significantly on both dependency and constituency accuracy metrics. [sent-29, score-0.409]
17 Our most interesting finding, though, is that higher-order dependency features show a consistent and unambiguous contribution to the dependency accuracy, both labelled and unlabelled, of our phrase-structure parsers on outof-domain tests (which means, here, trained on WSJ, but tested on BROWN). [sent-30, score-0.703]
18 In fact, the gains are even stronger on out-of-domain tests than on indomain tests. [sent-31, score-0.137]
19 One might have thought that higher- order dependencies, being rather specific by nature, would tend to pick out only very rare events, and so only serve to over-fit the training material, but this is not what we find. [sent-32, score-0.057]
20 The cube decoding paradigm requires a firststage parser to prune the output space. [sent-35, score-0.526]
21 For this, we use the generative parser of Petrov et al. [sent-36, score-0.289]
22 We can use this parser’s model score as a feature in our discriminative model at no additional cost. [sent-38, score-0.175]
23 However, doing so conflates the contribution to accuracy of the generative model, on the one hand, and the discriminatively trained, hand610 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-39, score-0.168]
24 Future systems might use the same or a similar feature set to ours, but in an architecture that does not include any generative parser. [sent-42, score-0.254]
25 On the other hand, some systems might indeed incorporate this generative model’s score. [sent-43, score-0.229]
26 So, we need to know exactly what the generative model is contributing to the accuracy of a generative-discriminative model combi- nation. [sent-44, score-0.168]
27 Thus, we conduct experiments in sets: in some cases the generative model score is used, and in others it is not used. [sent-45, score-0.128]
28 Compared to the faster and more psychologically plausible shift-reduce parsers (Zhang & Nivre, 2011; Zhang & Clark, 2011), cube decoding is a computationally expensive method. [sent-46, score-0.429]
29 But, cube decoding provides a relatively exact environment with which to compare different feature sets, has close connections with modern phrasebased machine translation methods (Huang & Chiang, 2007), and produces very accurate parsers. [sent-47, score-0.45]
30 In some cases, one might want to use a slower, but more accurate, parser during the training stage of a semi-supervised parser training strategy. [sent-48, score-0.379]
31 (2010) have shown that a fast parser (Nivre et al. [sent-50, score-0.161]
32 , 2007) can be profitably trained from the output of a slower but more accurate one (Petrov et al. [sent-51, score-0.12]
33 1 Non-Local Features To decode using exact dynamic programming (i. [sent-55, score-0.035]
34 , CKY), one must restrict oneself to the use of only local features. [sent-57, score-0.04]
35 Local features are those that factor according to the individual rule productions of the parse. [sent-58, score-0.143]
36 For example, a feature indicating the presence of the rule S → NP VP is local. [sent-59, score-0.138]
37 3 But, a efesaetnucree tohfa tth iend riuclaet Ses t→hat N thPe V VhPea ids lwoocardl. [sent-60, score-0.031]
38 , joined, is non-local, because the head word of a phrase cannot be determined by looking at a single rule production. [sent-63, score-0.226]
39 To find a phrase’s head word (or tag), we must recursively find the 2See http : / / gfcoppo la . [sent-64, score-0.072]
40 , the first word dominated by S is Pierre is also local, since the words of the sentence are constant across hypothesized parses, and words can be referred to by their position with respect to a given rule production. [sent-70, score-0.069]
41 head phrase of each local rule production, until we reach a terminal node (or tag node). [sent-72, score-0.315]
42 Many discriminative parsers have used only local features (Taskar et al. [sent-74, score-0.288]
43 However, Huang (2008) shows that the use ofnon-local features does in fact contribute substantially to parser performance. [sent-78, score-0.235]
44 And, our desire to make heavy use of head-word dependency relations necessitates the use of non-local features. [sent-79, score-0.216]
45 2 Cube Decoding While the use of non-local features destroys the ability to do exact search, we can still do inexact search using Huang’s (2008) cube decoding algorithm. [sent-81, score-0.404]
46 4 A tractable first-stage parser prunes the space of possible parses, and outputs a forest, which is a set of rule production instances that can be used to make a parse for the given sentence, and which is significantly pruned compared to the entire space allowed by the grammar. [sent-82, score-0.335]
47 The size of this forest is at most cubic in the length of the sentence (Billot & Lang, 1989), but implicitly represents exponentially many parses. [sent-83, score-0.066]
48 Then, when parsing, we visit each node n in the same bottomup order we would use for Viterbi decoding, and compute a list of the top k parses to n, according to a global linear model (Collins, 2002), using the trees that have survived the beam at earlier nodes. [sent-85, score-0.046]
49 3 The First-Stage Parser As noted, we require a first-stage parser to prune the search space. [sent-87, score-0.196]
50 5 As a by-product of this pruning procedure, we are able to use the model score of the first-stage parser as a feature in our ultimate model at no additional cost. [sent-88, score-0.23]
51 ’s (2010) implementation of the LA-PCFG parser of Petrov et al. [sent-90, score-0.161]
52 1 Phrase-Structure Features Our phrase-structure feature set is taken from Collins (2000), Charniak & Johnson (2005), and 4This algorithm is closely related to the algorithm for phrase-based machine translation using a language model (Huang & Chiang, 2007). [sent-93, score-0.069]
53 5All work in this paradigm has used a generative parser as the first-stage parser. [sent-94, score-0.289]
54 We could just as well use a discriminative parser with only local features, like Petrov & Klein (2007a). [sent-96, score-0.307]
55 Some features are omitted, with choices made based on the ablation studies of Johnson & Ural (2010). [sent-98, score-0.074]
56 (2005) showed that chart-based dependency parsing, based on Eisner’s (1996) algorithm, could be successfully approached in a discriminative framework. [sent-102, score-0.322]
57 In this earliest work, each feature function could only refer to a single, bigram head-modifier relationship, e. [sent-103, score-0.069]
58 Subsequent work (McDonald & Pereira, 2006; Carreras, 2007; Koo & Collins, 2010) looked at allowing features to access more complex, higher-order relationships, including trigram and 4-gram relationships, e. [sent-106, score-0.074]
59 With the ability to incorporate non-local phrase-structure parse features (Huang, 2008), we can recognize dependency features of arbitrary order (cf. [sent-109, score-0.452]
60 Our dependency feature set, which we call Φdeps, contains: • Modifier head and modifier 6The tags outside of a given XP are approximated using the marginally most likely tags given the parse. [sent-111, score-0.551]
61 Each feature class contains more and less lexicalized versions. [sent-113, score-0.069]
62 3 Generative Model Score Feature Finally, we have a feature set, Φgen, containing only one feature function. [sent-115, score-0.138]
63 This feature maps a parse to the logarithm of the MAX-RULEPRODUCT score of that parse according to the LAPCFG parsing model, which is trained separately. [sent-116, score-0.247]
64 This score has the character of a conditional likelihood for the parse (see Petrov & Klein (2007b)). [sent-117, score-0.044]
65 4 Training We have two feature sets Φphrase and Φdeps, for which we fix weights using parallel stochastic optimization of a structured SVM objective (Collins, 2002; Taskar et al. [sent-118, score-0.104]
66 the generative model score), we give the weight 1. [sent-125, score-0.128]
67 The MERT stage helps to avoid feature undertraining (Sutton et al. [sent-128, score-0.139]
68 G abbreviates generative, D abbreviates discrim- Some cells are empty because Φdeps features are only sensitive to unlabelled dependencies. [sent-134, score-0.258]
69 Best results in D and G+D conditions appear in bold face. [sent-135, score-0.04]
70 7 We evaluate using harmonic mean between labelled bracket recall and precision (EVALB F1), unlabelled dependency accuracy (UAS), and labelled dependency accuracy (LAS). [sent-142, score-0.738]
71 We chose this dependency extractor, firstly, because it is natively meant to be run on the output of phrase-structure parsers, rather than on gold trees with function tags and traces still present, as is, e. [sent-144, score-0.253]
72 Also, this is the extractor that was used in a recent shared task (Petrov & McDonald, 2012). [sent-147, score-0.037]
73 2 Results The performance of the models is shown in Table 1, and Table 2 depicts the results of significance tests of differences between key model pairs. [sent-159, score-0.051]
74 We find that adding in the higher-order dependency feature set, Φdeps, makes a statistically significant improvement in accuracy on most metrics, in most conditions. [sent-160, score-0.362]
75 However, on the out-of-domain BROWN tests, we find that adding Φdeps always adds considerably, and in a statistically significant way, to both LAS and UAS. [sent-163, score-0.037]
76 That is, not only is Φphrase+deps better at dependency recovery than its component parts, but Φphrase+deps+gen is also considerably bet613 ter on dependency recovery than Φphrase+gen, which represents the previous state-of-the-art in this vein of research (Huang, 2008). [sent-164, score-0.556]
77 This result is perhaps counter-intuitive, in the sense that one might have supposed that higher-order dependency features, being highly specific by nature, might only have only served to over-fit the training material. [sent-165, score-0.33]
78 Note that the dependency features include various levels of lexicalization. [sent-167, score-0.29]
79 It might be that the more unlexicalized features capture something about the structure of correct parses, that transfers well out-of- domain. [sent-168, score-0.163]
80 To our knowledge, this is the first work to specifically separate the role of the generative model feature from the other features of Collins (2000) and Charniak & Johnson (2005). [sent-171, score-0.271]
81 We note that, even without the Φgen feature, the discriminative parsing models are very strong, but adding Φgen nevertheless yields considerable gains. [sent-172, score-0.196]
82 Thus, while a fully discriminative model, perhaps implemented using a shift-reduce algorithm, can be expected to do very well, if the best accuracy is necessary (e. [sent-173, score-0.146]
83 This is presumably largely because our dependency features are, at present, not sensitive to arc labels, so our results probably underestimate the capability of our general framework with respect to labelled dependency recovery. [sent-177, score-0.68]
84 Note that our model Φphrase+gen uses essentially the same features as Huang (2008), so the fact that our Φphrase+gen is noticeably more accurate on F1 is presumably due to the benefits in reduced feature under-training achieved by the MERT combination strategy. [sent-179, score-0.233]
85 Also, our Φphrase+deps model is as accurate as Huang’s, without even using the generative model score feature. [sent-180, score-0.179]
86 5 million sentences of automatically labelled NANC newswire text (semi-supervised, out-of-domain), and iii) the BROWN corpus (supervised, in-domain). [sent-184, score-0.078]
87 41 Table 3: Comparison of constituency parsing results in the cube decoding framework, on the WSJ test set. [sent-189, score-0.457]
88 Underline indicates best trained on WSJ, bold face indicates best overall. [sent-194, score-0.04]
89 We see that our best (WSJ-trained) model is over 2% more accurate (absolute F1 difference) than the Charniak & Johnson (2005) parser trained on the same data. [sent-196, score-0.212]
90 Of course, the self-training strategy is orthogonal to the improvements we have made. [sent-199, score-0.033]
91 6 Conclusion We have shown that the addition of higher-order dependency features into a cube decoding phasestructure parser leads to statistically significant gains in accuracy. [sent-200, score-0.872]
92 The most interesting finding is that these gains are clearly observed on out-ofdomain tests. [sent-201, score-0.054]
93 This seems to imply that higherorder dependency features do not merely over-fit the training material. [sent-202, score-0.335]
94 Future work should look at other train-test domain pairs, as well as look at exactly which higher-order dependency features are most important to out-of-domain accuracy. [sent-203, score-0.29]
95 Three new probabilistic models for dependency parsing: An exploration. [sent-248, score-0.216]
96 Overview of the 2012 shared task on parsing the web. [sent-394, score-0.09]
97 Feature bagging: Preventing weight undertraining in structured discriminative learning. [sent-400, score-0.211]
98 Scalable discriminative learning for natural language parsing and translation. [sent-416, score-0.196]
wordName wordTfidf (topN-words)
[('deps', 0.4), ('gen', 0.299), ('cube', 0.22), ('dependency', 0.216), ('petrov', 0.197), ('johnson', 0.173), ('huang', 0.17), ('parser', 0.161), ('mcdonald', 0.151), ('collins', 0.146), ('charniak', 0.138), ('generative', 0.128), ('wsj', 0.126), ('modifier', 0.12), ('decoding', 0.11), ('discriminative', 0.106), ('grandchild', 0.104), ('brown', 0.104), ('mcclosky', 0.097), ('parsing', 0.09), ('phrase', 0.085), ('ural', 0.08), ('sibling', 0.079), ('labelled', 0.078), ('features', 0.074), ('klein', 0.073), ('head', 0.072), ('unlabelled', 0.07), ('billot', 0.07), ('undertraining', 0.07), ('nivre', 0.07), ('feature', 0.069), ('rule', 0.069), ('parsers', 0.068), ('forest', 0.066), ('pereira', 0.064), ('reranking', 0.064), ('recovery', 0.062), ('production', 0.061), ('mert', 0.061), ('taskar', 0.06), ('arc', 0.057), ('abbreviates', 0.057), ('might', 0.057), ('carreras', 0.056), ('zhang', 0.055), ('koo', 0.055), ('gains', 0.054), ('evalb', 0.053), ('crammer', 0.052), ('tests', 0.051), ('accurate', 0.051), ('nonlocal', 0.051), ('conjuncts', 0.051), ('las', 0.05), ('tag', 0.049), ('sutton', 0.047), ('dependencies', 0.047), ('parses', 0.046), ('higherorder', 0.045), ('incorporate', 0.044), ('parse', 0.044), ('steedman', 0.044), ('emnlp', 0.044), ('turian', 0.043), ('lang', 0.043), ('nugues', 0.043), ('accuracy', 0.04), ('bold', 0.04), ('informatics', 0.04), ('uas', 0.04), ('local', 0.04), ('presumably', 0.039), ('martins', 0.038), ('finkel', 0.038), ('johansson', 0.038), ('slower', 0.038), ('acl', 0.037), ('statistically', 0.037), ('tags', 0.037), ('constituency', 0.037), ('extractor', 0.037), ('xp', 0.035), ('prune', 0.035), ('decode', 0.035), ('structured', 0.035), ('heads', 0.034), ('manning', 0.034), ('chiang', 0.034), ('strategy', 0.033), ('eisner', 0.033), ('stronger', 0.032), ('unlexicalized', 0.032), ('hde', 0.031), ('awnodr', 0.031), ('scottish', 0.031), ('psychologically', 0.031), ('iend', 0.031), ('kleeman', 0.031), ('profitably', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
Author: Greg Coppola ; Mark Steedman
Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.
2 0.21139951 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
3 0.19966237 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
Author: Xuezhe Ma ; Fei Xia
Abstract: In this paper, we propose a simple and effective approach to domain adaptation for dependency parsing. This is a feature augmentation approach in which the new features are constructed based on subtree information extracted from the autoparsed target domain data. To demonstrate the effectiveness of the proposed approach, we evaluate it on three pairs of source-target data, compared with several common baseline systems and previous approaches. Our approach achieves significant improvement on all the three pairs of data sets.
4 0.19157244 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
5 0.1872678 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
Author: Yang Liu
Abstract: We introduce a shift-reduce parsing algorithm for phrase-based string-todependency translation. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. As our approach combines the merits of phrase-based and string-todependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
6 0.18685035 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
7 0.17713031 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
8 0.15535735 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
9 0.15069249 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
10 0.14789334 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
11 0.14671983 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
12 0.14022315 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
13 0.13412137 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
14 0.13372858 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
15 0.1318472 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
16 0.12512089 275 acl-2013-Parsing with Compositional Vector Grammars
17 0.12201202 80 acl-2013-Chinese Parsing Exploiting Characters
18 0.11636007 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
19 0.1072436 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
20 0.10498742 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
topicId topicWeight
[(0, 0.243), (1, -0.19), (2, -0.216), (3, 0.067), (4, -0.137), (5, 0.014), (6, 0.097), (7, -0.019), (8, 0.022), (9, -0.111), (10, 0.024), (11, 0.012), (12, -0.034), (13, 0.036), (14, 0.073), (15, 0.086), (16, -0.112), (17, 0.001), (18, 0.014), (19, 0.0), (20, 0.015), (21, 0.013), (22, 0.007), (23, 0.025), (24, -0.034), (25, 0.033), (26, -0.001), (27, -0.038), (28, -0.028), (29, -0.0), (30, 0.05), (31, 0.026), (32, -0.024), (33, 0.034), (34, 0.025), (35, 0.017), (36, 0.011), (37, -0.05), (38, 0.06), (39, 0.002), (40, -0.003), (41, 0.046), (42, -0.034), (43, -0.048), (44, 0.03), (45, 0.053), (46, 0.015), (47, 0.013), (48, -0.016), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.97271627 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
Author: Greg Coppola ; Mark Steedman
Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.
2 0.87304133 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
Author: Andre Martins ; Miguel Almeida ; Noah A. Smith
Abstract: We present fast, accurate, direct nonprojective dependency parsers with thirdorder features. Our approach uses AD3, an accelerated dual decomposition algorithm which we extend to handle specialized head automata and sequential head bigram models. Experiments in fourteen languages yield parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German).
3 0.85711247 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
4 0.85142469 335 acl-2013-Survey on parsing three dependency representations for English
Author: Angelina Ivanova ; Stephan Oepen ; Lilja vrelid
Abstract: In this paper we focus on practical issues of data representation for dependency parsing. We carry out an experimental comparison of (a) three syntactic dependency schemes; (b) three data-driven dependency parsers; and (c) the influence of two different approaches to lexical category disambiguation (aka tagging) prior to parsing. Comparing parsing accuracies in various setups, we study the interactions of these three aspects and analyze which configurations are easier to learn for a dependency parser.
5 0.84586924 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
6 0.83261842 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
7 0.82554859 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
8 0.77618068 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
9 0.77542818 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
10 0.77297652 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
11 0.75487542 288 acl-2013-Punctuation Prediction with Transition-based Parsing
12 0.73891962 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
13 0.73050874 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
14 0.70830894 94 acl-2013-Coordination Structures in Dependency Treebanks
15 0.70017195 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
16 0.6639818 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
17 0.66234124 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
18 0.66155416 176 acl-2013-Grounded Unsupervised Semantic Parsing
19 0.65339476 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
20 0.62724501 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
topicId topicWeight
[(0, 0.08), (6, 0.047), (11, 0.076), (14, 0.015), (15, 0.014), (24, 0.032), (26, 0.082), (28, 0.025), (35, 0.063), (42, 0.093), (48, 0.047), (70, 0.068), (88, 0.029), (90, 0.039), (95, 0.057), (99, 0.159)]
simIndex simValue paperId paperTitle
1 0.9337644 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
Author: Gerasimos Lampouras ; Ion Androutsopoulos
Abstract: We present an ILP model of concept-totext generation. Unlike pipeline architectures, our model jointly considers the choices in content selection, lexicalization, and aggregation to avoid greedy decisions and produce more compact texts.
2 0.87474942 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
Author: Egoitz Laparra ; German Rigau
Abstract: This paper presents a novel deterministic algorithm for implicit Semantic Role Labeling. The system exploits a very simple but relevant discursive property, the argument coherence over different instances of a predicate. The algorithm solves the implicit arguments sequentially, exploiting not only explicit but also the implicit arguments previously solved. In addition, we empirically demonstrate that the algorithm obtains very competitive and robust performances with respect to supervised approaches that require large amounts of costly training data.
same-paper 3 0.86455464 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
Author: Greg Coppola ; Mark Steedman
Abstract: Higher-order dependency features are known to improve dependency parser accuracy. We investigate the incorporation of such features into a cube decoding phrase-structure parser. We find considerable gains in accuracy on the range of standard metrics. What is especially interesting is that we find strong, statistically significant gains on dependency recovery on out-of-domain tests (Brown vs. WSJ). This suggests that higher-order dependency features are not simply overfitting the training material.
Author: Trevor Cohn ; Lucia Specia
Abstract: Annotating linguistic data is often a complex, time consuming and expensive endeavour. Even with strict annotation guidelines, human subjects often deviate in their analyses, each bringing different biases, interpretations of the task and levels of consistency. We present novel techniques for learning from the outputs of multiple annotators while accounting for annotator specific behaviour. These techniques use multi-task Gaussian Processes to learn jointly a series of annotator and metadata specific models, while explicitly representing correlations between models which can be learned directly from data. Our experiments on two machine translation quality estimation datasets show uniform significant accuracy gains from multi-task learning, and consistently outperform strong baselines.
5 0.77245551 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
Author: Ulle Endriss ; Raquel Fernandez
Abstract: Crowdsourcing, which offers new ways of cheaply and quickly gathering large amounts of information contributed by volunteers online, has revolutionised the collection of labelled data. Yet, to create annotated linguistic resources from this data, we face the challenge of having to combine the judgements of a potentially large group of annotators. In this paper we investigate how to aggregate individual annotations into a single collective annotation, taking inspiration from the field of social choice theory. We formulate a general formal model for collective annotation and propose several aggregation methods that go beyond the commonly used majority rule. We test some of our methods on data from a crowdsourcing experiment on textual entailment annotation.
6 0.76584727 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
7 0.76003492 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
8 0.75787532 225 acl-2013-Learning to Order Natural Language Texts
9 0.75764316 80 acl-2013-Chinese Parsing Exploiting Characters
10 0.75540793 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
11 0.75536877 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
12 0.75405627 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
13 0.75209278 275 acl-2013-Parsing with Compositional Vector Grammars
14 0.75152749 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
15 0.74920124 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
16 0.74765575 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
17 0.74743831 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
18 0.74509448 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
19 0.74357975 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
20 0.74235719 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction