emnlp emnlp2012 emnlp2012-66 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
Reference: text
sentIndex sentText sentNum sentScore
1 e s Abstract In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. [sent-2, score-1.282]
2 First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. [sent-3, score-1.505]
3 Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). [sent-4, score-1.388]
4 None of these transitions has a negative impact on the computational complexity of the algorithm. [sent-5, score-0.446]
5 , 2004; Attardi, 2006; Sagae and Tsujii, 2008; Nivre, 2009; Huang and Sagae, 2010; G ´omezRodr ı´guez and Nivre, 2010) are stack-based (Nivre, 2008), meaning that they keep a stack of partially processed tokens and an input buffer of unread tokens. [sent-22, score-0.812]
6 In this paper, we show how the accuracy of this kind of parsers can be improved, without compromising efficiency, by extending their set of available transitions with buffer transitions. [sent-23, score-1.229]
7 These are transitions that create a dependency arc involving some node in the buffer, which would typically be considered unavailable for linking by these algoPLraoncge uadgineg Lse oafr tnhineg 2,0 p1a2g Jeosin 30t C8–o3n1f9e,re Jnecjue Iosnla Enmd,p Kiroicraela, M 1e2t–h1o4ds Ju ilny N 20a1tu2r. [sent-24, score-0.672]
8 The rationale is that buffer transitions construct some “easy” dependency arcs in advance, before the involved nodes reach the stack, so that the classifier’s job when choosing among standard transitions is simplified. [sent-27, score-1.897]
9 Section 3 presents the first novel contribution of this paper, projective buffer transitions, and discusses their empirical results on CoNLL-X datasets. [sent-32, score-0.981]
10 Section 4 does the same for a more complex set of transitions, non-projective buffer transitions. [sent-33, score-0.712]
11 Every dependency forest can trivially be represented as a tree by adding arcs from the dummy root node 0 to every other root node. [sent-63, score-0.409]
12 A dependency forest is said to be projective if the set of nodes reachable by traversing zero or more arcs from any given node k corresponds to a continuous substring of the input (i. [sent-65, score-0.671]
13 2 Transition systems A transition system is a nondeterministic state machine that maps input strings to dependency graphs. [sent-71, score-0.3]
14 A stack-based transition system is a quadruple S = (C, T, cs, Ct) where • • • • C is a set of parser configurations. [sent-73, score-0.286]
15 Transition systems are nondeterministic devices, since several transitions may be applicable to the same configuration. [sent-81, score-0.447]
16 To obtain a deterministic parser from a transition system, a classifier is trained to greedily select the best transition at each state. [sent-82, score-0.515]
17 This means that links are created in a strict left-to-right order, and implies that while leftward links are built in a bottom-up fashion, a rightward link a → b will be created before the node b hwaasr dco l inlekct ead → →its right dependents. [sent-89, score-0.298]
18 The arc-eager transition system has the following four transitions (note that, for convenience, we write a stack with node ion top as σ|i, and a buffer whose afir sstta ncokd wei tish in as i|β): • SHIFT : (σ, i|β, A) ⇒ (σ|i, β, A). [sent-90, score-1.515]
19 The SHIFT transition reads an input word by removing the first node from the buffer and placing it on top of the stack. [sent-96, score-0.985]
20 The REDUCE transition pops the stack, and it can only be executed if the topmost stack node has already been assigned a head. [sent-97, score-0.414]
21 The LEFT-ARC transition creates an arc from the first node in the buffer to the node on top of the stack, and then pops the stack. [sent-98, score-1.179]
22 Finally, the RIGHT-ARC transition creates an arc from the top of the stack to the first buffer node, and then removes the latter from the buffer and moves it to the stack. [sent-100, score-1.875]
23 In principle, it is restricted to projective dependency forests, but it can be used in conjunction with the pseudo-projective transformation (Nivre et al. [sent-102, score-0.366]
24 3 Projective buffer transitions In this section, we show that the accuracy of stackbased transition systems can benefit from adding one of a pair ofnew transitions, which we call projective buffer transitions, to their transition sets. [sent-105, score-2.6]
25 1 The transitions The two projective buffer transitions are defined as follows: • • LEFT-BUFFER-ARCl : (σ, i |β, A) ⇒ (σ, j |β, A |j ∪ {(j, l, i)}). [sent-107, score-1.841]
26 The LEFT-BUFFER-ARC transition creates a leftward dependency link from the second node to the first node in the buffer, and then removes the first node from the buffer. [sent-109, score-0.592]
27 Conversely, the RIGHTBUFFER-ARC transition creates a rightward dependency link from the first node to the second node in the buffer, and then removes the second node. [sent-110, score-0.534]
28 We call these transitions projective buffer transitions because, since they act on contiguous buffer nodes, they can only create projective arcs. [sent-111, score-2.865]
29 However, in parsers that allow them (such as those defined by G ´omez-Rodr ı´guez and Nivre (2010)), projective buffer transitions can still be added without affecting correctness if we impose explicit single-head and acyclicity preconditions on them. [sent-113, score-1.551]
30 The idea behind projective buffer transitions is to create dependency arcs of length one (i. [sent-116, score-1.745]
31 , arcs involving contiguous nodes) in advance of the standard arc-building transitions that need at least one of the nodes to get to the stack (LEFT-ARC and RIGHTARC in the case of the arc-eager transition system). [sent-118, score-1.055]
32 Note that the fact that projective buffer transitions create arcs of length 1 is not explicit in the definition of the transitions. [sent-120, score-1.677]
33 For instance, if we add the LEFT-BUFFER-ARCl transition only to the arc-eager transition system, LEFT-BUFFER-ARCl will only be able to create arcs of length 1, since it is easy to see that the first two buffer nodes are contiguous in all the accessible configurations. [sent-121, score-1.463]
34 Following the hypothesis explained above, our policy has been to train the 311 parser to use buffer transitions whenever possible for arcs of length one, and to not use them for arcs of length larger than one. [sent-126, score-1.706]
35 To test this idea, we also conducted experiments with the alternative policy “use buffer transitions whenever possible, regardless of arc length”: as expected, the obtained accuracies were (slightly) worse. [sent-127, score-1.247]
36 The chosen oracle policy is generic and can be plugged into any stack-based parser: for a given transition, first check whether it is possible to build a gold-standard arc of length 1with a projective buffer transition. [sent-128, score-1.154]
37 2 Experiments To empirically evaluate the effect of projective buffer transitions on parsing accuracy, we have conducted experiments on eight datasets of the CoNLLX shared task (Buchholz and Marsi, 2006): Arabic (Haji ˇc et al. [sent-131, score-1.497]
38 As our baseline parser, we use the arc-eager projective transition system by Nivre (2003). [sent-140, score-0.484]
39 Table 1 compares the accuracy obtained by this system alone with that obtained when the LEFT-BUFFER-ARC and RIGHT-BUFFER-ARC transitions are added to it as explained in Section 3. [sent-141, score-0.457]
40 In the case of RIGHT-BUFFER-ARC, this involves checking that the candidate dependent node has no dependents in the gold-standard tree (if it has any, we cannot remove it from the stack or it would not be able to collect its dependents, so we do not use the buffer transition). [sent-149, score-0.87]
41 (NE), with the LEFT-BUFFER-ARC transition added (NE+LBA) and with the RIGHT-BUFFER-ARC transition added (NE+RBA). [sent-153, score-0.484]
42 As can be seen in Table 1, adding a projective buffer transition improves the performance of the parser in seven out ofthe eight tested languages. [sent-155, score-1.325]
43 Note that the decision of which buffer transition to add strongly depends on the dataset. [sent-158, score-0.927]
44 In the majority of the treebanks, we can see that when the LEFT-BUFFER-ARC transition improves performance the RIGHT-BUFFER-ARC transition harms it, and vice versa. [sent-159, score-0.43]
45 The exceptions are Czech, where both transitions are beneficial, and Swedish, where both are harmful. [sent-160, score-0.43]
46 Therefore, when using projective buffer transitions in practice, the language and annotation scheme should be taken into account (or tests should be made) to decide which one to use. [sent-161, score-1.429]
47 Languages where both transitions are beneficial (Czech) or harmful (Swedish) are marked with an asterisk. [sent-170, score-0.449]
48 with a projective buffer transition added (NE+LBA, NE+RBA) and with both projective buffer transitions added (NE+LBA+RBA). [sent-173, score-2.661]
49 We mark a standard LEFT-ARC (LA) or RIGHT-ARC (LA) transition with an asterisk (LA*, RA*) when it is acting only on a “hard” subset of leftward (rightward) arcs, and thus its precision is not directly comparable to that of (LA, RA). [sent-174, score-0.302]
50 To further test this idea, we computed the labelled precision of each individual transition of the parsers with and without projective buffer transitions, as shown in Table 3. [sent-176, score-1.28]
51 As we can see, projective buffer transitions achieve better precision than standard transitions, but this is not surprising since they act only on “easy” arcs of length 1. [sent-177, score-1.649]
52 Similarly, adding a projective buffer transition decreases the precision of its corresponding standard transition, but this is because the standard transition is then dealing only with “harder” arcs of length greather than 1, not because it is making more errors. [sent-179, score-1.679]
53 This further backs the hypothesis that the filtering of “easy” links achieved by projective buffer transitions makes it easier for the classifier to decide among standard transitions. [sent-181, score-1.476]
54 We also conducted experiments adding both transitions at the same time (NE+LBA+RBA), but the results were worse than adding the suitable transition for each dataset. [sent-182, score-0.705]
55 , 2006) decide between both with the restricted feature information available for buffer nodes. [sent-184, score-0.73]
56 To further put the obtained results into context, Table 4 compares the performance of the arc-eager parser with the projective buffer transition most suitable for each dataset with the results obtained by the parser with the pseudo-projective transformation by Nivre et al. [sent-185, score-1.367]
57 S728i19shob- tained by the arc-eager parser with projective buffer transitions in comparison to other parsers in the literature that report results on these datasets. [sent-191, score-1.552]
58 Finally, it is worth noting that adding projective buffer transitions has no negative impact on efficiency, either in terms of computational complexity or of empirical runtime. [sent-196, score-1.457]
59 Since each projective buffer transition removes a node from the buffer, no more than n such transitions can be executed for a sentence of length n, so adding these transitions cannot increase the complexity of a transition-based parser. [sent-197, score-2.237]
60 In the particular case of the arc-eager parser, using projective buffer transitions reduces the average number of transitions needed to obtain a given dependency forest, as some nodes can be dispatched by a single transition rather than being shifted and later popped from the stack. [sent-198, score-2.164]
61 4 Non-projective buffer transitions We now present a second set of transitions that still follow the idea of early processing of some dependency arcs, as in Section 3; but which are able to create arcs skipping over a buffer node, so that they can create some non-projective arcs. [sent-200, score-2.625]
62 For this reason, we call them non-projective buffer transitions. [sent-201, score-0.712]
63 1 The transitions The two non-projective buffer transitions are defined as follows: • LEFT-NONPROJ-BUFFER-ARCl : (σ|i, j |k|β, A) ⇒ (σ, j |k|β, A {(k, l, i) }). [sent-203, score-1.572]
64 ∪ The LEFT-NONPROJ-BUFFER-ARC transition creates a leftward arc from the second buffer node to the node on top of the stack, and then pops the stack. [sent-205, score-1.266]
65 The RIGHTNONPROJ-BUFFER-ARC transition creates an arc from the top of the stack to the second node in the buffer, and then removes the latter from the buffer. [sent-207, score-0.509]
66 Note that these transitions are analogous to projective buffer transitions, and they use the second node in the buffer in the same way, but they create arcs involving the node on top of the stack rather than the first buffer node. [sent-208, score-3.296]
67 This change makes the precondition that checks for a head necessary for the transition LEFT-NONPROJ-BUFFER-ARC to respect the single-head constraint, since many stack-based parsers can generate configurations where the node on top of the stack has a head. [sent-209, score-0.443]
68 We call these transitions non-projective buffer transitions because, as they act on non-contiguous nodes in the stack and buffer, they allow the creation of a limited set of non-projective dependency arcs. [sent-210, score-1.78]
69 This means that, when added to a projective parser, they will increase its coverage. [sent-211, score-0.311]
70 5 On the other hand, 5They may also increase the coverage of parsers allowing restricted forms of non-projectivity, but that depends on the par- tion (NE), with the LEFT-NONPROJ-BUFFER-ARC transition added (NE+LNBA) and with the RIGHT-NONPROJ- BUFFER-ARC transition added (NE+RNBA). [sent-212, score-0.569]
71 adding these transitions to a stack-based transition system does not affect soundness under the same conditions and for the same reasons explained for projective buffer transitions in Section 3. [sent-214, score-2.1]
72 Note that the fact that non-projective buffer transitions are able to create non-projective dependency arcs does not mean that all the arcs that they build are non-projective, since an arc on non-contiguous nodes in the stack and buffer may or may not cross other arcs. [sent-216, score-2.612]
73 This means that non-projective buffer transitions serve a dual purpose: not only they increase coverage, but they also can create some “easy” dependency links in advance of standard transitions, just like projective buffer transitions. [sent-217, score-2.305]
74 Contrary to projective buffer transitions, we do not impose any arc length restrictions on nonprojective buffer transitions (either as a hard con- straint in the transitions themselves or as a policy in the training oracle), since we would like the increase in coverage to be as large as possible. [sent-218, score-2.713]
75 We wish to allow the parsers to create non-projective arcs in a straightforward way and without compromising efficiency. [sent-219, score-0.332]
76 2 Experiments We evaluate the impact ofnon-projective buffer transitions on parsing accuracy by using the same baseticular subset of non-projective structures captured by each such parser. [sent-222, score-1.169]
77 315 line parser, datasets and experimental settings as for projective buffer transitions in Section 3. [sent-223, score-1.428]
78 As can be seen in Table 6, adding a non-projective buffer transition to the arc-eager parser improves its performance on all eight datasets. [sent-225, score-1.056]
79 Note that the Chinese treebank is fully projective, this means that non-projective buffer transitions are also beneficial when creating projective arcs. [sent-228, score-1.461]
80 While with projective buffer transitions we observed that each of them was beneficial for about half of the treebanks, and we related this to the amount ofleftward and rightward links oflength 1in each; in the case of non-projective buffer transitions we do not observe this tendency. [sent-229, score-2.692]
81 The results were similar to those for the projective transitions, and show that adding a non-projective buffer transition improves the precision of the standard transitions. [sent-232, score-1.226]
82 We also experimentally checked that adding both non-projective buffer transitions at the same time (NE+LNBA+RNBA) achieved worse performance than adding only the most suitable transition for each dataset. [sent-233, score-1.417]
83 -r6R57pro- jective (PP, PR) and non-projective (NP, NR) arcs, av- eraged over all datasets, obtained by Nivre’s arc-eager parser with and without non-projective buffer transitions (NE+LNBA/RNBA, NE) and the parser with the pseudoprojective transformation (Nivre et al. [sent-240, score-1.339]
84 We can see that the algorithm with non-projective buffer transitions obtains better LAS in five out of the eight treebanks. [sent-245, score-1.17]
85 Like projective buffer transitions, non-projective transitions do not increase the computational complexity of stack-based parsers. [sent-247, score-1.442]
86 The observed training and parsing times for the arc-eager parser with non-projective buffer transitions showed a small 316 overhead with respect to the original arc-eager (7. [sent-248, score-1.24]
87 5 Related work The approach of adding an extra transition to a parser to improve its accuracy has been applied in the past by Choi and Palmer (201 1). [sent-255, score-0.316]
88 In that paper, the LEFT-ARC transition from Nivre’s arc-eager transition system is added to a list-based parser. [sent-256, score-0.457]
89 The idea of creating dependency arcs of length 1 in advance to help the classifier has been used by Cheng et al. [sent-258, score-0.372]
90 However, their system creates such arcs in a separate preprocessing step rather than dynamically by adding a transition to the parser, and our approach obtains better LAS and UAS results on all the tested datasets. [sent-260, score-0.49]
91 The projective buffer transitions presented here bear some resemblance to the easy-first parser by Goldberg and Elhadad (2010), which allows creation of dependency arcs between any pair of contiguous nodes and is based on the idea of “easy” dependency links being created first. [sent-261, score-1.923]
92 Non-projective transitions that create dependency arcs between non-contiguous nodes have been used in the transition-based parser by Attardi (2006). [sent-263, score-0.854]
93 However, the transitions in that parser do not use the second buffer node, since they are not intended to create some arcs in advance. [sent-264, score-1.458]
94 The non-projective buffer transitions presented in this paper can also be added to Attardi’s parser. [sent-265, score-1.169]
95 6 Discussion We have presented a set of two transitions, called projective buffer transitions, and showed that adding one of them to Nivre’s arc-eager parser improves its accuracy in seven out of eight tested datasets from the CoNLL-X shared task. [sent-266, score-1.141]
96 Furthermore, adding one of a set of non-projective buffer transitions achieves accuracy improvements in all of the eight datasets. [sent-267, score-1.2]
97 The obtained improvements are statistically significant for several of the treebanks, and the parser with projective buffer transitions surpassed the best published single-parser LAS results on two of them. [sent-268, score-1.482]
98 This comes at no cost either on computational complexity or (in the case of projective transitions) on empirical training and parsing times with respect to the original parser. [sent-269, score-0.312]
99 While we have chosen Nivre’s well-known arceager parser as our baseline, we have shown that these transitions can be added to any stack-based dependency parser, and we are not aware of any specific property of arc-eager that would make them work better in practice on this parser than on others. [sent-270, score-0.719]
100 Therefore, future work will include an evaluation of the impact of buffer transitions on more transitionbased parsers. [sent-271, score-1.142]
wordName wordTfidf (topN-words)
[('buffer', 0.712), ('transitions', 0.43), ('projective', 0.269), ('arcs', 0.217), ('transition', 0.215), ('nivre', 0.112), ('stack', 0.1), ('arc', 0.088), ('leftward', 0.087), ('rightward', 0.087), ('treebanks', 0.077), ('parser', 0.071), ('parsers', 0.07), ('dependency', 0.068), ('vw', 0.067), ('node', 0.058), ('arceager', 0.052), ('ne', 0.044), ('lba', 0.044), ('rba', 0.044), ('las', 0.041), ('nodes', 0.04), ('advance', 0.038), ('joakim', 0.037), ('links', 0.033), ('haji', 0.032), ('attardi', 0.03), ('adding', 0.03), ('transformation', 0.029), ('create', 0.028), ('oracle', 0.028), ('swedish', 0.028), ('turkish', 0.028), ('creates', 0.028), ('eight', 0.028), ('guez', 0.027), ('added', 0.027), ('parsing', 0.027), ('acyclicity', 0.026), ('oflazer', 0.026), ('pseudoprojective', 0.026), ('arabic', 0.026), ('uas', 0.025), ('forests', 0.025), ('jan', 0.022), ('mcdonald', 0.021), ('executed', 0.021), ('length', 0.021), ('petr', 0.02), ('pops', 0.02), ('removes', 0.02), ('sagae', 0.02), ('nonprojective', 0.019), ('generic', 0.019), ('forest', 0.019), ('beneficial', 0.019), ('decide', 0.018), ('atalay', 0.017), ('axj', 0.017), ('bilge', 0.017), ('compromising', 0.017), ('coru', 0.017), ('delegate', 0.017), ('departamento', 0.017), ('jarmila', 0.017), ('lnba', 0.017), ('nondeterministic', 0.017), ('panevov', 0.017), ('preconditions', 0.017), ('rnba', 0.017), ('stackbased', 0.017), ('tratz', 0.017), ('universidade', 0.017), ('treebank', 0.017), ('policy', 0.017), ('stroudsburg', 0.017), ('dummy', 0.017), ('datasets', 0.017), ('yamada', 0.016), ('complexity', 0.016), ('czech', 0.015), ('increase', 0.015), ('comparator', 0.015), ('conllx', 0.015), ('converse', 0.015), ('afonso', 0.015), ('campus', 0.015), ('kemal', 0.015), ('configuration', 0.015), ('contiguous', 0.015), ('danish', 0.015), ('classifier', 0.014), ('la', 0.014), ('labelled', 0.014), ('johan', 0.014), ('buchholz', 0.014), ('creating', 0.014), ('shared', 0.014), ('soundness', 0.014), ('tiger', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
3 0.12706383 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
4 0.10289442 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Author: Zhongguo Li ; Guodong Zhou
Abstract: Most previous approaches to syntactic parsing of Chinese rely on a preprocessing step of word segmentation, thereby assuming there was a clearly defined boundary between morphology and syntax in Chinese. We show how this assumption can fail badly, leading to many out-of-vocabulary words and incompatible annotations. Hence in practice the strict separation of morphology and syntax in the Chinese language proves to be untenable. We present a unified dependency parsing approach for Chinese which takes unsegmented sentences as input and outputs both morphological and syntactic structures with a single model and algorithm. By removing the intermediate word segmentation, the unified parser no longer needs separate notions for words and phrases. Evaluation proves the effectiveness of the unified model and algorithm in parsing structures of words, phrases and sen- tences simultaneously. 1
5 0.083239213 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Author: Emily Pitler ; Sampath Kannan ; Mitchell Marcus
Abstract: We introduce gap inheritance, a new structural property on trees, which provides a way to quantify the degree to which intervals of descendants can be nested. Based on this property, two new classes of trees are derived that provide a closer approximation to the set of plausible natural language dependency trees than some alternative classes of trees: unlike projective trees, a word can have descendants in more than one interval; unlike spanning trees, these intervals cannot be nested in arbitrary ways. The 1-Inherit class of trees has exactly the same empirical coverage of natural language sentences as the class of mildly nonprojective trees, yet the optimal scoring tree can be found in an order of magnitude less time. Gap-minding trees (the second class) have the property that all edges into an interval of descendants come from the same node, and thus an algorithm which uses only single in- tervals can produce trees in which a node has descendants in multiple intervals.
6 0.069647528 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
7 0.0635041 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
8 0.062414046 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
9 0.058134846 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
10 0.05538309 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
11 0.054512091 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
12 0.044243649 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
13 0.042891007 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
14 0.037041526 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
15 0.035339635 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
16 0.034697004 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
17 0.033997517 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
18 0.033897433 55 emnlp-2012-Forest Reranking through Subtree Ranking
19 0.032384731 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
20 0.031069165 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
topicId topicWeight
[(0, 0.133), (1, -0.149), (2, 0.176), (3, -0.078), (4, 0.085), (5, 0.031), (6, -0.005), (7, -0.003), (8, -0.026), (9, 0.18), (10, 0.093), (11, -0.125), (12, 0.155), (13, 0.002), (14, 0.216), (15, -0.001), (16, 0.017), (17, 0.018), (18, 0.099), (19, 0.123), (20, -0.017), (21, -0.292), (22, -0.097), (23, 0.11), (24, -0.071), (25, -0.084), (26, 0.03), (27, -0.064), (28, 0.066), (29, 0.114), (30, 0.181), (31, 0.208), (32, -0.025), (33, -0.153), (34, -0.115), (35, -0.014), (36, -0.006), (37, -0.156), (38, -0.146), (39, -0.011), (40, 0.017), (41, -0.007), (42, -0.05), (43, 0.189), (44, 0.12), (45, -0.036), (46, -0.061), (47, 0.013), (48, 0.062), (49, 0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.9716019 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
3 0.47029099 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
4 0.37188923 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
Author: Zhongguo Li ; Guodong Zhou
Abstract: Most previous approaches to syntactic parsing of Chinese rely on a preprocessing step of word segmentation, thereby assuming there was a clearly defined boundary between morphology and syntax in Chinese. We show how this assumption can fail badly, leading to many out-of-vocabulary words and incompatible annotations. Hence in practice the strict separation of morphology and syntax in the Chinese language proves to be untenable. We present a unified dependency parsing approach for Chinese which takes unsegmented sentences as input and outputs both morphological and syntactic structures with a single model and algorithm. By removing the intermediate word segmentation, the unified parser no longer needs separate notions for words and phrases. Evaluation proves the effectiveness of the unified model and algorithm in parsing structures of words, phrases and sen- tences simultaneously. 1
5 0.33015388 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
Author: Bernd Bohnet ; Anders Bjorkelund ; Jonas Kuhn ; Wolfgang Seeker ; Sina Zarriess
Abstract: We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages: English, German, Dutch, Danish, Hungarian, and Czech.
6 0.26268935 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
7 0.23922643 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
8 0.23056528 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
9 0.22423114 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
10 0.22356474 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
11 0.19178942 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
12 0.16553825 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
13 0.15413547 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
14 0.15209538 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
15 0.14920135 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
16 0.14829579 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
17 0.13187912 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation
18 0.13169123 55 emnlp-2012-Forest Reranking through Subtree Ranking
19 0.12998489 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
20 0.12637413 81 emnlp-2012-Learning to Map into a Universal POS Tagset
topicId topicWeight
[(2, 0.033), (10, 0.279), (11, 0.017), (16, 0.089), (25, 0.014), (34, 0.073), (39, 0.017), (45, 0.024), (47, 0.042), (60, 0.067), (63, 0.05), (64, 0.017), (65, 0.019), (73, 0.012), (74, 0.031), (76, 0.045), (80, 0.012), (81, 0.013), (86, 0.022), (95, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.77355629 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
2 0.64986062 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns
Author: Annie Louis ; Ani Nenkova
Abstract: We introduce a model of coherence which captures the intentional discourse structure in text. Our work is based on the hypothesis that syntax provides a proxy for the communicative goal of a sentence and therefore the sequence of sentences in a coherent discourse should exhibit detectable structural patterns. Results show that our method has high discriminating power for separating out coherent and incoherent news articles reaching accuracies of up to 90%. We also show that our syntactic patterns are correlated with manual annotations of intentional structure for academic conference articles and can successfully predict the coherence of abstract, introduction and related work sections of these articles. 59.3 (100.0) Intro 50.3 (100.0) 1166 Rel wk 55.4 (100.0) >= 0.663.8 (67.2)50.8 (71.1)58.6 (75.9) >= 0.7 67.2 (32.0) 54.4 (38.6) 63.3 (52.8) >= 0.8 74.0 (10.0) 51.6 (22.0) 63.0 (25.7) >= 0.9 91.7 (2.0) 30.6 (5.0) 68.1 (7.2) Table 9: Accuracy (% examples) above each confidence level for the conference versus workshop task. These results are shown in Table 9. The proportion of examples under each setting is also indicated. When only examples above 0.6 confidence are examined, the classifier has a higher accuracy of63.8% for abstracts and covers close to 70% of the examples. Similarly, when a cutoff of 0.7 is applied to the confidence for predicting related work sections, we achieve 63.3% accuracy for 53% of examples. So we can consider that 30 to 47% of the examples in the two sections respectively are harder to tell apart. Interestingly however even high confidence predictions on introductions remain incorrect. These results show that our model can successfully distinguish the structure of articles beyond just clearly incoherent permutation examples. 7 Conclusion Our work is the first to develop an unsupervised model for intentional structure and to show that it has good accuracy for coherence prediction and also complements entity and lexical structure of discourse. This result raises interesting questions about how patterns captured by these different coherence metrics vary and how they can be combined usefully for predicting coherence. We plan to explore these ideas in future work. We also want to analyze genre differences to understand if the strength of these coherence dimensions varies with genre. Acknowledgements This work is partially supported by a Google research grant and NSF CAREER 0953445 award. References Regina Barzilay and Mirella Lapata. 2008. Modeling local coherence: An entity-based approach. Computa- tional Linguistics, 34(1): 1–34. Regina Barzilay and Lillian Lee. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In Proceedings of NAACL-HLT, pages 113–120. Xavier Carreras, Michael Collins, and Terry Koo. 2008. Tag, dynamic programming, and the perceptron for efficient, feature-rich parsing. In Proceedings of CoNLL, pages 9–16. Eugene Charniak and Mark Johnson. 2005. Coarse-tofine n-best parsing and maxent discriminative reranking. In Proceedings of ACL, pages 173–180. Jackie C.K. Cheung and Gerald Penn. 2010. Utilizing extra-sentential context for parsing. In Proceedings of EMNLP, pages 23–33. Christelle Cocco, Rapha ¨el Pittier, Fran ¸cois Bavaud, and Aris Xanthos. 2011. Segmentation and clustering of textual sequences: a typological approach. In Proceedings of RANLP, pages 427–433. Michael Collins and Terry Koo. 2005. Discriminative reranking for natural language parsing. Computational Linguistics, 3 1:25–70. Isaac G. Councill, C. Lee Giles, and Min-Yen Kan. 2008. Parscit: An open-source crf reference string parsing package. In Proceedings of LREC, pages 661–667. Micha Elsner and Eugene Charniak. 2008. Coreferenceinspired coherence modeling. In Proceedings of ACLHLT, Short Papers, pages 41–44. Micha Elsner and Eugene Charniak. 2011. Extending the entity grid with entity-specific features. In Proceedings of ACL-HLT, pages 125–129. Micha Elsner, Joseph Austerweil, and Eugene Charniak. 2007. A unified local and global model for discourse coherence. In Proceedings of NAACL-HLT, pages 436–443. Pascale Fung and Grace Ngai. 2006. One story, one flow: Hidden markov story models for multilingual multidocument summarization. ACM Transactions on Speech and Language Processing, 3(2): 1–16. Barbara J. Grosz and Candace L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 3(12): 175–204. Yufan Guo, Anna Korhonen, and Thierry Poibeau. 2011. A weakly-supervised approach to argumentative zoning of scientific documents. In Proceedings of EMNLP, pages 273–283. Liang Huang. 2008. Forest reranking: Discriminative parsing with non-local features. In Proceedings of ACL-HLT, pages 586–594, June. 1167 Nikiforos Karamanis, Chris Mellish, Massimo Poesio, and Jon Oberlander. 2009. Evaluating centering for information ordering using corpora. Computational Linguistics, 35(1):29–46. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423–430. Mirella Lapata and Regina Barzilay. 2005. Automatic evaluation of text coherence: Models and representations. In Proceedings of IJCAI. Mirella Lapata. 2003. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of ACL, pages 545–552. Maria Liakata and Larisa Soldatova. 2008. Guidelines for the annotation of general scientific concepts. JISC Project Report. Maria Liakata, Simone Teufel, Advaith Siddharthan, and Colin Batchelor. 2010. Corpora for the conceptualisation and zoning of scientific papers. In Proceedings of LREC. Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng. 2009. Recognizing implicit discourse relations in the Penn Discourse Treebank. In Proceedings of EMNLP, pages 343–351. Ziheng Lin, Hwee Tou Ng, and Min-Yen Kan. 2011. Automatically evaluating text coherence using discourse relations. In Proceedings of ACL-HLT, pages 997– 1006. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313–330. Emily Pitler and Ani Nenkova. 2008. Revisiting readability: A unified framework for predicting text quality. In Proceedings of EMNLP, pages 186–195. Dragomir R. Radev, Mark Thomas Joseph, Bryan Gibson, and Pradeep Muthukrishnan. 2009. A Bibliometric and Network Analysis ofthe field of Computational Linguistics. Journal of the American Society for Information Science and Technology. David Reitter, Johanna D. Moore, and Frank Keller. 2006. Priming of Syntactic Rules in Task-Oriented Dialogue and Spontaneous Conversation. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, pages 685–690. Jeffrey C. Reynar and Adwait Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the fifth conference on Applied natural language processing, pages 16–19. Radu Soricut and Daniel Marcu. 2006. Discourse generation using utility-trained coherence models. In Proceedings of COLING-ACL, pages 803–810. John Swales. 1990. Genre analysis: English in academic and research settings, volume 11. Cambridge University Press. Simone Teufel and Marc Moens. 2000. What’s yours and what’s mine: determining intellectual attribution in scientific text. In Proceedings of EMNLP, pages 9– 17. Simone Teufel, Jean Carletta, and Marc Moens. 1999. An annotation scheme for discourse-level argumentation in research articles. In Proceedings of EACL, pages 110–1 17. Ying Zhao, George Karypis, and Usama Fayyad. 2005. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10: 141–168. 1168
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
4 0.44898039 113 emnlp-2012-Resolving This-issue Anaphora
Author: Varada Kolhatkar ; Graeme Hirst
Abstract: We annotate and resolve a particular case of abstract anaphora, namely, thisissue anaphora. We propose a candidate ranking model for this-issue anaphora resolution that explores different issuespecific and general abstract-anaphora features. The model is not restricted to nominal or verbal antecedents; rather, it is able to identify antecedents that are arbitrary spans of text. Our results show that (a) the model outperforms the strong adjacent-sentence baseline; (b) general abstract-anaphora features, as distinguished from issue-specific features, play a crucial role in this-issue anaphora resolution, suggesting that our approach can be generalized for other NPs such as this problem and this debate; and (c) it is possible to reduce the search space in order to improve performance.
5 0.43587154 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
6 0.43409476 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
7 0.43347961 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
8 0.43072656 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
9 0.42993453 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
10 0.42446503 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
11 0.42415613 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
12 0.42348149 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
13 0.4232648 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts
14 0.42231661 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
15 0.42168751 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
16 0.42093924 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
17 0.42057538 81 emnlp-2012-Learning to Map into a Universal POS Tagset
18 0.41927177 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
19 0.41742051 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
20 0.41723648 2 emnlp-2012-A Beam-Search Decoder for Grammatical Error Correction