acl acl2010 acl2010-93 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. [sent-2, score-0.671]
2 We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. [sent-3, score-0.338]
3 Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster. [sent-5, score-0.356]
4 (2005b) is a notable exception: the MST algorithm is exact search but not dynamic programming. [sent-9, score-0.242]
5 edu ct that runs in (almost) linear-time, yet searches over a huge space with dynamic programming? [sent-12, score-0.221]
6 We instead propose a dynamic programming alogorithm for shift-reduce parsing which runs in polynomial time in theory, but linear-time (with beam search) in practice. [sent-14, score-0.777]
7 The key idea is to merge equivalent stacks according to feature functions, inspired by Earley parsing (Earley, 1970; Stolcke, 1995) and generalized LR parsing (Tomita, 1991). [sent-15, score-0.386]
8 Our final parser achieves the best (unlabeled) accuracies that we are aware of in both English and Chinese among dependency parsers trained on the Penn Treebanks. [sent-18, score-0.31]
9 For convenience ofpresentation and experimentation, we will focus on shift-reduce parsing for dependency structures in the remainder of this paper, though our formalism and algorithm can also be applied to phrase-structure parsing. [sent-26, score-0.214]
10 1 Vanilla Shift-Reduce Shift-reduce parsing performs a left-to-right scan of the input sentence, and at each step, choose one of the two actions: either shift the current word onto the stack, or reduce the top two (or more) items at the end of the stack (Aho and Ullman, 1972). [sent-28, score-0.495]
11 To adapt it to dependency parsing, we split the reduce action into two cases, rex and rey, depending on which one of the two items becomes the head after reduction. [sent-29, score-0.339]
12 2 More formally, we describe a parser configuration by a state hj, Si where S is a stack of trees s0, s1, . [sent-32, score-0.526]
13 sh: move the head of queue, wj, onto stack S as a singleton tree; 2. [sent-44, score-0.314]
14 rex: combine the top two trees on the stack, s0 and s1, and replace them with tree s1xs0. [sent-45, score-0.161]
15 rey: combine the top two trees on the stack, s0 and s1, and replace them with tree s1ys0. [sent-47, score-0.161]
16 Note that the shorthand notation txt′ denotes a new tree by “attaching tree t′ as the leftmost child of the root of tree t”. [sent-48, score-0.244]
17 At step (4), we face a shiftreduce conflict: either combine “saw” and “Al” in a rey action (5a), or shift “with” (5b). [sent-53, score-0.284]
18 To resolve this conflict, there is a cost c associated with each state so that we can pick the best one (or few, with a beam) at each step. [sent-54, score-0.277]
19 Costs are accumulated in each step: as shown in Figure 1, actions sh, rex, and rey have their respective costs ξ, λ, and ρ, which are dot-products of the weights w and fea- tures extracted from the state and the action. [sent-55, score-0.357]
20 2 Features We view features as “abstractions” or (partial) observations of the current state, which is an important intuition for the development of dynamic programming in Section 3. [sent-57, score-0.283]
21 1(b)), consisting of the top few trees on the stack and the first few words on the queue. [sent-59, score-0.357]
22 t, capturing the root word of the top tree s0 on the stack, and the part-of-speech tag of the current head word q0 on the queue. [sent-64, score-0.243]
23 More formally, we denote f to be the feature function, such that f(j, S) returns a vector of feature instances for state hj, Si . [sent-72, score-0.228]
24 To decide which action i sn sthtaen bceesst ffoorr tshtaet ceu hrjre,nSti state, we perform a athctreioenway classification based on f(j, S), and to do so, we further conjoin these feature instances with the action, producing action-conjoined instances like (s0. [sent-73, score-0.156]
25 We denote fsh (j, S), frex (j, S), and frey (j, S) to be the conjoined feature instances, whose dotproducts with the weight vector decide the best action (see Eqs. [sent-76, score-0.329]
26 3 Beam Search and Early Update To improve on strictly greedy search, shift-reduce parsing is often enhanced with beam search (Zhang and Clark, 2008), where b states develop in parallel. [sent-80, score-0.645]
27 At each step we extend the states in the current beam by applying one of the three actions, and then choose the best b resulting states for the next step. [sent-81, score-0.571]
28 Our dynamic programming algorithm also runs on top of beam search in practice. [sent-82, score-0.722]
29 Dynamic programming turns out to be a great fit for early updating (see Section 4. [sent-86, score-0.176]
30 1 Merging Equivalent States The key observation for dynamic programming is to merge “equivalent states” in the same beam 3As a special case, for the deterministic mode (b=1), updates always co-occur with the first mistake made. [sent-89, score-0.624]
31 , same step) if they have the same feature values, because they will have the same costs as shown in the deductive system in Figure 1. [sent-143, score-0.249]
32 (4) Note that j = j′ is also needed because the queue head position j determines which word to shift next. [sent-146, score-0.188]
33 In practice, however, a small subset of atomic features will be enough to determine the whole feature vector, which we call kernel feadefined as the smallest set of atomic templaetes such that turesfe(j,S), fe(j, S) = fe(j′, S′) ⇒ hj, Si ∼ hj′, S′i. [sent-147, score-0.201]
34 s0i : (c, v, π) axiom (p0) predictor states ) ≺ ℓ :ee 0 : h0, 0, ǫi : (0, 0, ∅) ℓ + 1 : hj, jℓ + : 1 h, , s t jdat,−ep1 sd:. [sent-151, score-0.304]
35 s00,iw:j (ic :, ( ,c + ) ξ, 0, {p})j < n state p: rex π: ℓ : hi, j, sd. [sent-155, score-0.267]
36 (c, (c′, (c,v, sh ℓ: step; c, v: prefix and inside costs; state q: ℓ: + hk 1, : ih, k s,′d j. [sent-170, score-0.411]
37 Figure 3: Deductive system for shift-reduce parsing with dynamic programming. [sent-189, score-0.295]
38 The predictor state set π is an implicit graph-structured stack (Tomita, 1988) while the prefix cost c is inspired by Stolcke (1995). [sent-190, score-0.839]
39 The rey case is similar, replacing s′0xs0 with s′0ys0, and λ with ρ = w · frey (j, sd. [sent-191, score-0.172]
40 Since the queue is static information to the parser (unlike the stack, which changes dynamically), we can use j to replace features from the queue. [sent-197, score-0.16]
41 ,f0(s0)) if the feaeture window looks at top d + 1 trees on stack, and where fi(si) extracts kernel features from tree si (0 ≤ i≤ d). [sent-201, score-0.279]
42 For shift, this suffices as the stack grows on the right; but for reduce actions the stack shrinks, and in order still to maintain d + 1trees, we have to know something about the history. [sent-213, score-0.57]
43 This is exactly why we needed the full stack for vanilla shift-reduce parsing in the first place, and why dynamic pro- gramming seems hard here. [sent-214, score-0.611]
44 Basically, each state p carries with it a set π(p) of predictor states, each of which can be combined with p in a reduction step. [sent-216, score-0.252]
45 In a shift step, if state p generates state q (we say “p predicts q” in Earley (1970) terms), then p is added onto π(q). [sent-217, score-0.298]
46 When two equivalent shifted states get merged, their predictor states get combined. [sent-218, score-0.459]
47 In a reduction step, state q tries to combine with every predictor state p ∈ π(q), and the resulting state r inherits tshtaet predictor qst),ate asn dse tth efro rmesu p, i. [sent-219, score-0.665]
48 Interestingly, when two equivalent reduced states get merged, we can prove (by induction) that their predictor states are identical (proof omitted). [sent-222, score-0.459]
49 Figure 3 shows the new deductive system with dynamic programming and GSS. [sent-223, score-0.404]
50 It can be combined with some predictor state p spanning [k. [sent-232, score-0.252]
51 In fact, the chart in Earley and other agenda-based parsers is indeed a GSS when viewed left-to-right. [sent-241, score-0.161]
52 In these parsers, when a state is popped up from the agenda, it looks for possible sibling states that can combine with it; GSS, however, explicitly maintains these predictor states so that the newlypopped state does not need to look them up. [sent-242, score-0.636]
53 The deductive system is optimal and runs in worst-case polynomial time as long as the kernel feature function satisfies two properties: • fe(j, bounded: S) = (j, fd(sd), . [sent-245, score-0.335]
54 This is also natural, since the information most relevant to the current decision is always around the stack top. [sent-252, score-0.26]
55 To decide the ordering of states in each beam we borrow the concept of prefix cost from Stolcke (1995), originally developed for weighted Earley parsing. [sent-269, score-0.717]
56 3, the prefix cost c is the total cost of the best action sequence from the initial state to the end of state p, i. [sent-271, score-0.78]
57 , it includes both the inside cost v (for Viterbi inside derivation), and the cost of the (best) path leading towards the beginning of state p. [sent-273, score-0.56]
58 We say that a state p with prefix cost c is better than a state p′ with prefix cost c′, notated p ≺ p′ in Fig. [sent-274, score-0.89]
59 We can also prove (by contradiction) that optimizing for prefix cost implies optimal inside cost (Nederhof, 2003, Sec. [sent-276, score-0.548]
60 The combo cost δ = ξ′ + consists of shift cost ξ′ of p and reduction cost of q. [sent-281, score-0.603]
61 The cost in the non-DP shift-reduce algorithm λ λ (Fig. [sent-282, score-0.159]
62 1) is indeed a prefix cost, and the DP algorithm subsumes the non-DP one as a special case where no two states are equivalent. [sent-283, score-0.301]
63 , 2005a) using shift-reduce with dynamic programming, which is similar to bilexical PCFG parsing using CKY (Eisner and Satta, 1999). [sent-286, score-0.295]
64 Here the kernel feature function is fe(j,S) = (j,h(s1),h(s0)) e 5Note thate using inside cost v for ordering would be a bad idea, as ite will always prefer shorter derivations like in best-first parsing. [sent-287, score-0.334]
65 As in A* search, we need some estimate of “outside cost” to predict which states are more promising, and the prefix cost includes an exact cost for the left outside context, but no right outside context. [sent-288, score-0.619]
66 i where rex cost λ = w · frex (h′, h) Figure 4: Example of shift-reduce with dynamic programming: simulating an edge-factored model. [sent-304, score-0.552]
67 where h(x) returns the head word index of tree x, because all features in this model are based on the head and modifier indices in a dependency link. [sent-306, score-0.285]
68 The theoretical complexity of this algorithm is O(n7) because in a reduction step we have three span indices and three head indices, plus a step index ℓ. [sent-308, score-0.222]
69 (2009) in Python (henceforth “non-DP”), and then extended it to do dynamic programing (henceforth “DP”). [sent-312, score-0.159]
70 We evaluate their performances on the standard Penn Treebank (PTB) English dependency parsing task7 using the standard split: secs 02-21 for training, 22 for development, and 23 for testing. [sent-313, score-0.321]
71 Both DP and non-DP parsers use the same feature templates in Table 1. [sent-314, score-0.216]
72 1 Speed Comparisons To compare parsing speed between DP and nonDP, we run each parser on the development set, varying the beam width b from 2 to 16 (DP) or 64 (non-DP). [sent-324, score-0.532]
73 5a shows the relationship between search quality (as measured by the average model score per sentence, higher the better) and speed (average parsing time per sentence), where DP with a beam width of b=16 achieves the same search quality with non-DP at b=64, while being 5 times faster. [sent-326, score-0.61]
74 5c), since the less expressive feature set makes more states “equivalent” and mergeable in DP. [sent-335, score-0.188]
75 5d shows the (almost linear) correlation between dependency accuracy and search quality, confirming that better search yields better parsing. [sent-337, score-0.244]
76 2 Search Space, Forest, and Oracles DP achieves better search quality because it expores an exponentially large search space rather than only b trees allowed by the beam (see Fig. [sent-339, score-0.538]
77 As a by-product, DP can output a forest encoding these exponentially many trees, out of which we can draw longer and better (in terms of oracle) kbest lists than those in the beam (see Fig. [sent-341, score-0.384]
78 3 Perceptron Training and Early Updates Another interesting advantage of DP over non-DP is the faster training with perceptron, even when both parsers use the same beam width. [sent-350, score-0.501]
79 3), which happen much more often with DP, because a gold- standard state p is often merged with an equivalent (but incorrect) state that has a higher model score, which triggers update immediately. [sent-353, score-0.334]
80 By contrast, in non-DP beam search, states such as p might still 8DP’s k-best lists are extracted from the forest using the algorithm of Huang and Chiang (2005), rather than those in the final beam as in the non-DP case, because many derivations have been merged during dynamic programming. [sent-354, score-0.917]
81 time (full model) (d) correlation b/w parsing (y) and search (x) Figure 5: Speed comparisons between DP and non-DP, with beam size b ranging 2∼16 for DP and 2∼64 fFoirg unroen 5-D: SPp. [sent-358, score-0.476]
82 1W6i tfoh rt DheP same l∼ev6e4l of search quality or parsing accuracy, DP (at b=16) is ∼4. [sent-361, score-0.219]
83 sentence length (a) sizes of search spaces k (b) oracle precision on dev Figure 6: DP searches over a forest of exponentially many trees, which also produces better and longer k-best lists with higher oracles, while non-DP only explores b trees allowed in the beam (b = 16 here). [sent-366, score-0.579]
84 survive in the beam throughout, even though it is no longer possible to rank the best in the beam. [sent-371, score-0.257]
85 The higher frequency of early updates results in faster iterations of perceptron training. [sent-372, score-0.343]
86 While the number of updates is roughly comparable between DP and non-DP, the rate of early updates is much higher with DP, and the time per iteration is consequently shorter. [sent-374, score-0.22]
87 Our parser achieves the highest (unlabeled) dependency accuracy among dependency parsers trained on the Treebank, and is also much faster than most other parsers even with a pure Python implementation T125iat7bleu23p1805:d6927aP3148et536rceap9 rtl78 yo. [sent-387, score-0.595]
88 Early updates happen much more often with DP due to equivalent state merging, which leads to faster training (time in minutes). [sent-390, score-0.398]
89 Our parser (in pure Python) has the highest accuracy among dependency parsers trained on the Treebank, and is also much faster than major parsers. [sent-408, score-0.41]
90 The observed time complexity ofour DP parser is in fact linear compared to the superlinear complexity of Charniak, MST (McDonald et al. [sent-417, score-0.162]
91 9,10 5 Related Work This work was inspired in part by Generalized LR parsing (Tomita, 1991) and the graph-structured stack (GSS). [sent-442, score-0.396]
92 Tomita uses GSS for exhaustive LR parsing, where the GSS is equivalent to a dy- namic programming chart in chart parsing (see Footnote 4). [sent-443, score-0.427]
93 In fact, Tomita’s GLR is an instance of techniques for tabular simulation of nondeterministic pushdown automata based on deductive systems (Lang, 1974), which allow for cubictime exhaustive shift-reduce parsing with contextfree grammars (Billot and Lang, 1989). [sent-444, score-0.257]
94 Second, unlike previous the- oretical results about cubic-time complexity, we achieved linear-time performance by smart beam search with prefix cost inspired by Stolcke (1995), allowing for state-of-the-art data-driven parsing. [sent-452, score-0.667]
95 To the best of our knowledge, our work is the first linear-time incremental parser that performs dynamic programming. [sent-453, score-0.298]
96 6 Conclusion We have presented a dynamic programming algorithm for shift-reduce parsing, which runs in linear-time in practice with beam search. [sent-455, score-0.602]
97 Empirical results on a state-the-art dependency parser confirm the advantage of DP in many aspects: faster speed, larger search space, higher oracles, and better and faster learning. [sent-457, score-0.523]
98 Our final parser outperforms all previously reported dependency parsers trained on the Penn Treebanks for both English and Chinese, and is much faster in speed (even with a Python implementation). [sent-458, score-0.461]
99 An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. [sent-578, score-0.304]
100 A tale of two parsers: investigating and combining graphbased and transition-based dependency parsing using beam-search. [sent-597, score-0.214]
wordName wordTfidf (topN-words)
[('dp', 0.292), ('stack', 0.26), ('beam', 0.257), ('hj', 0.212), ('gss', 0.192), ('prefix', 0.168), ('cost', 0.159), ('dynamic', 0.159), ('tomita', 0.149), ('rex', 0.149), ('faster', 0.137), ('parsing', 0.136), ('predictor', 0.134), ('states', 0.133), ('programming', 0.124), ('deductive', 0.121), ('state', 0.118), ('rey', 0.116), ('cky', 0.112), ('parsers', 0.107), ('secs', 0.107), ('earley', 0.097), ('lr', 0.094), ('python', 0.091), ('parser', 0.088), ('frex', 0.085), ('updates', 0.084), ('search', 0.083), ('dependency', 0.078), ('fe', 0.077), ('zhang', 0.077), ('fsh', 0.075), ('costs', 0.073), ('queue', 0.072), ('forest', 0.072), ('perceptron', 0.07), ('monotonic', 0.069), ('huang', 0.066), ('bounded', 0.064), ('tree', 0.064), ('boundedness', 0.064), ('combo', 0.064), ('sh', 0.063), ('ft', 0.062), ('shift', 0.062), ('inside', 0.062), ('runs', 0.062), ('mst', 0.061), ('trees', 0.06), ('clark', 0.06), ('si', 0.06), ('mcdonald', 0.06), ('equivalent', 0.059), ('action', 0.058), ('kernel', 0.058), ('oracles', 0.056), ('frey', 0.056), ('vanilla', 0.056), ('koo', 0.056), ('feature', 0.055), ('exponentially', 0.055), ('templates', 0.054), ('chart', 0.054), ('head', 0.054), ('early', 0.052), ('root', 0.052), ('dev', 0.052), ('incremental', 0.051), ('speed', 0.051), ('actions', 0.05), ('hours', 0.049), ('step', 0.048), ('charniak', 0.046), ('chinese', 0.044), ('atomic', 0.044), ('duan', 0.043), ('ja', 0.043), ('sd', 0.043), ('glr', 0.043), ('joe', 0.043), ('tshtaet', 0.043), ('merged', 0.039), ('polynomial', 0.039), ('stolcke', 0.039), ('roark', 0.038), ('complexity', 0.037), ('frazier', 0.037), ('aho', 0.037), ('axiom', 0.037), ('peaking', 0.037), ('top', 0.037), ('accuracies', 0.037), ('tag', 0.036), ('greedy', 0.036), ('hi', 0.036), ('eisner', 0.036), ('constituency', 0.035), ('indices', 0.035), ('billot', 0.034), ('masaru', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
2 0.17971431 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
3 0.15430735 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii
Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).
4 0.14994307 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
5 0.14482853 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
Author: Martin Haulrich
Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.
6 0.13209155 71 acl-2010-Convolution Kernel over Packed Parse Forest
7 0.13108009 69 acl-2010-Constituency to Dependency Translation with Forests
8 0.12366986 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
9 0.11933507 133 acl-2010-Hierarchical Search for Word Alignment
10 0.11046597 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
11 0.10848286 243 acl-2010-Tree-Based and Forest-Based Translation
12 0.10377447 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
13 0.095332719 236 acl-2010-Top-Down K-Best A* Parsing
14 0.090315558 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
15 0.085586928 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
16 0.085389398 169 acl-2010-Learning to Translate with Source and Target Syntax
17 0.085012436 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
18 0.083989218 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
19 0.083844133 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
20 0.083640218 114 acl-2010-Faster Parsing by Supertagger Adaptation
topicId topicWeight
[(0, -0.242), (1, -0.065), (2, 0.098), (3, -0.022), (4, -0.125), (5, -0.124), (6, 0.093), (7, 0.014), (8, -0.049), (9, 0.144), (10, -0.197), (11, -0.042), (12, 0.008), (13, 0.125), (14, -0.005), (15, 0.015), (16, -0.018), (17, 0.007), (18, -0.067), (19, -0.0), (20, 0.059), (21, -0.024), (22, 0.023), (23, -0.058), (24, -0.022), (25, 0.105), (26, 0.004), (27, 0.052), (28, -0.044), (29, 0.012), (30, 0.127), (31, -0.021), (32, 0.02), (33, 0.021), (34, -0.059), (35, 0.013), (36, -0.086), (37, -0.025), (38, 0.075), (39, 0.087), (40, 0.019), (41, 0.008), (42, -0.026), (43, -0.025), (44, 0.058), (45, 0.089), (46, 0.03), (47, 0.178), (48, -0.002), (49, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.96083081 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
2 0.77459443 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
3 0.70880598 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
Author: Martin Haulrich
Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.
4 0.6815778 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii
Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).
5 0.67151505 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
6 0.60875297 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing
7 0.59375995 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
8 0.5878163 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
9 0.50610679 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
10 0.49680793 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
11 0.49275655 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
12 0.47874099 69 acl-2010-Constituency to Dependency Translation with Forests
13 0.46171707 114 acl-2010-Faster Parsing by Supertagger Adaptation
14 0.44983804 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
15 0.44921699 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
16 0.44781002 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
17 0.44584259 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
18 0.4284572 71 acl-2010-Convolution Kernel over Packed Parse Forest
19 0.41720313 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
20 0.41044733 236 acl-2010-Top-Down K-Best A* Parsing
topicId topicWeight
[(14, 0.067), (25, 0.08), (33, 0.011), (39, 0.026), (42, 0.019), (44, 0.024), (59, 0.087), (65, 0.192), (73, 0.033), (76, 0.031), (78, 0.053), (83, 0.099), (84, 0.04), (98, 0.151)]
simIndex simValue paperId paperTitle
1 0.87965751 66 acl-2010-Compositional Matrix-Space Models of Language
Author: Sebastian Rudolph ; Eugenie Giesbrecht
Abstract: We propose CMSMs, a novel type of generic compositional models for syntactic and semantic aspects of natural language, based on matrix multiplication. We argue for the structural and cognitive plausibility of this model and show that it is able to cover and combine various common compositional NLP approaches ranging from statistical word space models to symbolic grammar formalisms.
same-paper 2 0.85747391 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
3 0.75721085 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."
Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth
Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.
4 0.75044632 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
5 0.74723184 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
6 0.74669552 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
7 0.72884011 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
8 0.72868574 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
9 0.72785234 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
10 0.72751528 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
11 0.72707528 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
12 0.72694731 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
13 0.72616303 99 acl-2010-Efficient Third-Order Dependency Parsers
14 0.72569627 162 acl-2010-Learning Common Grammar from Multilingual Corpus
15 0.72370636 169 acl-2010-Learning to Translate with Source and Target Syntax
16 0.72342157 130 acl-2010-Hard Constraints for Grammatical Function Labelling
17 0.72339892 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
18 0.72319758 133 acl-2010-Hierarchical Search for Word Alignment
19 0.72245753 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
20 0.72196686 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans