acl acl2013 acl2013-133 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Beam search incremental parsers are accurate, but not as fast as they could be. [sent-4, score-0.189]
2 We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. [sent-5, score-0.412]
3 We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. [sent-6, score-0.091]
4 We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. [sent-7, score-0.355]
5 Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences. [sent-8, score-0.131]
6 1 Introduction Beam search incremental parsers (Roark, 2001 ; Collins and Roark, 2004; Zhang and Clark, 2008; Huang et al. [sent-9, score-0.189]
7 , 2009; Huang and Sagae, 2010; Zhang and Nivre, 2011; Zhang and Clark, 2011) provide very competitive parsing accuracies for various grammar formalisms (CFG, CCG, and dependency grammars). [sent-10, score-0.094]
8 cuom belief, in most standard implementations their actual runtime is in fact O(kn2) rather than linear. [sent-25, score-0.068]
9 Although this argument in general also applies to dynamic programming (DP) parsers,1 in this paper we only focus on the standard, non-dynamic programming approach since it is arguably still the dominant practice (e. [sent-26, score-0.07]
10 The dependence on the beam-size k is because one needs to do k-times the number ofbasic opera- tions (feature-extractions, dot-products, and statetransitions) relative to a greedy parser (Nivre and Scholz, 2004; Goldberg and Elhadad, 2010). [sent-30, score-0.099]
11 Note that in a beam setting, the same state can expand to several new states in the next step, which is usually achieved by copying the state prior to making a transition, whereas greedy search only stores one state which is modified in-place. [sent-31, score-0.933]
12 Copying amounts to a large fraction of the slowdown of beam-based with respect to greedy parsers. [sent-32, score-0.081]
13 Copying is expensive, because the state keeps track of (a) a stack and (b) the set of dependency-arcs added so far. [sent-33, score-0.648]
14 Both the arc-set and the stack can grow to O(n) size in the worst-case, making the state-copy (and hence state-transition) an O(n) operation. [sent-34, score-0.46]
15 Thus, beam search implementations that copy the entire state are in fact quadratic O(kn2) and not linear, with a slowdown factor of O(kn) with respect to greedy parsers, which is confirmed empirically in Figure 4. [sent-35, score-0.537]
16 edu) does run in O(kn), which inspired this paper when we experimented with simulating non-DP beam search using GSS. [sent-40, score-0.258]
17 i : ∅ w0 ‘ + 1‘ : h :j hj +, 1 S,i S :| Awji : Aj < n REDUCEL ‘ + 1 :‘ hj :, hj S,|s S0|si1 :|s A0i ∪ : { As1xs0} REDUCER ‘ + 1 :‘ hj :, hj S,|s S1|si1 :|s A0i ∪ : { As1ys0} goal 2n − 1 : hn, s0i : A Figure 1: An abstraction of the arc-standard deductive system Nivre (2008). [sent-47, score-0.412]
18 The stack S is a list of heads, j is the index of the token at the front of the buffer, and ‘ is the step number (beam index). [sent-48, score-0.565]
19 A is the arc-set of dependency arcs accumulated so far, which we will get rid of in Section 4. [sent-49, score-0.081]
20 version, being linear time, leads to a speedup of 2x∼2. [sent-51, score-0.067]
21 We present a simple scheme of sharing repeated scoring operations across different beam items, resulting in an additional 7 to 25% speed increase. [sent-59, score-0.367]
22 On Treebank sentences, the methods combined lead to a speedup of ∼2x over strong baselbi nneesd (∼10x over enaedivuep ones), xan odv on longer sentliennecses ( they are oerrd nearisv oef o magnitude fnas ltoenr. [sent-60, score-0.174]
23 We briefly describe a standard shift-reduce dependency parser (which is called “arc-standard” by Nivre) to establish notation. [sent-63, score-0.095]
24 Parsing transitions are applied to states, and result in new states. [sent-65, score-0.198]
25 We thank Yue Zhang for informing us that TSS was already implemented for the CCG parser in zpar (http : // sourceforge . [sent-68, score-0.127]
26 When reaching the goal state the parser returns a tree composed of the arcs in the arc-set. [sent-72, score-0.333]
27 At parsing time, transitions are chosen based on a trained scoring model which looks at features of the state. [sent-73, score-0.289]
28 In a beam parser, k items (hypotheses) are maintained. [sent-74, score-0.299]
29 At step i, each of the k items is extended by applying all possible transitions to the given state, resulting in k a items, a being the ngiuvmenbe srt aotef possible gtr ianns kit i×on as. [sent-76, score-0.267]
30 eOmf these, tinheg top scoring k items are kept and used in step i+ 1. [sent-77, score-0.1]
31 Finally, the tree associated with the highest-scoring item is returned. [sent-78, score-0.091]
32 3 The Common Implementation of State The stack is usually represented as a list or an array of token indices, and the arc-set as an array heads of length n mapping the word at position m to the index of its parent. [sent-79, score-0.761]
33 In order to allow for fast feature extraction, additional arrays are used to map each token to its left-most and right-most modi- fier, which are used in most incremental parsers, e. [sent-80, score-0.168]
34 The buffer is usually implemented as a pointer to a shared sentence object, and an index j to the current front of the buffer. [sent-83, score-0.373]
35 Finally, it is common to keep an additional array holding the transition sequence leading to the current state, which can be represented compactly as a pointer to the previous state and the current action. [sent-84, score-0.554]
36 The state structure is summarized below: class state st ack [ n ] o f t oken_ids array [ n ] heads array [ n ] le ftmo st_modi fie rs array [ n ] rightmo st_modi fie rs int j int last_act ion st ate previ ous In a greedy parser, state transition is performed inplace. [sent-85, score-1.659]
37 However, in a beam parser the states cannot be modified in place, and a state transition operation needs to result in a new, independent state object. [sent-86, score-0.849]
38 Copying a stack and arrays of size n is an 629 O(n) operation. [sent-88, score-0.51]
39 In what follows, we present a way to perform transitions in O(1). [sent-89, score-0.198]
40 1 Distributed Representation of Trees The state needs to keep track of the set of arcs added to the tree so far for two reasons: (a) In order to return the complete tree at the end. [sent-91, score-0.371]
41 In the feature set we use (Huang and Sagae, 2010), we need access to (1) items on the buffer, (2) the 3 top-most elements of the stack, and (3) the current left-most and right-most modifiers of the two topmost stack elements. [sent-96, score-0.581]
42 The left-most and right-most modifiers are already kept in the state – representation, but store more information than needed: we only need to keep track of the modifiers of current stack items. [sent-97, score-0.826]
43 Once a token is removed from the stack it will never return, and we will not need access to its modifiers again. [sent-98, score-0.554]
44 We can therefore remove the left/rightmost modifier arrays, and instead have the stack store triplets (token, leftmost_mod, rightmost_mod) . [sent-99, score-0.504]
45 Our new state representation becomes: class st ate st ack [ n ] o f ( t ok int j int la st_act i on st at e previous , le ft , right ) 4. [sent-101, score-0.746]
46 Notice that the buffer, which is also of size O(n), is represented as a pointer to an immutable shared object, and is therefore very efficient to copy. [sent-103, score-0.219]
47 We would like to treat the stack in a similar fashion. [sent-104, score-0.46]
48 An immutable stack can be implemented functionally as a cons list, where the head is the top of the stack and the tail is the rest of the stack. [sent-105, score-1.071]
49 Pushing an item to the stack amounts to adding a new head link to the list and returning it. [sent-106, score-0.514]
50 Popping an item from the stack amounts to returning the tail of the list. [sent-107, score-0.596]
51 Notice that, crucially, a pop operation does not change the underlying list at all, and a push operation only adds to the front of a list. [sent-108, score-0.258]
52 Thus, the stack operations are non-destructive, in the sense that once you hold a reference to a stack, the view of the stack through this reference does not change regardless of future operations that are applied to the stack. [sent-109, score-1.002]
53 This stack implementation is an example of a persistent data structure a data structure inspired by functional programming which keeps the old versions of itself intact when modified (Okasaki, 1999). [sent-111, score-0.569]
54 While each client sees the stack as a list, the underlying representation is a tree, and clients hold pointers to nodes in the tree. [sent-112, score-0.521]
55 A push operation adds a branch to the tree and returns the new pointer, while a pop operation returns the pointer of the parent, see Figure 3 for an example. [sent-113, score-0.414]
56 – Using this stack representation, we can replace the O(n) stack by an integer holding the item at the top of the stack (s 0), and a pointer to the tail of the stack (tail). [sent-115, score-2.128]
57 As discussed above, in addition to the top of the stack we also keep its leftmost and rightmost modifiers s 0L and s 0R. [sent-116, score-0.583]
58 The simplified state representation becomes: class stat e int s 0 s 0L s 0R st ate t ai l int j int last_act ion st ate previ ous , , State is now reduced to seven integers, and the transitions can be implemented very efficiently as we show in Figure 2. [sent-117, score-1.338]
59 The parser state is transformed into a compact object, and state transitions are O(1) operations involving only a few pointer lookups and integer assignments. [sent-118, score-0.828]
60 GSS; Space Complexity TSS is inspired by the graph-structured stack (GSS) used in the dynamic-programming parser of Huang and Sagae (2010), but without reentrancy (see also Footnote 2). [sent-121, score-0.549]
61 More importantly, the state signature in TSS is much slimmer than that in GSS. [sent-122, score-0.292]
62 Using the notation of Huang and Sagae, instead of maintaining the full DP signature of efDP(j,S) = (j,fd(sd), . [sent-123, score-0.104]
63 ,f0(s0)) where sed denotes the dth tree on stack, in non-DP TSS we only need to store the features f0(s0) for the final tree on the stack, fenoDP(j,S) = (j,f0(s0)), 630 def Shi ft ( st ate ) newst ate . [sent-126, score-0.889]
64 j + 1 return new s t at e def ReduceL ( state ) newst ate . [sent-133, score-0.724]
65 j = j return news t at e def ReduceR ( st ate ) newst ate . [sent-144, score-0.803]
66 j = j return new s t at e Figure 2: State transitions implementation in the TSS representation (see Fig. [sent-156, score-0.276]
67 The forward arrows denote state transitions, and the dotted backward arrows are the tail pointers to the stack tail. [sent-161, score-0.791]
68 Notice that for b = shift(a) we perform a single push operation getting b. [sent-163, score-0.126]
69 tail = a, while for b = reduce(a) transition we perform two pops and a push, resulting in b. [sent-164, score-0.211]
70 thanks to the uniqueness of tail pointers (“leftpointers” in Huang and Sagae). [sent-168, score-0.143]
71 In terms of space complexity, each state is reduced from O(n) in size to O(d) with GSS and to O(1) with TSS,3 making it possible to store the entire beam in O(kn) space. [sent-169, score-0.462]
72 Moreover, the constant state-size makes memory management easier and reduces fragmentation, by making it possible to pre-allocate the entire beam upfront. [sent-170, score-0.23]
73 (201 1), our approach is also applicable to other transitions systems and richer feature-sets with some additional book-keeping. [sent-175, score-0.198]
74 A well3For example, a GSS state in Huang and Sagae’s experiments also stores s 1, s 1L, s 1R, s 2 besides the f0 (s0) features (s 0, s 0L, s 0R) needed by TSS. [sent-176, score-0.188]
75 This is done by first computing transition scores from each beam item, then keeping the top k highest scoring (state, action) pairs, performing only those k transitions. [sent-180, score-0.352]
76 This technique is especially important when the number of possible transitions is large, such as in labeled parsing. [sent-181, score-0.198]
77 Notice that relatively few token indices from a state can determine the values of many features. [sent-188, score-0.262]
78 For example, knowing the buffer index j determines the words and tags of items after location j on the buffer, as well as features composed of combinations of these values. [sent-189, score-0.226]
79 Based on this observation we propose the notion of a state signature, which is a set of token indices. [sent-190, score-0.23]
80 An example of a state signature would be sig(state) = (s0, s0L, s1, s1L), indicating the indices of the two tokens at the top of the stack together with their leftmost modifiers. [sent-191, score-0.825]
81 Given a sig631 Figure 4: Non-linearity of the standard beam search compared to the linearity of our TSS beam search for labeled arc-eager and unlabeled arcstandard parsers on long sentences (running times vs. [sent-192, score-0.573]
82 nature, we decompose the feature function φ(x) into two parts φ(x) = φs (sig(x)) + φo(x), where φs (sig(x)) extracts all features that depend exclusively on signature items, and φo(x) extracts all other features. [sent-195, score-0.104]
83 )D duer-ing bpoesames decoding, we m(xa)i)n+t ai nw a φcache mapping seen signatures sig(state) to (partial) transition scores w · φs (sig(state)) . [sent-197, score-0.181]
84 We now need tsoit coanlc sucloartee w · φo(x) for each beam item, but w · φs (sig(x)) only for one of the items sharing twhe · signature. [sent-198, score-0.332]
85 Defining the signature involves a natural balance between signatures that repeat often and signatures that cover many features. [sent-199, score-0.21]
86 In the experiments in this paper, we chose the signature function for the arc-standard parser to contain all core elements participating in feature extraction5, and for the arc-eager parser a signature containing only a partial 7 subset. [sent-200, score-0.33]
87 6 Experiments We implemented beam-based parsers using the traditional approach as well as with our proposed extension and compared their runtime. [sent-201, score-0.144]
88 system plain plain plain plain +TSS+lazy +TSS +lazy +TSS +feat-share (sec 3) (sec 4) (sec 5) +lazy (sec 6) ArcS-U20. [sent-207, score-0.18]
89 All parsers are implemented in Python, with dot- products in C. [sent-223, score-0.144]
90 As parsing time is dominated by score computation, the effect is too small to be measured on natural language sentences, but it is noticeable for longer sentences. [sent-228, score-0.094]
91 We argue parsing longer sentences is by itself an interesting and potentially important problem (e. [sent-230, score-0.094]
92 Our next set of experiments compares the actual speedup observed on English sentences. [sent-235, score-0.067]
93 Table 1 shows the speed of the parsers (sentences/second) with the various proposed optimization techniques. [sent-236, score-0.145]
94 We first train our parsers on Sections 02– 21 of PTB, using Section 22 as the test set. [sent-237, score-0.113]
95 The accuracies of all our parsers are at the state-ofthe-art level. [sent-238, score-0.113]
96 8 Conclusions We demonstrated in both theory and experiments that the standard implementation of beam search parsers run in O(n2) time, and have presented improved algorithms which run in O(n) time. [sent-240, score-0.389]
97 Combined with other techniques, our method offers significant speedups (∼2x) over strong baselines, or 1n0ifxic over npaeievdeu ones, a2nxd) i osv oerrd estrrso nofg magnitude faster on much longer sentences. [sent-241, score-0.139]
98 We have demonstrated that our approach is general and we believe it will benefit many other incremental parsers. [sent-242, score-0.076]
99 Pharaoh: a beam search decoder for phrase-based statistical machine translation models. [sent-271, score-0.23]
100 A tale of two parsers: investigating and combining graphbased and transition-based dependency parsing using beam-search. [sent-299, score-0.094]
wordName wordTfidf (topN-words)
[('stack', 0.46), ('tss', 0.391), ('newst', 0.239), ('beam', 0.23), ('ate', 0.217), ('transitions', 0.198), ('state', 0.188), ('pointer', 0.152), ('gss', 0.13), ('sagae', 0.13), ('buffer', 0.127), ('parsers', 0.113), ('sig', 0.111), ('signature', 0.104), ('array', 0.093), ('transition', 0.091), ('newstate', 0.087), ('reducel', 0.087), ('nivre', 0.087), ('tail', 0.082), ('huang', 0.078), ('incremental', 0.076), ('hj', 0.076), ('int', 0.074), ('reducer', 0.071), ('items', 0.069), ('push', 0.069), ('copying', 0.067), ('speedup', 0.067), ('tomita', 0.065), ('sec', 0.063), ('lazy', 0.063), ('parser', 0.061), ('pointers', 0.061), ('parsing', 0.06), ('stat', 0.058), ('operation', 0.057), ('zhang', 0.056), ('item', 0.054), ('signatures', 0.053), ('kuhlmann', 0.053), ('modifiers', 0.052), ('kn', 0.051), ('ail', 0.05), ('arrays', 0.05), ('yue', 0.05), ('st', 0.05), ('roark', 0.05), ('def', 0.048), ('arcs', 0.047), ('implementation', 0.046), ('plain', 0.045), ('store', 0.044), ('ack', 0.043), ('beams', 0.043), ('oerrd', 0.043), ('slowdown', 0.043), ('heads', 0.043), ('pop', 0.042), ('shift', 0.042), ('token', 0.042), ('leftmost', 0.041), ('operations', 0.041), ('previ', 0.038), ('immutable', 0.038), ('pops', 0.038), ('greedy', 0.038), ('implementations', 0.038), ('python', 0.037), ('tree', 0.037), ('ai', 0.037), ('goldberg', 0.036), ('zpar', 0.035), ('programming', 0.035), ('dependency', 0.034), ('ccg', 0.034), ('states', 0.034), ('longer', 0.034), ('notice', 0.034), ('fie', 0.033), ('borrowed', 0.033), ('front', 0.033), ('sharing', 0.033), ('yoav', 0.032), ('speed', 0.032), ('indices', 0.032), ('return', 0.032), ('ous', 0.032), ('deductive', 0.032), ('speedups', 0.032), ('scoring', 0.031), ('dp', 0.031), ('implemented', 0.031), ('magnitude', 0.03), ('keep', 0.03), ('runtime', 0.03), ('joakim', 0.03), ('index', 0.03), ('efficient', 0.029), ('inspired', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
2 0.32765007 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
3 0.29961339 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
4 0.24028994 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
5 0.22996677 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
Author: Ji Ma ; Jingbo Zhu ; Tong Xiao ; Nan Yang
Abstract: In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. We propose a simple variant of “early-update” to ensure valid update in the training process. The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. On CTB, we achieve 94.01% tagging accuracy and 86.33% unlabeled attachment score with a relatively small beam width. On PTB, we also achieve state-of-the-art performance. 1
6 0.22304426 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
7 0.10409355 80 acl-2013-Chinese Parsing Exploiting Characters
8 0.10105815 288 acl-2013-Punctuation Prediction with Transition-based Parsing
9 0.097543448 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
10 0.079911761 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
11 0.078492351 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
12 0.076469883 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
13 0.071992137 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.069504037 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
15 0.068399638 66 acl-2013-Beam Search for Solving Substitution Ciphers
16 0.065159932 275 acl-2013-Parsing with Compositional Vector Grammars
17 0.062593736 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
18 0.059778526 124 acl-2013-Discriminative state tracking for spoken dialog systems
19 0.057988454 335 acl-2013-Survey on parsing three dependency representations for English
20 0.057369828 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
topicId topicWeight
[(0, 0.152), (1, -0.122), (2, -0.2), (3, 0.033), (4, -0.128), (5, 0.009), (6, 0.146), (7, -0.028), (8, 0.001), (9, -0.104), (10, -0.021), (11, 0.076), (12, -0.104), (13, -0.041), (14, 0.273), (15, 0.015), (16, -0.196), (17, -0.07), (18, -0.012), (19, -0.025), (20, -0.084), (21, 0.039), (22, 0.046), (23, 0.029), (24, 0.012), (25, -0.002), (26, -0.038), (27, 0.036), (28, 0.054), (29, 0.034), (30, -0.077), (31, -0.041), (32, 0.016), (33, -0.043), (34, -0.087), (35, -0.039), (36, -0.003), (37, 0.101), (38, -0.209), (39, 0.055), (40, 0.036), (41, -0.025), (42, 0.052), (43, 0.158), (44, -0.023), (45, 0.029), (46, 0.023), (47, -0.031), (48, 0.035), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.96787471 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
2 0.83355898 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
3 0.82788849 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
Author: Ji Ma ; Jingbo Zhu ; Tong Xiao ; Nan Yang
Abstract: In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. We propose a simple variant of “early-update” to ensure valid update in the training process. The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. On CTB, we achieve 94.01% tagging accuracy and 86.33% unlabeled attachment score with a relatively small beam width. On PTB, we also achieve state-of-the-art performance. 1
4 0.82131821 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
5 0.67695594 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
6 0.65542251 288 acl-2013-Punctuation Prediction with Transition-based Parsing
7 0.61429995 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
8 0.5210647 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
9 0.51902336 335 acl-2013-Survey on parsing three dependency representations for English
10 0.4447313 176 acl-2013-Grounded Unsupervised Semantic Parsing
11 0.44031581 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
12 0.37657768 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
13 0.35812667 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.34670824 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
15 0.34129345 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
16 0.34071913 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
17 0.33709347 275 acl-2013-Parsing with Compositional Vector Grammars
18 0.3293581 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
19 0.3246623 190 acl-2013-Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
20 0.31531009 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
topicId topicWeight
[(0, 0.074), (6, 0.045), (11, 0.127), (24, 0.017), (26, 0.061), (28, 0.012), (35, 0.063), (42, 0.077), (48, 0.034), (70, 0.054), (85, 0.242), (88, 0.038), (90, 0.02), (95, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.8426764 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
2 0.79117173 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
Author: Shu Cai ; Kevin Knight
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.
3 0.73847532 6 acl-2013-A Java Framework for Multilingual Definition and Hypernym Extraction
Author: Stefano Faralli ; Roberto Navigli
Abstract: In this paper we present a demonstration of a multilingual generalization of Word-Class Lattices (WCLs), a supervised lattice-based model used to identify textual definitions and extract hypernyms from them. Lattices are learned from a dataset of automatically-annotated definitions from Wikipedia. We release a Java API for the programmatic use of multilingual WCLs in three languages (English, French and Italian), as well as a Web application for definition and hypernym extraction from user-provided sentences.
4 0.71184146 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features
Author: Nathan Gilbert ; Ellen Riloff
Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.
5 0.64277506 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
Author: Jinho D. Choi ; Andrew McCallum
Abstract: We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. Selectional branching is guaranteed to perform a fewer number of transitions than beam search yet performs as accurately. We also present a new transition-based dependency parsing algorithm that gives a complexity of O(n) for projective parsing and an expected linear time speed for non-projective parsing. With the standard setup, our parser shows an unlabeled attachment score of 92.96% and a parsing speed of 9 milliseconds per sentence, which is faster and more accurate than the current state-of-the-art transitionbased parser that uses beam search.
6 0.63657057 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
7 0.62503421 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
8 0.62333536 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
9 0.61413527 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
10 0.61263794 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
11 0.61162663 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
12 0.60848367 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
13 0.60662615 61 acl-2013-Automatic Interpretation of the English Possessive
14 0.60375607 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
15 0.60331625 225 acl-2013-Learning to Order Natural Language Texts
16 0.60258996 333 acl-2013-Summarization Through Submodularity and Dispersion
17 0.6001485 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
18 0.59925187 242 acl-2013-Mining Equivalent Relations from Linked Data
19 0.59770268 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
20 0.59625101 57 acl-2013-Arguments and Modifiers from the Learner's Perspective