acl acl2011 acl2011-300 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. [sent-3, score-0.208]
2 Shortest derivation parsing exhibits an unusual range of behaviors. [sent-4, score-0.222]
3 At one extreme, in the fully unpruned case, it is neither fast nor accurate. [sent-5, score-0.103]
4 At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. [sent-6, score-0.558]
5 Our analysis includes an investigation of tie-breaking and associated dynamic programs. [sent-7, score-0.023]
6 At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation. [sent-8, score-0.142]
7 1 Introduction One guiding intuition in parsing, and data-driven NLP more generally, is that, all else equal, it is advantageous to memorize large fragments of training examples. [sent-9, score-0.08]
8 Taken to the extreme, this intuition suggests shortest derivation parsing (SDP), wherein a test sentence is analyzed in a way which uses as few training fragments as possible (Bod, 2000; Goodman, 2003). [sent-10, score-0.452]
9 In this paper, we consider SDP in both its pure form and with several direct modifications, finding a range of behaviors. [sent-13, score-0.033]
10 In its pure form, with no pruning or approximation, SDP is neither fast nor accurate, achieving less than 70% F1 on the English WSJ 720 task. [sent-14, score-0.154]
11 Moreover, basic tie-breaking variants and lexical augmentation are insufficient to achieve competitive accuracies. [sent-15, score-0.066]
12 1 On the other hand, SDP is dramatically improved in both speed and accuracy when a simple, unlexicalized PCFG is used for coarseto-fine pruning (and tie-breaking). [sent-16, score-0.185]
13 On the English WSJ, the coarse PCFG and the fine SDP together achieve 87% F1 with basic treebank annotation (see – Table 2) and up to 90% F1 with richer treebank annotation (see Table 4). [sent-17, score-0.487]
14 The main contribution of this work is to analyze the behavior of shortest derivation parsing, showing both when it fails and when it succeeds. [sent-18, score-0.267]
15 Our final parser, which combines a simple PCFG coarse pass with an otherwise pure SPD fine pass, can be quite accurate while being straightforward to implement. [sent-19, score-0.408]
16 2 Implicit Grammar for SDP The all-fragments grammar (AFG) for a (binarized) treebank is formally the tree-substitution grammar (TSG) (Resnik, 1992; Bod, 1993) that consists of all fragments (elementary trees) of all training trees in the treebank, with some weighting on each fragment. [sent-20, score-0.296]
17 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 720–725, rameterized weighting of the substructures in their grammar in an effort to extend the original DOP1 model (Bod, 1993; Goodman, 1996a). [sent-26, score-0.072]
18 However, for SDP, the grammar is even simpler (Goodman, 2003). [sent-27, score-0.072]
19 In principle, the implicit SDP grammar needs just two rule schemas: CONTINUE (Xp → Yq Zr) and SWITCH (Xp → Xq), with additive →cost Ys 0 and 1, respectively. [sent-28, score-0.092]
20 CONTINUE rules walk along training trees, while SWITCH rules change between trees for a unit cost. [sent-29, score-0.062]
21 2 Assuming that the SWITCH rules are in practice broken down into BEGIN and END sub-rules as in Bansal and Klein (2010), the grammar is linear in the size of the treebank. [sent-30, score-0.112]
22 3 Note that no lexicon is needed in this grammar: lexical switches are like any other. [sent-31, score-0.098]
23 A derivation in our grammar has weight (cost) w where w is the number of switches (or the number of training fragments minus one) used to build the derivation (see Figure 1). [sent-32, score-0.46]
24 The Viterbi dynamic program for finding the shortest derivation is quite simple: it requires CKY to store only bytevalued switch-counts s(Xp, i,j) (i. [sent-33, score-0.291]
25 , the number of switches) for each chart item and compute the derivation with the least switch-count. [sent-35, score-0.142]
26 Specifically, in the dynamic program, if we use a SWITCH rule Xp → Xq, then we update s(Xp, i,j) := s(Xq, i,j) + 1. [sent-36, score-0.023]
27 If we use a continue rule Xp → Yq Zr, then the update is s(Xp, i,j) := s(Yq, i, k) + s(Zr, k, j), where k is a split point in the chart. [sent-37, score-0.021]
28 Using this dynamic program, we compute the exact shortest derivation parse in the full all-fragments grammar (which is reduced to a PCFG with 2 rules schemas as described above). [sent-38, score-0.423]
29 3 Basic SDP: Inaccurate and Slow SDP in its most basic form is appealingly simple, but has two serious issues: it is both slow and inaccurate. [sent-39, score-0.072]
30 Because there are millions of grammar 2This grammar is a very minor variant of the reduction of SDP suggested by Goodman (2003). [sent-40, score-0.144]
31 3For a compact WSJ training set with graph packing (see Bansal and Klein (2010)) and one level of parent annotation and markovization, our grammar has 0. [sent-41, score-0.132]
32 75 million binarized) explicitly-extracted fragments of just depth 1 and 2. [sent-44, score-0.08]
33 721 Figure 1: SDP - the best parse corresponds to the shortest derivation (fewest switches). [sent-45, score-0.249]
34 symbols, exact SDP parsing takes more than 45 seconds per sentence in our implementation (in addition to being highly memory-intensive). [sent-46, score-0.111]
35 Many methods exist for speeding up parsing through approxima- tion, but basic SDP is too inaccurate to merit them. [sent-47, score-0.175]
36 When implemented as described in Section 2, SDP achieves only 66% F1 on the WSJ task (dev set, ≤ 4ac0h words). [sent-48, score-0.027]
37 One reason for low accuracy may be that there are many shortest derivations, i. [sent-50, score-0.156]
38 derivations that are all built with the fewest number of fragments, and that tie breaking could be at fault. [sent-52, score-0.116]
39 To investigate this, we tried various methods for tie-breaking: FIRST/LAST (procedurally break ties), UNIFORM (sample derivations equally), FREQ (use the frequency of local rules). [sent-53, score-0.066]
40 In fact, even oracle tie-breaking, where ties are broken to favor the number of gold constituents in the derivation achieves only 80% F1, indicating that correct derivations are often not the shortest ones. [sent-55, score-0.361]
41 Another reason for the poor performance of SDP may be that the parameter-free treatment of the lexical layer is particularly pathological. [sent-56, score-0.02]
42 Indeed, this hypothesis is partially verified by the result that using a lexicon (similar to that in Petrov et al. [sent-57, score-0.026]
43 (2006)) at the terminal layer brings the uniform tie-breaking result up to 80% F1. [sent-58, score-0.02]
44 However, combining a lexicon with oracle tie-breaking yields only 81. [sent-59, score-0.026]
45 These results at first seem quite discouraging, but we will show that they can be easily improved with information from even a simple PCFG. [sent-61, score-0.019]
46 4 Improvements from a Coarse PCFG The additional information that makes shortest derivation parsing work comes from a coarse unlexicalized PCFG. [sent-62, score-0.558]
47 In the standard way, our PCFG consists of the local (depth-1) rules X → Y Z with probability P(Y Z|X) computed using tYhe Z co wuintht oprfo tbhaeb irluiltye aPn(dY Ythe Z count oomf pthutee ndo unstienrgmi thneal cXou nint the given treebank (no smoothing was used). [sent-63, score-0.076]
48 Our coarse grammar uses a lexicon with unknown word classes, similar to that in Petrov et al. [sent-64, score-0.288]
49 When filtered by a coarse PCFG pass, however, SDP becomes both fast and accurate, even for the basic, lexicon-free SDP formulation. [sent-67, score-0.214]
50 Summed marginals (posteriors) are computed in the coarse PCFG and used for pruning and tie-breaking in the SDP chart, as described next. [sent-68, score-0.287]
51 If a particular base symbol X is pruned by the PCFG coarse pass for a particular span (i, j) (i. [sent-71, score-0.329]
52 r,t tahine threshold), athrgenin ainl tPhe(X Xfu,lil, SjD|sP) pass we adon not allow building any indexed symbol Xl of type X for span (i, j). [sent-75, score-0.101]
53 In all our pruning-based experiments, we use a log posterior threshold of −3. [sent-76, score-0.037]
54 We also use the PCFG coarse pass for tiebreaking. [sent-78, score-0.273]
55 During Viterbi shortest-derivation parsing (after coarse-pruning), if two derivations have the same cost (i. [sent-79, score-0.127]
56 , the number of switches), then we break the tie between them by choosing the derivation which has a higher sum of coarse posteriors (i. [sent-81, score-0.404]
57 , the sum of the coarse PCFG chart-cell posteriors P(X, i,j |s) used to build the derivation). [sent-83, score-0.244]
58 4 tTehreio coarse P,Ci,jFG|s )h uasse an extremely e b denereivficaitaiol i)n. [sent-84, score-0.19]
59 teraction with the fine all-fragments SDP grammar, wherein the accuracy of the combined grammars is significantly higher than either individually (see 4This is similar to the maximum recall objective for approximate inference (Goodman, 1996b). [sent-85, score-0.124]
60 9 Table 1: Accuracy (F1) and exact match (EX) for Bansal and Klein (2010). [sent-95, score-0.031]
61 The pruned row shows their original results with coarse-to-fine pruning. [sent-96, score-0.056]
62 The unpruned row shows new results for an unpruned version of their parser; these accuracies are very similar to their pruned counterparts. [sent-97, score-0.239]
63 In addition, the speed of parsing and memory-requirements improve by more than an order of magnitude over the exact SDP pass alone. [sent-99, score-0.218]
64 It is perhaps surprising that coarse-pass pruning improves accuracy by such a large amount for SDP. [sent-100, score-0.122]
65 To check one such result, we ran the full, weighted AFG construction of Bansal and Klein (2010) without any pruning (using the maximum recall objective as they did). [sent-102, score-0.097]
66 Their results hold up without pruning: the results of the unpruned version are only around 0. [sent-103, score-0.1]
67 5% less (in parsing F1) than the results achieved with pruning (see Table 1). [sent-104, score-0.177]
68 However, in the case of our shortest-derivation parser, the coarse-pass is essential for high accuracies (and for speed and memory, as always). [sent-105, score-0.049]
69 5 Results We have seen that basic, unpruned SDP is both slow and inaccurate, but improves greatly when complemented by a coarse PCFG pass; these results are shown in Table 2. [sent-106, score-0.296]
70 Shortest derivation parsing with a PCFG coarse-pass (PCFG+SDP) achieves an accuracy of nearly 87% F1 (on the WSJ test set, ≤ 40 rwacoryd sentences), w%h Fic1h iosn significantly higher ≤th 4an0 the accuracy of the PCFG or SDP alone. [sent-107, score-0.275]
71 5 When the coarse PCFG is combined with basic SDP, the majority of the improvement comes from pruning with the coarse-posteriors; tie-breaking with coarseposteriors contributes around 0. [sent-108, score-0.353]
72 5PCFG+SDP accuracies are around 3% higher in F1 and 10% higher in EX than the PCFG-only accuracies. [sent-110, score-0.046]
73 PCFG results are with one level of parent annotation and horizontal markovization. [sent-119, score-0.092]
74 See end of Section 5 for a comparison to other parsing approaches. [sent-121, score-0.08]
75 Figure 2 shows the number offragments for shortest derivation parsing (averaged for each sentence length). [sent-122, score-0.329]
76 Note that the number of fragments is of course greater for the combined PCFG+SDP model than the exact basic SDP model (which is guaranteed to be minimal). [sent-123, score-0.156]
77 This result provides some analysis of how coarse-pruning helps SDP: it illustrates that the coarse-pass filters out certain short but inaccurate derivations (that the minimal SDP on its own is forced to choose) to improve performance. [sent-124, score-0.122]
78 Figure 3 shows the parsing accuracy of the PCFG+SDP model for various pruning thresholds in coarse-to-fine pruning. [sent-125, score-0.221]
79 Note how this is different from the standard coarse-pass pruning graphs (see Charniak et al. [sent-126, score-0.097]
80 In contrast, coarse-pass pruning provides large accuracy benefits here, perhaps because of the unusual complementarity of the two grammars (typical coarse passes are designed to be as similar as possible to their fine counterparts, even explicitly so in Petrov and Klein (2007)). [sent-128, score-0.436]
81 Our PCFG+SDP parser is more accurate than recent sampling-based TSG’s (Post and Gildea, 2009; Cohn and Blunsom, 2010), who achieve 83-85% F1, and it is competitive with more complex weightedfragment approaches. [sent-129, score-0.087]
82 6 See Bansal and Klein (2010) for a more thorough comparison to other parsing work. [sent-130, score-0.08]
83 In addition to being accurate, the PCFG+SDP parser is simple and fast, requiring negligible train- ing and tuning. [sent-131, score-0.039]
84 It takes 2 sec/sentence, less than 2 GB of memory and is written in less than 2000 lines 6Bansal and Klein (2010) achieve around 1. [sent-132, score-0.021]
85 0% higher F1 than our results without a lexicon (character-level parsing) and 1. [sent-133, score-0.026]
86 723 25 0 4 8 12 16 20 24 sentence length 28 32 36 40 Figure 2: The average number of fragments in shortest derivation parses, computed using the basic version (SDP) and the pruned version (PCFG+SDP), for WSJ dev-set (≤ 40 words). [sent-135, score-0.43]
87 0 -3 -5 -7 -9 -11 -13 -15 Coarse-pass Log Posterior Threshold (PT) -17 Figure 3: Parsing accuracy for various coarse-pass pruning thresholds (on WSJ dev-set ≤ 40 words). [sent-142, score-0.141]
88 1 Other Treebanks One nice property of the parameter-free, all- fragments SDP approach is that we can easily transfer it to any new domain with a treebank, or any new annotation of an existing treebank. [sent-148, score-0.111]
89 For German, our parser outperforms Dubey (2005) and we are not far behind latentvariable parsers, for which parsing is substantially datasets. [sent-150, score-0.119]
90 8 7These statistics can be further improved with standard parsing micro-optimization. [sent-151, score-0.08]
91 8See Gildea (2001) and Petrov and Klein (2007) for the exact experimental setup that we followed here. [sent-152, score-0.031]
92 8 Table 4: Results with richer WSJ-annotations from Stanford and Berkeley parsers. [sent-171, score-0.026]
93 2 Treebank Annotations PCFG+SDP achieves 87% F1 on the English WSJ task using basic annotation only (i. [sent-174, score-0.103]
94 Table 4 shows that by pre-transforming the WSJ treebank with richer annotation from previous work, we can obtain state-of-the-art accuracies of up to 90% F1 with no change to our simple parser. [sent-177, score-0.136]
95 In STAN-ANNOTATION, we annotate the treebank symbols with annotations from the Stanford parser (Klein and Manning, 2003). [sent-178, score-0.119]
96 In BERKANNOTATION, we annotate with the splits learned via hard-EM and 5 split-merge rounds of the Berkeley parser (Petrov et al. [sent-179, score-0.039]
97 6 Conclusion Our investigation of shortest-derivation parsing showed that, in the exact case, SDP performs poorly. [sent-181, score-0.111]
98 When pruned (and, to a much lesser extent, tiebroken) by a coarse PCFG, however, it is competitive with a range of other, more complex techniques. [sent-182, score-0.267]
99 An advantage of this approach is that the fine SDP pass is actually quite simple compared to typical fine passes, while still retaining enough complementarity to the coarse PCFG to increase final accuracies. [sent-183, score-0.429]
100 What to do when lexicalization fails: parsing German In ACL ’05. [sent-225, score-0.08]
wordName wordTfidf (topN-words)
[('sdp', 0.8), ('pcfg', 0.288), ('coarse', 0.19), ('shortest', 0.131), ('bansal', 0.131), ('derivation', 0.118), ('pruning', 0.097), ('pass', 0.083), ('wsj', 0.082), ('fragments', 0.08), ('parsing', 0.08), ('klein', 0.079), ('unpruned', 0.079), ('xp', 0.073), ('switches', 0.072), ('grammar', 0.072), ('petrov', 0.071), ('bod', 0.069), ('alel', 0.057), ('xq', 0.057), ('yq', 0.057), ('zr', 0.057), ('pruned', 0.056), ('fine', 0.056), ('posteriors', 0.054), ('treebank', 0.054), ('inaccurate', 0.05), ('switch', 0.05), ('goodman', 0.05), ('derivations', 0.047), ('fewest', 0.046), ('rens', 0.046), ('basic', 0.045), ('wherein', 0.043), ('markovization', 0.041), ('unlexicalized', 0.039), ('parser', 0.039), ('sima', 0.035), ('binarized', 0.034), ('afg', 0.033), ('pure', 0.033), ('horizontal', 0.032), ('annotation', 0.031), ('exact', 0.031), ('gildea', 0.03), ('charniak', 0.03), ('tsg', 0.029), ('parent', 0.029), ('dop', 0.027), ('accurate', 0.027), ('slow', 0.027), ('achieves', 0.027), ('extreme', 0.027), ('symbols', 0.026), ('cohn', 0.026), ('schemas', 0.026), ('lexicon', 0.026), ('richer', 0.026), ('accuracies', 0.025), ('slav', 0.025), ('complementarity', 0.025), ('mohit', 0.025), ('minimal', 0.025), ('accuracy', 0.025), ('joshua', 0.025), ('unusual', 0.024), ('post', 0.024), ('fast', 0.024), ('german', 0.024), ('speed', 0.024), ('chart', 0.024), ('dynamic', 0.023), ('tie', 0.023), ('rules', 0.022), ('competitive', 0.021), ('around', 0.021), ('berkeley', 0.021), ('continue', 0.021), ('ties', 0.02), ('chicago', 0.02), ('layer', 0.02), ('implicit', 0.02), ('blunsom', 0.019), ('thresholds', 0.019), ('quite', 0.019), ('passes', 0.019), ('threshold', 0.019), ('break', 0.019), ('trees', 0.018), ('indexed', 0.018), ('collins', 0.018), ('broken', 0.018), ('posterior', 0.018), ('fails', 0.018), ('ex', 0.018), ('dan', 0.017), ('spd', 0.017), ('discouraging', 0.017), ('twhee', 0.017), ('ttehreio', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
2 0.13809668 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
3 0.11776032 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging
Author: Michael Auli ; Adam Lopez
Abstract: We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A* parsing on the more realistic but more stringent metric of CPU time. By itself, A* substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A* with adaptive supertagging decreases CPU time by 15% for our best model.
4 0.11400967 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark
Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.
5 0.093844227 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark
Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.
6 0.078890689 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
7 0.072401442 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
8 0.064919643 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
9 0.063422069 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
10 0.062016107 282 acl-2011-Shift-Reduce CCG Parsing
11 0.061378639 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
12 0.058013622 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
13 0.056324121 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
14 0.054746531 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
15 0.054124661 330 acl-2011-Using Derivation Trees for Treebank Error Detection
16 0.053830791 333 acl-2011-Web-Scale Features for Full-Scale Parsing
17 0.046837289 30 acl-2011-Adjoining Tree-to-String Translation
18 0.046746779 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
19 0.04633918 61 acl-2011-Binarized Forest to String Translation
20 0.045963921 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach
topicId topicWeight
[(0, 0.109), (1, -0.06), (2, -0.008), (3, -0.143), (4, -0.034), (5, -0.022), (6, -0.059), (7, 0.017), (8, -0.027), (9, -0.039), (10, -0.024), (11, 0.034), (12, 0.006), (13, 0.011), (14, -0.005), (15, 0.035), (16, 0.021), (17, 0.014), (18, 0.046), (19, -0.024), (20, 0.07), (21, 0.021), (22, 0.001), (23, -0.054), (24, -0.039), (25, 0.106), (26, 0.031), (27, -0.004), (28, 0.038), (29, -0.036), (30, -0.049), (31, 0.026), (32, -0.061), (33, -0.046), (34, 0.029), (35, -0.017), (36, -0.031), (37, -0.053), (38, -0.034), (39, -0.064), (40, 0.007), (41, 0.017), (42, -0.005), (43, 0.049), (44, 0.035), (45, 0.017), (46, 0.015), (47, -0.037), (48, 0.094), (49, -0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.93657309 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
2 0.76664317 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark
Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.
3 0.76417303 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark
Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.
4 0.66461158 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
Author: Antske Fokkens
Abstract: When designing grammars of natural language, typically, more than one formal analysis can account for a given phenomenon. Moreover, because analyses interact, the choices made by the engineer influence the possibilities available in further grammar development. The order in which phenomena are treated may therefore have a major impact on the resulting grammar. This paper proposes to tackle this problem by using metagrammar development as a methodology for grammar engineering. Iargue that metagrammar engineering as an approach facilitates the systematic exploration of grammars through comparison of competing analyses. The idea is illustrated through a comparative study of auxiliary structures in HPSG-based grammars for German and Dutch. Auxiliaries form a central phenomenon of German and Dutch and are likely to influence many components of the grammar. This study shows that a special auxiliary+verb construction significantly improves efficiency compared to the standard argument-composition analysis for both parsing and generation.
5 0.6460498 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
6 0.64583069 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
7 0.63378143 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging
8 0.61485642 298 acl-2011-The ACL Anthology Searchbench
9 0.56337219 267 acl-2011-Reversible Stochastic Attribute-Value Grammars
10 0.53691345 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser
11 0.53675234 282 acl-2011-Shift-Reduce CCG Parsing
12 0.53420371 330 acl-2011-Using Derivation Trees for Treebank Error Detection
13 0.53320318 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
14 0.53007931 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing
15 0.49168944 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach
17 0.45060498 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
18 0.45012316 195 acl-2011-Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
19 0.43270352 192 acl-2011-Language-Independent Parsing with Empty Elements
20 0.40952075 239 acl-2011-P11-5002 k2opt.pdf
topicId topicWeight
[(5, 0.063), (17, 0.043), (26, 0.042), (28, 0.033), (37, 0.106), (39, 0.105), (41, 0.056), (47, 0.206), (55, 0.036), (59, 0.029), (72, 0.02), (91, 0.04), (96, 0.124)]
simIndex simValue paperId paperTitle
same-paper 1 0.81768203 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
2 0.75485694 85 acl-2011-Coreference Resolution with World Knowledge
Author: Altaf Rahman ; Vincent Ng
Abstract: While world knowledge has been shown to improve learning-based coreference resolvers, the improvements were typically obtained by incorporating world knowledge into a fairly weak baseline resolver. Hence, it is not clear whether these benefits can carry over to a stronger baseline. Moreover, since there has been no attempt to apply different sources of world knowledge in combination to coreference resolution, it is not clear whether they offer complementary benefits to a resolver. We systematically compare commonly-used and under-investigated sources of world knowledge for coreference resolution by applying them to two learning-based coreference models and evaluating them on documents annotated with two different annotation schemes.
3 0.68789291 182 acl-2011-Joint Annotation of Search Queries
Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith
Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.
4 0.68580747 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
Author: Jacob Eisenstein ; Noah A. Smith ; Eric P. Xing
Abstract: We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ‘1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.
5 0.68559563 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
Author: Ryo Nagata ; Edward Whittaker ; Vera Sheinman
Abstract: The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POStagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.
6 0.68558043 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
7 0.6806494 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
8 0.67848647 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
9 0.67661768 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
10 0.67659378 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
11 0.67488557 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
12 0.67335057 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
13 0.67236447 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
14 0.67167997 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
15 0.67158401 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
16 0.67148238 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
17 0.67061794 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
18 0.66738671 192 acl-2011-Language-Independent Parsing with Empty Elements
19 0.66735351 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
20 0.66580331 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation