acl acl2011 acl2011-188 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
Reference: text
sentIndex sentText sentNum sentScore
1 Such features outperform n-gram features and various model scores by a wide margin. [sent-2, score-0.11]
2 Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. [sent-3, score-0.181]
3 Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. [sent-4, score-0.215]
4 9% in discriminating between grammatical text and samples from an n-gram language model. [sent-6, score-0.217]
5 — — 1 Introduction The task of a language model is to provide a measure of the grammaticality of a sentence. [sent-7, score-0.218]
6 Language models are useful in a variety of settings, for both human and machine output; for example, in the automatic grading of essays, or in guiding search in a machine translation system. [sent-8, score-0.034]
7 The simplest models, n-grams, are self-evidently poor models of language, unable to (easily) capture or enforce long-distance linguistic phenomena. [sent-10, score-0.067]
8 As a result, the output of such text generation systems is often very poor grammatically, even if it is understandable. [sent-12, score-0.033]
9 Since grammaticality judgments are a matter of the syntax of a language, the obvious approach for modeling grammaticality is to start with the extensive work produced over the past two decades in the field of parsing. [sent-13, score-0.48]
10 This paper demonstrates the utility of local features derived from the fragments of tree substitution grammar derivations. [sent-14, score-0.869]
11 Following Cherry and Quirk (2008), we conduct experiments in a classification setting, where the task is to distinguish between real text and “pseudo-negative” text obtained by sampling from a trigram language model (Okanohara and Tsujii, 2007). [sent-15, score-0.072]
12 Our primary points of comparison are the latent SVM training of Cherry and Quirk (2008), mentioned above, and the extensive set of local and nonlocal feature templates developed by Charniak and Johnson (2005) for parse tree reranking. [sent-16, score-0.263]
13 In contrast to this latter set of features, the feature sets from TSG derivations require no engineering; instead, they are obtained directly from the identity of the fragments used in the derivation, plus simple statistics computed over them. [sent-17, score-0.573]
14 Since these fragments are in turn learned automatically from a Treebank with a Bayesian model, their usefulness here suggests a greater potential for adapting to other languages and datasets. [sent-18, score-0.441]
15 Evaluated by parsing accuracy, these grammars are well below state of the art. [sent-50, score-0.178]
16 Larger fragments better match linguists’ intuitions about what the basic units of grammar should be, capturing, for example, the predicate-argument structure of a verb (Figure 1). [sent-52, score-0.594]
17 The grammars are context-free and thus retain cubic-time inference procedures, yet they reduce the independence assumptions in the model’s generative story by virtue of using fewer fragments (compared to a standard CFG) to generate a tree. [sent-53, score-0.58]
18 3 A spectrum of grammaticality The use of large fragments in TSG grammar derivations provides reason to believe that such grammars might do a better job at language modeling tasks. [sent-54, score-1.12]
19 Consider an extreme case, in which a grammar consists entirely of complete parse trees. [sent-55, score-0.228]
20 In this case, ungrammaticality is synonymous with parser failure. [sent-56, score-0.052]
21 On the other extreme, a context-free grammar containing only depth-one rules can basically produce an analysis over any sequence of words. [sent-58, score-0.193]
22 However, such grammars are notoriously leaky, and the existence of an analysis does not correlate with grammaticality. [sent-59, score-0.139]
23 Context-free grammars are too poor models of language for the linguistic definition of grammaticality (a sequence of words in the language of the grammar) to apply. [sent-60, score-0.424]
24 TSGs permit us to posit a spectrum of grammaticality in between these two extremes. [sent-61, score-0.324]
25 If we have a grammar comprising small and large fragments, we might consider that larger fragments should be less likely to fit into ungrammatical situations, whereas small fragments could be employed almost anywhere as a sort of ungrammatical glue. [sent-62, score-1.458]
26 Thus, on average, grammatical sentences will license deriva218 tions with larger fragments, whereas ungrammatical sentences will be forced to resort to small fragments. [sent-63, score-0.452]
27 This raises the question of what exactly the larger fragments are. [sent-65, score-0.408]
28 A fundamental problem with TSGs is that they are hard to learn, since there is no annotated corpus of TSG derivations and the number of possible derivations is exponential in the size of a tree. [sent-66, score-0.303]
29 The most popular TSG approach has been DataOriented Parsing (Scha, 1990; Bod, 1993), which takes all fragments in the training data. [sent-67, score-0.408]
30 The large size of such grammars (exponential in the size of the training data) forces either implicit representations (Goodman, 1996; Bansal and Klein, 2010) which do not permit arbitrary probability distributions over the grammar fragments or explicit approximations to all fragments (Bod, 2001). [sent-68, score-1.27]
31 Of these approaches, work in Bayesian learning of TSGs produces intuitive grammars in a principled way, and has demonstrated potential in language modeling tasks (Post and Gildea, 2009b; Post, 2010). [sent-71, score-0.216]
32 4 Experiments We experiment with a binary classification task, defined as follows: given a sequence of words, determine whether it is grammatical or not. [sent-73, score-0.147]
33 , 1993), and the BLLIP ’99 dataset,1 a collection of automatically-parsed sentences from three years of articles from the Wall Street Journal. [sent-75, score-0.036]
34 For both datasets, positive examples are obtained from the leaves of the parse trees, retaining their tokenization. [sent-76, score-0.12]
35 Negative examples were produced from a trigram language model by randomly generating sentences of length no more than 100 so as to match the size of the positive data. [sent-77, score-0.165]
36 The average sentence lengths for the positive and negative data were 23. [sent-79, score-0.078]
37 Each set of sentences is evenly split between positive and negative examples. [sent-85, score-0.114]
38 For the BLLIP dataset, we followed Cherry and Quirk (2008): we randomly selected 450K sentences to train the n-gram language model, and 50K, 3K, and 3K sentences for classifier training, development, and testing, respectively. [sent-92, score-0.072]
39 Table 1 contains statistics of the datasets used in our experiments. [sent-94, score-0.041]
40 1 Base models and features Our experiments compare a number of different feature sets. [sent-101, score-0.13]
41 Central to these feature sets are features computed from the output of four language models. [sent-102, score-0.096]
42 Bigram and trigram language models (the same ones used to generate the negative data) 2. [sent-104, score-0.116]
43 A Bayesian-learned tree substitution grammar (Post and Gildea, 2009a)2 2The sampler was run with the default settings for 1,000 iterations, and a grammar of 192,667 fragments was then extracted from counts taken from every 10th iteration between iterations 500 and 1,000, inclusive. [sent-106, score-0.958]
44 The Charniak parser (Charniak, 2000), run in language modeling mode The parsing models for both datasets were built from sections 2 - 21 of the WSJ portion of the Treebank. [sent-110, score-0.21]
45 These models were used to score or parse the training, development, and test data for the classifier. [sent-111, score-0.083]
46 From the output, we extract the following feature sets used in the classifier. [sent-112, score-0.041]
47 These are counter features bRauseled on tthuree easto (mRi). [sent-117, score-0.055]
48 From the Char- nRiaerka parser output we (eCx&traJc;). [sent-122, score-0.052]
49 t tFhreo complete saretof reranking features of Charniak and Johnson (2005), and just the local ones (C&J; local). [sent-123, score-0.164]
50 Instances of this featFurreo nctliaesrs scizouen (tF the, Fnumber of TSG fragments having frontier size n, 1 ≤ n ≤ 9. [sent-125, score-0.532]
51 We experimented with SVM classifiers instead of maximum entropy, and the only real change across all the models was for these first five models, which saw classification rise to 55 to 60%. [sent-131, score-0.067]
52 On the BLLIP dataset, the C&J; feature sets perform the best, even when the set of features is restricted to local ones. [sent-132, score-0.143]
53 The classifiers with TSG features outperform all the other models. [sent-134, score-0.055]
54 The (near)-perfect performance of the TSG models on the Treebank is a result of the large number of features relative to the size of the training data: 3Local features can be computed in a bottom-up manner. [sent-135, score-0.199]
55 r of terminals and nonterminals among its leaves, also known its rank. [sent-140, score-0.036]
56 For example, the fragment in Figure 1has a frontier size of 5. [sent-141, score-0.218]
57 feature setTreebankBLLIP feature set l+ R3 l+ RP l+ RT l+ C&J; (local) l+ C&J; l + RT+ F∗ Treebank BLLIP 18K 15K 14K 24K 58K 14K 122K 11K 60K 607K 959K 60K Table 3: Model size. [sent-142, score-0.082]
58 the positive and negative data really do evince different fragments, and there are enough such features relative to the size of the training data that very high weights can be placed on them. [sent-143, score-0.188]
59 Despite having more features available, the Charniak & Johnson feature set has significantly lower accuracy on the Treebank data, which suggests that the TSG features are more strongly associated with a particular (positive or negative) outcome. [sent-145, score-0.151]
60 For comparison, Cherry and Quirk (2008) report a classification accuracy of 81. [sent-146, score-0.033]
61 220 5 Analysis Table 4 lists the highest-weighted TSG features associated with each outcome, taken from the BLLIP model in the last row of Table 2. [sent-149, score-0.055]
62 The learned weights accord with the intuitions presented in Section 3. [sent-150, score-0.043]
63 Ungrammatical sentences use smaller, abstract (unlexicalized) rules, whereas grammatical sentences use higher rank rules and are more lexicalized. [sent-151, score-0.269]
64 Looking at the fragments themselves, we see that sensible patterns such as balanced parenthetical expressions or verb predicate-argument structures are associated with grammaticality, while many of the ungrammatical fragments contain unbalanced quotations and unlikely configurations. [sent-152, score-1.049]
65 Table 5 contains the most probable depth-one rules for each outcome. [sent-153, score-0.05]
66 The unary rules associated with ungrammatical sentences show some interesting patterns. [sent-154, score-0.319]
67 For example, the rule NP → DT occurs 2,344 tttiemrness. [sent-155, score-0.04]
68 i Fno othr eex training portion NofP t →he DTrTee obcacunkrs. [sent-157, score-0.066]
69 Most of these occurrences are in subject settings over articles that aren’t required to modify a noun, such as that, some, this, and all. [sent-158, score-0.048]
70 However, in the BLLIP n-gram data, this rule is used over the definite article the 465 times the second-most common use. [sent-159, score-0.04]
71 Yet this rule occurs only nine times in the Treebank where the grammar was learned. [sent-160, score-0.183]
72 The small fragment size, together with the coarseness of the nonterminal, permit the fragment to be used in distributional settings where it should not be licensed. [sent-161, score-0.331]
73 This suggests some complementarity between fragment learning and work in using nonterminal refinements (Johnson, 1998; Petrov et al. [sent-162, score-0.128]
74 – 6 Related work Past approaches using parsers as language models in discriminative settings have seen varying degrees of success. [sent-164, score-0.175]
75 (2004) found that the score of a bilexicalized parser was not useful in distinguishing machine translation (MT) output from human reference translations. [sent-166, score-0.052]
76 Cherry and Quirk (2008) addressed this problem by using a latent SVM to adjust the CFG rule weights such that the parser score was a much more useful discriminator be- tween grammatical text and n-gram samples. [sent-167, score-0.206]
77 (2007) also addressed this problem by combining scores from different parsers using an SVM and showed an improved metric of fluency. [sent-169, score-0.044]
78 grammaticalungrammatical Outside of MT, Foster and Vogel (2004) argued for parsers that do not assume the grammaticality of their input. [sent-170, score-0.338]
79 (2007) used a set of templates to extract labeled sequential part-of-speech patterns together with some other linguistic features) which were then used in an SVM setting to classify sentences in Japanese and Chinese learners’ English corpora. [sent-172, score-0.071]
80 (2009) and Foster and Andersen (2009) attempt finer-grained, more realistic (and thus more difficult) classifications against ungrammatical text modeled on the sorts of mistakes made by language learners using parser probabilities. [sent-174, score-0.412]
81 More recently, some researchers have shown that using features of parse trees (such as the rules 221 grammaticalungrammatical ( PSNW BRPH ATJN RabP NlWe)NCW5HDSP:)NHTPiNg Sh)est-wigh( tNeTS dOPN VdP DUe)pFTNthR)J-KAonGCe)AruPlSes). [sent-175, score-0.23]
82 7 Summary Parsers were designed to discriminate among structures, whereas language models discriminate among strings. [sent-177, score-0.155]
83 Small fragments, abstract rules, independence assumptions, and errors or peculiarities in the training corpus allow probable structures to be produced over ungrammatical text when using models that were optimized for parser accuracy. [sent-178, score-0.352]
84 The experiments in this paper demonstrate the utility of tree-substitution grammars in discriminating between grammatical and ungrammatical sentences. [sent-179, score-0.589]
85 Features are derived from the identities of the fragments used in the derivations above a sequence of words; particular fragments are associated with each outcome, and simple statistics computed over those fragments are also useful. [sent-180, score-1.348]
86 The most complicated aspect of using TSGs is grammar learning, for which there are publicly available tools. [sent-181, score-0.143]
87 Looking forward, we believe there is significant potential for TSGs in more subtle discriminative tasks, for example, in discriminating between finer grained and more realistic grammatical errors (Foster and Vogel, 2004; Wagner et al. [sent-182, score-0.334]
88 , 2009), or in discriminating among translation candidates in a machine translation framework. [sent-183, score-0.103]
89 In another line of potential work, it could prove useful to incorporate into the grammar learning procedure some knowledge of the sorts of fragments and features shown here to be helpful for discriminating grammatical and ungrammatical text. [sent-184, score-1.147]
90 What is the minimal set of fragments that achieves maximal parse accuracy? [sent-197, score-0.457]
91 Gen- errate: generating errors for use in grammatical error detection. [sent-233, score-0.114]
92 Good reasons for noting bad grammar: Constructing a corpus of ungrammatical language. [sent-238, score-0.233]
wordName wordTfidf (topN-words)
[('fragments', 0.408), ('tsg', 0.288), ('bllip', 0.259), ('tsgs', 0.233), ('ungrammatical', 0.233), ('grammaticality', 0.218), ('grammar', 0.143), ('grammars', 0.139), ('post', 0.13), ('substitution', 0.125), ('derivations', 0.124), ('charniak', 0.12), ('cherry', 0.119), ('grammatical', 0.114), ('foster', 0.107), ('discriminating', 0.103), ('fragment', 0.094), ('wagner', 0.092), ('tree', 0.091), ('treebank', 0.088), ('quirk', 0.086), ('grammaticalungrammatical', 0.076), ('mutton', 0.076), ('okanohara', 0.076), ('matt', 0.072), ('johnson', 0.071), ('frontier', 0.069), ('reranking', 0.062), ('permit', 0.062), ('rens', 0.061), ('gildea', 0.061), ('sorts', 0.058), ('jennifer', 0.056), ('size', 0.055), ('features', 0.055), ('svm', 0.055), ('dras', 0.055), ('bod', 0.055), ('bansal', 0.052), ('parser', 0.052), ('rules', 0.05), ('parse', 0.049), ('discriminative', 0.049), ('settings', 0.048), ('sima', 0.047), ('zollmann', 0.047), ('local', 0.047), ('bayesian', 0.046), ('judging', 0.046), ('eugene', 0.045), ('spectrum', 0.044), ('discriminate', 0.044), ('modeling', 0.044), ('parsers', 0.044), ('intuitions', 0.043), ('negative', 0.043), ('outcome', 0.042), ('usa', 0.042), ('datasets', 0.041), ('feature', 0.041), ('rule', 0.04), ('joshi', 0.04), ('wong', 0.039), ('parsing', 0.039), ('trigram', 0.039), ('cfg', 0.038), ('procedures', 0.038), ('fan', 0.037), ('sentences', 0.036), ('nonterminals', 0.036), ('extreme', 0.036), ('leaves', 0.036), ('rt', 0.036), ('cohn', 0.035), ('realistic', 0.035), ('vogel', 0.035), ('templates', 0.035), ('positive', 0.035), ('models', 0.034), ('pcfg', 0.034), ('nonterminal', 0.034), ('learners', 0.034), ('independence', 0.033), ('andersen', 0.033), ('istein', 0.033), ('coarseness', 0.033), ('zhongyang', 0.033), ('parsimonious', 0.033), ('waikiki', 0.033), ('eex', 0.033), ('feor', 0.033), ('joachim', 0.033), ('othr', 0.033), ('scha', 0.033), ('whistler', 0.033), ('poor', 0.033), ('classification', 0.033), ('acl', 0.033), ('potential', 0.033), ('whereas', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
2 0.13950071 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose a model that incorporates an insertion operator in Bayesian tree substitution grammars (BTSG). Tree insertion is helpful for modeling syntax patterns accurately with fewer grammar rules than BTSG. The experimental parsing results show that our model outperforms a standard PCFG and BTSG for a small dataset. For a large dataset, our model obtains comparable results to BTSG, making the number of grammar rules much smaller than with BTSG.
3 0.13809668 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
4 0.12882377 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
5 0.12669992 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
6 0.1225302 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
7 0.1173248 30 acl-2011-Adjoining Tree-to-String Translation
8 0.11140017 61 acl-2011-Binarized Forest to String Translation
9 0.10596572 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
10 0.10581459 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
11 0.10550055 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
12 0.10410403 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
13 0.10018903 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
14 0.10016404 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
15 0.097124241 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
16 0.09557277 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
17 0.095175557 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
18 0.092535488 282 acl-2011-Shift-Reduce CCG Parsing
19 0.087241359 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
20 0.085522339 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
topicId topicWeight
[(0, 0.232), (1, -0.101), (2, 0.014), (3, -0.135), (4, -0.021), (5, -0.009), (6, -0.082), (7, -0.004), (8, -0.046), (9, -0.057), (10, -0.091), (11, -0.026), (12, -0.024), (13, 0.072), (14, 0.015), (15, 0.05), (16, -0.004), (17, 0.031), (18, 0.06), (19, -0.029), (20, 0.023), (21, 0.019), (22, -0.029), (23, -0.026), (24, 0.028), (25, 0.079), (26, 0.004), (27, -0.039), (28, 0.059), (29, -0.081), (30, -0.043), (31, 0.005), (32, -0.065), (33, 0.007), (34, 0.017), (35, -0.02), (36, 0.0), (37, -0.101), (38, -0.007), (39, 0.045), (40, -0.021), (41, 0.021), (42, -0.014), (43, 0.01), (44, -0.052), (45, -0.026), (46, 0.16), (47, -0.022), (48, 0.09), (49, -0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.93813151 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
2 0.81980342 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
Author: Hiroyuki Shindo ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose a model that incorporates an insertion operator in Bayesian tree substitution grammars (BTSG). Tree insertion is helpful for modeling syntax patterns accurately with fewer grammar rules than BTSG. The experimental parsing results show that our model outperforms a standard PCFG and BTSG for a small dataset. For a large dataset, our model obtains comparable results to BTSG, making the number of grammar rules much smaller than with BTSG.
3 0.7863276 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
4 0.73870176 330 acl-2011-Using Derivation Trees for Treebank Error Detection
Author: Seth Kulick ; Ann Bies ; Justin Mott
Abstract: This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.
5 0.72238034 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
Author: Antske Fokkens
Abstract: When designing grammars of natural language, typically, more than one formal analysis can account for a given phenomenon. Moreover, because analyses interact, the choices made by the engineer influence the possibilities available in further grammar development. The order in which phenomena are treated may therefore have a major impact on the resulting grammar. This paper proposes to tackle this problem by using metagrammar development as a methodology for grammar engineering. Iargue that metagrammar engineering as an approach facilitates the systematic exploration of grammars through comparison of competing analyses. The idea is illustrated through a comparative study of auxiliary structures in HPSG-based grammars for German and Dutch. Auxiliaries form a central phenomenon of German and Dutch and are likely to influence many components of the grammar. This study shows that a special auxiliary+verb construction significantly improves efficiency compared to the standard argument-composition analysis for both parsing and generation.
6 0.63556188 267 acl-2011-Reversible Stochastic Attribute-Value Grammars
7 0.6228286 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
8 0.61550236 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
9 0.61103463 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
11 0.60700035 30 acl-2011-Adjoining Tree-to-String Translation
12 0.60274786 154 acl-2011-How to train your multi bottom-up tree transducer
13 0.6004082 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
14 0.5973233 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
15 0.59538382 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
16 0.58789241 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
17 0.5675959 298 acl-2011-The ACL Anthology Searchbench
18 0.55124265 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
19 0.54890025 44 acl-2011-An exponential translation model for target language morphology
20 0.5478707 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
topicId topicWeight
[(5, 0.035), (17, 0.058), (26, 0.032), (28, 0.234), (37, 0.119), (39, 0.068), (41, 0.077), (55, 0.033), (59, 0.036), (72, 0.046), (91, 0.032), (96, 0.152), (97, 0.011)]
simIndex simValue paperId paperTitle
1 0.84245396 267 acl-2011-Reversible Stochastic Attribute-Value Grammars
Author: Daniel de Kok ; Barbara Plank ; Gertjan van Noord
Abstract: An attractive property of attribute-value grammars is their reversibility. Attribute-value grammars are usually coupled with separate statistical components for parse selection and fluency ranking. We propose reversible stochastic attribute-value grammars, in which a single statistical model is employed both for parse selection and fluency ranking.
same-paper 2 0.82553965 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
Author: Matt Post
Abstract: In this paper, we show that local features computed from the derivations of tree substitution grammars such as the identify of particular fragments, and a count of large and small fragments are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model. — —
3 0.81014097 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
Author: Kapil Thadani ; Kathleen McKeown
Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.
4 0.78707039 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
Author: Yue Zhang ; Joakim Nivre
Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.
5 0.72055781 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
6 0.69763672 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
7 0.69563967 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
8 0.69527394 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
9 0.69507569 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
10 0.69402909 282 acl-2011-Shift-Reduce CCG Parsing
11 0.69235212 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
12 0.68953311 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
13 0.68929493 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
14 0.68900537 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
15 0.68871582 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
16 0.68869042 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
17 0.68807524 28 acl-2011-A Statistical Tree Annotator and Its Applications
18 0.68802267 311 acl-2011-Translationese and Its Dialects
19 0.68799627 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
20 0.68762243 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment