acl acl2012 acl2012-127 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adam Pauls ; Dan Klein
Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. [sent-3, score-0.414]
2 We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. [sent-4, score-0.321]
3 We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. [sent-5, score-0.38]
4 Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. [sent-6, score-0.337]
5 We also show fluency improvements in a preliminary machine translation experiment. [sent-7, score-0.089]
6 At the same time, because n-gram language models only condition on a local window of linear word-level context, they are poor models of long-range syntactic dependencies. [sent-9, score-0.276]
7 Although several lines of work have proposed generative syntactic language models that improve on n-gram models for moderate amounts of data (Chelba, 1997; Xu et al. [sent-10, score-0.37]
8 , 2002; Charniak, 2001 ; Hall, 2004; Roark, 959 2004), these models have only recently been scaled to the impressive amounts of data routinely used by n-gram language models (Tan et al. [sent-11, score-0.189]
9 In this paper, we describe a generative, syntactic language model that conditions on local context treelets1 in a parse tree, backing off to smaller treelets as necessary. [sent-13, score-0.385]
10 Our model can be trained simply by collecting counts and using the same smoothing techniques normally applied to n-gram models (Kneser and Ney, 1995), enabling us to apply techniques developed for scaling n-gram models out of the box (Brants et al. [sent-14, score-0.333]
11 The simplicity of our training procedure allows us to train a model on a billion tokens of data in a matter of hours on a single machine, which compares favorably to the more involved training algorithm of Tan et al. [sent-16, score-0.122]
12 The simplicity of our approach also contrasts with recent work on language modeling with tree substitution grammars (Post and Gildea, 2009), where larger treelet contexts are incorporated by using sophisticated priors to learn a segmentation of parse trees. [sent-18, score-0.41]
13 We also evaluate our model on several grammaticality tasks proposed in 1We borrow the term treelet from Quirk et al. [sent-22, score-0.551]
14 , 2008T; Ceh ceerntryer sa andre Q loussierks, o it cToanbslies 1te:n Ttlhye o fuirsttp eforfuorr smams f2 m00e8al)s ,a andnd s thhoew ru tnhsa atr palne sn o-gf lreanmgt mh obedtwele aesn w 15el aln as other head-driven and tree-driven generative baselines. [sent-47, score-0.149]
15 F5ormEaxllpy,e lreitm Te nbtes a constituency tree consisting of cWoente evxat-lfureaete er ouluers mofo tdheel f aolormng gr =ev ePra →l d mC1e ·n ·s ·i oCnsd,. [sent-59, score-0.14]
16 Ctho mnodditeioln ainndg ao 5n- gthraem parent lr turalei re0d a olnlow thes Puse tno capture ks. [sent-81, score-0.118]
17 a2pturPeesr bploethxi tPy and its parent P0, which predicts the distribution sotvaenrd child tsryinmsibco elvs far better than fjuorst l P (uJoaghens moon,d 1l9s. [sent-84, score-0.118]
18 o to use back-off-based smoothing for syntactic language modeling such techniques have been applied to models that condition on head-word contexts (Charniak, 2001 ; Roark, 2004; Zhang, 2009). [sent-90, score-0.27]
19 Parent rule context has also been employed in translation (Vaswani et al. [sent-91, score-0.141]
20 To capture linear effects, we extend the context for terminal (lexical) productions to include – the previous two words w−2 and w−1 in the sentence in addition to r0; see Figure 1(c) for a depiction. [sent-96, score-0.236]
21 For non-terminal productions, we back off from r0 to P and its parent P0, and then to just P. [sent-101, score-0.173]
22 In order t|oP generalize to unseen rule yields C|P1d,) we f uorrtdheerr o ba gcekn eorfafl ifzreom to t uhnes e beansic PCFG probability p(C1d|P) to p(Ci|Ci−1, P), a 4-gram model over symbols C) t oco pn(Cdit|iConie−d3 on P, interpolated with an unconditional 4-gram model p(Ci |Ci−1). [sent-103, score-0.148]
23 Fr)om to there, we back off to p(w|P, R) Pw,hRer,er R is the sibling immediately to the right Po,fR P, wthheenr eto R a raw eP sCibFliGn p(w|P), aatnedly finally triog a unigram deinst troib aut riaown. [sent-108, score-0.103]
24 There is one additional hurdle in the estimation of our model: while there exist corpora with humanannotated constituency parses like the Penn Treebank (Marcus et al. [sent-115, score-0.212]
25 These parses may contain errors, but not all parsing errors are problematic for our model, since we only care about the sentences generated by our model and not the parses themselves. [sent-120, score-0.303]
26 We show in our experiments that the addition of data with automatic parses does improve the performance of our language models across a range of tasks. [sent-121, score-0.144]
27 – 3 Tree Transformations – In the previous section, we described how to condition on rich parse context to better capture the distribution of English trees. [sent-122, score-0.143]
28 While such context allows our model to capture many interesting dependencies, several important dependencies require additional attention. [sent-123, score-0.126]
29 today Figure 2: A sample parse from the Penn Treebank after the tree transformations described in Section 3. [sent-126, score-0.133]
30 Note that we have not shown head tag annotations on preterminals because in that case, the head tag is the preterminal itself. [sent-127, score-0.344]
31 number of transformations of Treebank constituency parses that allow us to capture such dependencies. [sent-128, score-0.228]
32 Instead, we mark any noun that is the head of a NP-TMP constituent at least once in the Treebank as a temporal noun, so for example today would be tagged as NNT and months would be tagged as NNTS. [sent-133, score-0.129]
33 Head Annotations We annotate every non-terminal or preterminal with its head word if the head is a closedclass word3 and with its head tag otherwise. [sent-134, score-0.523]
34 Klein and Manning (2003) used head tag annotation extensively, though they applied their splits much more selectively. [sent-135, score-0.177]
35 By removing the dominated NP, we allow the production NNS→sales to condition on the presence of a modifying PP (here a PP head-annotated with by). [sent-138, score-0.14]
36 962 MX for numbers that mix letters and digits; and CD-AL for numbers that are entirely alphabetic. [sent-140, score-0.092]
37 SBAR Flattening We remove any sentential (S) nodes immediately dominated by an SBAR. [sent-141, score-0.145]
38 S nodes under SBAR have very distinct distributions from other sentential nodes, mostly due to empty subjects and/or objects. [sent-142, score-0.096]
39 By flattening such structures, we allow the main verb and its arguments to condition on the whole chain of verbs. [sent-147, score-0.206]
40 Gapped Sentence Annotation Collins (1999) and Klein and Manning (2003) annotate nodes which have empty subjects. [sent-149, score-0.146]
41 We use a very simple procedure: we annotate all S or SBAR nodes that have a VP before any NPs. [sent-151, score-0.098]
42 Parent Annotation We annotate all VPs with their parent symbol. [sent-152, score-0.168]
43 Because our treelet model already conditions on the parent, this has the effect of allowing verbs to condition on their grandparents. [sent-153, score-0.508]
44 Unary Deletion We remove all unary productions except the root and preterminal productions, keeping only the bottom-most symbol. [sent-156, score-0.261]
45 For machine translation, a model that builds target-side constituency parses, such as that of Galley et al. [sent-161, score-0.147]
46 In Table 1, we show the first four samples of length between 15 and 20 generated from our model and a 5-gram model trained on the Penn Treebank. [sent-176, score-0.197]
47 4We found that using the 1-best worked just as well as the 1000-best on our grammaticality tasks, but significantly overestimated our model’s perplexities. [sent-177, score-0.134]
48 For training data, we constructed a large treebank by concatenating the WSJ and Brown portions of the Penn Treebank, the 50K BLLIP training sentences from Post (201 1), and the AFP and APW portions of English Gigaword version 3 (Graff, 2003), totaling about 1. [sent-182, score-0.139]
49 We used the humanannotated parses for the sentences in the Penn Treebank, but parsed the Gigaword and BLLIP sentences with the Berkeley Parser. [sent-184, score-0.241]
50 TREELET-RULE The TREELET-TRANS model with the parent rule context described in Section 2. [sent-193, score-0.244]
51 This is equivalent to the full TREELET model without the lexical context described in Section 2. [sent-194, score-0.126]
52 6Specifically, like Collins Model 1, we generate a rule yield conditioned on parent symbol P and head word h by first generating its head symbol Ch, then generating the head words and symbols for left and right modifiers outwards from Ch. [sent-196, score-0.591]
53 Unlike Model 1, which generates each modifier head and symbol conditioned only on Ch, h, and P, we additionally condition on the previously generated modifier’s head and symbol and back off to Model 1. [sent-197, score-0.49]
54 PCFG-LA (marked with **) was only trained on the WSJ and Brown corpora because it does not scale to large amounts of data. [sent-200, score-0.128]
55 We used the Berkeley LM toolkit (Pauls and Klein, 2011), which implements Kneser-Ney smoothing, to estimate all back-off models for both n-gram and treelet models. [sent-201, score-0.398]
56 Our model outperforms all other generative models, though the improvement over the n-gram model is not statistically significant. [sent-203, score-0.302]
57 3 Classification of Pseudo-Negative Sentences We make use of three kinds of automatically generated pseudo-negative sentences previously proposed in the literature: Okanohara and Tsujii (2007) proposed generating pseudo-negative examples from a trigram language model; Foster et al. [sent-206, score-0.171]
58 (2008) create “noisy” sentences by automatically inserting a single error into grammatical sentences with a script that randomly deletes, inserts, or misspells a word; and Och et al. [sent-207, score-0.102]
59 We evaluate our model’s ability to distinguish positive from pseudonegative data, and compare against generative baselines and state-of-the-art discriminative methods. [sent-210, score-0.289]
60 We would like to use our model to make grammaticality judgements, but as a generative model it can only provide us with probabilities. [sent-218, score-0.388]
61 Simply thresholding generative probabilities, even with a separate threshold for each length, has been shown to be very ineffective for grammaticality judgements, both for n-gram and syntactic language models (Cherry and Quirk, 2008; Post, 2011). [sent-219, score-0.37]
62 We used a simple measure for isolating the syntactic likelihood of a sentence: we take the log-probability under our model and subtract the log-probability under a unigram model, then normalize by the length of the sentence. [sent-220, score-0.195]
63 8 This measure, which we call the syntactic log-odds ratio (SLR), is a crude way of “subtracting out” the semantic component of the generative probability, so that sentences that use rare words are not penalized for doing so. [sent-221, score-0.232]
64 1 Trigram Classification To facilitate comparison with previous work, we used the same negative corpora as Post (201 1) for trigram classification. [sent-224, score-0.164]
65 They randomly selected 50K train, 3K development, and 3K positive test sentences from the BLLIP corpus, then trained a trigram model on 450K BLLIP sentences and generated 50K train, 3K development, and 3K negative sentences. [sent-225, score-0.389]
66 We parsed the 50K positive training examples of Post (201 1) with the Berkeley Parser and used the resulting treebank to train a treelet language model. [sent-226, score-0.431]
67 We set an SLR threshold for each model on the 6K positive and negative development sentences. [sent-227, score-0.118]
68 In addition to our generative baselines, we show results for the discriminative models reported in Cherry and Quirk (2008) and Post (201 1). [sent-229, score-0.235]
69 We assume this is either because the lengthnormalization is important, or because their choice of syntactic language model was poor. [sent-233, score-0.149]
70 The number reported for PCFG-LA is marked with a * to indicate that this model was trained on the training section of the WSJ, not the BLLIP corpus. [sent-235, score-0.172]
71 Our TREELET model performs nearly as well as the TSG method, and substantially outperforms the LSVM method, though the latter was not tested on the same random split. [sent-238, score-0.122]
72 This is likely because the negative data is largely coherent at the trigram level (because it was generated from a trigram model), and the full model is much more sensitive to trigram coherence than the TREELET-RULE model. [sent-240, score-0.534]
73 We emphasize that the discriminative baselines are specifically trained to separate trigram text from natural English, while our model is trained on positive examples alone. [sent-242, score-0.418]
74 Unlike some of the discriminative baselines, which require expensive operations9 on 9It is true that in order train our system, one must parse large amounts of training data, which can be costly, though it only needs to be done once. [sent-245, score-0.201]
75 In contrast, even with observed training trees, the discriminative algorithms must still iteratively perform expensive operations (like parsing) for each sentence, and a new model must be trained for new types of negative data. [sent-246, score-0.241]
76 “Pairwise” accuracy is the fraction of correct sentences whose SLR score was higher than its noisy version, and “independent” refers to standard binary classification accuracy. [sent-248, score-0.151]
77 each training sentence, we can very easily scale our model to much larger amounts of data. [sent-249, score-0.153]
78 In Table 4, we also show the performance of the generative models trained on our 1B corpus. [sent-250, score-0.21]
79 All generative models improve, but TREELET-RULE remains the best, now outperforming the RERANK system, though of course it is likely that RERANK would improve if it could be scaled up to more training data. [sent-251, score-0.209]
80 2 “Noisy” Classification We also evaluate the performance of our model on the task of distinguishing the noisy WSJ sentences of Foster et al. [sent-254, score-0.172]
81 Because they only report classification results on Section 0, we used Section 23 to tune an SLR threshold, and tested our model on Section 0. [sent-257, score-0.127]
82 We show the results of both independent and pairwise classification for the WSJ and 1B training sets in Table 5. [sent-258, score-0.137]
83 Note that independent classification is much more difficult than for the trigram data, because sentences contain at most one change, which may not even result in an ungrammaticality. [sent-259, score-0.224]
84 Again, our model outperforms the n-gram model for both types of classification, and achieves the same performance as the discriminative system of Foster et al. [sent-260, score-0.222]
85 The TREELET-RULE system again slightly outperforms the full TREELET model at independent classification, though not at pairwise classification. [sent-262, score-0.206]
86 This probably reflects the fact that semantic coherence can still influence the SLR score, despite our efforts to subtract it out. [sent-263, score-0.102]
87 9% on a similar experiment with German a source language, though the translation system and training data were different so the numbers are not comparable. [sent-274, score-0.183]
88 For pairwise comparisons, where semantic coherence is effectively held constant, such sentences are not problematic. [sent-277, score-0.191]
89 (2004) and Cherry and Quirk (2008) in evaluating our language models on their ability to distinguish the 1-best output of a machine translation system from a reference translation in a pairwise fashion. [sent-281, score-0.317]
90 , 2009) trained on 500K sentences of GALE Chinese-English parallel newswire. [sent-286, score-0.1]
91 We trained both our TREELET model and a 5-GRAM model on the union of our 1B corpus and the English sides of our parallel corpora. [sent-287, score-0.197]
92 org/wmt09 11We note that the n-gram language model used by the MT system was much smaller than the 5-GRAM model, as they were only trained on the English sides of their parallel data. [sent-293, score-0.123]
93 966 fect language model might not be able to differentiate such translations from their references. [sent-294, score-0.133]
94 4 Machine Translation Fluency We also carried out reranking experiments on 1000best lists from Moses using our syntactic language model as a feature. [sent-296, score-0.149]
95 We did not find that the use of our syntactic language model made any statistically significant increases in BLEU score. [sent-297, score-0.149]
96 However, we noticed in general that the translations favored by our model were more fluent, a useful improvement to which BLEU is often insensitive. [sent-298, score-0.133]
97 To confirm this, we carried out an Amazon Mechanical Turk experiment where users from the United States were asked to compare translations using our TREELET language model as the language model feature to those using the 5-GRAM model. [sent-299, score-0.207]
98 6 Conclusion We have presented a simple syntactic language model that can be estimated using standard n-gram smoothing techniques on large amounts of data. [sent-304, score-0.277]
99 Our model outperforms generative baselines on several evaluation metrics and achieves the same performance as state-of-the-art discriminative classifiers specifically trained on several types of negative data. [sent-305, score-0.399]
100 Scalable inference and training of context-rich syntactic translation models. [sent-365, score-0.164]
wordName wordTfidf (topN-words)
[('treelet', 0.343), ('slr', 0.172), ('bllip', 0.16), ('grammaticality', 0.134), ('head', 0.129), ('productions', 0.128), ('pcfg', 0.125), ('trigram', 0.12), ('parent', 0.118), ('nps', 0.115), ('cii', 0.115), ('flattening', 0.115), ('treelets', 0.115), ('generative', 0.106), ('cherry', 0.106), ('quirk', 0.105), ('post', 0.104), ('ci', 0.094), ('wsj', 0.093), ('pauls', 0.092), ('sbar', 0.091), ('condition', 0.091), ('parses', 0.089), ('translation', 0.089), ('treebank', 0.088), ('lsvm', 0.086), ('preterminal', 0.086), ('perplexity', 0.085), ('pairwise', 0.084), ('amounts', 0.079), ('syntactic', 0.075), ('model', 0.074), ('discriminative', 0.074), ('penn', 0.073), ('nns', 0.073), ('constituency', 0.073), ('berkeley', 0.073), ('foster', 0.07), ('backing', 0.069), ('vps', 0.069), ('tree', 0.067), ('transformations', 0.066), ('bleu', 0.064), ('okanohara', 0.064), ('chris', 0.063), ('klein', 0.061), ('charniak', 0.06), ('translations', 0.059), ('rerank', 0.057), ('pseudonegative', 0.057), ('coherence', 0.056), ('terminal', 0.056), ('models', 0.055), ('digits', 0.055), ('back', 0.055), ('association', 0.054), ('classification', 0.053), ('context', 0.052), ('vp', 0.052), ('baselines', 0.052), ('counts', 0.051), ('sentences', 0.051), ('annotate', 0.05), ('ag', 0.05), ('nnt', 0.05), ('humanannotated', 0.05), ('vaswani', 0.05), ('brants', 0.049), ('yp', 0.049), ('tan', 0.049), ('joshua', 0.049), ('dominated', 0.049), ('marked', 0.049), ('smoothing', 0.049), ('trained', 0.049), ('nodes', 0.048), ('though', 0.048), ('immediately', 0.048), ('empty', 0.048), ('billion', 0.048), ('noisy', 0.047), ('root', 0.047), ('ot', 0.046), ('moses', 0.046), ('numbers', 0.046), ('wp', 0.046), ('subtract', 0.046), ('signatures', 0.046), ('ciprian', 0.046), ('collins', 0.045), ('negative', 0.044), ('symbol', 0.043), ('matt', 0.043), ('sn', 0.043), ('kneser', 0.043), ('tsg', 0.043), ('chelba', 0.043), ('sales', 0.043), ('german', 0.042), ('parser', 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
Author: Adam Pauls ; Dan Klein
Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.
2 0.16004193 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
3 0.1460723 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Author: Hiroyuki Shindo ; Yusuke Miyao ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
4 0.14496754 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
Author: Hui Zhang ; David Chiang
Abstract: Syntax-based translation models that operate on the output of a source-language parser have been shown to perform better if allowed to choose from a set of possible parses. In this paper, we investigate whether this is because it allows the translation stage to overcome parser errors or to override the syntactic structure itself. We find that it is primarily the latter, but that under the right conditions, the translation stage does correct parser errors, improving parsing accuracy on the Chinese Treebank.
5 0.14258136 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
6 0.14049426 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
7 0.13934048 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
8 0.13926816 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
10 0.11149015 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation
11 0.11057505 154 acl-2012-Native Language Detection with Tree Substitution Grammars
12 0.10865183 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
13 0.1059109 140 acl-2012-Machine Translation without Words through Substring Alignment
14 0.10515023 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
15 0.10460225 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
16 0.10070937 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
17 0.10069396 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
18 0.099251501 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
19 0.09907493 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
20 0.097840354 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
topicId topicWeight
[(0, -0.332), (1, -0.107), (2, -0.102), (3, -0.07), (4, -0.058), (5, -0.063), (6, -0.002), (7, 0.061), (8, 0.033), (9, -0.004), (10, -0.005), (11, -0.046), (12, -0.024), (13, 0.035), (14, 0.024), (15, -0.099), (16, 0.045), (17, 0.02), (18, -0.048), (19, -0.019), (20, 0.034), (21, -0.016), (22, 0.009), (23, -0.014), (24, 0.117), (25, -0.022), (26, 0.071), (27, 0.015), (28, 0.032), (29, 0.079), (30, -0.025), (31, -0.045), (32, -0.045), (33, 0.019), (34, 0.045), (35, 0.007), (36, -0.016), (37, -0.033), (38, -0.032), (39, -0.001), (40, 0.071), (41, -0.016), (42, -0.027), (43, 0.031), (44, -0.024), (45, 0.001), (46, 0.002), (47, -0.046), (48, 0.068), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.93764555 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
Author: Adam Pauls ; Dan Klein
Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.
2 0.73008686 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
Author: Dave Golland ; John DeNero ; Jakob Uszkoreit
Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
4 0.68752843 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
5 0.67877167 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Author: Hiroyuki Shindo ; Yusuke Miyao ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
6 0.67423886 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
7 0.66384691 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
8 0.6611073 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
9 0.64686745 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
10 0.64231354 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
11 0.62856585 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
12 0.62732673 154 acl-2012-Native Language Detection with Tree Substitution Grammars
13 0.62039256 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
14 0.61768711 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus
15 0.61510301 108 acl-2012-Hierarchical Chunk-to-String Translation
16 0.61005729 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
17 0.60602599 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
18 0.60518116 83 acl-2012-Error Mining on Dependency Trees
19 0.60514635 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
20 0.60126591 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
topicId topicWeight
[(7, 0.016), (25, 0.025), (26, 0.037), (28, 0.07), (30, 0.039), (37, 0.035), (39, 0.031), (57, 0.042), (59, 0.017), (61, 0.18), (74, 0.051), (82, 0.038), (84, 0.029), (85, 0.034), (90, 0.15), (92, 0.08), (94, 0.034), (99, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.81174248 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
Author: Adam Pauls ; Dan Klein
Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.
2 0.72201079 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
Author: William Yang Wang ; Elijah Mayfield ; Suresh Naidu ; Jeremiah Dittmar
Abstract: We propose a latent variable model to enhance historical analysis of large corpora. This work extends prior work in topic modelling by incorporating metadata, and the interactions between the components in metadata, in a general way. To test this, we collect a corpus of slavery-related United States property law judgements sampled from the years 1730 to 1866. We study the language use in these legal cases, with a special focus on shifts in opinions on controversial topics across different regions. Because this is a longitudinal data set, we are also interested in understanding how these opinions change over the course of decades. We show that the joint learning scheme of our sparse mixed-effects model improves on other state-of-the-art generative and discriminative models on the region and time period identification tasks. Experiments show that our sparse mixed-effects model is more accurate quantitatively and qualitatively interesting, and that these improvements are robust across different parameter settings.
3 0.72076482 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
Author: Xinyan Xiao ; Deyi Xiong ; Min Zhang ; Qun Liu ; Shouxun Lin
Abstract: Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. Our model also achieves a better performance and a faster speed than previous approaches that work at the word level.
4 0.72034997 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
Author: Arianna Bisazza ; Marcello Federico
Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
6 0.71724409 140 acl-2012-Machine Translation without Words through Substring Alignment
7 0.71354675 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
8 0.71240509 83 acl-2012-Error Mining on Dependency Trees
9 0.71048093 136 acl-2012-Learning to Translate with Multiple Objectives
10 0.70967335 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
11 0.7093097 31 acl-2012-Authorship Attribution with Author-aware Topic Models
12 0.70904851 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
13 0.70874316 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
14 0.70531946 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
15 0.70524228 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
16 0.70463479 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
17 0.70404047 167 acl-2012-QuickView: NLP-based Tweet Search
18 0.70396119 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
19 0.70348901 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
20 0.70328581 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization