acl acl2011 acl2011-180 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
Reference: text
sentIndex sentText sentNum sentScore
1 We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. [sent-2, score-1.113]
2 We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases. [sent-3, score-1.099]
3 1 Introduction Synchronous context-free grammar (SCFG) is widely used for machine translation. [sent-4, score-0.102]
4 In this paper, we discuss some of the practical is- sues that arise from decoding general SCFGs that are seldom discussed in the literature. [sent-8, score-0.21]
5 We focus on parsing grammars extracted using the method put forth by Galley et al. [sent-9, score-0.164]
6 (2004), but the solutions to these issues are applicable to other general forms of SCFG with many nonterminals. [sent-10, score-0.053]
7 The GHKM grammar extraction method produces a large number of unary rules. [sent-11, score-0.511]
8 Unary rules are the rules that have exactly one nonterminal and no terminals on the source side. [sent-12, score-0.595]
9 They may be problematic for decoders since they may create cycles, which are unary production chains that contain duplicated dynamic programming states. [sent-13, score-0.465]
10 In later sections, we discuss why unary rules are problematic and investigate two possible solutions. [sent-14, score-0.718]
11 413 GHKM grammars often have rules with many right-hand-side nonterminals and require binarization to ensure O(n3) time parsing. [sent-15, score-0.612]
12 However, binarization creates a large number of virtual nonterminals. [sent-16, score-0.333]
13 We discuss the challenges of, and possible solutions to, issues arising from having a large number of virtual nonterminals. [sent-17, score-0.259]
14 We also compare bina- rizing the grammar with filtering rules according to scope, a concept introduced by Hopkins and Langmead (2010). [sent-18, score-0.369]
15 By explicitly considering the effect of anchoring terminals on input sentences, scope3 rules encompass a much larger set of rules than Chomsky normal form but they can still be parsed in O(n3) time. [sent-19, score-0.562]
16 Unlike phrase-based machine translation, GHKM grammars are less flexible in how they can segment sentence pairs into phrases because they are restricted not only by alignments between words in sentence pairs, but also by target-side parse trees. [sent-20, score-0.188]
17 In general, GHKM grammars suffer more from data sparsity than phrasal rules. [sent-21, score-0.104]
18 To alleviate this issue, we discuss adding glue rules and phrases extracted using methods commonly used in phrase-based machine translation. [sent-22, score-0.922]
19 2 Handling unary rules Unary rules are common in GHKM grammars. [sent-23, score-0.911]
20 We observed that as many as 10% of the rules extracted from a Chinese-English parallel corpus are unary. [sent-24, score-0.292]
21 Some unary rules are the result of alignment errors, but other ones might be useful. [sent-25, score-0.644]
22 Extracted grammars include rules that reflect this fact: NP → NP, the NP NNPP → NP, a Ne PN Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-27, score-0.371]
23 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 413–417, However, unary rules can be problematic: • • Unary production cycles corrupt the translation hypergraph generated by cthorer udepct tohdeer tr. [sent-29, score-0.779]
24 aAn hypergraph containing a unary cycle cannot be topologically sorted. [sent-30, score-0.437]
25 Many algorithms for parameter tuning and coarse-to-fine decoding, such as the inside-outside algorithm and cube-pruning, cannot be run in the presence of unary cycles. [sent-31, score-0.402]
26 The existence of many unary rules of the form “ TNhPe → NP, cthee oNfP m” quickly yfi rllusl a pruning brimn with guesses of English words to insert without any source-side lexical evidence. [sent-32, score-0.68]
27 The most obvious way of eliminating problematic unary rules would be converting grammars into Chomsky normal form. [sent-33, score-0.788]
28 In this section, we present two different ways to handle unary rules. [sent-35, score-0.405]
29 The first involves modifying the grammar extraction method, and the second involves modifying the decoder. [sent-36, score-0.266]
30 1 Modifying grammar extraction We can modify the grammar extraction method such that it does not extract any unary rules. [sent-38, score-0.645]
31 (2004) extracts rules by segmenting the target-side parse parse tree based on frontier nodes. [sent-40, score-0.474]
32 We modify the definition of a frontier node in the following way. [sent-41, score-0.15]
33 We label frontier nodes in the English parse tree, and examine the Chinese span each frontier node covers. [sent-42, score-0.34]
34 If a frontier node covers the same span as the frontier node that immediately dominates it, then the dominated node is no longer considered a frontier. [sent-43, score-0.333]
35 Figure 1shows an example of an English-Chinese sentence pair with the English side automatically parsed. [sent-45, score-0.036]
36 Frontier nodes in the tree in the original GHKM rule extraction method are marked with a box. [sent-46, score-0.11]
37 With the modification, only the top boldfaced NP would be considered a frontier node. [sent-47, score-0.117]
38 2 Modifying the decoder Modifying how grammars are extracted has an obvious down side, i. [sent-50, score-0.191]
39 In the previous example, the modification results in a bad rule, which is the result of bad alignments. [sent-53, score-0.048]
40 Before the modification, the rule set includes a good rule: NPB → 白 鹭 鸶, the snowy egret which can be applied at test time. [sent-54, score-0.272]
41 Because of this, one may still want to decode with all available unary rules. [sent-55, score-0.419]
42 We handle unary rules inside the decoder in the following ways: • Unary cycle detection The way to detect unary cycles is back- naïve tracking on a unary chain to see if a newly generated item has been generated before. [sent-56, score-1.558]
43 The running time ofthis is constrained only by the number of possible items in a chart span. [sent-57, score-0.073]
44 In practice, however, this is often not a problem: if all unary derivations have positive costs and a priority queue is used to expand unary derivations, only the best K unary items will be generated, where K is the pruning constant. [sent-58, score-1.272]
45 • 3 Ban negative cost unary rules When tuning feature weights, an optimizer may try feature weights that may give negative costs to unary productions. [sent-59, score-1.107]
46 The solution is to set a maximum length for unary chains, or to ban negative unary productions outright. [sent-61, score-0.797]
47 However, it creates virtual nonterminals which re- quire special attention at parsing time. [sent-65, score-0.287]
48 Alternatively, we can filter rules that have more than scope-3 to parse in O(n3) time with unbinarized rules. [sent-66, score-0.391]
49 This requires Earley (Earley, 1970) style parsing, which does implicit binarization at decoding time. [sent-67, score-0.25]
50 Scopefiltering may filter out unnecessarily long rules that may never be applied, but it may also throw out rules with useful contextual information. [sent-68, score-0.534]
51 For the rest of the section, we discuss issues created by virtual nonterminals. [sent-71, score-0.259]
52 2 Handling virtual nonterminals One aspect of grammar binarization that is rarely mentioned is how to assign probabilities to binarized grammar rules. [sent-73, score-0.676]
53 The solution is to assign probability one to any rule whose left-hand side is a virtual nonterminal. [sent-74, score-0.286]
54 However, it is generally not fair to put chart items of virtual nonterminals and those of regular nonterminals in the same bin, because virtual items have artificially low costs. [sent-76, score-0.642]
55 One possible solution is adding a heuristic to push up the cost of virtual items for fair comparison. [sent-77, score-0.319]
56 naïve For our experiments, we use an outside estimate as a heuristic for a virtual item. [sent-78, score-0.172]
57 Consider the following rule binarization (only the source side shown): A → BCD : −log(p) ⇒ VA →→ VBCD : : 0 −log(p) 415 A → BCD is the orginal rule and −log(p) is the cost of the rulei. [sent-79, score-0.392]
58 s Ithne decoding time, dw −helno a cph)a isrt ihteem co sist generated from the binarized rule V → BC, we add −log(p) to its total cost as an optimistic estimate of t−hel cost to b itusil tdo ttahle c original u onpbtiimnairsiztiecd e s rtuilme. [sent-80, score-0.304]
59 a Teh oef heuristic is used only for pruning purposes, and it does not change the real cost. [sent-81, score-0.036]
60 One complication is that a binarized rule can arise from multiple different unbinarized rules. [sent-83, score-0.247]
61 In this case, we pick the lowest cost among the unbinarized rules as the heuristic. [sent-84, score-0.385]
62 Another approach for handling virtual nonterminals would be giving virtual items separate bins and avoiding pruning them at all. [sent-85, score-0.53]
63 1 Glue rules Because of data sparsity, an SCFG extracted from data may fail to parse sentences at test time. [sent-88, score-0.337]
64 For example, consider the following rules: NP → JJ NN, JJ NN JNJP → c1, e1 JJJJ → c2, e2 JNJN → → c3, e3 This set ofrules is able to parse the word sequence c1 c3 and c2 c3 but not c1 c2 c3, if we have not seen “NP → JJ JJ NN” at training time. [sent-89, score-0.045]
65 Therefore, we may opt to add glue rules as used in Hiero (Chiang, 2005): S → C, C SS → SC C, S C where S is the goal state and C is the glue nonterminal that can produce any nonterminals. [sent-91, score-1.328]
66 We refer to these glue rules as the monotonic glue rules. [sent-92, score-1.446]
67 We rely on GHKM rules for reordering when we use the monotonic glue rules. [sent-93, score-0.932]
68 However, we can also al- low glue rules to reorder constituents. [sent-94, score-0.815]
69 Wu (1997) presents a better-constrained grammar designed to only produce tail-recursive parses. [sent-95, score-0.102]
70 These rules always generate leftS → A A → [A B] B → h B A i SS → BA AA → [B B] BB → h AB AA i SS → CB AA → [C B] BB → h CA AA i AA → [A C] BB → h BC CA i AA → [B C] BB → h AB CC i AA → [C C] BB → h CA CC i Table 1: The ABC Grammar. [sent-98, score-0.267]
71 We follow the convention of Wu (1997) that square brackets stand for straight rules and angle brackets stand for inverted rules. [sent-99, score-0.418]
72 We learn probabilities of ABC glue rules by using expectation maximization (Dempster et al. [sent-101, score-0.781]
73 In our experiments, depending on the configuration, the decoder failed to parse about 5% of sentences without glue rules, which illustrates their necessity. [sent-103, score-0.621]
74 In our experiments, we compare the ABC glue rules with the monotonic glue rules. [sent-105, score-1.446]
75 2 Adding phrases GHKM grammars are more restricted than the phrase extraction methods used in phrase-based models, since, in GHKM grammar extraction, phrase segmentation is constrained by parse trees. [sent-107, score-0.322]
76 (2003) to extract phrases, and, for each phrase, we add a rule with the glue nonterminal as the left-hand side and the phrase pair as the right-hand side. [sent-110, score-0.661]
77 We experiment to see whether adding phrases is beneficial. [sent-111, score-0.082]
78 There have been other efforts to extend GHKM grammar to allow more flexible rule extraction. [sent-112, score-0.18]
79 (2006) introduce composed rules where minimal GHKM rules are fused to form larger rules. [sent-114, score-0.534]
80 Zollmann and Venugopal (2006) introduce a model that allows more generalized rules to be extracted. [sent-115, score-0.267]
81 99 No-unary No-unary No-unary No-unary No-unary + monotonic glue rules + ABC glue rules (scope-filtered) + monotonic (scope-filtered) + ABC glue rules + ABC glue rules + phrases 23. [sent-117, score-3.465]
82 1 Setup We extracted a GHKM grammar from a ChineseEnglish parallel corpus with the English side parsed. [sent-123, score-0.163]
83 , 2011) was applied to all GHKM grammars that are not scopefiltered. [sent-127, score-0.104]
84 Our in-house decoder was used for experiments with a trigram language model. [sent-130, score-0.062]
85 The decoder is capable of both CNF parsing and Earley-style parsing with cube-pruning (Chiang, 2007). [sent-131, score-0.132]
86 We have limited the maximum size of phrases to be four. [sent-133, score-0.039]
87 The baseline GHKM grammar with monotonic glue rules yielded a worse result than the no-unary grammar with the same glue rules. [sent-136, score-1.65]
88 Compared to using monotonic glue rules, using ABC glue rules brought slight improvements for both the no-unary setting and the scope-filtered setting, but the differences are not statistically significant. [sent-139, score-1.47]
89 In terms of decoding speed and memory usage, using ABC glues and monotonic glue rules were virtually identical. [sent-140, score-1.021]
90 The fact that glue rules are seldom used at decoding time may account for why there is little difference in using monotonic glue rules and using ABC glue rules. [sent-141, score-2.35]
91 Out of all the rules that were applied to decoding our test set, less than one percent were glue rules, and among the glue rules, straight glue rules outnumbered inverted ones by three to one. [sent-142, score-2.22]
92 Compared with binarized no-unary rules, scope3 filtered no-unary rules retained 87% of the rules but still managed to have slightly better BLEU score. [sent-143, score-0.593]
93 Because the size of the grammar is smaller, compared to using no-unary grammar, it used less memory at decoding time. [sent-145, score-0.191]
94 This is because the decoder employs Early-style dotted rules to handle unbinarized rules, and in order to decode with scope-3 rules, the decoder needs to build dotted items, which are not pruned until a rule is completely matched, thus leading to slower decoding. [sent-147, score-0.67]
95 We believe the more important reason is that once a phrase is used, only glue rules can be used to continue the derivation, thereby losing the richer information offered by GHKM grammar. [sent-152, score-0.781]
96 6 Conclusion In this paper, we discussed several issues concerning decoding with synchronous context-free grammars, focusing on grammars resulting from the GHKM extraction method. [sent-153, score-0.386]
97 We presented a modified grammar extraction scheme that eliminates unary rules. [sent-155, score-0.511]
98 We also presented a way to decode with unary rules in the grammar, and examined several different issues resulting from binarizing SCFGs. [sent-156, score-0.739]
99 We finally discussed adding flexibility to SCFGs by adding glue rules and phrases. [sent-157, score-0.911]
100 Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. [sent-219, score-0.189]
wordName wordTfidf (topN-words)
[('glue', 0.514), ('unary', 0.377), ('ghkm', 0.351), ('rules', 0.267), ('abc', 0.197), ('virtual', 0.172), ('binarization', 0.161), ('monotonic', 0.151), ('frontier', 0.117), ('grammars', 0.104), ('grammar', 0.102), ('npb', 0.098), ('egret', 0.097), ('snowy', 0.097), ('decoding', 0.089), ('scfg', 0.088), ('synchronous', 0.084), ('aa', 0.083), ('bb', 0.081), ('np', 0.08), ('nonterminals', 0.08), ('unbinarized', 0.079), ('rule', 0.078), ('scfgs', 0.067), ('modifying', 0.066), ('decoder', 0.062), ('galley', 0.061), ('binarized', 0.059), ('issues', 0.053), ('jj', 0.052), ('romance', 0.05), ('bcd', 0.048), ('nnpp', 0.048), ('modification', 0.048), ('fang', 0.045), ('parse', 0.045), ('flexibility', 0.044), ('cycles', 0.044), ('adding', 0.043), ('items', 0.043), ('pnpp', 0.043), ('ban', 0.043), ('decode', 0.042), ('problematic', 0.04), ('derivations', 0.04), ('chomsky', 0.039), ('licheng', 0.039), ('cost', 0.039), ('phrases', 0.039), ('side', 0.036), ('pruning', 0.036), ('parsing', 0.035), ('discuss', 0.034), ('reorder', 0.034), ('seldom', 0.034), ('tagyoung', 0.034), ('hypergraph', 0.034), ('earley', 0.034), ('node', 0.033), ('nonterminal', 0.033), ('translation', 0.033), ('extraction', 0.032), ('hiero', 0.032), ('nn', 0.032), ('ss', 0.031), ('arise', 0.031), ('hopkins', 0.031), ('chart', 0.03), ('zollmann', 0.03), ('handle', 0.028), ('terminals', 0.028), ('straight', 0.028), ('determiners', 0.028), ('examine', 0.028), ('chung', 0.027), ('handling', 0.027), ('inverted', 0.027), ('bc', 0.027), ('daniel', 0.026), ('dempster', 0.026), ('transduction', 0.026), ('dotted', 0.026), ('cycle', 0.026), ('brackets', 0.025), ('tuning', 0.025), ('chiang', 0.025), ('extracted', 0.025), ('log', 0.024), ('statistically', 0.024), ('chains', 0.024), ('inversion', 0.024), ('production', 0.024), ('concerning', 0.024), ('stand', 0.023), ('bleu', 0.023), ('mert', 0.022), ('costs', 0.022), ('practical', 0.022), ('rochester', 0.022), ('fair', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
2 0.27823117 296 acl-2011-Terminal-Aware Synchronous Binarization
Author: Licheng Fang ; Tagyoung Chung ; Daniel Gildea
Abstract: We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality.
3 0.25897133 61 acl-2011-Binarized Forest to String Translation
Author: Hao Zhang ; Licheng Fang ; Peng Xu ; Xiaoyun Wu
Abstract: Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks.
4 0.23976578 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Kristy Hollingshead ; Brian Roark
Abstract: We present a novel pruning method for context-free parsing that increases efficiency by disallowing phrase-level unary productions in CKY chart cells spanning a single word. Our work is orthogonal to recent work on “closing” chart cells, which has focused on multi-word constituents, leaving span-1 chart cells unpruned. We show that a simple discriminative classifier can learn with high accuracy which span-1 chart cells to close to phrase-level unary productions. Eliminating these unary productions from the search can have a large impact on downstream processing, depending on implementation details of the search. We apply our method to four parsing architectures and demonstrate how it is complementary to the cell-closing paradigm, as well as other pruning methods such as coarse-to-fine, agenda, and beam-search pruning.
5 0.21451923 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
6 0.16811095 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
7 0.16773945 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
8 0.16390704 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
9 0.15414292 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
10 0.14607266 30 acl-2011-Adjoining Tree-to-String Translation
11 0.14241628 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
12 0.11607717 252 acl-2011-Prototyping virtual instructors from human-human corpora
13 0.10847783 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
14 0.10223885 44 acl-2011-An exponential translation model for target language morphology
15 0.095175557 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
16 0.089887157 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
17 0.085151672 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
18 0.083405837 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
19 0.083292514 282 acl-2011-Shift-Reduce CCG Parsing
20 0.083024919 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
topicId topicWeight
[(0, 0.176), (1, -0.191), (2, 0.083), (3, -0.113), (4, 0.005), (5, 0.018), (6, -0.262), (7, -0.064), (8, -0.101), (9, -0.112), (10, -0.101), (11, -0.004), (12, -0.013), (13, 0.111), (14, 0.065), (15, -0.056), (16, 0.019), (17, 0.073), (18, 0.0), (19, 0.043), (20, 0.023), (21, -0.034), (22, -0.089), (23, -0.034), (24, -0.052), (25, 0.086), (26, -0.011), (27, -0.044), (28, 0.045), (29, -0.007), (30, 0.047), (31, 0.132), (32, -0.016), (33, 0.014), (34, 0.043), (35, 0.11), (36, 0.116), (37, 0.217), (38, 0.108), (39, 0.084), (40, 0.009), (41, 0.072), (42, 0.012), (43, -0.004), (44, 0.033), (45, 0.002), (46, 0.051), (47, -0.031), (48, -0.081), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.95054048 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
2 0.87606561 296 acl-2011-Terminal-Aware Synchronous Binarization
Author: Licheng Fang ; Tagyoung Chung ; Daniel Gildea
Abstract: We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality.
3 0.77220225 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.
4 0.69319063 61 acl-2011-Binarized Forest to String Translation
Author: Hao Zhang ; Licheng Fang ; Peng Xu ; Xiaoyun Wu
Abstract: Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks.
5 0.63902879 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
Author: Pierluigi Crescenzi ; Daniel Gildea ; Andrea Marino ; Gianluca Rossi ; Giorgio Satta
Abstract: We study the problem offinding the best headdriven parsing strategy for Linear ContextFree Rewriting System productions. A headdriven strategy must begin with a specified righthand-side nonterminal (the head) and add the remaining nonterminals one at a time in any order. We show that it is NP-hard to find the best head-driven strategy in terms of either the time or space complexity of parsing.
6 0.58229446 154 acl-2011-How to train your multi bottom-up tree transducer
7 0.5397374 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
8 0.53747874 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
9 0.52674675 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
10 0.51503861 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
12 0.46394625 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
13 0.43540522 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
14 0.42873383 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
15 0.4261817 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
16 0.42508006 30 acl-2011-Adjoining Tree-to-String Translation
17 0.41397607 44 acl-2011-An exponential translation model for target language morphology
18 0.39516702 232 acl-2011-Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars
19 0.3943367 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment
20 0.39229974 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
topicId topicWeight
[(5, 0.04), (8, 0.022), (17, 0.112), (26, 0.019), (28, 0.014), (37, 0.09), (39, 0.054), (41, 0.069), (55, 0.025), (59, 0.036), (72, 0.04), (82, 0.236), (91, 0.016), (96, 0.143)]
simIndex simValue paperId paperTitle
same-paper 1 0.83931053 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
Author: Tagyoung Chung ; Licheng Fang ; Daniel Gildea
Abstract: We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases.
2 0.6624279 296 acl-2011-Terminal-Aware Synchronous Binarization
Author: Licheng Fang ; Tagyoung Chung ; Daniel Gildea
Abstract: We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality.
3 0.66014451 30 acl-2011-Adjoining Tree-to-String Translation
Author: Yang Liu ; Qun Liu ; Yajuan Lu
Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.
4 0.65927929 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
Author: Alla Rozovskaya ; Dan Roth
Abstract: We consider the problem of correcting errors made by English as a Second Language (ESL) writers and address two issues that are essential to making progress in ESL error correction - algorithm selection and model adaptation to the first language of the ESL learner. A variety of learning algorithms have been applied to correct ESL mistakes, but often comparisons were made between incomparable data sets. We conduct an extensive, fair comparison of four popular learning methods for the task, reversing conclusions from earlier evaluations. Our results hold for different training sets, genres, and feature sets. A second key issue in ESL error correction is the adaptation of a model to the first language ofthe writer. Errors made by non-native speakers exhibit certain regularities and, as we show, models perform much better when they use knowledge about error patterns of the nonnative writers. We propose a novel way to adapt a learned algorithm to the first language of the writer that is both cheaper to implement and performs better than other adaptation methods.
5 0.65877593 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction
Author: Yee Seng Chan ; Dan Roth
Abstract: In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difficult to identify. We propose a novel algorithmic approach to RE that starts by first identifying these structures and then, within these, identifying the semantic type of the relation. In the real RE problem where relation arguments need to be identified, exploiting these structures also allows reducing pipelined propagated errors. We show that this RE framework provides significant improvement in RE performance.
6 0.6558975 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
7 0.65554106 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
8 0.65356481 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
9 0.65291911 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
10 0.6519618 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
11 0.65143013 311 acl-2011-Translationese and Its Dialects
12 0.65130699 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
13 0.65057826 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering
14 0.65045297 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
15 0.65041691 141 acl-2011-Gappy Phrasal Alignment By Agreement
16 0.64908677 61 acl-2011-Binarized Forest to String Translation
17 0.64830649 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
18 0.64666337 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
19 0.64656723 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
20 0.64648616 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models