acl acl2011 acl2011-202 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
Reference: text
sentIndex sentText sentNum sentScore
1 nl Abstract While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. [sent-3, score-0.915]
2 The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. [sent-4, score-0.662]
3 These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. [sent-5, score-0.458]
4 Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. [sent-7, score-0.681]
5 SCFGs in the form of the Inversion-Transduction Grammar (ITG) were first introduced by (Wu, 1997) as a formalism to recursively describe the translation process. [sent-11, score-0.335]
6 nl utilised an ITG-flavour which focused on hierarchical phrase-pairs to capture context-driven translation and reordering patterns with ‘gaps’, offering competitive performance particularly for language pairs with extensive reordering. [sent-14, score-0.667]
7 As Hiero uses a single non-terminal and concentrates on overcoming translation lexicon sparsity, it barely explores the recursive nature of translation past the lexical level. [sent-15, score-0.627]
8 Nevertheless, the successful employment of SCFGs for phrase-based SMT brought translation models assuming latent syntactic structure to the spotlight. [sent-16, score-0.473]
9 Simultaneously, mounting efforts have been directed towards SMT models employing linguistic syntax on the source side (Yamada and Knight, 2001 ; Quirk et al. [sent-17, score-0.48]
10 Hierarchical translation was combined with target side linguistic annotation in (Zollmann and Venugopal, 2006). [sent-24, score-0.476]
11 , 2003) exemplified the difficulties of integrating linguistic information in translation systems. [sent-26, score-0.367]
12 Syntaxbased MT often suffers from inadequate constraints in the translation rules extracted, or from striving to combine these rules together towards a full derivation. [sent-27, score-0.635]
13 , 2010), or by moving from linguistically motivated synchronous grammars to systems where linguistic plausibility of the translation is assessed through additional features in a phrase-based system (Venugopal et al. [sent-29, score-0.782]
14 While it is assumed that linguistic structure does correlate with some translation phenomena, in this Proce dinPgosrt olafn thde, 4 O9rtehg Aon ,n Ju anle M 1e9e-2tin4g, 2 o0f1 t1h. [sent-32, score-0.433]
15 In place of linguistically constrained translation imposing syntactic parse structure, we opt for linguistically motivated translation. [sent-35, score-0.632]
16 We learn latent hierarchical structure, taking advantage of linguistic annotations but shaped and trained for translation. [sent-36, score-0.432]
17 These phrasepair label charts are the input of our learning algorithm, which extracts the linguistically motivated rules and estimates the probabilities for a stochastic SCFG, without arbitrary constraints such as phrase or span sizes. [sent-38, score-0.604]
18 In contrast, our learning objective not only avoids overfitting the training data but, most importantly, learns joint stochastic synchronous grammars which directly aim at generalisation towards yet unseen instances. [sent-41, score-0.43]
19 By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline. [sent-42, score-1.063]
20 Section 2 discusses the weak independence assumptions of SCFGs and introduces a joint translation model which addresses these issues and separates hierarchical translation structure from phrase-pair emission. [sent-44, score-0.823]
21 In section 3 we consider a chart over phrase-pair spans filled with sourcelanguage linguistically motivated labels. [sent-45, score-0.334]
22 We show how we can employ this crucial input to extract and train a hierarchical translation structure model with millions of rules. [sent-46, score-0.664]
23 Section 4 demonstrates decoding with the model by constraining derivations to linguistic hints of the source sentence and presents our empirical results. [sent-47, score-0.317]
24 By crossing the links between the non-terminals of the two sides reordering phenomena are captured. [sent-52, score-0.327]
25 Also, for this work we only used grammars with either purely lexical or purely abstract rules involving one or two nonterminal pairs. [sent-56, score-0.254]
26 not only binary ones), which has received surprisingly little attention, is that the reordering pattern between the non-terminal pairs (or in the case of ITGs the choice between monotone and swap expansion) are not conditioned on any other part of a derivation. [sent-63, score-0.289]
27 As an example, a probabilistic SCFG will always assign a higher probability to derivations swapping or monotonically translating nouns and adjectives between English and French, only depending on which of the two rules NP → [NN JJ], NP → hNN JJi htwaso a higher probability. [sent-67, score-0.33]
28 T JheJ ] r,e Nst Pof →the (sometimes thousands of) rule-specific features usually added to SCFG translation models do not directly help either, leaving reordering decisions disconnected from the rest of the derivation. [sent-68, score-0.521]
29 While in a decoder this is somehow mitigated by the use of a language model, we believe that the weakness of straightforward applications of SCFGs to model reordering structure at the sentence level misses a chance to learn this crucial part of the translation process during grammar induction. [sent-69, score-0.796]
30 As (Mylonakis and Sima’an, 2010) note, ‘plain’ SCFGs seem to perform worse than the grammars described next, mainly due to wrong long-range reordering decisions for which the language model can hardly help. [sent-70, score-0.323]
31 2 Hierarchical Reordering SCFG We address the weaknesses mentioned above by relying on an SCFG grammar design that is similar to the ‘Lexicalised Reordering’ grammar of (Mylonakis and Sima’an, 2010). [sent-72, score-0.336]
32 As in the rules of Figure 1, we separate non-terminals according to the reordering patterns in which they participate. [sent-73, score-0.368]
33 Furthermore, this set of pre-terminals allows us to separate the higher order translation structure from the process that emits phrase-pairs, a feature we employ next. [sent-80, score-0.455]
34 In (Mylonakis and Sima’an, 2010) this grammar design mainly contributed to model lexical reordering preferences. [sent-81, score-0.407]
35 While we retain this function, for the rich linguistically-motivated grammars used in this work this design effectively propagates reordering preferences above and below the current rule application (e. [sent-82, score-0.386]
36 Figure 1, rules (a)-(c)), allowing to learn and apply complex reordering patterns. [sent-84, score-0.368]
37 The different types of grammar rules are summarised in abstract form in Figure 2. [sent-85, score-0.297]
38 3 Generative Model We arrive at a probabilistic SCFG model which jointly generates source e and target f strings, by augmenting each grammar rule with a probability, summing up to one for every left-hand side. [sent-88, score-0.465]
39 The probability of a derivation D of tuple he, fi begin- ning afbriolmity ys otafrt a symbol oSn nis D equal to eth hee product nofthe probabilities of the rules used to recursively generate it. [sent-89, score-0.309]
40 The grammar rules pertaining to the is the problem’. [sent-91, score-0.297]
41 structural part and their associated probabilities define a model p(σ) over the latent variable σ determining the recursive, reordering and phrase-pair segmenting structure of translation, as in Figure 4. [sent-93, score-0.504]
42 Given σ, the phrase-pair emission part merely gener- ates the phrase-pairs utilising distributions from every NTP to the phrase-pairs that it covers, thereby defining a model over all sentence-pairs generated given each translation structure. [sent-94, score-0.365]
43 The probabilities of a derivation and of a sentence-pair are then as follows: p(D) =p(σ)p(e, f|σ) p(e,f) = X p(D) (1) (2) D:DX⇒∗he,fi By splitting thejoint model in a hierarchical structure model and a lexical emission one we facilitate estimating the two models separately. [sent-95, score-0.422]
44 For every training sentence-pair, we also input a chart containing one or more labels for every synchronous span, such as that of Figure 3. [sent-101, score-0.296]
45 We aim to induce a recursive translation structure explaining the joint generation of the source and target 645 sentence taking advantage of these phrase-pair span labels. [sent-105, score-0.678]
46 For this work we employ the linguistically motivated labels of (Zollmann and Venugopal, 2006), albeit for the source language. [sent-106, score-0.449]
47 Binary rules are extracted from adjoining synchronous spans up to the whole sentence-pair level, with the non-terminals of both left and right-hand side derived from the label names plus their reordering function (monotone, left/right swapping) in the span examined. [sent-113, score-0.721]
48 Noun phrase NPR is recursively constructed with a preference to constitute the right branch ofan order swapping nonterminal expansion. [sent-122, score-0.256]
49 The labels VP/NP and SBAR\WHNP allow linguistic syntax context t aon idnf SluBeAncRe\ WtheH lNexPic aall oanwd reordering tyrnatnasxlation choices. [sent-124, score-0.448]
50 Crucially, all these lexical, attachment and reordering preferences (as encoded in the model’s rules and probabilities) must be matched together to arrive at the analysis in Figure 4. [sent-125, score-0.437]
51 However, apart from overfitting towards long phrase-pairs, a grammar with millions of structural rules is also liable to overfit towards degenerate latent structures which, while fitting the training data well, have limited applicability to unseen sentences. [sent-130, score-0.728]
52 For our probabilistic SCFG-based translation structure variable σ, implementing CV-EM boils down to a synchronous version of the Inside-Outside algorithm, modified to enforce the CV criterion. [sent-136, score-0.526]
53 The CV-criterion, apart fretoemrs avoiding overfitting, results in discarding the structural rules which are only found in a single part of the training corpus, leading to a more compact grammar while still retaining millions of structural rules that are more hopeful to generalise. [sent-138, score-0.631]
54 Unravelling the joint generative process, by modelling latent hierarchical structure separately from phrase-pair emission, allows us to concentrate our inference efforts towards the hidden, higher-level translation mechanism. [sent-139, score-0.679]
55 1 Decoding Model The induced joint translation model can be used to recover arg maxe p(e|f), as it is equal to arg maxe p(e, f) . [sent-141, score-0.409]
56 We employ the induced probabilistic HR-SCFG G as the backbone of a log-linear, feature based translation model, with the derivation probability p(D) under the grammar estimate being one of the features. [sent-142, score-0.682]
57 This is augmented with a small number n of additional smoothing features φi for derivation rules r: (a) conditional phrase translation probabilities, (b) lexical phrase translation probabilities, (c) word generation penalty, and (d) a count of swapping reordering operations. [sent-143, score-1.191]
58 Features (a), (b) and (c) are applicable to phrase-pair emission rules and features for both translation directions are used, while (d) is only triggered by structural rules. [sent-144, score-0.565]
59 These extra features assess translation quality past the synchronous grammar derivation and learning general reordering or word emission preferences for the language pair. [sent-145, score-0.999]
60 As an example, while our probabilistic HR-SCFG maintains a separate joint phrase-pair emission distribution per non-terminal, the smoothing features (a) above assess the conditional translation of surface phrases irrespective of any notion of recursive translation structure. [sent-146, score-0.841]
61 , 2009) to translate, with the following modifications: Source Labels Constraints As for this work the phrase-pair labels used to extract the grammar are based on the linguistic analysis of the source side, we can construct the label chart for every input sentence from its parse. [sent-150, score-0.528]
62 We subsequently use it to consider only derivations with synchronous spans which are covered by non-terminals matching one of the labels for those spans. [sent-151, score-0.306]
63 In this manner we not only constrain the trans- lation hypotheses resulting in faster decoding time, but, more importantly, we may ground the hypotheses more closely to the available linguistic information of the source sentence. [sent-153, score-0.28]
64 As our grammar uses non-terminals in the hundreds of thousands, it is important not to prune away prematurely non-terminals covering smaller spans and to leave more options to be considered as we move up the derivation tree. [sent-156, score-0.397]
65 Expected Counts Rule Pruning To compact the hierarchical structure part of the grammar prior to decoding, we prune rules that fail to accumulate 10−8 expected counts during the last CV-EM iteration. [sent-160, score-0.555]
66 We compare against a state-of-the-art hierarchical translation (Chiang, 2005) baseline, based on the Joshua translation system under the default training and decoding settings (j osh-base). [sent-206, score-0.825]
67 The heuristi- cally trained baseline takes advantage of ‘gap rules’ to reorder based on lexical context cues, but makes very limited use of the hierarchical structure above the lexical surface. [sent-208, score-0.298]
68 In contrast, our method induces a grammar with no such rules, relying on lexical content and the strength of a higher level translation structure instead. [sent-209, score-0.516]
69 The decoder does not employ any ‘glue grammar’ as is usual with hierarchical translation systems to limit reordering up to a certain cut-off length. [sent-214, score-0.815]
70 Instead, we rely on our LTS grammar to reorder and construct the translation output up to the full sentence length. [sent-215, score-0.497]
71 92 BLEU points for English to Chinese translation when training on the 400K set. [sent-228, score-0.282]
72 We selected an array of target languages of increasing reordering complexity with English as source. [sent-231, score-0.324]
73 757** Table 2: Additional experiments for English to Chinese translation examining (a) the impact of the linguistic annotations in the LTS system (lt s), when compared with an instance not employing such annotations (lt s -nolabel s) and (b) decoding with a 4th-order language model (-lm4). [sent-242, score-0.712]
74 For the English to Chinese translation task, we performed further experiments along two axes. [sent-245, score-0.282]
75 We first investigate the contribution of the linguistic annotations, by comparing our complete system (lt s) with an otherwise identical implementation (lt s-nolabe l which does not employ any lins) guistically motivated labels. [sent-246, score-0.266]
76 The latter system then uses a labels chart as that ofFigure 3, which however labels all phrase-pair spans solely with the generic X label. [sent-247, score-0.285]
77 Notably, as can be seen in Table 2(b), switching to a 4-gram LM results in performance gains for both the baseline and our system and while the margin between the two systems decreases, our system continues to deliver a considerable and significant improvement in translation BLEU scores. [sent-250, score-0.282]
78 5 Related Work In this work, we focus on the combination of learning latent structure with syntax and linguistic annotations, exploring the crossroads of machine 649 learning, linguistic syntax and machine translation. [sent-251, score-0.43]
79 We show that a translation system based on such a joint model can perform competitively in comparison with conditional probability models, when it is augmented with a rich latent hierarchical structure trained adequately to avoid overfitting. [sent-253, score-0.629]
80 Earlier approaches for linguistic syntax-based translation such as (Yamada and Knight, 2001 ; Gal- ley et al. [sent-254, score-0.367]
81 , 2006) focus on memorising and reusing parts of the structure of the source and/or target parse trees and constraining decoding by the input parse tree. [sent-257, score-0.46]
82 In contrast to this approach, we choose to employ linguistic annotations in the form of unambiguous synchronous span labels, while discovering ambiguous translation structure taking advantage of them. [sent-258, score-0.847]
83 , 2009) takes a more flexible approach, influencing translation output using linguistically motivated features, or features based on source-side linguistically-guided latent syntactic categories (Huang et al. [sent-261, score-0.561]
84 While for this work we constrain ourselves to source language syntax annotations, our method can be directly applied to employ labels taking advantage of linguistic annotations from both sides of translation. [sent-268, score-0.551]
85 The HR-SCFG we adopt allows capturing more complex reordering phenomena and, in contrast to both (Chiang, 2005; Zollmann and Venugopal, 2006), is not exposed to the issues highlighted in section 2. [sent-273, score-0.285]
86 Nevertheless, our results underline the capacity of linguistic annotations similar to those of (Zollmann and Venugopal, 2006) as part of latent translation variables. [sent-275, score-0.529]
87 Most of the aforementioned work does concentrate on learning hierarchical, linguistically motivated translation models. [sent-276, score-0.473]
88 The rich linguistically motivated latent variable learnt by our method delivers translation performance that compares favourably to a state-of-the-art system. [sent-281, score-0.598]
89 In this work we employ some of their grammar design principles for an immensely more complex grammar with millions of hierarchical latent structure rules and show how such grammar can be learnt and applied taking advantage of source language linguistic annotations. [sent-283, score-1.307]
90 6 Conclusions In this work we contribute a method to learn and apply latent hierarchical translation structure. [sent-284, score-0.516]
91 To this end, we take advantage of source-language linguistic annotations to motivate instead of constrain the translation process. [sent-285, score-0.48]
92 650 Instead of employing hierarchical phrase-pairs, we invest in learning the higher-order hierarchical synchronous structure behind translation, up to the full sentence length. [sent-289, score-0.58]
93 Future work directions include investigating the impact of hierarchical phrases for our models as well as any gains from additional features in the log-linear decoding model. [sent-291, score-0.261]
94 Smoothing the HR-SCFG grammar estimates could prove a possible source of further performance improvements. [sent-292, score-0.286]
95 by interpolating them with less sparse ones, could in the future lead to an additional increase in translation quality. [sent-296, score-0.282]
96 Finally, we discuss in this work how our method can already utilise hundreds of thousands of phrasepair labels and millions of structural rules. [sent-297, score-0.333]
97 A further promising direction is broadening this set with labels taking advantage of both source and targetlanguage linguistic annotation or categories exploring additional phrase-pair properties past the parse trees such as semantic annotations. [sent-298, score-0.317]
98 Scalable inference and training of context-rich syntactic translation models. [sent-357, score-0.282]
99 Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. [sent-371, score-0.561]
100 Phrase translation probabilities with ITG priors and smoothing as learning objective. [sent-414, score-0.368]
wordName wordTfidf (topN-words)
[('translation', 0.282), ('mylonakis', 0.25), ('reordering', 0.239), ('scfg', 0.206), ('grammar', 0.168), ('scfgs', 0.157), ('sima', 0.156), ('hierarchical', 0.146), ('synchronous', 0.14), ('whnp', 0.14), ('venugopal', 0.132), ('rules', 0.129), ('zollmann', 0.127), ('swapping', 0.126), ('linguistically', 0.117), ('decoding', 0.115), ('ntp', 0.114), ('lts', 0.111), ('employ', 0.107), ('sbar', 0.094), ('cri', 0.091), ('hbl', 0.091), ('osung', 0.091), ('latent', 0.088), ('derivation', 0.087), ('chart', 0.085), ('linguistic', 0.085), ('grammars', 0.084), ('emission', 0.083), ('employing', 0.082), ('source', 0.08), ('chiang', 0.078), ('motivated', 0.074), ('annotations', 0.074), ('structural', 0.071), ('labels', 0.071), ('arrive', 0.069), ('markos', 0.068), ('memorising', 0.068), ('mounting', 0.068), ('phrasepair', 0.068), ('structure', 0.066), ('rule', 0.063), ('recursive', 0.063), ('millions', 0.063), ('side', 0.062), ('die', 0.061), ('npp', 0.06), ('npr', 0.06), ('utilise', 0.06), ('spans', 0.058), ('overfitting', 0.057), ('lt', 0.055), ('span', 0.054), ('recursively', 0.053), ('syntax', 0.053), ('knight', 0.052), ('generalisation', 0.052), ('ppp', 0.052), ('degenerate', 0.052), ('towards', 0.05), ('monotone', 0.05), ('emitting', 0.049), ('galley', 0.047), ('target', 0.047), ('reorder', 0.047), ('joint', 0.047), ('association', 0.047), ('smoothing', 0.046), ('phenomena', 0.046), ('prune', 0.046), ('apl', 0.045), ('apr', 0.045), ('illc', 0.045), ('nprp', 0.045), ('uva', 0.045), ('whnpp', 0.045), ('khalil', 0.045), ('constraints', 0.045), ('parse', 0.042), ('sides', 0.042), ('decoder', 0.041), ('nonterminal', 0.041), ('huang', 0.041), ('probabilities', 0.04), ('generalise', 0.04), ('maxe', 0.04), ('joshua', 0.04), ('label', 0.039), ('advantage', 0.039), ('array', 0.038), ('kevin', 0.038), ('covering', 0.038), ('estimates', 0.038), ('probabilistic', 0.038), ('derivations', 0.037), ('delivers', 0.037), ('employment', 0.037), ('climbing', 0.037), ('preference', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
2 0.31018722 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
Author: Andreas Zollmann ; Stephan Vogel
Abstract: In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
3 0.26343763 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
Author: Bing Zhao ; Young-Suk Lee ; Xiaoqiang Luo ; Liu Li
Abstract: We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST08 evaluations by 1.3 absolute BLEU, which is statistically significant.
4 0.23530769 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu
Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.
5 0.23305908 266 acl-2011-Reordering with Source Language Collocations
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li
Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1
6 0.22659737 61 acl-2011-Binarized Forest to String Translation
7 0.21451923 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
8 0.20802659 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
9 0.20384164 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
10 0.20150819 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
11 0.19550243 30 acl-2011-Adjoining Tree-to-String Translation
12 0.19052181 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
13 0.18881717 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
14 0.18058689 44 acl-2011-An exponential translation model for target language morphology
15 0.18027309 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
16 0.17412472 263 acl-2011-Reordering Constraint Based on Document-Level Context
17 0.15803131 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
18 0.15769219 264 acl-2011-Reordering Metrics for MT
19 0.15230195 313 acl-2011-Two Easy Improvements to Lexical Weighting
20 0.14853394 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
topicId topicWeight
[(0, 0.35), (1, -0.303), (2, 0.161), (3, 0.005), (4, 0.058), (5, 0.016), (6, -0.222), (7, -0.034), (8, -0.048), (9, -0.023), (10, -0.043), (11, -0.087), (12, -0.002), (13, -0.029), (14, 0.041), (15, -0.037), (16, -0.059), (17, 0.042), (18, -0.03), (19, 0.036), (20, -0.034), (21, 0.043), (22, -0.026), (23, -0.116), (24, -0.077), (25, 0.106), (26, 0.054), (27, -0.072), (28, 0.017), (29, 0.042), (30, 0.013), (31, 0.048), (32, 0.074), (33, -0.064), (34, 0.055), (35, -0.042), (36, 0.032), (37, 0.019), (38, 0.031), (39, 0.039), (40, 0.062), (41, 0.033), (42, -0.03), (43, -0.015), (44, -0.004), (45, -0.063), (46, -0.039), (47, -0.016), (48, 0.047), (49, -0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.95925599 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
2 0.81991398 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
Author: Andreas Zollmann ; Stephan Vogel
Abstract: In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
3 0.81721455 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
Author: Bing Zhao ; Young-Suk Lee ; Xiaoqiang Luo ; Liu Li
Abstract: We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST08 evaluations by 1.3 absolute BLEU, which is statistically significant.
4 0.78852534 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
5 0.7775169 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
Author: Daniel Emilio Beck
Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
6 0.7770173 266 acl-2011-Reordering with Source Language Collocations
7 0.76774704 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
8 0.76690835 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
9 0.7625646 61 acl-2011-Binarized Forest to String Translation
10 0.74557513 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
11 0.73977721 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
12 0.7191658 44 acl-2011-An exponential translation model for target language morphology
13 0.71445709 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
14 0.71218812 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
15 0.71207941 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
16 0.67303181 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
17 0.66705728 154 acl-2011-How to train your multi bottom-up tree transducer
18 0.65515989 264 acl-2011-Reordering Metrics for MT
19 0.64317024 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
20 0.63669044 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
topicId topicWeight
[(5, 0.056), (17, 0.067), (26, 0.021), (31, 0.011), (37, 0.121), (39, 0.118), (41, 0.076), (53, 0.01), (55, 0.043), (59, 0.033), (72, 0.019), (78, 0.15), (91, 0.03), (96, 0.165)]
simIndex simValue paperId paperTitle
same-paper 1 0.89523256 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
2 0.8812713 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
Author: Dipanjan Das ; Noah A. Smith
Abstract: We describe a new approach to disambiguating semantic frames evoked by lexical predicates previously unseen in a lexicon or annotated data. Our approach makes use of large amounts of unlabeled data in a graph-based semi-supervised learning framework. We construct a large graph where vertices correspond to potential predicates and use label propagation to learn possible semantic frames for new ones. The label-propagated graph is used within a frame-semantic parser and, for unknown predicates, results in over 15% absolute improvement in frame identification accuracy and over 13% absolute improvement in full frame-semantic parsing F1 score on a blind test set, over a state-of-the-art supervised baseline.
3 0.83264428 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing
Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark
Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.
4 0.83008337 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
Author: Andreas Zollmann ; Stephan Vogel
Abstract: In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
5 0.82935297 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: We investigate full-scale shortest-derivation parsing (SDP), wherein the parser selects an analysis built from the fewest number of training fragments. Shortest derivation parsing exhibits an unusual range of behaviors. At one extreme, in the fully unpruned case, it is neither fast nor accurate. At the other extreme, when pruned with a coarse unlexicalized PCFG, the shortest derivation criterion becomes both fast and surprisingly effective, rivaling more complex weighted-fragment approaches. Our analysis includes an investigation of tie-breaking and associated dynamic programs. At its best, our parser achieves an accuracy of 87% F1 on the English WSJ task with minimal annotation, and 90% F1 with richer annotation.
6 0.82804209 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
7 0.82501042 182 acl-2011-Joint Annotation of Search Queries
8 0.82432425 316 acl-2011-Unary Constraints for Efficient Context-Free Parsing
9 0.82332206 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
10 0.82251954 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
11 0.8224718 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding
13 0.82037389 28 acl-2011-A Statistical Tree Annotator and Its Applications
14 0.81712341 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
15 0.81671691 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
16 0.81594217 192 acl-2011-Language-Independent Parsing with Empty Elements
17 0.81575763 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
18 0.81424338 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
19 0.81384861 238 acl-2011-P11-2093 k2opt.pdf
20 0.81242096 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction