acl acl2010 acl2010-87 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
Reference: text
sentIndex sentText sentNum sentScore
1 Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. [sent-4, score-0.33]
2 Extraction set models provide two principle advantages over word-factored alignment models. [sent-5, score-0.309]
3 Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1. [sent-8, score-0.419]
4 We present a discriminative model that directly predicts which set of phrasal translation rules should be extracted from a sentence pair. [sent-17, score-0.521]
5 Our model predicts extraction sets: combinatorial objects that include the set of all overlapping phrasal translation rules consistent with an underlying word-level alignment. [sent-18, score-0.711]
6 This approach provides additional discriminative power relative to word aligners because extraction sets are scored based on the phrasal rules they contain in addition to word-to-word alignment links. [sent-19, score-0.962]
7 Moreover, the structure of our model directly reflects the purpose of alignment models in general, which is to discover translation rules. [sent-20, score-0.448]
8 First, we would like to leverage existing word-level alignment resources. [sent-22, score-0.309]
9 To do so, we define a deterministic mapping from word alignments to extraction sets, inspired by existing extraction procedures. [sent-23, score-0.541]
10 In our mapping, possible alignment links have a precise interpretation that dictates what phrasal translation rules can be extracted from a sentence pair. [sent-24, score-0.876]
11 This mapping allows us to train with existing annotated data sets and use the predictions from word-level aligners as features in our extraction set model. [sent-25, score-0.341]
12 We optimize for a phrase-level F-measure in order to focus learning on the task of predicting phrasal rules rather than word alignment links. [sent-27, score-0.605]
13 Our model does not factor over disjoint wordto-word links or minimal phrase pairs, and so existing inference procedures do not directly apply. [sent-29, score-0.333]
14 However, we show that the dynamic program for a block ITG aligner can be augmented to score extraction sets that are indexed by underlying ITG word alignments (Wu, 1997). [sent-30, score-0.632]
15 c s 2o0c1ia0ti Aosnso focria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s453–1463, σ(ei) Figure 1: A word alignment A (shaded grid cells) dFeigfiunrees projections σ(ei) eanntd A σ(fj), dsehdow grnid as edlolst)ted lines for each word in each sentence. [sent-33, score-0.341]
16 The extraction set R3(A) includes all bispans licensed by these projections, )sh inocwlund as raollu bnidsepadn rectangles. [sent-34, score-0.502]
17 Our extraction set model outperforms both unsupervised and supervised word aligners at predicting word alignments and extraction sets. [sent-36, score-0.754]
18 , 2009) 2 Extraction Set Models The input to our model is an unaligned sentence pair, and the output is an extraction set of phrasal translation rules. [sent-42, score-0.572]
19 We first specify the relationship between word alignments and extraction sets, then define our model. [sent-44, score-0.376]
20 1 Extraction Sets from Word Alignments Rule extraction is a standard concept in machine translation: word alignment constellations license particular sets of overlapping rules, from which subsets are selected according to limits on phrase length (Koehn et al. [sent-46, score-0.681]
21 In this paper, we focus on phrasal rule extraction (i. [sent-49, score-0.435]
22 These types account fo2r0 1906年% of the possible alignment links in our data set. [sent-53, score-0.482]
23 a(flig)nment links A = {(i, j)} to an extraction set oalfi bispans Rn(A) = {[g, h) ⇔ [k, ‘)}, 月o5wn日he sreet each bispan lin(kAs target span [g, h) [tok sourc1e5 日span [k, ‘). [sent-54, score-0.906]
24 Let word ei project to the phrasal span σ(ei), where σ(ei) =? [sent-58, score-0.311]
25 Then, Rn(A) includes a bispan [g, h) ⇔ [k, ‘) iff σ(ei) ⊆ [k, ‘) ∀i ∈ [g, h) σ(fj) ⊆ [g, h) ∀j ∈ [k, ‘) That is, every word in one of the phrasal spans must project within the other. [sent-61, score-0.441]
26 This mapping is deterministic, and so we can interpret a word-level alignment A as also specifying the phrasal rules tahliagtn nsmhoeunltd A Abe a esxt arlsacote sdp efrciofmyi a sentence pair. [sent-62, score-0.605]
27 Possible links account for 22% of all alignment links in these data, and we found that most of these links fall into two categories. [sent-72, score-0.828]
28 Possible links are typically not included in extraction procedures because most aligners predict only sure links. [sent-80, score-0.585]
29 However, we see a natural interpretation for possible links in rule extraction: they license phrasal rules that both include and exclude them. [sent-81, score-0.546]
30 We restrict our model to score only co2We collected corpus frequencies of possible alignment link types ourselves on a sample of the hand-aligned data set. [sent-91, score-0.427]
31 σ(e1) σ(f2)OnFebruary152010PDT12 05月1日0年 Figure 3: Possible links constrain the word-tophrase projection of otherwise unaligned words, which in turn license overlapping phrases. [sent-92, score-0.363]
32 herent extraction sets Rn(A), those that are li- censed by an underlying w(oAr)d, alignment tA a ewi ltihsure alignments dAer(sly) ⊆ Ao. [sent-95, score-0.722]
33 The ITG constraint 1455 Figure 4: Above, we show a representative subset of the block alignment patterns that serve as terminal productions of the ITG that restricts the output space of our model. [sent-102, score-0.537]
34 These terminal productions cover up to n = 3 words in each sentence and include a mixture of sure (filled) and possible (striped) word-level alignment links. [sent-103, score-0.531]
35 However, the space of block ITG alignments is expressive enough to include the vast majority of patterns observed in handannotated parallel corpora (Haghighi et al. [sent-105, score-0.405]
36 Unlike previous work, we allow possible alignment links to appear in the block terminals, as depicted in Figure 4. [sent-109, score-0.599]
37 For each training example, MIRA requires that we find the alignment Am corresponding to the highest scoring enxmtreanctti Aon set Rn(Am) under the current model, Am = arg maxA∈ITG(e,f)θ · φ(A) (2) Section 4 describes our approach to solving this search problem for model inference. [sent-114, score-0.381]
38 Some handannotated alignments are outside of the block ITG model class. [sent-116, score-0.42]
39 Hence, we update toward the extraction set for a pσs(eeud)o-gold alignment Ag ∈ ITG(e, f) wetit hfo rmi ani pσms(eaeul dd)ois-tganoclde afrliogmn mtheen ttru Ae re∈ference alignment At. [sent-117, score-0.809]
40 1 Extraction Set Loss Function In order to focus learning on predicting the right bispans, we use an extraction-level loss L(Am; Ag) : an F-measure of the overlap between bispans Ain Rn(Am) and Rn(Ag) . [sent-128, score-0.379]
41 Optimizing for a bispan F-measure penalizes alignment mistakes in proportion to their rule extraction consequences. [sent-134, score-0.707]
42 That is, adding a word link that prevents the extraction of many correct phrasal rules, or which licenses many incorrect rules, is strongly discouraged by this loss. [sent-135, score-0.439]
43 2 Features on Extraction Sets The discriminative power of our model is driven by the features on sure word alignment links φa(i, j) and bispans φb(g, h, k, ‘). [sent-137, score-1.023]
44 To score word-to-word links, we use the posterior predictions of a jointly trained HMM alignment model (Liang et al. [sent-139, score-0.428]
45 To score phrasal translation rules in an extraction set, we use a mixture of feature types. [sent-142, score-0.601]
46 Extraction set models allow us to incorporate the same phrasal relative frequency statistics that drive phrase-based translation performance (Koehn et al. [sent-143, score-0.337]
47 To implement these frequency features, we extract a phrase table from the alignment predictions of a jointly trained unsupervised HMM model using Moses (Koehn et al. [sent-145, score-0.463]
48 Our feature set also includes bias features on phrasal rules and links, which control the number of null-aligned words and number of rules licensed. [sent-156, score-0.353]
49 Figure 5: gBoth posshible ITG decompositions of this example alignment will split one of the two highklighted bispans across constituents. [sent-159, score-0.646]
50 Although we have restricted Am ∈ ITG(e, f), our extraction set model does not fAacto∈r over ITG productions, and so the dy在namic programk = for a vanilla block ITG will not suff晚ic饭e to[ find Rn(Am). [sent-161, score-0.323]
51 An ITG decomposition of th后e un-[ derlying alignment imposes a hierarchical bracketing onl =ea4ch sentence, and some bispan in t我he ex-[ traction set for this alignment will cross any such bracketing. [sent-163, score-0.846]
52 o4 mTpheo smesod inetl score souf bA-d, ewrihviactiho scores Aextraancdtio An set Rn(A), decomposes over hA sLco raensd e AR, along wt Rith any phrasal bispans vliecren Ased by adjoining AL and AR. [sent-168, score-0.648]
53 θ · φ(A) = θ · φ(AL) +θ where I(AL, AR) is θ · · φ(AR) + I(AL, AR) P φ(g, h, k, l) summed over leic Ie(nAsed, bispans [g, hP) ⇔ [k, ‘) that overlap tohvee boundary b beitswpaenesn ALP Pa)n ⇔d A [kR,. [sent-169, score-0.337]
54 1457 g h lk Figure 6: Augmenting the ITG grammar states with the alignment configuration in an n − 1在 deep perimeter loigf nthmee bispan aiglluorwatsi us iton score −all 1 overlappkin =g2 phrasal rules introduced by adjoinin晚g 饭two bispans. [sent-174, score-0.94]
55 The state must encode whether a sure link appears in each edge column or row, but th后e spe- cific location of edge links is not required. [sent-175, score-0.319]
56 tTiohen state must represent (a) the specific alignment links in the nA 1r dedeinpn ecrorneIr of easclehp At, and (b) wlinhkesth ienr any sure alignments appear cinh t Ahe, rows or columns extending from those corners. [sent-177, score-0.804]
57 6 With this information, we can infer the bispans licensed by adjoining AL and AR, as in Figure 6. [sent-178, score-0.367]
58 This dynamic program is an instance of ITG bitext parsing, where the grammar uses symbols to encode the alignment contexts described above. [sent-180, score-0.373]
59 Maintaining the context necessary to score non-local bispans further increases running time. [sent-184, score-0.379]
60 We discard all states corresponding to bispans that are incompatible with 3 or more alignment links under an intersected HMM—a proven approach to pruning the space of ITG alignments (Zhang and Gildea, 2006; Haghighi et al. [sent-188, score-1.129]
61 The oracle alignment error rate for the block ITG model class is 1. [sent-191, score-0.467]
62 In the coarse pass, we search over the space of ITG alignments, but score only features on alignment links and bispans that are local to terminal blocks. [sent-198, score-0.992]
63 We then compute outside scores for bispans under a max-sum semiring (Goodman, 1996). [sent-200, score-0.337]
64 We order states on agendas by the sum of their inside score under the full model and the outside score computed in the + coarse pass, pruning all states not within the fixed agenda beam size. [sent-202, score-0.431]
65 3 Finding Pseudo-Gold ITG Alignments Equation 3 asks for the block ITG alignment Ag that is closest to a reference alignment At, Awhicthh may lnooste slite oin ITG(e,f). [sent-208, score-0.735]
66 W aleig nsemaercnht Afor 1458 Figure 7: A* search for pseudo-gold ITG alignments uses an admissible heuristic for bispans that counts the number of gold links outside of [k, ‘) but within [g, h). [sent-209, score-0.752]
67 Above, the heuristic is 1, which is also the minimal number of alignment errors that an ITG alignment will incur using this bispan. [sent-210, score-0.618]
68 Search states, which correspond to bispans [g, h) ⇔ [k, ‘), are scored by the number of errors w[g,ithhi)n ⇔ ⇔the [k bispan plus tehde bnyum thbee nru mofb (i, j) ∈ At swuicthhi nth taht j ∈ [k, ‘) u bsu tt hie ∈/ [g, h) (recall errors). [sent-212, score-0.539]
69 5 Relationship to Previous Work Our model is certainly not the first alignment approach to include structures larger than words. [sent-217, score-0.35]
70 Model-based phrase-to-phrase alignment was proposed early in the history of phrase-based translation as a method for training translation models (Marcu and Wong, 2002). [sent-218, score-0.505]
71 The ITG grammar formalism, the corresponding word alignment class, and inference procedures for the class have also been explored extensively (Wu, 1997; Zhang and Gildea, 2005; Cherry and Lin, 2007; Zhang et al. [sent-232, score-0.419]
72 At the intersection of these lines of work, discriminative ITG models have also been proposed, including one-to-one alignment models (Cherry and Lin, 2006) and block models (Haghighi et al. [sent-234, score-0.478]
73 Our model directly extends this research agenda with first-class possi- ble links, overlapping phrasal rule features, and an extraction-level loss function. [sent-236, score-0.485]
74 That work differs from ours in that it uses fixed word alignments and focuses on translation model estimation, while we focus on alignment and translate using standard relative frequency estimators. [sent-238, score-0.659]
75 Deng and Zhou (2009) present an alignment combination technique that uses phrasal features. [sent-239, score-0.548]
76 6 Experiments We evaluate our extraction set model by the bispans it predicts, the word alignments it generates, and the translations generated by two end-to-end systems. [sent-244, score-0.754]
77 We discriminatively trained a block ITG aligner with only sure links, using block terminal productions up to 3 words by 3 words in size. [sent-258, score-0.522]
78 To remain within the alignment class, MIRA updates this model toward a pseudogold alignment with only sure links. [sent-262, score-0.796]
79 We add possible links to the output of the block ITG model by adding the mixed terminal block productions described in Section 2. [sent-265, score-0.559]
80 This model scores overlapping phrasal rules contained within terminal blocks that result from including or excluding possible links. [sent-267, score-0.472]
81 However, this model does not score bispans that cross bracketing of ITG derivations. [sent-268, score-0.42]
82 Our full model in- cludes possible links and features on extraction sets for phrasal bispans with a maximum size of 3. [sent-270, score-1.019]
83 We performed our discriminative training and alignment evaluations using a hand-aligned portion of the NIST MT02 test set, which consists of 150 training and 191 test sentences (Ayan and Dorr, 2006). [sent-275, score-0.361]
84 We use the alignment error rate (AER) measure: precision is the fraction of sure links in the system output that are sure or possible in the reference, and recall is the fraction of sure links in the reference that the system outputs as sure. [sent-289, score-0.988]
85 For this evaluation, possible links produced by our extraction set models are ignored. [sent-290, score-0.338]
86 The second panel gives a phrasal rule-level evaluation, which measures the degree to which these aligners matched the extraction sets of handannotated alignments, R3 (At). [sent-292, score-0.595]
87 9 To compete fairly, all models were eval(uAated on the full extraction sets induced by the word alignments they predicted. [sent-293, score-0.44]
88 First, most of the information needed to predict an extraction set can be inferred from word links and phrasal rules contained within ITG terminal productions. [sent-297, score-0.692]
89 Second, the coarse-to-fine inference may be constraining the full phrasal model to predict similar output to the coarse model. [sent-298, score-0.388]
90 s642e Table 1: Experimental results demonstrate that the full extraction set model outperforms supervised and unsupervised baselines in evaluations of word alignment quality, extraction set quality, and translation. [sent-311, score-0.805]
91 3 Translation Experiments We evaluate the alignments predicted by our model using two publicly available, open-source, state-of-the-art phrase-based translation systems. [sent-316, score-0.35]
92 Both of these systems take word alignments as input, and neither of these systems accepts possible links in the alignments they consume. [sent-321, score-0.595]
93 To interface with our extraction set models, we produced three sets of sure-only alignments from our model predictions: one that omitted possible links, one that converted all possible links to sure links, and one that includes each possible link with 0. [sent-322, score-0.773]
94 The training set we used for MT experiments is quite heterogenous and noisy compared to our alignment test sets, and the supervised aligners did not handle certain sentence pairs in our parallel corpus well. [sent-325, score-0.477]
95 To account for these issues, we added counts of phrasal rules extracted from the baseline HMM to the counts produced by supervised aligners. [sent-327, score-0.335]
96 In Moses, our extraction set model predicts the set of phrases extracted by the system, and so the estimation techniques for the alignment model and translation model both share a common underlying representation: extraction sets. [sent-328, score-0.921]
97 In Joshua, hierarchical rule extraction is based upon phrasal rule extraction, but abstracts away sub-phrases to create a grammar. [sent-333, score-0.519]
98 7 Conclusion Our extraction set model serves to coordinate the alignment and translation model components of a statistical translation system by unifying their representations. [sent-338, score-0.752]
99 Moreover, our model provides an effective alternative to phrase alignment models that choose a particular phrase segmentation; instead, we predict many overlapping phrases, both large and small, that are mutually consistent. [sent-339, score-0.521]
100 Soft syntactic constraints for word alignment through discriminative training. [sent-367, score-0.361]
wordName wordTfidf (topN-words)
[('itg', 0.45), ('bispans', 0.337), ('alignment', 0.309), ('phrasal', 0.239), ('alignments', 0.211), ('rn', 0.203), ('bispan', 0.202), ('ag', 0.187), ('links', 0.173), ('extraction', 0.165), ('denero', 0.133), ('block', 0.117), ('sure', 0.111), ('aligners', 0.103), ('translation', 0.098), ('overlapping', 0.077), ('ayan', 0.074), ('haghighi', 0.068), ('koehn', 0.068), ('hmm', 0.063), ('mira', 0.06), ('terminal', 0.058), ('rules', 0.057), ('agenda', 0.055), ('cherry', 0.053), ('productions', 0.053), ('states', 0.053), ('discriminative', 0.052), ('handannotated', 0.051), ('null', 0.048), ('bleu', 0.048), ('phrase', 0.047), ('pruning', 0.046), ('license', 0.046), ('fraser', 0.046), ('chris', 0.045), ('ei', 0.043), ('moses', 0.043), ('joshua', 0.042), ('coarse', 0.042), ('taskar', 0.042), ('zhang', 0.042), ('loss', 0.042), ('score', 0.042), ('model', 0.041), ('chiang', 0.041), ('dan', 0.04), ('ar', 0.039), ('inference', 0.039), ('supervised', 0.039), ('marcu', 0.039), ('inversion', 0.038), ('projection', 0.038), ('grammar', 0.038), ('annual', 0.037), ('transduction', 0.037), ('klein', 0.037), ('sets', 0.037), ('predictions', 0.036), ('conference', 0.035), ('link', 0.035), ('al', 0.034), ('aligner', 0.034), ('predicts', 0.034), ('matchings', 0.034), ('procedures', 0.033), ('association', 0.033), ('daniel', 0.033), ('discriminatively', 0.032), ('projections', 0.032), ('fj', 0.032), ('hao', 0.032), ('rule', 0.031), ('search', 0.031), ('proceedings', 0.031), ('align', 0.031), ('moore', 0.03), ('unsupervised', 0.03), ('adjoining', 0.03), ('agendas', 0.03), ('tfe', 0.03), ('fazil', 0.03), ('necip', 0.03), ('span', 0.029), ('baselines', 0.029), ('unaligned', 0.029), ('aligned', 0.029), ('augmenting', 0.028), ('dorr', 0.028), ('ji', 0.028), ('away', 0.027), ('extractable', 0.027), ('full', 0.027), ('phrases', 0.027), ('och', 0.026), ('hierarchical', 0.026), ('parallel', 0.026), ('toward', 0.026), ('dynamic', 0.026), ('ain', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000017 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
2 0.48013783 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou
Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1
3 0.38016853 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
4 0.30380481 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell
Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
5 0.28399491 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
6 0.27424291 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
7 0.24555434 170 acl-2010-Letter-Phoneme Alignment: An Exploration
8 0.19153185 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
9 0.1538699 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
10 0.14985582 262 acl-2010-Word Alignment with Synonym Regularization
11 0.14604385 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
12 0.14168638 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
13 0.13622048 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
14 0.13538651 169 acl-2010-Learning to Translate with Source and Target Syntax
15 0.10800903 54 acl-2010-Boosting-Based System Combination for Machine Translation
16 0.10642552 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
17 0.10416067 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
18 0.10411645 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
19 0.10394874 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
20 0.10379012 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
topicId topicWeight
[(0, -0.292), (1, -0.38), (2, -0.054), (3, -0.011), (4, 0.079), (5, 0.101), (6, -0.179), (7, 0.095), (8, 0.148), (9, -0.153), (10, -0.115), (11, -0.147), (12, -0.177), (13, 0.077), (14, -0.069), (15, 0.019), (16, 0.018), (17, 0.012), (18, -0.109), (19, -0.031), (20, 0.061), (21, 0.078), (22, 0.031), (23, -0.047), (24, -0.069), (25, 0.046), (26, -0.049), (27, 0.095), (28, 0.026), (29, -0.01), (30, 0.002), (31, -0.026), (32, 0.005), (33, 0.055), (34, 0.076), (35, 0.043), (36, -0.031), (37, 0.055), (38, -0.009), (39, 0.028), (40, -0.094), (41, 0.107), (42, -0.012), (43, -0.02), (44, 0.04), (45, -0.034), (46, -0.01), (47, -0.056), (48, -0.035), (49, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.95592135 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
2 0.95483613 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou
Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1
3 0.86831152 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
4 0.84195989 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
5 0.82169765 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell
Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
6 0.81934488 170 acl-2010-Letter-Phoneme Alignment: An Exploration
7 0.69699812 262 acl-2010-Word Alignment with Synonym Regularization
8 0.65258628 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
9 0.61000246 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
10 0.59694403 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
11 0.54847372 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
12 0.51891059 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
13 0.46131176 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
14 0.39720017 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
15 0.39501956 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
16 0.38775989 180 acl-2010-On Jointly Recognizing and Aligning Bilingual Named Entities
17 0.36231551 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
18 0.35862225 169 acl-2010-Learning to Translate with Source and Target Syntax
19 0.35615242 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
20 0.34815261 54 acl-2010-Boosting-Based System Combination for Machine Translation
topicId topicWeight
[(14, 0.022), (16, 0.024), (25, 0.082), (33, 0.014), (39, 0.011), (42, 0.019), (44, 0.013), (59, 0.155), (73, 0.041), (76, 0.018), (78, 0.026), (80, 0.013), (83, 0.086), (84, 0.022), (94, 0.136), (98, 0.2)]
simIndex simValue paperId paperTitle
1 0.95146978 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
Author: Verena Rieser ; Oliver Lemon ; Xingkun Liu
Abstract: We present a novel approach to Information Presentation (IP) in Spoken Dialogue Systems (SDS) using a data-driven statistical optimisation framework for content planning and attribute selection. First we collect data in a Wizard-of-Oz (WoZ) experiment and use it to build a supervised model of human behaviour. This forms a baseline for measuring the performance of optimised policies, developed from this data using Reinforcement Learning (RL) methods. We show that the optimised policies significantly outperform the baselines in a variety of generation scenarios: while the supervised model is able to attain up to 87.6% of the possible reward on this task, the RL policies are significantly better in 5 out of 6 scenarios, gaining up to 91.5% of the total possible reward. The RL policies perform especially well in more complex scenarios. We are also the first to show that adding predictive “lower level” features (e.g. from the NLG realiser) is important for optimising IP strategies according to user preferences. This provides new insights into the nature of the IP problem for SDS.
same-paper 2 0.92987216 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
3 0.88482046 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Author: Xianpei Han ; Jun Zhao
Abstract: Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7% and 14.7%. 1
4 0.88140464 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
Author: Sujith Ravi ; Jason Baldridge ; Kevin Knight
Abstract: We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.
5 0.88033605 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
Author: Zhiyang Wang ; Yajuan Lv ; Qun Liu ; Young-Sook Hwang
Abstract: This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. Experimental results show that, the new criteria weeds out about 40% rules while with translation performance improvement, and the new feature brings an- other improvement to the baseline system, especially on larger corpus.
6 0.87893796 169 acl-2010-Learning to Translate with Source and Target Syntax
7 0.87773502 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
8 0.8770256 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
10 0.87459916 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
11 0.87375641 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
12 0.87328333 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
13 0.87180567 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
14 0.87176442 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
15 0.87173831 133 acl-2010-Hierarchical Search for Word Alignment
16 0.87044895 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
17 0.86950773 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
18 0.86929107 79 acl-2010-Cross-Lingual Latent Topic Extraction
19 0.86870527 114 acl-2010-Faster Parsing by Supertagger Adaptation
20 0.86837894 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts