acl acl2013 acl2013-363 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. [sent-5, score-0.391]
2 In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. [sent-6, score-0.755]
3 We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. [sent-7, score-0.172]
4 We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. [sent-8, score-0.176]
5 1 Introduction Long distance reordering remains one of the greatest challenges in Statistical Machine Translation (SMT) research. [sent-12, score-0.199]
6 The challenge stems from the fact that an accurate reordering hinges upon the model’s ability to make many local and global reordering decisions accurately. [sent-13, score-0.524]
7 Often, such reordering decisions require contexts that span across multiple translation units. [sent-14, score-0.385]
8 1 Unfortunately, previous approaches fall short in capturing such cross-unit contextual information that could be 1We define translation units as phrases in phrase-based SMT, and as translation rules in syntax-based SMT. [sent-15, score-0.168]
9 Specifically, the popular distortion or lexicalized reordering models in phrase- based SMT focus only on making good local prediction (i. [sent-17, score-0.23]
10 predicting the orientation of immediate neighboring translation units), while translation rules in syntax-based SMT come with a strong context-free assumption, which model only the reordering within the confine of the rules. [sent-19, score-0.653]
11 In this paper, we argue that reordering modeling would greatly benefit from richer cross-boundary contextual information We introduce a reordering model that incorporates such contextual information, named the TwoNeighbor Orientation (TNO) model. [sent-20, score-0.529]
12 We first identify anchors as regions in the source sentences around which ambiguous reordering patterns frequently occur and chunks as regions that are consistent with word alignment which may span multiple translation units at decoding time. [sent-21, score-0.763]
13 Most notably, anchors and chunks in our model may not necessarily respect the boundaries of translation units. [sent-22, score-0.495]
14 Then, we jointly model the orientations of chunks that immediately precede and follow the anchors (hence, the name “two-neighbor”) along with the maximal span of these chunks, to which we refer as Maximal Orientation Span (MOS). [sent-23, score-0.509]
15 As we will elaborate further in next sections, our models provide a stronger mechanism to make more accurate global reordering decisions for the following reasons. [sent-24, score-0.294]
16 First of all, we consider the orientation decisions on both sides of the anchors simultaneously, in contrast to existing works that only consider one-sided decisions. [sent-25, score-0.534]
17 In this way, we hope to upgrade the unigram formulation of existing reordering models to a higher order formulation. [sent-26, score-0.199]
18 Second of all, we capture the reordering of chunks that may cross translation units and may be composed of multiple units, in contrast to ex1264 Proce dingsS o f ita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h. [sent-27, score-0.392]
19 Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1264–1274, isting works that focus on the reordering between individual translation units. [sent-29, score-0.267]
20 In effect, MOS acts as a global reordering parameter that guides or constrains the underlying local reordering decisions. [sent-30, score-0.468]
21 To show the effectiveness of our model, we integrate our TNO model into a state-of-theart syntax-based SMT system, which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation. [sent-31, score-0.279]
22 However as mentioned earlier, the context-free assumption ingrained in the syntax-based formalism often limits the model’s ability to influence global reordering decision that involves cross-boundary contexts. [sent-33, score-0.238]
23 In integrating TNO, we hope to strengthen syntax-based system’s ability to make more accurate global reordering decisions. [sent-34, score-0.238]
24 We implement an efficient shift-reduce algorithm that facilitates the accumulation of partial context in a bottom-up fashion, allowing our model to influence the translation process even in the absence of full context. [sent-37, score-0.132]
25 We show the efficacy of our proposal in a largescale Chinese-to-English translation task where the introduction of our TNO model provides a significant gain over a state-of-the-art string-todependency SMT system (Shen et al. [sent-38, score-0.108]
26 We define anchors as chunks, around which ambiguous reordering patterns frequently occur. [sent-50, score-0.423]
27 Then, the orientation ∀CL = (fjj34/eii34) = (fjj56/eii56) ∈ CL(a) : j4 of CL and CR are OL(CL, a) and OR(CR, a) respectively and each may take one of the following four orientation values (similar to (Nagata et al. [sent-55, score-0.508]
28 The first clause (monotone, reverse) i3 for OL (1) indicates whether the target order of the chunks follows the source order; the second (adjacent, gap) indicates whether the chunks are adjacent or separated by an intervening phrase when projected. [sent-61, score-0.321]
29 1 and 2, we can infer that a has three left neighbors and four right neighbors, i. [sent-68, score-0.158]
30 1, we can compute the orientation values of each of these neighbors, which are OL (CL (a) , a) = RG, RA, RA and OR(CR(a), a) = RG, RA, RA, RA. [sent-72, score-0.254]
31 As shown, mos(tC of the neighbors have Reverse Adjacent (RA) orientation except for the smallest left and right neighbors (i. [sent-73, score-0.472]
32 To make the TNO∈ ∈m oCdel more tractable, we simplify the TNO model to consider only the largest left and right neighbors, referred to as the Maximal Orientation Span/MOS (M). [sent-79, score-0.138]
33 More formally, given a = the left and the right MOS of a are: (fjj12/eii12), ML (a) = arg max (j4 − j3) (fjj34/eii34)∈CL(a) MR(a) = arg max (j6 − j5) (fjj56/eii56)∈CR(a) Coming back to our example, the left and right MOS of the anchor are ML(a) = (f36/e181) and MR(a) = (f811/e36). [sent-80, score-0.298]
34 Beyond simplifying the computation, the key benefit of modeling MOS is that it serves as a global parameter that can guide or constrain underlying local reorderings. [sent-86, score-0.097]
35 i Xd→hX1de7shaoshu8 guojia9 zhi10 yi11, one11of10the few8 countries9 that7X1i This set of hierarchical phrases represents a translation model that has resolved all local ambiguities (i. [sent-90, score-0.184]
36 local reordering and lexical mappings) except for the spans of the hierarchical phrases. [sent-92, score-0.317]
37 With this example, we want to show that accurate local decisions (rather obviously) don’t always lead to accurate global reordering and to demonstrate that explicit MOS modeling can play a crucial role to address this issue. [sent-93, score-0.352]
38 To do so, we will again focus on the same anchor de (that). [sent-94, score-0.102]
39 3We use hierarchical phrase-based translation system as a case in point, but the merit is generalizable to other systems. [sent-95, score-0.113]
40 6iiwith3North4Koreaii Table 2: Derivation of Xa ≺ Xb ≺ Xd As the rule’s identifier, we attach an alphabet letter to each rule’s left hand side, as such the anchor de (that) appears in rule Xd. [sent-100, score-0.213]
41 We also attach the word indices as the superscript of the source words and project the indices to the target words aligned, as such “have5” suggests that the word “have” is aligned to the 5-th source word, i. [sent-101, score-0.149]
42 The application of the rules would show that the first derivation will produce an incorrect reordering while the last two will produce the correct ones. [sent-112, score-0.258]
43 Here, we would like to point out that even in this simple example where all local decisions are made accurate, this ambiguity occurs and it would occur even more so in the real translation task where local decisions may be highly inaccurate. [sent-113, score-0.242]
44 Particularly, we want to show that the MOS generated by the incorrect derivation does not match the MOS learnt from Fig. [sent-116, score-0.092]
45 Running the same MOS extraction procedure on both derivations would produce the right MOS that agrees with the right MOS previously learnt from Fig. [sent-123, score-0.153]
46 As shown, the incorrect derivation produces a left MOS that spans six words, i. [sent-128, score-0.156]
47 (f16/e61), while the correct derivation produces a left MOS that spans four words, i. [sent-130, score-0.156]
48 Clearly, the MOS of the incorrect derivation doesn’t agree with the MOS we learnt from Fig. [sent-133, score-0.092]
49 This suggests that explicit MOS modeling would provide a mechanism for resolving crucial global reordering ambiguities that are beyond the ability of local models. [sent-135, score-0.296]
50 In Tables 1 and 2’s full derivations, we indicate rule boundaries explicitly by indexing the angle brackets, e. [sent-137, score-0.094]
51 As the anchor appears in Xd, we 1267 highlight its boundaries in box frames. [sent-142, score-0.14]
52 de (that)’s MOS respects rule boundaries if and only if all ×× the words come entirely from Xd’s antecedent or hd and id appears outside of MOS; otherwise it crosses ithe rule boundaries. [sent-143, score-0.15]
53 As clearly shown in Table 2, the left MOS of the correct derivation (underlined) crosses the rule boundary (of Xd) since hd appears within the MOS. [sent-144, score-0.17]
54 Model 2 is designed to address the deficiency of Model 1 since Model 1may assign non-zero probability to improbable assignment of orientation values, e. [sent-166, score-0.254]
55 Monotone Adjacent for the left neighbor and Reverse Adjacent for the right neighbor. [sent-168, score-0.125]
56 Before describing the specifics, we start by describing the procedure to extract anchors and their corresponding MOS from training data, from which we collect statistics and extract features to train the model. [sent-173, score-0.224]
57 For each aligned sentence pair (F, E, ∼) in the training data, gthneed training st paaritrs wF,itEh ∼the) nid tehnetification of the regions in the source sentences as anchors (A). [sent-174, score-0.282]
58 For our Chinese-English experiments, we use a simple h Cehuirnisestiec- Ethnagtl equates as anchors, single-word chunks whose corresponding word class belongs to closed-word classes, bearing a close resemblance to (Setiawan et al. [sent-175, score-0.125]
59 Next we generate all possible chunks ∆(Θ) as previously described in Sec. [sent-178, score-0.125]
60 We then define a function MinC(∆, j1,j2) which returns the shortest chunk that can span from j1 to j2. [sent-180, score-0.106]
61 If ∈ ∆, then MinC returns The algorithm to extract MOS takes ∆ and an anchor a = as input; and outputs the chunk that qualifies as MOS or none. [sent-181, score-0.146]
62 (fjj12/eii12) provides the algorithm to extract the right MOS; the algorithm to extract the left MOS is identical to Alg. [sent-184, score-0.098]
63 1, except that it scans for chunks to the left of the anchor. [sent-185, score-0.18]
64 1268 To estimate POL and POR, we train discriminative classifiers that predict the orientation values and use the normalized posteriors at decoding time as additional feature scores in SMT’s log linear framework. [sent-189, score-0.293]
65 anchor-related: slex (the actual word of spos (part-of-speech (POS) tag of slex), sparent (spos’s parent in the parse tree), tlex ’s actual target word). [sent-192, score-0.371]
66 surrounding: (the previous word / rslex (the next word / lspos (ls−le1x’s POS tag), rspos (rslex’s POS tag), l sparent (ls lex’s parent), rsparent fj 11−−11), lslex fj 22++11), (rslex’s parent). [sent-195, score-0.129]
67 conjunction of the lexical of the anchor and the lexical of the left and the right anchors. [sent-205, score-0.2]
68 To avoid the sparsity issue, we represent ML as (mosl int spos,mosl ext spos) and MR as (mosr int spos,mosr ext spos). [sent-213, score-0.292]
69 We condition PML and PMR only on spos and the orientation, estimating them as follow: P(ML|spos,OL) =NN(M(sL,psopso,sO,LO)L) P(MR|spos,OR) =N(NM(sR,psopso,sO,RO)R) where N returns the count of the events in the training data. [sent-214, score-0.164]
70 iScainbclee many reordering decisions may have been made at the earlier stages, the late application of TNO model would limit the utility of the model. [sent-219, score-0.295]
71 The shift operation advances the input stream by one symbol and push the symbol into the stack; while the reduce operation applies some reduction rule to the topmost elements of the stack. [sent-223, score-0.148]
72 In our case, the input stream is the target string of the rule and the symbol is the corresponding source index of the elements of the target string. [sent-225, score-0.167]
73 The reduction rule looks at two indices and merge them if they are adjacent (i. [sent-226, score-0.145]
74 It then projects the source index to the corresponding target word and then enumerates the target string in a left to right fashion. [sent-232, score-0.15]
75 For example, when the algorithm reads Xd at line (6), it pushes the entire stack from line (5). [sent-237, score-0.108]
76 For example, the orientation values of de (that)’s left neighbor is always RA. [sent-245, score-0.336]
77 This statement holds, even though at the end of Section 2, we stated that de (that)’s left neighbor may have other orientation values, i. [sent-246, score-0.336]
78 The formal proof is omitted, but the intuition comes from the fact that the derivations for SCFG-based translation are subset of ∆(Θ) and that (f66/e99) will never become ML for MinC(CL(a) , a) respectively (chunk that spans a and CL). [sent-249, score-0.144]
79 In addition to the standard features including the rule translation probabilities, we incorporate features that are found useful for developing a state-of-the-art baseline, such as the provenance features (Chiang et al. [sent-256, score-0.124]
80 As the backbone of our string-to-dependency system, we train 3-gram models for left and right dependencies and unigram for head using the target side of the bilingual training data. [sent-260, score-0.124]
81 As shown, the empirical results confirm our intuition that SMT can greatly benefit from reordering model that incorporate cross-unit contextual information. [sent-280, score-0.271]
82 We conjecture that the weblog text has a more ambiguous orientation span that are more challenging to learn. [sent-299, score-0.365]
83 Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrasebased SMT (Koehn et al. [sent-308, score-0.279]
84 UOM views reordering as a process of generating (b, o) in a left-to-right fashion, where b is the current phrase pair and o is the orientation of b with the previously generated phrase pair b0. [sent-310, score-0.453]
85 Our MOS concept is also closely related to hierarchical reordering model (Galley and Manning, 2008) in phrase-based decoding, which computes o of b with respect to a multi-block unit that may go beyond b0. [sent-320, score-0.284]
86 They mainly use it to avoid overestimating “discontiguous” orientation but fall short in modeling the multi-block unit, perhaps due to data sparsity issue. [sent-321, score-0.281]
87 Our MOS is also closely related to the efforts of modeling the span of hierarchical phrases in formally syntax-based SMT. [sent-322, score-0.134]
88 Recent work couples span modeling tightly with reordering decisions, either by adding an additional feature for each hierarchical phrase (Chiang et al. [sent-327, score-0.333]
89 In equating anchors with the function word class, our work, particularly Model 1, is closely related to the function word-centered model of Setiawan et al. [sent-334, score-0.264]
90 The goal of PM is to reorder the input sentence F into F0 whose order is closer to the target language order, whereas the goal of our model is to directly reorder F into the target language order. [sent-342, score-0.092]
91 9 Conclusion We presented a novel approach to address a kind of long-distance reordering that requires global cross-boundary contextual information. [sent-344, score-0.27]
92 Our approach, which we formulate as a Two-Neighbor Orientation model, includes the joint modeling of two orientation decisions and the modeling of the maximal span of the reordered chunks through the concept of Maximal Orientation Span. [sent-345, score-0.609]
93 Empirical results confirm our intuition that incorporating crossboundaries contextual information improves translation quality. [sent-347, score-0.1]
94 In the future, we hope to continue this line of research, perhaps by learning to identify anchors automatically from training data, incorporating a richer set of linguistics features such as dependency structure and strengthening the modeling of Maximal Orientation Span. [sent-349, score-0.284]
95 Source-side dependency tree reordering models with subtree movements and constraints. [sent-354, score-0.199]
96 Automatically learning sourceside reordering rules for large scale machine translation. [sent-402, score-0.199]
97 Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. [sent-413, score-0.113]
98 A clustered global phrase reordering model for statistical machine translation. [sent-425, score-0.278]
99 A new string-to-dependency machine translation algorithm with a target dependency language model. [sent-445, score-0.094]
100 Discrimina- tive reordering models for statistical machine translation. [sent-496, score-0.199]
wordName wordTfidf (topN-words)
[('mos', 0.567), ('tno', 0.263), ('orientation', 0.254), ('anchors', 0.224), ('reordering', 0.199), ('mr', 0.165), ('spos', 0.164), ('pol', 0.136), ('ml', 0.129), ('cl', 0.128), ('chunks', 0.125), ('ol', 0.125), ('slex', 0.119), ('cr', 0.119), ('xd', 0.111), ('anchor', 0.102), ('xb', 0.092), ('mosl', 0.089), ('mosr', 0.089), ('por', 0.085), ('smt', 0.082), ('ext', 0.08), ('pml', 0.079), ('pmr', 0.075), ('translation', 0.068), ('int', 0.066), ('setiawan', 0.065), ('span', 0.062), ('neighbors', 0.06), ('minc', 0.06), ('xa', 0.06), ('derivation', 0.059), ('maximal', 0.058), ('rule', 0.056), ('decisions', 0.056), ('xc', 0.055), ('left', 0.055), ('xiong', 0.054), ('ter', 0.052), ('bleu', 0.05), ('weblog', 0.049), ('hierarchical', 0.045), ('rslex', 0.045), ('adjacent', 0.045), ('indices', 0.044), ('reverse', 0.044), ('chunk', 0.044), ('right', 0.043), ('spans', 0.042), ('stack', 0.042), ('haizhou', 0.04), ('uom', 0.04), ('shen', 0.04), ('model', 0.04), ('global', 0.039), ('decoding', 0.039), ('boundaries', 0.038), ('monotone', 0.036), ('rg', 0.035), ('ra', 0.035), ('aligned', 0.035), ('conditioning', 0.034), ('derivations', 0.034), ('symbol', 0.033), ('line', 0.033), ('hendra', 0.033), ('learnt', 0.033), ('contextual', 0.032), ('association', 0.032), ('tag', 0.032), ('local', 0.031), ('newswire', 0.03), ('zollmann', 0.03), ('deyi', 0.03), ('bom', 0.03), ('ifr', 0.03), ('koreaii', 0.03), ('lanchorslex', 0.03), ('ptno', 0.03), ('ranchorslex', 0.03), ('sparent', 0.03), ('twoneighbor', 0.03), ('chiang', 0.029), ('pos', 0.028), ('libin', 0.028), ('modeling', 0.027), ('fj', 0.027), ('neighbor', 0.027), ('tromble', 0.026), ('singapore', 0.026), ('stream', 0.026), ('target', 0.026), ('marton', 0.026), ('facilitates', 0.024), ('confine', 0.024), ('visweswariah', 0.024), ('intersects', 0.024), ('min', 0.024), ('regions', 0.023), ('boulder', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
2 0.17608009 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan
Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
3 0.16528866 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney
Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.
4 0.15595639 166 acl-2013-Generalized Reordering Rules for Improved SMT
Author: Fei Huang ; Cezar Pendus
Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1
5 0.12934498 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
Author: ThuyLinh Nguyen ; Stephan Vogel
Abstract: Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. We propose an extension of Hiero called PhrasalHiero to address Hiero’s second problem. Phrasal-Hiero still has the same hypothesis space as the original Hiero but incorporates a phrase-based distance cost feature and lexicalized reodering features into the chart decoder. The work consists of two parts: 1) for each Hiero translation derivation, find its corresponding dis- continuous phrase-based path. 2) Extend the chart decoder to incorporate features from the phrase-based path. We achieve significant improvement over both Hiero and phrase-based baselines for ArabicEnglish, Chinese-English and GermanEnglish translation.
6 0.11613262 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
7 0.097472221 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
8 0.094750904 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
9 0.094301268 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
10 0.090583913 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
11 0.083747759 314 acl-2013-Semantic Roles for String to Tree Machine Translation
12 0.080831707 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric
13 0.078363426 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
14 0.077447437 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
15 0.077065237 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
16 0.074159786 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
17 0.073922612 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
18 0.073199041 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
19 0.07145986 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
20 0.067050859 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
topicId topicWeight
[(0, 0.181), (1, -0.118), (2, 0.086), (3, 0.084), (4, -0.046), (5, 0.047), (6, 0.029), (7, 0.006), (8, -0.006), (9, 0.05), (10, -0.027), (11, 0.036), (12, 0.001), (13, 0.001), (14, 0.044), (15, 0.032), (16, 0.086), (17, 0.043), (18, 0.002), (19, 0.009), (20, -0.085), (21, 0.007), (22, 0.032), (23, -0.104), (24, 0.047), (25, 0.006), (26, -0.017), (27, -0.083), (28, -0.151), (29, -0.003), (30, 0.001), (31, -0.007), (32, 0.013), (33, -0.011), (34, -0.004), (35, 0.002), (36, 0.033), (37, 0.03), (38, -0.021), (39, 0.05), (40, 0.044), (41, -0.071), (42, -0.016), (43, -0.022), (44, -0.063), (45, -0.001), (46, 0.062), (47, -0.011), (48, 0.018), (49, -0.016)]
simIndex simValue paperId paperTitle
1 0.90258831 166 acl-2013-Generalized Reordering Rules for Improved SMT
Author: Fei Huang ; Cezar Pendus
Abstract: We present a simple yet effective approach to syntactic reordering for Statistical Machine Translation (SMT). Instead of solely relying on the top-1 best-matching rule for source sentence preordering, we generalize fully lexicalized rules into partially lexicalized and unlexicalized rules to broaden the rule coverage. Furthermore, , we consider multiple permutations of all the matching rules, and select the final reordering path based on the weighed sum of reordering probabilities of these rules. Our experiments in English-Chinese and English-Japanese translations demonstrate the effectiveness of the proposed approach: we observe consistent and significant improvement in translation quality across multiple test sets in both language pairs judged by both humans and automatic metric. 1
same-paper 2 0.89084387 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
3 0.87831599 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Author: Karthik Visweswariah ; Mitesh M. Khapra ; Ananthakrishnan Ramanathan
Abstract: Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available. The main challenge we tackle is to generate quality data for training the reordering model in spite of the machine align- ments being noisy. To mitigate the effect of noisy machine alignments, we propose a novel approach that improves reorderings produced given noisy alignments and also improves word alignments using information from the reordering model. This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner. The data generated allows us to train a reordering model that gives an improvement of 1.8 BLEU points on the NIST MT-08 Urdu-English evaluation set over a reordering model that only uses manual word alignments, and a gain of 5.2 BLEU points over a standard phrase-based baseline.
4 0.87509733 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
Author: ThuyLinh Nguyen ; Stephan Vogel
Abstract: Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. We propose an extension of Hiero called PhrasalHiero to address Hiero’s second problem. Phrasal-Hiero still has the same hypothesis space as the original Hiero but incorporates a phrase-based distance cost feature and lexicalized reodering features into the chart decoder. The work consists of two parts: 1) for each Hiero translation derivation, find its corresponding dis- continuous phrase-based path. 2) Extend the chart decoder to incorporate features from the phrase-based path. We achieve significant improvement over both Hiero and phrase-based baselines for ArabicEnglish, Chinese-English and GermanEnglish translation.
5 0.82628351 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney
Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.
6 0.82566082 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
7 0.79796392 125 acl-2013-Distortion Model Considering Rich Context for Statistical Machine Translation
8 0.58425015 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
9 0.55214763 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
10 0.5503996 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
11 0.5469088 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
12 0.54442835 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
13 0.54359126 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
14 0.53784376 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
15 0.5333342 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
16 0.51426148 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
17 0.49157611 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
18 0.46993816 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
19 0.46874499 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
20 0.46408552 314 acl-2013-Semantic Roles for String to Tree Machine Translation
topicId topicWeight
[(0, 0.04), (6, 0.047), (11, 0.047), (14, 0.02), (24, 0.05), (26, 0.04), (28, 0.291), (35, 0.061), (42, 0.084), (48, 0.032), (70, 0.047), (88, 0.03), (90, 0.049), (95, 0.076)]
simIndex simValue paperId paperTitle
1 0.94290924 349 acl-2013-The mathematics of language learning
Author: Andras Kornai ; Gerald Penn ; James Rogers ; Anssi Yli-Jyra
Abstract: unkown-abstract
same-paper 2 0.80618447 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang ; Libin Shen
Abstract: Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries. We explicitly model the longest span of such chunks, referred to as Maximal Orientation Span, to serve as a global parameter that constrains underlying local decisions. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
3 0.79945707 124 acl-2013-Discriminative state tracking for spoken dialog systems
Author: Angeliki Metallinou ; Dan Bohus ; Jason Williams
Abstract: In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states. Current approaches based on generative or discriminative models have different but important shortcomings that limit their accuracy. In this paper we discuss these limitations and introduce a new approach for discriminative state tracking that overcomes them by leveraging the problem structure. An offline evaluation with dialog data collected from real users shows improvements in both state tracking accuracy and the quality of the posterior probabilities. Features that encode speech recognition error patterns are particularly helpful, and training requires rel- atively few dialogs.
4 0.76332122 107 acl-2013-Deceptive Answer Prediction with User Preference Graph
Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai
Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.
5 0.74498814 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
Author: Xuezhe Ma ; Fei Xia
Abstract: In this paper, we propose a simple and effective approach to domain adaptation for dependency parsing. This is a feature augmentation approach in which the new features are constructed based on subtree information extracted from the autoparsed target domain data. To demonstrate the effectiveness of the proposed approach, we evaluate it on three pairs of source-target data, compared with several common baseline systems and previous approaches. Our approach achieves significant improvement on all the three pairs of data sets.
6 0.7407285 96 acl-2013-Creating Similarity: Lateral Thinking for Vertical Similarity Judgments
7 0.60465294 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
8 0.55303282 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
9 0.53739345 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
10 0.53412986 250 acl-2013-Models of Translation Competitions
11 0.53169006 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory
12 0.52521074 328 acl-2013-Stacking for Statistical Machine Translation
13 0.52253282 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
14 0.52163959 267 acl-2013-PARMA: A Predicate Argument Aligner
15 0.51831782 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
16 0.51689768 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
17 0.51629943 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
18 0.51549697 332 acl-2013-Subtree Extractive Summarization via Submodular Maximization
20 0.51190305 335 acl-2013-Survey on parsing three dependency representations for English