acl acl2011 acl2011-263 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
Reference: text
sentIndex sentText sentNum sentScore
1 j p i , , Abstract One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. [sent-5, score-0.562]
2 In this paper, we propose a method of imposing reordering constraints using document-level context. [sent-6, score-0.512]
3 As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. [sent-7, score-0.442]
4 Given a source sentence, zones which cover the noun phrases are used as reordering constraints. [sent-8, score-1.016]
5 Then, in decoding, reorderings which violate the zones are restricted. [sent-9, score-0.65]
6 Experiment results for patent translation tasks show a significant improvement of 1. [sent-10, score-0.636]
7 1 Introduction Phrase-based statistical machine translation is useful for translating between languages with similar word orders. [sent-13, score-0.234]
8 However, it has problems with longdistance reordering when translating between languages with different word orders, such as JapaneseEnglish. [sent-14, score-0.402]
9 These problems are especially crucial when translating long sentences, such as patent sentences, because many combinations of word orders cause high computational costs and low translation quality. [sent-15, score-0.748]
10 These include methods where source sentences are divided into syntactic chunks or clauses and the translations are merged later (Koehn and 434 Knight, 2003; Sudoh et al. [sent-17, score-0.06]
11 , 2010), methods where syntactic constraints or penalties for reordering are added to a decoder (Yamamoto et al. [sent-18, score-0.448]
12 However, these methods did not use document-level context to constrain reorderings. [sent-22, score-0.066]
13 We think it is a promising clue to improving translation quality. [sent-24, score-0.192]
14 In this paper, we propose a method where reordering constraints are added to a decoder using document-level context. [sent-25, score-0.478]
15 As the document-level context, we use noun phrases which significantly occur in context documents containing source sentences. [sent-26, score-0.417]
16 Given a source sentence, zones which cover the noun phrases are used as reordering constraints. [sent-27, score-1.016]
17 Then, in decoding, reorderings which violate the zones are restricted. [sent-28, score-0.65]
18 By using document-level context, contextually-appropriate reordering constraints are preferentially considered. [sent-29, score-0.397]
19 As a result, the translation quality and speed can be improved. [sent-30, score-0.195]
20 Experiment results for the NTCIR-8 patent translation tasks show a significant improvement of 1. [sent-31, score-0.636]
21 2 Patent Translation Patent translation is difficult because of the amount of new phrases and long sentences. [sent-34, score-0.361]
22 Since a patent document explains a newly-invented apparatus or method, it contains many new phrases. [sent-35, score-0.524]
23 Learning phrase translations for these new phrases from the Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-36, score-0.262]
24 Baseline outputan interlayer insulating film 12 is formed on the surface of a semiconductor substrate 10 , a pad electrode 11 via a first insulating film . [sent-39, score-1.187]
25 。 Source + Zoneパッド 電極 1 1 は 、 第 1 の 絶縁 膜 で ある 層間 絶 縁 膜 1 2 を 介 し て 半導体 基板 1 0 の 表面 に 形成 さ れ て い い る Proposed outputpad electrode 11 is formed on the surface of the semiconductor substrate 10 through the interlayer insulating film 12 of the first insulating film . [sent-40, score-1.157]
26 training corpora is difficult because these phrases occur only in that patent specification. [sent-42, score-0.62]
27 Therefore, when translating such phrases, a decoder has to combine multiple smaller phrase translations. [sent-43, score-0.216]
28 Moreover, sentences in patent documents tend to be long. [sent-44, score-0.479]
29 This results in a large number of combinations of phrasal reorderings and a degradation of the translation quality and speed. [sent-45, score-0.336]
30 Table 1 shows how a failure in phrasal reordering can spoil the whole translation. [sent-46, score-0.287]
31 In the baseline output, the translation of “第 1 の 絶縁 膜 で あ る 層間 絶縁 膜 1 2 ” (an interlayer insulation film 12 that is a first insulation film) is divided into two blocks, “an interlayer insulating film 12” and “a first insulating film”. [sent-47, score-1.408]
32 In this case, a reordering constraint to translate “第 1 の 絶縁 膜 で ある 層間 絶縁 膜 1 2 ” as a single block can reduce incorrect reorderings and improve the translation quality. [sent-48, score-0.649]
33 Therefore, how to specify ranges for reordering constraints is a very important problem. [sent-50, score-0.43]
34 We propose a solution for this problem that uses the very nature of patent documents themselves. [sent-51, score-0.479]
35 3 Proposed Method In order to address the aforementioned problem, we propose a method for specifying phrases in a source sentence which are assumed to be translated as single blocks using document-level context. [sent-52, score-0.499]
36 When translating a document, for example a patent specification, we first extract coherent phrase candidates from the document. [sent-54, score-1.053]
37 Then, when translating each sentence in the document, we set zones which cover the coher435 ent phrase candidates and restrict reorderings which violate the zones. [sent-55, score-1.012]
38 1 Coherent phrases in patent documents As mentioned in the previous section, specifying coherent phrases is difficult when using only one source sentence. [sent-57, score-1.326]
39 However, we have observed that document-level context can be a clue for specifying coherent phrases. [sent-58, score-0.543]
40 In a patent specification, for example, noun phrases which indicate parts of the invention are very important noun phrases. [sent-59, score-0.869]
41 Since this is not language dependent, in other words, this noun phrase is always a part of the invention in any other language, this noun phrase should be translated as a single block in every language. [sent-61, score-0.49]
42 In this way, important phrases in patent documents are assumed to be coherent phrases. [sent-62, score-0.955]
43 We therefore treat the problem of specifying coherent phrases as a problem of specifying important 絶 phrases, and we use these phrases as constraints on reorderings. [sent-63, score-1.037]
44 The details of the proposed method are described below. [sent-64, score-0.068]
45 2 Finding coherent phrases We propose the following method for finding coherent phrases in patent sentences. [sent-66, score-1.431]
46 First, we extract coherent phrase candidates from a patent document. [sent-67, score-0.979]
47 Next, the candidates are ranked by a criterion which reflects the document-level context. [sent-68, score-0.134]
48 In this method, using document-level context is critically important because we cannot rank the candidates without it. [sent-70, score-0.2]
49 1 Extracting coherent phrase candidates Coherent phrase candidates are extracted from a context document, a document that contains a source sentence. [sent-73, score-0.931]
50 We extract all noun phrases as coherent phrase candidates since most noun phrases can be translated as single blocks in other languages (Koehn and Knight, 2003). [sent-74, score-1.123]
51 2 Ranking with C-value The candidates which have been extracted are nested and have different lengths. [sent-78, score-0.215]
52 For example, ranking by frequency cannot pick up an important phrase which has a long length, yet, ranking by length may give a long but unimportant phrase a high rank. [sent-80, score-0.242]
53 In order to select the appropriate coherent phrases, measurements which give high rank to phrases with high termhood are needed. [sent-81, score-0.507]
54 C-value is a measurement of automatic term recognition and is suitable for extracting important phrases from nested candidates. [sent-83, score-0.303]
55 Since phrases which have a large C-value frequently occur in a context document, these phrases are considered to be a significant unit, i. [sent-85, score-0.408]
56 , a part of the invention, and to be coherent phrases. [sent-87, score-0.305]
57 3 Specifying coherent phrases Given a source sentence, we find coherent phrase candidates in the sentence in order to set zones for reordering constraints. [sent-90, score-1.752]
58 If a coherent phrase candidate is found in the source sentence, the phrase is regarded a coherent phrase and annotated with a zone tag, which will be mentioned in the next section. [sent-91, score-1.007]
59 436 We check the coherent phrase candidates in the sentence in descending C-value order, and stop when the C-value goes below a certain threshold. [sent-92, score-0.557]
60 Nested zones are allowed, unless their zones conflict with pre-existing zones. [sent-93, score-0.744]
61 3 Decoding with reordering constraints In decoding, reorderings which violate zones, such as the baseline output in Table 1, are restricted and we get a more appropriate translation, such as the proposed output in Table 1. [sent-96, score-0.77]
62 , 2007; Koehn and Haddow, 2009), which can specify reordering constraints using and tags. [sent-98, score-0.43]
63 Moses restricts reorderings which violate zones and translates zones as single blocks. [sent-99, score-1.022]
64 4 Experiments In order to evaluate the performance of the proposed method, we conducted Japanese-English (J-E) and English-Japanese (E-J) translation experiments using the NTCIR-8 patent translation task dataset (Fujii et al. [sent-100, score-0.807]
65 This dataset contains a training set of 3 million sentence pairs, a development set of 2,000 sentence pairs, and a test set of 1,25 1(J-E) and 1,119 (E-J) sentence pairs. [sent-102, score-0.081]
66 Moreover, this dataset contains the patent specifications from which sentence pairs are extracted. [sent-103, score-0.522]
67 1 Baseline We used Moses as a baseline system, with all the set- tings except distortion limit (dl) at the default. [sent-106, score-0.134]
68 The distortion limit is a maximum distance of reordering. [sent-107, score-0.083]
69 It is known that an appropriate distortion-limit can improve translation quality and decoding speed. [sent-108, score-0.262]
70 In experiments, we compared dl = 6, 10, 20, 30, 40, and −1 (unlimited). [sent-110, score-0.068]
71 2 Compared methods We compared two methods, the method of specifying reordering constraints with a context document w/o Contextin ( this case ) , ( the leading end ) 15f of ( the segment operating body ) ( ( 15 swings ) in ( a direction opposite ) ) to ( the a arrow direction ) . [sent-114, score-0.9]
72 w/ Contextin ( this case ) , ( ( the leading end ) 15f ) of ( ( ( the segment ) operating body ) 15 ) swings in a direction opposite to ( the a arrow direction ) . [sent-115, score-0.217]
73 (w/ Context) and the method of specifying reordering constraints without a context document (w/o Context). [sent-119, score-0.683]
74 In both methods, the feature weights used in decoding are the same value as those for the baseline (dl = −1). [sent-120, score-0.097]
75 1 Proposed method (w/ Context) In the proposed method, reordering constraints were defined with a context document. [sent-123, score-0.531]
76 For J-E translation, we used the CaboCha parser (Kudo and Matsumoto, 2002) to analyze the context document. [sent-124, score-0.09]
77 As coherent phrase candidates, we extracted all subtrees whose heads are noun. [sent-125, score-0.396]
78 For E-J translation, we used the Charniak parser (Charniak, 2000) and ex- tracted all noun phrases, labeled “NP”, as coherent phrase candidates. [sent-126, score-0.51]
79 The parsers are used only when extracting coherent phrase candidates. [sent-127, score-0.421]
80 When specifying zones for each source sentence, strings which match the coherent phrase candidates are defined to be zones. [sent-128, score-1.102]
81 Therefore, the proposed method is robust against parsing errors. [sent-129, score-0.068]
82 2 w/o Context In this method, reordering constraints were defined without a context document. [sent-133, score-0.463]
83 For J-E translation, we converted the dependency trees of source sen437 tences processed by the CaboCha parser into bracketed trees and used these as reordering constraints. [sent-134, score-0.371]
84 For E-J translation, we used all of the noun phrases detected by the Charniak parser as reordering constraints. [sent-135, score-0.572]
85 In both directions, our proposed method yielded the highest BLEU scores. [sent-140, score-0.094]
86 These results show that the proposed method using document-level context is effective in specifying reordering constraints. [sent-153, score-0.561]
87 Moreover, as shown in Table 3, although zone setting without context is failed if source sentences have parsing errors, the proposed method can set zones appropriately using document-level context. [sent-154, score-0.63]
88 The Charniak parser tends to make errors on noun phrases with ID numbers. [sent-155, score-0.285]
89 This shows that document-level context can possibly improve parsing quality. [sent-156, score-0.066]
90 As for the distortion limit, while an appropriate distortion-limit, 30 for J-E and 40 for E-J, improved the translation quality, the gains from the proposed method were significantly better than the gains from the distortion limit. [sent-157, score-0.369]
91 In general, imposing strong constraints causes fast decoding but low translation quality. [sent-158, score-0.426]
92 However, the proposed method improves the translation quality and speed by imposing appropriate constraints. [sent-159, score-0.379]
93 5 Conclusion In this paper, we proposed a method for imposing reordering constraints using document-level context. [sent-160, score-0.55]
94 In the proposed method, coherent phrase candidates are extracted from a context document in advance. [sent-161, score-0.684]
95 Given a source sentence, zones which cover the coherent phrase candidates are defined. [sent-162, score-0.998]
96 Then, in decoding, reorderings which violate the zones are restricted. [sent-163, score-0.65]
97 Since reordering constraints reduce incorrect reorderings, the translation quality and speed can be improved. [sent-164, score-0.592]
98 The experiment results for the NTCIR-8 patent translation tasks show a significant improvement of 1. [sent-165, score-0.636]
99 We think that the proposed method is independent of language pair and domains. [sent-168, score-0.068]
100 In the future, we want to apply our proposed method to other language pairs and domains. [sent-169, score-0.068]
wordName wordTfidf (topN-words)
[('patent', 0.449), ('zones', 0.372), ('coherent', 0.305), ('reordering', 0.287), ('film', 0.188), ('reorderings', 0.176), ('phrases', 0.171), ('insulating', 0.169), ('translation', 0.16), ('interlayer', 0.141), ('specifying', 0.14), ('candidates', 0.134), ('insulation', 0.113), ('constraints', 0.11), ('violate', 0.102), ('phrase', 0.091), ('noun', 0.09), ('bleu', 0.088), ('imposing', 0.085), ('semiconductor', 0.085), ('substrate', 0.085), ('nested', 0.081), ('electrode', 0.074), ('sudoh', 0.074), ('translating', 0.074), ('decoding', 0.071), ('invention', 0.069), ('dl', 0.068), ('context', 0.066), ('zone', 0.064), ('source', 0.06), ('koehn', 0.06), ('contextin', 0.056), ('frantzi', 0.056), ('swings', 0.056), ('yamamoto', 0.056), ('distortion', 0.055), ('decoder', 0.051), ('document', 0.05), ('katsuhito', 0.05), ('masao', 0.046), ('utiyama', 0.046), ('specifications', 0.046), ('moses', 0.045), ('charniak', 0.045), ('cabocha', 0.043), ('tsukada', 0.043), ('fujii', 0.041), ('arrow', 0.041), ('longdistance', 0.041), ('blocks', 0.038), ('proposed', 0.038), ('isozaki', 0.036), ('hajime', 0.036), ('cover', 0.036), ('points', 0.036), ('speed', 0.035), ('metricsmatr', 0.035), ('orders', 0.035), ('eiichiro', 0.034), ('philipp', 0.034), ('formed', 0.033), ('direction', 0.033), ('translated', 0.033), ('marton', 0.033), ('specify', 0.033), ('kudo', 0.032), ('clue', 0.032), ('shi', 0.032), ('appropriate', 0.031), ('specification', 0.031), ('xiong', 0.031), ('documents', 0.03), ('method', 0.03), ('long', 0.03), ('pad', 0.03), ('operating', 0.03), ('limit', 0.028), ('sentence', 0.027), ('improvement', 0.027), ('measurement', 0.026), ('block', 0.026), ('baseline', 0.026), ('yielded', 0.026), ('surface', 0.025), ('tsutomu', 0.025), ('tings', 0.025), ('documentlevel', 0.025), ('sov', 0.025), ('japaneseenglish', 0.025), ('hideo', 0.025), ('okuma', 0.025), ('keihanna', 0.025), ('apparatus', 0.025), ('utsuro', 0.025), ('extracting', 0.025), ('parser', 0.024), ('opposite', 0.024), ('kevin', 0.024), ('japanese', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
2 0.2407655 266 acl-2011-Reordering with Source Language Collocations
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li
Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1
3 0.18429323 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
4 0.17412472 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
5 0.15697135 264 acl-2011-Reordering Metrics for MT
Author: Alexandra Birch ; Miles Osborne
Abstract: One of the major challenges facing statistical machine translation is how to model differences in word order between languages. Although a great deal of research has focussed on this problem, progress is hampered by the lack of reliable metrics. Most current metrics are based on matching lexical items in the translation and the reference, and their ability to measure the quality of word order has not been demonstrated. This paper presents a novel metric, the LRscore, which explicitly measures the quality of word order by using permutation distance metrics. We show that the metric is more consistent with human judgements than other metrics, including the BLEU score. We also show that the LRscore can successfully be used as the objective function when training translation model parameters. Training with the LRscore leads to output which is preferred by humans. Moreover, the translations incur no penalty in terms of BLEU scores.
6 0.14379384 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
7 0.14160059 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
8 0.11519224 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
9 0.11354946 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
10 0.11171462 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
11 0.11096458 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
12 0.10492428 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
13 0.10485877 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
14 0.10350408 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
15 0.10130931 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
16 0.099769257 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
17 0.099220984 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
18 0.089744404 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives
19 0.087983213 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals
20 0.086609662 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
topicId topicWeight
[(0, 0.19), (1, -0.164), (2, 0.095), (3, 0.097), (4, 0.061), (5, 0.026), (6, -0.042), (7, 0.008), (8, 0.017), (9, -0.011), (10, -0.005), (11, -0.055), (12, -0.013), (13, -0.173), (14, -0.004), (15, 0.002), (16, -0.039), (17, 0.045), (18, -0.086), (19, -0.008), (20, -0.069), (21, 0.046), (22, 0.006), (23, -0.119), (24, -0.042), (25, 0.079), (26, 0.115), (27, -0.015), (28, -0.028), (29, 0.082), (30, 0.056), (31, 0.009), (32, 0.18), (33, -0.046), (34, 0.01), (35, -0.168), (36, -0.091), (37, -0.02), (38, 0.009), (39, 0.047), (40, 0.073), (41, -0.015), (42, -0.064), (43, 0.014), (44, 0.03), (45, -0.059), (46, -0.032), (47, 0.069), (48, -0.023), (49, -0.007)]
simIndex simValue paperId paperTitle
same-paper 1 0.94112456 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
2 0.91901243 266 acl-2011-Reordering with Source Language Collocations
Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Ting Liu ; Sheng Li
Abstract: This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding, the model is employed to softly constrain the translation orders of the source language collocations, so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality, achieving the absolute improvements of 1.1~1.4 BLEU score over the baseline methods. 1
3 0.8217904 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser
Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.
4 0.76527578 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful
Author: Susan Howlett ; Mark Dras
Abstract: There are a number of systems that use a syntax-based reordering step prior to phrasebased statistical MT. An early work proposing this idea showed improved translation performance, but subsequent work has had mixed results. Speculations as to cause have suggested the parser, the data, or other factors. We systematically investigate possible factors to give an initial answer to the question: Under what conditions does this use of syntax help PSMT?
5 0.75608689 264 acl-2011-Reordering Metrics for MT
Author: Alexandra Birch ; Miles Osborne
Abstract: One of the major challenges facing statistical machine translation is how to model differences in word order between languages. Although a great deal of research has focussed on this problem, progress is hampered by the lack of reliable metrics. Most current metrics are based on matching lexical items in the translation and the reference, and their ability to measure the quality of word order has not been demonstrated. This paper presents a novel metric, the LRscore, which explicitly measures the quality of word order by using permutation distance metrics. We show that the metric is more consistent with human judgements than other metrics, including the BLEU score. We also show that the LRscore can successfully be used as the objective function when training translation model parameters. Training with the LRscore leads to output which is preferred by humans. Moreover, the translations incur no penalty in terms of BLEU scores.
6 0.6734429 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
7 0.61152762 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
8 0.60657746 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices
10 0.56882524 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
11 0.53699583 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
12 0.53262609 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
13 0.52712387 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
14 0.51939499 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
15 0.50489789 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
16 0.49515823 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
17 0.49187443 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
18 0.48051256 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
19 0.47915223 151 acl-2011-Hindi to Punjabi Machine Translation System
20 0.47599223 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
topicId topicWeight
[(5, 0.021), (12, 0.256), (17, 0.078), (26, 0.023), (31, 0.01), (37, 0.081), (39, 0.041), (41, 0.045), (55, 0.025), (59, 0.036), (72, 0.022), (91, 0.035), (96, 0.231)]
simIndex simValue paperId paperTitle
1 0.87648094 84 acl-2011-Contrasting Opposing Views of News Articles on Contentious Issues
Author: Souneil Park ; Kyung Soon Lee ; Junehwa Song
Abstract: We present disputant relation-based method for classifying news articles on contentious issues. We observe that the disputants of a contention are an important feature for understanding the discourse. It performs unsupervised classification on news articles based on disputant relations, and helps readers intuitively view the articles through the opponent-based frame. The readers can attain balanced understanding on the contention, free from a specific biased view. We applied a modified version of HITS algorithm and an SVM classifier trained with pseudo-relevant data for article analysis. 1
same-paper 2 0.84011352 263 acl-2011-Reordering Constraint Based on Document-Level Context
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents containing source sentences. Given a source sentence, zones which cover the noun phrases are used as reordering constraints. Then, in decoding, reorderings which violate the zones are restricted. Experiment results for patent translation tasks show a significant improvement of 1.20% BLEU points in JapaneseEnglish translation and 1.41% BLEU points in English-Japanese translation.
3 0.7686038 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
Author: William M. Darling ; Fei Song
Abstract: Statistical approaches to automatic text summarization based on term frequency continue to perform on par with more complex summarization methods. To compute useful frequency statistics, however, the semantically important words must be separated from the low-content function words. The standard approach of using an a priori stopword list tends to result in both undercoverage, where syntactical words are seen as semantically relevant, and overcoverage, where words related to content are ignored. We present a generative probabilistic modeling approach to building content distributions for use with statistical multi-document summarization where the syntax words are learned directly from the data with a Hidden Markov Model and are thereby deemphasized in the term frequency statistics. This approach is compared to both a stopword-list and POS-tagging approach and our method demonstrates improved coverage on the DUC 2006 and TAC 2010 datasets using the ROUGE metric.
4 0.72300625 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
5 0.72056007 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu
Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.
6 0.71818787 61 acl-2011-Binarized Forest to String Translation
7 0.71610707 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
8 0.71514517 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
10 0.71465433 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
11 0.71401823 169 acl-2011-Improving Question Recommendation by Exploiting Information Need
12 0.7137019 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
13 0.71335268 117 acl-2011-Entity Set Expansion using Topic information
14 0.71254802 177 acl-2011-Interactive Group Suggesting for Twitter
15 0.71252847 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
16 0.71243668 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
17 0.71196038 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
18 0.7119143 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
19 0.71137965 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
20 0.71128017 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application