acl acl2012 acl2012-162 knowledge-graph by maker-knowledge-mining

162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation


Source: pdf

Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita

Abstract: Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the postordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering method. The existing post-ordering method reorders a sequence of target language words in a source language word order via SMT, while our method reorders the sequence by: 1) parsing the sequence to obtain syntax structures similar to a source language structure, and 2) transferring the obtained syntax structures into the syntax structures of the target language.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We employ the postordering framework proposed by (Sudoh et al. [sent-5, score-0.104]

2 , 2011b) for Japanese to English translation and improve upon the reordering method. [sent-6, score-0.283]

3 1 Introduction The word reordering problem is a challenging one when translating between languages with widely different word orders such as Japanese and English. [sent-8, score-0.253]

4 Many reordering methods have been proposed in statistical machine translation (SMT) research. [sent-9, score-0.308]

5 Those methods can be classified into the following three types: Type-1 : Conducting the target word selection and reordering jointly. [sent-10, score-0.231]

6 , 2003), hierarchical phrase-based SMT (Chiang, 2007), and syntax-based SMT (Galley et al. [sent-12, score-0.044]

7 First, these methods reorder the source language sentence into the target language word order. [sent-20, score-0.166]

8 Then, they translate the reordered source word sequence using SMT methods. [sent-21, score-0.151]

9 First, these methods translate the source sentence almost monotonously into a sequence of the target language words. [sent-25, score-0.182]

10 Then, they reorder the translated word sequence into the target language word order. [sent-26, score-0.148]

11 This paper employs the post-ordering framework for Japanese-English translation based on the discussions given in Section 2, and improves upon the reordering method. [sent-27, score-0.283]

12 Our method uses syntactic structures, which are essential for improving the target word order in translating long sentences between Japanese (a Subject-Object-Verb (SOV) language) and English (an SVO language). [sent-28, score-0.259]

13 Before explaining our method, we explain the preordering method for English to Japanese used in the post-ordering framework. [sent-29, score-0.08]

14 (2010b) proposed a simple pre-ordering method that achieved the best quality in human evaluations, which were conducted for the NTCIR-9 patent machine translation task (Sudoh et al. [sent-31, score-0.244]

15 The method, which is called head finalization, simply moves syntactic heads to the end of corresponding syntactic constituents (e. [sent-34, score-0.173]

16 This method first changes the English word order into a word order similar to Japanese word order using the head finalization rule. [sent-37, score-0.443]

17 There are two key reasons why this pre-ordering method works for estimating Japanese word order. [sent-93, score-0.064]

18 That is, a syntactic head word comes after nonhead (dependent) words. [sent-95, score-0.143]

19 Second, input English sentences are parsed by a high-quality parser, Enju (Miyao and Tsujii, 2008), which outputs syntactic heads. [sent-96, score-0.142]

20 Consequently, the parsed English input sentences can be pre-ordered into a Japaneselike word order using the head finalization rule. [sent-97, score-0.393]

21 Pre-ordering using the head finalization rule naturally cannot be applied to Japanese-English translation, because English is not a head-final language. [sent-98, score-0.218]

22 If we want to pre-order Japanese sentences into an English-like word order, we therefore have to build complex rules (Sudoh et al. [sent-99, score-0.083]

23 The translation flow for the post-ordering method is shown in Figure 1, where “HFE” is an abbreviation of “Head Final English”. [sent-103, score-0.153]

24 An HFE sentence consists of English words in a Japanese-like structure. [sent-104, score-0.044]

25 Therefore, if good rules are applied to this HFE sentence, the underlying English sentence can be recovered. [sent-107, score-0.044]

26 This is the key observation of the postordering method. [sent-108, score-0.104]

27 The process of post-ordering translation consists of two steps. [sent-109, score-0.114]

28 First, the Japanese input sentence is translated into HFE almost monotonously. [sent-110, score-0.044]

29 Then, the word order of HFE is changed into an English word order. [sent-111, score-0.087]

30 Training for the post-ordering method is conducted by first converting the English sentences in a Japanese-English parallel corpus into HFE sen- tences using the head-finalization rule. [sent-112, score-0.132]

31 Next, a monotone phrase-based Japanese-HFE SMT model is built using the Japanese-HFE parallel corpus 312 ? [sent-113, score-0.068]

32 Finally, an HFE-to-English word reordering model is built using the HFE-English parallel corpus. [sent-269, score-0.229]

33 (201 1b) have proposed using phrasebased SMT for converting HFE sentences into English sentences. [sent-272, score-0.1]

34 2 Parsing Model Our proposed model is called the parsing model. [sent-275, score-0.069]

35 The translation process for the parsing model is shown in Figure 2. [sent-276, score-0.183]

36 In this method, we first parse the HFE sentence into a binary tree. [sent-277, score-0.081]

37 We then swap the nodes annotated with “ SW” suffixes in this binary tree in order to produce an English sentence. [sent-278, score-0.192]

38 The structures of the HFE sentences, which are used for training our parsing model, can be obtained from the corresponding English sentences as follows. [sent-279, score-0.172]

39 1 First, each English sentence in the training Japanese-English parallel corpus is parsed into a binary tree by applying Enju. [sent-280, score-0.171]

40 Then, for each node in this English binary tree, the two children of each node are swapped if its first child is the head node (See (Isozaki et al. [sent-281, score-0.275]

41 , 2010b) for details of the head 1The explanations of pseudo-particles ( va0 and va2) and other details of the HFE is given in Section 4. [sent-282, score-0.089]

42 At the same time, these swapped nodes are annotated with “ SW”. [sent-285, score-0.121]

43 When the two nodes are not swapped, they are annotated with “ ST” (indicating “Straight”). [sent-286, score-0.053]

44 A node with only one child is not annotated with either “ ST” or “ SW”. [sent-287, score-0.055]

45 The result is an HFE sentence in a binary tree annotated with “ SW” and “ ST” suffixes. [sent-288, score-0.109]

46 Observe that the HFE sentences can be regarded as binary trees annotated with syntax tags augmented with swap/straight suffixes. [sent-289, score-0.181]

47 Therefore, the structures of these binary trees can be learnable by using an off-the-shelf grammar learning algorithm. [sent-290, score-0.082]

48 The learned parsing model can be regarded as an ITG model (Wu, 1997) between the HFE and English sentences. [sent-291, score-0.094]

49 The HFE sentences can be parsed by using the learned parsing model. [sent-293, score-0.182]

50 Then the parsed structures can be converted into their corresponding English struc- tures by swapping the “ SW” nodes. [sent-294, score-0.19]

51 Note that this parsing model jointly learns how to parse and swap the HFE sentences. [sent-295, score-0.105]

52 4 Detailed Explanation of Our Method This section explains the proposed method, which is based on the post-ordering framework using the parsing model. [sent-296, score-0.069]

53 1 Translation Method First, we produce N-best HFE sentences using Japanese-to-HFE monotone phrase-based SMT. [sent-298, score-0.12]

54 Next, we produce K-best parse trees for each HFE sentence by parsing, and produce English sentences by swapping any nodes annotated with “ SW”. [sent-299, score-0.265]

55 Then we score the English sentences and select the English sentence with the highest score. [sent-300, score-0.102]

56 2There are works using the ITG model in SMT: ITG was used for training pre-ordering models (DeNero and Uszkoreit, 2011); hierarchical phrase-based SMT (Chiang, 2007), which is extension of ITG; and reordering models using ITG (Chen et al. [sent-302, score-0.213]

57 2) Pseudo-particles are inserted after verb arguments: va0 (subject of sentence head), va1 (subject of verb), and va2 (object of verb). [sent-309, score-0.074]

58 Applying our parsing model to an HFE sentence produces an English sentence that does not have articles, but does have pseudo-particles. [sent-313, score-0.157]

59 We removed the pseudo-particles from the reordered sentences before calculating the probabilities used for the scores of the reordered sentences. [sent-314, score-0.212]

60 A reordered sentence without pseudo-particles is represented by E. [sent-315, score-0.121]

61 A language model P(E) was trained from English sentences whose articles were dropped. [sent-316, score-0.098]

62 In order to output a genuine English sentence E′ from E, articles must be inserted into E. [sent-317, score-0.187]

63 A language model trained using genuine English sentences is used for this purpose. [sent-318, score-0.094]

64 We try to insert one of the articles {a, an, the} or no article for each word in E. [sent-319, score-0.065]

65 1 Setup We used patent sentence data for the Japanese to English translation subtask from the NTCIR-9 and 8 (Goto et al. [sent-323, score-0.249]

66 There were 2,000 test sentences for NTCIR-9 and 1,251 for NTCIR-8. [sent-326, score-0.058]

67 2 for parsing the English side of the training data. [sent-330, score-0.069]

68 The translation model was trained using sentences of 64 words or less from the training corpus as (Sudoh et al. [sent-333, score-0.172]

69 We used the Berkeley parser (Petrov and Klein, 2007) to train the parsing model for HFE and to 3http://mecab. [sent-337, score-0.069]

70 5 million sentences randomly selected from training sentences of 40 words or less. [sent-341, score-0.116]

71 We used 10-best Moses outputs and 10-best parsing results of Berkeley parser. [sent-345, score-0.069]

72 For PO-PBMT, a distortion limit 0 was used for the Japanese-to-HFE translation and a distortion limit 20 was used for the HFE-to-English translation. [sent-350, score-0.276]

73 The PO-HPBMT method changes the post-ordering method of PO-PBMT from a phrase-based SMT to a hierarchical phrase-based SMT. [sent-351, score-0.122]

74 We used distortion limits of 12 or 20 for PBMT and a max-chart-span 15 for HPBMT. [sent-353, score-0.053]

75 3 Results and Discussion We evaluated translation quality based on the caseinsensitive automatic evaluation scores of RIBES v1. [sent-356, score-0.114]

76 Since RIBES is sensitive to global word order and BLEU is sensitive to local word order, the effectiveness ofthe proposed method for both global and local reordering can be demonstrated through these comparisons. [sent-361, score-0.295]

77 In order to investigate the effects of our postordering method in detail, we conducted an “HFEto-English reordering” experiment, which shows the main contribution of our post-ordering method in the framework of post-ordering SMT as compared with (Sudoh et al. [sent-362, score-0.219]

78 In this experiment, we changed the word order of the oracle-HFE sentences made from reference sentences into English, this is the same way as Table 4 in (Sudoh et al. [sent-364, score-0.178]

79 Since RIBES is based on the rank order correlation coefficient, these results show that the proposed method correctly recovered the word order of the English sentences. [sent-368, score-0.138]

80 These high scores also indicate that the parsing results for high quality HFE are fairly trustworthy. [sent-369, score-0.069]

81 However, some groups used pre-ordering methods in the NTCIR-9 Japanese to English translation subtask. [sent-376, score-0.114]

82 The proposed method parses sentences that consist of target language words in a source language word order, and does reordering by transferring the syntactic structures similar to the source language syntactic structures into the target language syntactic structures. [sent-382, score-0.626]

83 In Proceedings of Human Language Technologies: The 2009 NAACL, pages 254–262, Boulder, Colorado, June. [sent-387, score-0.048]

84 In Proceedings of the 43rd ACL, pages 53 1–540, Ann Arbor, Michigan, June. [sent-396, score-0.048]

85 In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 193– 203, Edinburgh, Scotland, UK. [sent-401, score-0.048]

86 In Proceedings of the 43rd ACL, pages 541–548, Ann Arbor, Michigan, June. [sent-407, score-0.048]

87 In Daniel Marcu Susan Dumais and Salim Roukos, editors, HLT-NAACL 2004: Main Proceedings, pages 273–280, Boston, Massachusetts, USA, May 2 - May 7. [sent-416, score-0.048]

88 In Proceedings of NAACL-HLT, pages 849–857, Los Angeles, California, June. [sent-421, score-0.048]

89 In Proceedings of the 23rd Coling, pages 447–455, Beijing, China, August. [sent-431, score-0.048]

90 In Proceedings of the 2010 EMNLP, pages 944– 952. [sent-436, score-0.048]

91 In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 244–251, Uppsala, Sweden, July. [sent-440, score-0.048]

92 In Proceedings of the 45th ACL, pages 177–180, Prague, Czech Republic, June. [sent-450, score-0.048]

93 In Proceedings of the 21st ACL, pages 609–616, Sydney, Australia, July. [sent-459, score-0.048]

94 In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 558–566, Suntec, Singapore, August. [sent-464, score-0.048]

95 In Computational Linguistics, Volume 34, Number 1, pages 81–88. [sent-475, score-0.048]

96 In NAACL-HLT, pages 404–41 1, Rochester, New York, April. [sent-479, score-0.048]

97 In Proceedings of the 13th Machine Translation Summit, pages 3 16–323. [sent-492, score-0.048]

98 In Proceedings of the 2009 EMNLP, pages 1007–1016, Singapore, August. [sent-496, score-0.048]

99 In Proceedings of the 13th Machine Translation Summit, pages 300– 307. [sent-501, score-0.048]

100 In Proceedings of Coling, pages 508–5 14, Geneva, Switzerland, Aug 23–Aug 27. [sent-509, score-0.048]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hfe', 0.699), ('sudoh', 0.311), ('smt', 0.193), ('reordering', 0.169), ('japanese', 0.143), ('finalization', 0.129), ('translation', 0.114), ('ribes', 0.113), ('hajime', 0.109), ('katsuhito', 0.109), ('isozaki', 0.107), ('postordering', 0.104), ('english', 0.094), ('patent', 0.091), ('head', 0.089), ('itg', 0.086), ('goto', 0.077), ('reordered', 0.077), ('duh', 0.077), ('tsukada', 0.073), ('sw', 0.069), ('parsing', 0.069), ('swapped', 0.068), ('sentences', 0.058), ('parsed', 0.055), ('xianchao', 0.054), ('kevin', 0.054), ('distortion', 0.053), ('kondo', 0.052), ('monotonously', 0.052), ('reorders', 0.052), ('swapping', 0.052), ('pages', 0.048), ('wu', 0.046), ('sov', 0.045), ('masao', 0.045), ('utiyama', 0.045), ('matusov', 0.045), ('structures', 0.045), ('masaaki', 0.044), ('sentence', 0.044), ('hierarchical', 0.044), ('denero', 0.043), ('phrasebased', 0.042), ('miyao', 0.042), ('moses', 0.042), ('enju', 0.041), ('preordering', 0.041), ('pbmt', 0.041), ('fujii', 0.041), ('articles', 0.04), ('method', 0.039), ('isao', 0.038), ('tromble', 0.038), ('converted', 0.038), ('order', 0.037), ('koehn', 0.037), ('binary', 0.037), ('target', 0.037), ('swap', 0.036), ('reorder', 0.036), ('aug', 0.036), ('transferring', 0.036), ('genuine', 0.036), ('parallel', 0.035), ('uszkoreit', 0.035), ('sumita', 0.035), ('translating', 0.034), ('syntax', 0.033), ('monotone', 0.033), ('hideki', 0.032), ('eiichiro', 0.031), ('inserted', 0.03), ('produce', 0.029), ('michigan', 0.029), ('xia', 0.029), ('ding', 0.029), ('syntactic', 0.029), ('annotated', 0.028), ('limit', 0.028), ('berkeley', 0.028), ('tsujii', 0.028), ('arbor', 0.027), ('association', 0.027), ('node', 0.027), ('petrov', 0.026), ('srilm', 0.026), ('stolcke', 0.026), ('heads', 0.026), ('summit', 0.026), ('philipp', 0.026), ('sequence', 0.025), ('statistical', 0.025), ('word', 0.025), ('nodes', 0.025), ('bleu', 0.025), ('alexandra', 0.025), ('regarded', 0.025), ('liu', 0.024), ('source', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita

Abstract: Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the postordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering method. The existing post-ordering method reorders a sequence of target language words in a source language word order via SMT, while our method reorders the sequence by: 1) parsing the sequence to obtain syntax structures similar to a source language structure, and 2) transferring the obtained syntax structures into the syntax structures of the target language.

2 0.22061759 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu

Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.

3 0.18360353 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

Author: Arianna Bisazza ; Marcello Federico

Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.

4 0.15664998 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li

Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1

5 0.1527127 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata

Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1

6 0.1495612 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

7 0.12616883 140 acl-2012-Machine Translation without Words through Substring Alignment

8 0.11721632 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

9 0.11488294 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

10 0.10551373 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

11 0.10464668 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

12 0.092477895 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages

13 0.092382893 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation

14 0.092241324 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

15 0.090977542 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

16 0.082992829 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment

17 0.082203858 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

18 0.080550939 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

19 0.080310777 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

20 0.079733193 108 acl-2012-Hierarchical Chunk-to-String Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.212), (1, -0.206), (2, 0.016), (3, -0.017), (4, 0.047), (5, -0.153), (6, -0.027), (7, 0.037), (8, 0.034), (9, -0.017), (10, 0.057), (11, 0.027), (12, 0.025), (13, 0.003), (14, -0.061), (15, -0.01), (16, -0.007), (17, -0.09), (18, -0.009), (19, -0.174), (20, 0.03), (21, 0.051), (22, -0.077), (23, -0.006), (24, -0.136), (25, 0.092), (26, -0.061), (27, 0.143), (28, 0.086), (29, 0.075), (30, -0.059), (31, 0.038), (32, -0.025), (33, -0.065), (34, 0.049), (35, -0.027), (36, 0.096), (37, -0.08), (38, 0.047), (39, 0.056), (40, -0.119), (41, 0.044), (42, 0.018), (43, -0.009), (44, -0.093), (45, -0.067), (46, -0.038), (47, 0.03), (48, 0.021), (49, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8936258 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

Author: Isao Goto ; Masao Utiyama ; Eiichiro Sumita

Abstract: Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the postordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering method. The existing post-ordering method reorders a sequence of target language words in a source language word order via SMT, while our method reorders the sequence by: 1) parsing the sequence to obtain syntax structures similar to a source language structure, and 2) transferring the obtained syntax structures into the syntax structures of the target language.

2 0.87374717 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

Author: Nan Yang ; Mu Li ; Dongdong Zhang ; Nenghai Yu

Abstract: Long distance word reordering is a major challenge in statistical machine translation research. Previous work has shown using source syntactic trees is an effective way to tackle this problem between two languages with substantial word order difference. In this work, we further extend this line of exploration and propose a novel but simple approach, which utilizes a ranking model based on word order precedence in the target language to reposition nodes in the syntactic parse tree of a source sentence. The ranking model is automatically derived from word aligned parallel data with a syntactic parser for source language based on both lexical and syntactical features. We evaluated our approach on largescale Japanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase- based SMT system.

3 0.82133418 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

Author: Arianna Bisazza ; Marcello Federico

Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.

4 0.59084976 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation

Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith

Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1

5 0.55102205 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li

Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1

6 0.54715431 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

7 0.53312612 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

8 0.50306416 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

9 0.45091593 108 acl-2012-Hierarchical Chunk-to-String Translation

10 0.43067399 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

11 0.4282355 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages

12 0.42519146 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

13 0.41720852 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

14 0.41632485 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

15 0.41117775 140 acl-2012-Machine Translation without Words through Substring Alignment

16 0.40540224 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

17 0.39547846 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

18 0.39134702 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

19 0.37478065 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

20 0.37378889 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(26, 0.029), (28, 0.035), (30, 0.051), (37, 0.021), (39, 0.021), (57, 0.041), (71, 0.021), (74, 0.033), (82, 0.031), (84, 0.011), (85, 0.385), (90, 0.129), (92, 0.025), (94, 0.034), (99, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.9150731 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain

Author: Meni Adler ; Jonathan Berant ; Ido Dagan

Abstract: We present a novel text exploration model, which extends the scope of state-of-the-art technologies by moving from standard concept-based exploration to statement-based exploration. The proposed scheme utilizes the textual entailment relation between statements as the basis of the exploration process. A user of our system can explore the result space of a query by drilling down/up from one statement to another, according to entailment relations specified by an entailment graph and an optional concept taxonomy. As a prominent use case, we apply our exploration system and illustrate its benefit on the health-care domain. To the best of our knowledge this is the first implementation of an exploration system at the statement level that is based on the textual entailment relation. 1

2 0.90725631 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

Author: Meritxell Gonzalez ; Jesus Gimenez ; Lluis Marquez

Abstract: Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.

3 0.87705761 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

Author: Khe Chai Sim

Abstract: This paper presents a probabilistic framework that combines multiple knowledge sources for Haptic Voice Recognition (HVR), a multimodal input method designed to provide efficient text entry on modern mobile devices. HVR extends the conventional voice input by allowing users to provide complementary partial lexical information via touch input to improve the efficiency and accuracy of voice recognition. This paper investigates the use of the initial letter of the words in the utterance as the partial lexical information. In addition to the acoustic and language models used in automatic speech recognition systems, HVR uses the haptic and partial lexical models as additional knowledge sources to reduce the recognition search space and suppress confusions. Experimental results show that both the word error rate and runtime factor can be re- duced by a factor of two using HVR.

4 0.85873431 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API

Author: Roberto Navigli ; Simone Paolo Ponzetto

Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –

5 0.8563441 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

Author: Jinho D. Choi ; Martha Palmer

Abstract: This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies. During decoding, one of the models is selected dynamically given the cosine similarity between each sentence and the training data. This dynamic model selection approach, coupled with a one-pass, leftto-right POS tagging algorithm, is evaluated on corpora from seven different genres. Even with this simple tagging algorithm, our system shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data. Furthermore, our system is able to tag about 32K tokens per second. this model selection approach to more sophisticated tagging improve their robustness even We believe that can be applied algorithms and further.

same-paper 6 0.78773457 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

7 0.61904877 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

8 0.58397573 136 acl-2012-Learning to Translate with Multiple Objectives

9 0.57728291 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

10 0.56584787 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

11 0.56066453 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

12 0.55784255 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

13 0.55546176 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

14 0.54258019 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

15 0.53540039 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

16 0.525823 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

17 0.525765 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

18 0.5230819 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

19 0.52219027 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

20 0.52175862 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool