emnlp emnlp2010 emnlp2010-50 emnlp2010-50-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jinhua Du ; Jie Jiang ; Andy Way
Abstract: For resource-limited language pairs, coverage of the test set by the parallel corpus is an important factor that affects translation quality in two respects: 1) out of vocabulary words; 2) the same information in an input sentence can be expressed in different ways, while current phrase-based SMT systems cannot automatically select an alternative way to transfer the same information. Therefore, given limited data, in order to facilitate translation from the input side, this paper proposes a novel method to reduce the translation difficulty using source-side lattice-based paraphrases. We utilise the original phrases from the input sentence and the corresponding paraphrases to build a lattice with estimated weights for each edge to improve translation quality. Compared to the baseline system, our method achieves relative improvements of 7.07%, 6.78% and 3.63% in terms of BLEU score on small, medium and large- scale English-to-Chinese translation tasks respectively. The results show that the proposed method is effective not only for resourcelimited language pairs, but also for resourcesufficient pairs to some extent.