acl acl2010 acl2010-223 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e. [sent-3, score-0.146]
2 BLEU) when applied to morphologically rich languages such as Czech. [sent-5, score-0.032]
3 A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well. [sent-6, score-0.176]
4 1 Introduction Automatic metrics of machine translation (MT) quality are vital for research progress at a fast pace. [sent-7, score-0.146]
5 Many automatic metrics of MT quality have been proposed and evaluated in terms of correlation with human judgments while various techniques of manual judging are being examined as well, see e. [sent-8, score-0.29]
6 Section 2 illustrates and explains severe problems of a widely used BLEU metric (Papineni et al. [sent-14, score-0.065]
7 , 2002) when applied to Czech as a representative of languages with rich morphology. [sent-15, score-0.032]
8 We see this as an instance of the sparse data problem well known for MT itself: too much detail in the formal representation leading to low coverage of e. [sent-16, score-0.049]
9 ojaBrLEU Figure 1: BLEU and human ranks of systems participating in the English-to-Czech WMT09 shared task. [sent-39, score-0.029]
10 Section 3 introduces and evaluates some new variations of SemPOS (Kos and Bojar, 2009), a metric based on the deep syntactic representation of the sentence performing very well for Czech as the target language. [sent-40, score-0.099]
11 Its correlation to human judgments was originally deemed high (for English) but better correlating metrics (esp. [sent-44, score-0.326]
12 Figure 1illustrates a very low correlation to human judgments when translating to Czech. [sent-51, score-0.214]
13 14% Total n-grams35,53133,89132,25130,611 Table 1: n-grams confirmed by the reference and containing error flags. [sent-75, score-0.245]
14 This focus goes directly against the properties of Czech: relatively free word order allows many permutations of words and rich morphology renders many valid word forms not confirmed by the reference. [sent-77, score-0.187]
15 3 These problems are to some extent mitigated if several reference translations are available, but this is often not the case. [sent-78, score-0.099]
16 In the case of pctrans, the match is even a false positive, “do” (to) is a preposition that should be used for the “minus” phrase and not for the “end of the day” phrase. [sent-81, score-0.052]
17 Table 1 estimates the overall magnitude of this issue: For 1-grams to 4-grams in 1640 instances (different MT outputs and different annotators) of 200 sentences with manually flagged errors4, we count how often the n-gram is confirmed by the reference and how often it contains an error flag. [sent-83, score-0.278]
18 The suspicious cases are n-grams confirmed by the reference but still containing a flag (false positives) and n-grams not confirmed despite containing no error flag (false negatives). [sent-84, score-0.492]
19 Fortunately, there are relatively few false positives in n-gram based metrics: 6. [sent-85, score-0.052]
20 The issue of false negatives is more serious and confirms the problem of sparse data if only one reference is available. [sent-87, score-0.23]
21 (2009) identify similar issues when evaluating translation to Arabic and employ rule-based normalization of MT output to improve the correlation. [sent-89, score-0.07]
22 4The dataset with manually flagged errors is available at http : / /u fal . [sent-93, score-0.033]
23 This amounts to 34% of running unigrams, giving enough space to differ in human judgments and still remain unscored. [sent-97, score-0.113]
24 Figure 3 documents the issue across languages: the lower the BLEU score itself (i. [sent-98, score-0.041]
25 fewer confirmed n-grams), the lower the correlation to human judgments regardless of the target language (WMT09 shared task, 2025 sentences per language). [sent-100, score-0.401]
26 The framed words in the illustration are not confirmed by the reference, but the actual error in these words is very severe for comprehension: nouns were used twice instead of finite verbs, and a misleading translation of a preposition was chosen. [sent-103, score-0.257]
27 The output by pctrans preserves the meaning much better despite not scoring in either of the finite verbs and producing far shorter confirmed sequences. [sent-104, score-0.298]
28 3 Extensions of SemPOS SemPOS (Kos and Bojar, 2009) is inspired by metrics based on overlapping of linguistic features in the reference and in the translation (Gim e´nez and M ´arquez, 2007). [sent-105, score-0.204]
29 , 2006), formally a dependency tree that includes only autosemantic (content-bearing) words. [sent-108, score-0.133]
30 5 SemPOS as defined in Kos and Bojar (2009) disregards the syntactic structure and uses the semantic part of speech of the words (noun, verb, etc. [sent-109, score-0.065]
31 For each semantic part of speech t, the overlapping O(t) is set to zero if the part of speech does not occur in the reference or the candidate set and otherwise it is computed as given in Equation 1below. [sent-112, score-0.122]
32 Our plans include experiments with approximating the deep syntactic analysis with a simple tagger, which would also decrease the installation burden and computation costs, at the expense of accuracy. [sent-118, score-0.034]
33 Only a single unigram in each hypothesis is confirmed in the reference. [sent-120, score-0.187]
34 3 BLEU score Figure 3: BLEU correlates with its correlation to human judgments. [sent-129, score-0.13]
35 The final SemPOS score is obtained by macroaveraging over all parts of speech: SemPOS =|T1|XO(t) (2) where T is the set of all possible semantic parts of speech types. [sent-137, score-0.032]
36 (The degenerate case of blank candidate and reference has SemPOS zero. [sent-138, score-0.058]
37 SemPOS uses semantic parts of speech to classify autosemantic words. [sent-144, score-0.165]
38 The tectogrammatical layer offers also a feature called Functor describing the relation of a word to its governor similarly as semantic roles do. [sent-145, score-0.119]
39 In SemPOS, an autosemantic word of a class is confirmed if its lemma matches the reference. [sent-150, score-0.32]
40 We utilize the dependency relations at the tectogrammatical layer to validate valence by refining the overlap and requiring also the lemma of 1) the parent (denoted “par”), or 2) all the children regardless of their order (denoted “sons”) to match. [sent-151, score-0.119]
41 This is too coarse even for languages with relatively free word order like Czech. [sent-154, score-0.032]
42 Another issue is that it operates on lemmas and it completely disregards correct word forms. [sent-155, score-0.107]
43 For the purposes of the combination, we compute BLEU only on unigrams up to fourgrams (denoted BLEU1 , . [sent-157, score-0.1]
44 The tectogrammatical layer is being adapted for English (Cinkov a´ et al. [sent-172, score-0.119]
45 2 Evaluation of SemPOS and Friends We measured the metric performance on data used in MetricsMATR08, WMT09 and WMT08. [sent-176, score-0.065]
46 For the evaluation of metric correlation with human judgments at the system level, we used the Pearson correlation coefficient ρ applied to ranks. [sent-177, score-0.38]
47 When correlating ranks (instead of exact scores) and with this handling of ties, the Pearson coefficient is equivalent to Spearman’s rank correlation coefficient. [sent-180, score-0.137]
48 We assigned a human ranking to the systems based on the percent of time that their translations were judged to be better than or equal to the translations of any other system in the manual evaluation. [sent-182, score-0.111]
49 Correlation coefficients for English are shown in Table 2. [sent-185, score-0.031]
50 The best metric is Voidpar closely followed by Voidsons. [sent-186, score-0.065]
51 The explanation is that Void compared to SemPOS or Functor does not lose points by an erroneous assignment of the POS or the functor, and that Voidpar profits from checking the dependency relations between autosemantic words. [sent-187, score-0.133]
52 Additionally, we confirm that 4-grams alone have little discriminative power both when used as a metric of their own (BLEU4) as well as in a linear combination with SemPOS. [sent-189, score-0.065]
53 The best metric for Czech (see Table 3) is a linear combination of SemPOS and 4-gram BLEU closely followed by other SemPOS and BLEUn combinations. [sent-190, score-0.065]
54 We assume this is because BLEU4 can capture correctly translated fixed phrases, which is positively reflected in human judgments. [sent-191, score-0.029]
55 Including BLEU1 in the combination favors translations with word forms as expected by the refer6For each n ∈ {1, 2, 3, 4}, we show only the best weight settinFgo fro era cShem nP ∈OS { 1a,n2d, B3,L4E}U, wn. [sent-192, score-0.041]
56 09 Table 2: Average, best and worst system-level correlation coefficients for translation to English from various source languages evaluated on 10 different testsets. [sent-259, score-0.234]
57 Given the negligible difference between SemPOS alone and the linear combinations, we see that word forms are not the major issue for humans interpreting the translation—most likely because the systems so far often make more important errors. [sent-262, score-0.041]
58 This is also confirmed by the observation that using BLEU alone is rather unreliable for Czech and BLEU-1 (which judges unigrams only) is even worse. [sent-263, score-0.243]
59 The error metrics PER and TER showed the lowest correlation with human judgments for translation to Czech. [sent-265, score-0.36]
60 4 Conclusion This paper documented problems of singlereference BLEU when applied to morphologically rich languages such as Czech. [sent-266, score-0.032]
61 BLEU suffers from a sparse data problem, unable to judge the quality of tokens not confirmed by the reference. [sent-267, score-0.236]
62 This is confirmed for other languages as well: the lower the BLEU score the lower the correlation to human judgments. [sent-268, score-0.349]
63 We introduced a refinement of SemPOS, an automatic metric of MT quality based on deepsyntactic representation of the sentence tackling Metric Avg Best Worst 3·SemPOS+1·BLEU40. [sent-269, score-0.065]
64 23 Table 3: System-level correlation coefficients for English-to-Czech translation evaluated on 3 different testsets. [sent-335, score-0.202]
65 SemPOS was evaluated on translation to Czech and to English, scoring better than or comparable to many established metrics. [sent-337, score-0.07]
wordName wordTfidf (topN-words)
[('sempos', 0.709), ('bleu', 0.234), ('confirmed', 0.187), ('autosemantic', 0.133), ('functor', 0.124), ('kos', 0.111), ('pctrans', 0.111), ('abokrtsk', 0.107), ('correlation', 0.101), ('czech', 0.097), ('mt', 0.093), ('zden', 0.089), ('sseemmppooss', 0.089), ('haji', 0.084), ('judgments', 0.084), ('tectogrammatical', 0.083), ('bojar', 0.082), ('petr', 0.079), ('metrics', 0.076), ('prague', 0.073), ('ond', 0.072), ('void', 0.071), ('translation', 0.07), ('rej', 0.067), ('cuni', 0.066), ('tectomt', 0.066), ('voidpar', 0.066), ('metric', 0.065), ('vl', 0.06), ('jan', 0.059), ('reference', 0.058), ('cinkov', 0.058), ('cnt', 0.058), ('unigrams', 0.056), ('minus', 0.054), ('jarmila', 0.053), ('sgall', 0.053), ('false', 0.052), ('pajas', 0.05), ('panevov', 0.05), ('sparse', 0.049), ('przybocki', 0.047), ('fourgrams', 0.044), ('functorpar', 0.044), ('functorsons', 0.044), ('gtm', 0.044), ('kamil', 0.044), ('konci', 0.044), ('nedolu', 0.044), ('obchodn', 0.044), ('semeck', 0.044), ('sempossons', 0.044), ('voidsons', 0.044), ('zko', 0.044), ('ov', 0.043), ('issue', 0.041), ('translations', 0.041), ('ek', 0.04), ('ref', 0.039), ('ter', 0.039), ('indlerov', 0.039), ('jana', 0.039), ('krist', 0.039), ('lucie', 0.039), ('mladov', 0.039), ('silvie', 0.039), ('toman', 0.039), ('yna', 0.039), ('mare', 0.039), ('christof', 0.039), ('testsets', 0.039), ('isbn', 0.039), ('mikulov', 0.039), ('anja', 0.039), ('ji', 0.037), ('layer', 0.036), ('correlating', 0.036), ('gim', 0.036), ('charles', 0.035), ('deep', 0.034), ('monz', 0.033), ('marie', 0.033), ('src', 0.033), ('disregards', 0.033), ('dne', 0.033), ('flagged', 0.033), ('josh', 0.033), ('lemmas', 0.033), ('languages', 0.032), ('speech', 0.032), ('avg', 0.032), ('nez', 0.032), ('coefficients', 0.031), ('flag', 0.03), ('negatives', 0.03), ('eva', 0.03), ('koehn', 0.03), ('human', 0.029), ('republic', 0.029), ('denoted', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
2 0.16190243 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
3 0.11459123 54 acl-2010-Boosting-Based System Combination for Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang
Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1
4 0.10948976 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
5 0.078695163 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
6 0.075897753 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
7 0.073902048 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
8 0.073833607 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
9 0.067238301 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
10 0.062497046 133 acl-2010-Hierarchical Search for Word Alignment
11 0.062247273 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
12 0.060692459 69 acl-2010-Constituency to Dependency Translation with Forests
13 0.058513496 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
14 0.055677753 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
15 0.055553392 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
16 0.055454731 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
17 0.052367266 56 acl-2010-Bridging SMT and TM with Translation Recommendation
18 0.051653907 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
19 0.048196245 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
20 0.048184339 99 acl-2010-Efficient Third-Order Dependency Parsers
topicId topicWeight
[(0, -0.136), (1, -0.095), (2, -0.039), (3, 0.001), (4, 0.022), (5, 0.012), (6, -0.037), (7, -0.04), (8, -0.034), (9, 0.066), (10, 0.104), (11, 0.13), (12, 0.055), (13, -0.021), (14, 0.024), (15, 0.017), (16, -0.007), (17, 0.073), (18, -0.041), (19, 0.041), (20, -0.029), (21, 0.009), (22, 0.053), (23, -0.023), (24, -0.02), (25, -0.052), (26, 0.052), (27, 0.124), (28, -0.048), (29, 0.106), (30, 0.06), (31, 0.036), (32, 0.079), (33, 0.058), (34, -0.067), (35, -0.006), (36, 0.068), (37, -0.101), (38, -0.104), (39, -0.011), (40, -0.05), (41, -0.003), (42, 0.02), (43, 0.062), (44, 0.084), (45, -0.044), (46, -0.007), (47, 0.071), (48, 0.006), (49, 0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.92990422 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
2 0.87239563 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
3 0.83664089 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.80368537 104 acl-2010-Evaluating Machine Translations Using mNCD
Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen
Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.
5 0.73258239 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
6 0.68956518 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
7 0.64188838 54 acl-2010-Boosting-Based System Combination for Machine Translation
8 0.53827232 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
9 0.52898985 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
10 0.485055 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
11 0.441284 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
12 0.4240486 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
13 0.42195493 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
14 0.41544721 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
15 0.38847163 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
16 0.3835462 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
17 0.38037291 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
18 0.37585562 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
19 0.37532765 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
20 0.37138194 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
topicId topicWeight
[(14, 0.038), (16, 0.413), (18, 0.015), (25, 0.037), (42, 0.017), (44, 0.012), (59, 0.091), (73, 0.042), (76, 0.017), (78, 0.038), (83, 0.074), (84, 0.013), (98, 0.102)]
simIndex simValue paperId paperTitle
same-paper 1 0.78533161 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
2 0.7286483 250 acl-2010-Untangling the Cross-Lingual Link Structure of Wikipedia
Author: Gerard de Melo ; Gerhard Weikum
Abstract: Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world’s entities and concepts.
3 0.6076144 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
Author: Coskun Mermer ; Ahmet Afsin Akin
Abstract: We tackle the previously unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation (SMT) and propose a segmentation metric that takes into account both sides of the SMT training corpus. We formulate the objective function as the posterior probability of the training corpus according to a generative segmentation-translation model. We describe how the IBM Model-1 translation likelihood can be computed incrementally between adjacent segmentation states for efficient computation. Submerging the proposed segmentation method in a SMT task from morphologically-rich Turkish to English does not exhibit the expected improvement in translation BLEU scores and confirms the robustness of phrase-based SMT to translation unit combinatorics. A positive outcome of this work is the described modification to the sequential search algorithm of Morfessor (Creutz and Lagus, 2007) that enables arbitrary-fold parallelization of the computation, which unexpectedly improves the translation performance as measured by BLEU.
4 0.56891286 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
Author: Jun Sun ; Min Zhang ; Chew Lim Tan
Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1
5 0.42003608 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
Author: Hailong Cao ; Eiichiro Sumita
Abstract: Source language parse trees offer very useful but imperfect reordering constraints for statistical machine translation. A lot of effort has been made for soft applications of syntactic constraints. We alternatively propose the selective use of syntactic constraints. A classifier is built automatically to decide whether a node in the parse trees should be used as a reordering constraint or not. Using this information yields a 0.8 BLEU point improvement over a full constraint-based system.
6 0.41656917 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
8 0.40984964 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
9 0.4041788 56 acl-2010-Bridging SMT and TM with Translation Recommendation
10 0.39789826 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
11 0.395307 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
12 0.39493757 71 acl-2010-Convolution Kernel over Packed Parse Forest
13 0.38932639 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
14 0.38824546 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
15 0.38817376 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
16 0.38088447 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
17 0.37904283 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
18 0.37699538 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
19 0.3768588 158 acl-2010-Latent Variable Models of Selectional Preference
20 0.37645298 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts