acl acl2010 acl2010-37 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
Reference: text
sentIndex sentText sentNum sentScore
1 Our method correctly determines the matching words between two sentences using corresponding noun phrases. [sent-5, score-0.364]
2 Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. [sent-7, score-0.577]
3 Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency. [sent-8, score-0.294]
4 The scores of some automatic evaluation methods can obtain high correlation with human judgment in document-level automatic evaluation(Coughlin, 2007) . [sent-10, score-0.441]
5 As described herein, for use with MT sys- tems, we propose a new automatic evaluation method using noun-phrase chunking to obtain higher sentence-level correlations. [sent-27, score-0.29]
6 Using noun phrases produced by chunking, our method yields the correct word correspondences and determines the similarity between two sentences in terms of the noun phrase order of appearance. [sent-28, score-0.744]
7 , 2008) demonstrate that the scores obtained using our system yield the highest correlation with the human judgments among the automatic evaluation methods in both sentence-level adequacy and fluency. [sent-30, score-0.711]
8 Moreover, the differences between correlation coefficients obtained using our method and other methods are statistically significant at the 5% or lower significance level for adequacy. [sent-31, score-0.527]
9 Results confirmed that our method using noun-phrase chunking is effective for automatic evaluation for machine translation. [sent-32, score-0.29]
10 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 108–1 7, spondences of noun phrases between MT outputs and references using chunking. [sent-36, score-0.379]
11 Secondly, the system calculates word-level scores based on the correct matched words using the determined correspondences of noun phrases. [sent-37, score-0.569]
12 The system calculates the final scores combining word-level scores and phrase-level scores. [sent-39, score-0.34]
13 1 Correspondence of Noun Phrases by Chunking The system obtains the noun phrases from each sentence by chunking. [sent-41, score-0.417]
14 It then determines corresponding noun phrases between MT outputs and references calculating the similarity for two noun phrases by the PER score(Su et al. [sent-42, score-0.813]
15 The high score represents that the similarity between two noun phrases is high. [sent-48, score-0.39]
16 Figure 1 presents an example of the determination of the corresponding noun phrases. [sent-49, score-0.361]
17 (1) Use of noun phrase chunking MT output : in general , [NP the amount ] of [NP the crowning fall ] is large like [NP the end ] . [sent-50, score-0.944]
18 Reference : generally , the closer [NP it ] is to [NP the end part ] , the larger [NP the amount ] of [NP crowning drop ] is . [sent-51, score-0.465]
19 (2) Determination of corresponding noun phrases MT output : Retignhsf elra gnarcel ya:irk,[etNhP t1hc. [sent-52, score-0.452]
20 Figure 1: Example of determination of corresponding noun phrases. [sent-56, score-0.361]
21 1, “the amount” , “the crowning fall” and “the end” are obtained as noun phrases in MT output by chunking, and “it” , “the end part” , “the amount” and “crowning drop” are obtained in the reference by chunking. [sent-58, score-0.972]
22 Next, the system determines the corresponding noun phrases from these noun phrases between the MT output and reference. [sent-59, score-0.887]
23 The score between “the end” and “the end part” is the highest among the scores between “the end” in the MT output and “it” , “the end part” , “the amount” , and “crowning drop” in the reference. [sent-60, score-0.339]
24 Moreover, the score between “the end part” and “the end” is the highest among the scores between “the end part” in reference and “the amount” , “the crowning fall” , “the end” in the MT output. [sent-61, score-0.642]
25 Consequently, “the end” and “the end part” are selected as noun phrases with the highest mutual scores: “the end” and “the end part” are determined as one corresponding noun phrase. [sent-62, score-0.783]
26 1, “the amount” in the MT output and “the amount” in reference, and “the crowning fall” in the MT output and “crowning drop” in the reference also are determined as the respective corresponding noun phrases. [sent-64, score-0.823]
27 The noun phrase for which the score between it and other noun phrases is 0. [sent-65, score-0.637]
28 The use of the noun phrases is effective because the frequency of the noun phrases is higher than those of other phrases. [sent-69, score-0.684]
29 It is difficult to determine the corresponding verb phrases correctly because the words in each verb phrase are often fewer than the noun phrases. [sent-71, score-0.411]
30 2 Word-level Score The system calculates the word-level scores between MT output and reference using the corresponding noun phrases. [sent-73, score-0.669]
31 In IMPACT, using no analytical knowledge, the LCS route is determined using the information of the number of words in the common parts and the position of the common parts. [sent-93, score-0.482]
32 In the LCS route selected by IMPACT, the weight of “the” in the common part “, the” is 1 because “the” in the reference is not included in the corresponding noun phrase. [sent-107, score-0.59]
33 In the LCS route selected using our method, the weight of “the” in “the amount of” is 2 because “the” in MT output and “the” in the reference are included in the corresponding noun phrase “NP1” . [sent-108, score-0.68]
34 Moreover, the word-level score is calculated using the common parts in the selected LCS route as the following Eqs. [sent-110, score-0.367]
35 ⎠⎞(β14) (1) First process for determination of common parts : LCS = 7 Our method MT output : Rglienasrfgleargena[lcNryeaiPkl1,:teh[1N2eP. [sent-120, score-0.376]
36 f,alhe] IMPACT MT output : (2) Second process for determination of common parts : LCS=3 Our method MT output : in general , [NP1 the amount ] of [NP2 the crowning fall ] is large like [NP3 the end ] Reference : , . [sent-132, score-0.919]
37 , generally the closer [NP it ] is to [NP3 the end part ] the larger [NP1 the amount ] of [NP2 crowning drop ] is . [sent-133, score-0.465]
38 Therefore, Rwd and Pwd becomes small as the appearance order of the common parts between MT output and reference is different. [sent-144, score-0.304]
39 The system calculates scorewd as the word- × × level score in Eq. [sent-148, score-0.481]
40 using Stwo common parts “the” a×n(d1 “the end” , except the common parts determined using the first process. [sent-176, score-0.341]
41 The system can determine the matching words correctly using the corresponding noun phrases between the MT output and the reference. [sent-208, score-0.492]
42 The system calculates scorewd multi using Rwd multi and Pwd multi which are, respectively, maximum Rwd and Pwd when multiple references are used as the following Eqs. [sent-209, score-0.931]
43 (8) , γ is determined as Pwd multi/Rwd multi The scorewd multi is between 0. [sent-212, score-0.648]
44 Rwd multi = Pwd multi = maxju=1⎛⎜ ⎜ ⎝ ⎜ ⎜ ⎛? [sent-216, score-0.332]
45 3 Phrase-level Score The system calculates the phrase-level score using the noun phrases obtained by chunking. [sent-225, score-0.617]
46 First, the system extracts only noun phrases from sentences. [sent-226, score-0.382]
47 (1) Corresponding noun phrases MT output : in general , [NP1 the amount ] of [NP2 the crowning fall ] is large like [NP3 the end ] . [sent-229, score-0.885]
48 Reference : generally , the closer [NP it ] is to [NP3 the end part ] , the larger [NP1 the amount ] of [NP2 crowning drop ] is . [sent-230, score-0.465]
49 (2) Generalization by noun phrases MT output : NP1 NP2 NP3 Reference : NP NP3 NP1 NP2 Figure 3: Example of generalization by noun phrases. [sent-231, score-0.63]
50 Figure 3 presents three corresponding noun phrases between the MT output and the reference. [sent-232, score-0.452]
51 The noun phrase “it” , which has no corresponding noun phrase, is expressed as “NP” in the reference. [sent-233, score-0.503]
52 Subsequently, the system obtains the phraselevel score between the generalized MT output and reference as the following Eqs. [sent-235, score-0.324]
53 (9) and (10) , cnpp denotes the common noun phrase parts; mcnp and ncnp respectively signify the quantities of common noun phrases in the reference and MT output. [sent-257, score-0.821]
54 Moreover, mno cnp and nno cnp are the quantities of noun phrases except the common noun phrases in the reference and MT output. [sent-258, score-1.23]
55 The values of mno cnp and nno cnp are processed as 1when no non-corresponding noun phrases exist. [sent-259, score-0.728]
56 The square root used for mno cnp and nno cnp is to decrease the weight of the noncorresponding noun phrases. [sent-260, score-0.603]
57 The system obtains scorenp multi calculating the average of scorenp when multiple references are used as the following Eq. [sent-277, score-0.609]
58 4 Final Score The system calculates the final score by combining the word-level score and the phraselevel score as shown in the following Eq. [sent-281, score-0.354]
59 The ratio of scorewd to scorenp is 1:1when δ is 1. [sent-286, score-0.479]
60 Moreover, scorewd multi and scorenp multi are used for Eq. [sent-288, score-0.781]
61 1 Experimental Procedure We calculated the correlation between the scores obtained using our method and scores produced by human judgment. [sent-300, score-0.512]
62 The system based on our method obtained the evaluation scores for 1,200 English output sentences related to the patent sentences. [sent-301, score-0.46]
63 We calculated Pearson’s correlation efficient and Spearman’s rank correlation efficient between the scores obtained using our method and the scores by human judgments in terms of sentence-level adequacy and fluency. [sent-308, score-0.957]
64 Additionally, we calculated the correlations between the scores using seven other methods and the scores by human judgments to compare our method with other automatic evaluation methods. [sent-309, score-0.347]
65 112 Table 2: Pearson’s correlation coefficient for sentence-level adequacy. [sent-324, score-0.297]
66 Moreover, we obtained the noun phrases using a shallow parser(Sha and Pereira, 2003) as the chunking tool. [sent-326, score-0.555]
67 Tables 2 and 3 respectively show Pearson’s correlation coefficient for sentence-level adequacy and fluency. [sent-330, score-0.477]
68 Tables 4 and 5 respectively show Spearman’s rank correlation coefficient for sentence-level adequacy and fluency. [sent-331, score-0.477]
69 In Tables 2–5, bold typeface signifies the maximum correlation coefficients among eight automatic evaluation methods. [sent-332, score-0.547]
70 Underlining in our method signifies that the differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level. [sent-333, score-0.68]
71 ” signifies the aver- age of the correlation coefficients obtained by 12 machine translation systems in respective automatic evaluation methods, and “All” are the correlation coefficients using the scores of 1,200 output sentences obtained using the 12 machine translation systems. [sent-335, score-1.308]
72 8 and “All” of Tables 2 and 4, the differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level. [sent-343, score-0.527]
73 Moreover, we investigated the correlation of machine translation systems of every type. [sent-344, score-0.314]
74 Table 6 shows “All” of Pearson’s correlation coefficient and Spearman’s rank correlation coefficient in SMT (i. [sent-345, score-0.594]
75 The scores of 900 output sentences obtained by 9 machine 113 Table 3: Pearson’s correlation coefficient for sentence-level fluency. [sent-354, score-0.513]
76 translation systems in SMT and the scores of 200 output sentences obtained by 2 machine translation systems in RBMT are used respectively. [sent-355, score-0.392]
77 In Table 6, our method obtained the highest correlation among the eight methods, except in terms of the adequacy of RBMT in Pear- son’s correlation coefficient. [sent-358, score-0.775]
78 The differences between correlation coefficients obtained using our method and IMPACT are statistically significant at the 5% significance level for adequacy of SMT. [sent-359, score-0.707]
79 In this case, BLEU scores were used as scorewd in Eq. [sent-361, score-0.351]
80 In the results of “BLEU with our method” in Tables 2–5, underlining signifies that the differences between correlation coefficients obtained using BLEU with our method and BLEU alone are statistically significant at the 5% significance level. [sent-364, score-0.679]
81 The coefficients of correlation for BLEU with our method are higher than those of BLEU in any machine translation system, “Avg. [sent-365, score-0.482]
82 These results indicate that our method using noun-phrase chunking is effective for some methods and that it is statistically significant in each machine translation system, not only “All” , which has large sentences. [sent-368, score-0.333]
83 Subsequently, we investigated the precision of the determination process of the corresponding noun phrases described in section 2. [sent-369, score-0.486]
84 1, we calculated the precision as the ratio of the number of the correct corresponding noun phrases for the number of all noun-phrase correspondences obtained using the system based on our method. [sent-371, score-0.557]
85 4%, demonstrating that our method can determine the corresponding noun phrases correctly. [sent-373, score-0.436]
86 Moreover, we investigated the relation be114 Table 4: Spearman’s rank correlation coefficient for sentence-level adequacy. [sent-374, score-0.297]
87 tween the correlation obtained by our method and the quality of chunking. [sent-375, score-0.34]
88 In “Our method” shown in Tables 2–5, noun phrases for which some erroneous results obtained using the chunking tool were revised. [sent-376, score-0.555]
89 “Our method II” of Tables 2–5 used noun phrases that were given as results obtained using the chunking tool. [sent-377, score-0.61]
90 Underlining in “Our method II” of Tables 2–5 signifies that the differences between correlation coefficients obtained using our method II and IMPACT are statistically significant at the 5% significance level. [sent-378, score-0.68]
91 ” and “All” of Tables 2–5, the correlation coefficients of our method II without the revised noun phrases are lower than those of our method using the revised noun phrases. [sent-380, score-1.084]
92 The performance of the chunking tool has no great influence on the results of our method because scorewd in Eqs. [sent-383, score-0.474]
93 2 when “the crowning fall” in the MT output and “crowning drop” in the reference are not determined as the noun phrases. [sent-387, score-0.713]
94 Other common parts are determined correctly because the weight of the common part “the amount of” is higher than those of other common parts by Eqs. [sent-388, score-0.474]
95 Consequently, the determination of the common parts except “the amount of” is not difficult. [sent-390, score-0.311]
96 Results show that the correlation coefficients of IMPACT with our method, for which IMPACT scores were used as scorewd in Eq. [sent-393, score-0.69]
97 4 Conclusion As described herein, we proposed a new automatic evaluation method for machine transla115 Table 5: Spearman’s rank correlation coefficient for sentence-level fluency. [sent-397, score-0.433]
98 Our method calculates the scores for MT outputs using noun-phrase chunking. [sent-408, score-0.306]
99 Consequently, the system obtains scores using the correctly matched words and phrase-level information based on the corresponding noun phrases. [sent-409, score-0.417]
100 Experimental results demonstrate that our method yields the highest correlation among eight methods in terms of sentencelevel adequacy and fluency. [sent-410, score-0.49]
wordName wordTfidf (topN-words)
[('lcs', 0.375), ('crowning', 0.286), ('scorewd', 0.265), ('mt', 0.262), ('correlation', 0.226), ('noun', 0.217), ('scorenp', 0.184), ('adequacy', 0.18), ('route', 0.174), ('multi', 0.166), ('chunking', 0.154), ('cnp', 0.143), ('calculates', 0.128), ('pwd', 0.125), ('phrases', 0.125), ('rwd', 0.122), ('patent', 0.116), ('coefficients', 0.113), ('determination', 0.105), ('signifies', 0.098), ('tables', 0.091), ('translation', 0.088), ('reference', 0.088), ('moreover', 0.087), ('bleu', 0.086), ('scores', 0.086), ('parts', 0.073), ('common', 0.072), ('output', 0.071), ('coefficient', 0.071), ('np', 0.069), ('end', 0.067), ('impact', 0.064), ('spearman', 0.062), ('araki', 0.061), ('rbmt', 0.061), ('amount', 0.061), ('obtained', 0.059), ('fall', 0.058), ('method', 0.055), ('utiyama', 0.055), ('pearson', 0.055), ('fujii', 0.054), ('mno', 0.054), ('pnp', 0.054), ('underlining', 0.054), ('determines', 0.053), ('drop', 0.051), ('determined', 0.051), ('automatic', 0.048), ('score', 0.048), ('summit', 0.047), ('hiroshi', 0.047), ('correspondences', 0.047), ('ir', 0.046), ('nno', 0.046), ('meteor', 0.044), ('counter', 0.044), ('masao', 0.044), ('xii', 0.044), ('phraselevel', 0.042), ('atsushi', 0.041), ('cncnp', 0.041), ('ebmt', 0.041), ('ehara', 0.041), ('japio', 0.041), ('leusch', 0.041), ('maxju', 0.041), ('mehay', 0.041), ('mikio', 0.041), ('mutton', 0.041), ('oyamada', 0.041), ('rnp', 0.041), ('slength', 0.041), ('takehito', 0.041), ('terumasa', 0.041), ('tesesm', 0.041), ('analytical', 0.04), ('system', 0.04), ('japanese', 0.039), ('corresponding', 0.039), ('judgments', 0.039), ('revised', 0.038), ('significance', 0.038), ('outputs', 0.037), ('statistically', 0.036), ('rs', 0.036), ('pozar', 0.036), ('obtains', 0.035), ('kenji', 0.034), ('herein', 0.033), ('nii', 0.033), ('routes', 0.033), ('evaluation', 0.033), ('ii', 0.031), ('therein', 0.031), ('phrase', 0.03), ('consequently', 0.03), ('ratio', 0.03), ('eight', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
2 0.13291918 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
3 0.12852809 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
Author: Xiaojun Wan ; Huiying Li ; Jianguo Xiao
Abstract: Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1
4 0.10948976 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
5 0.10083904 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
6 0.097039729 54 acl-2010-Boosting-Based System Combination for Machine Translation
7 0.092856161 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
9 0.070681274 104 acl-2010-Evaluating Machine Translations Using mNCD
10 0.067740187 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
11 0.06770733 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
12 0.06649942 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
13 0.061559692 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
14 0.060861889 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
15 0.060451936 169 acl-2010-Learning to Translate with Source and Target Syntax
16 0.05450156 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
17 0.054204591 133 acl-2010-Hierarchical Search for Word Alignment
18 0.054105777 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
19 0.053510949 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling
20 0.052619625 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
topicId topicWeight
[(0, -0.15), (1, -0.071), (2, -0.047), (3, 0.002), (4, 0.023), (5, 0.033), (6, -0.026), (7, -0.073), (8, -0.034), (9, 0.056), (10, 0.122), (11, 0.095), (12, 0.014), (13, -0.03), (14, 0.002), (15, 0.046), (16, 0.029), (17, 0.025), (18, -0.034), (19, 0.014), (20, -0.04), (21, -0.002), (22, 0.099), (23, 0.011), (24, -0.014), (25, -0.099), (26, 0.088), (27, 0.108), (28, -0.05), (29, 0.064), (30, 0.048), (31, -0.007), (32, 0.071), (33, 0.079), (34, -0.044), (35, 0.056), (36, 0.126), (37, -0.162), (38, -0.209), (39, -0.05), (40, -0.018), (41, 0.043), (42, 0.017), (43, 0.027), (44, 0.068), (45, -0.048), (46, 0.001), (47, 0.055), (48, 0.013), (49, 0.073)]
simIndex simValue paperId paperTitle
same-paper 1 0.95663637 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
2 0.82712978 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation
Author: Ondrej Bojar ; Kamil Kos ; David Marecek
Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.
3 0.77151257 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking
Author: Radu Soricut ; Abdessamad Echihabi
Abstract: The adoption ofMachine Translation technology for commercial applications is hampered by the lack of trust associated with machine-translated output. In this paper, we describe TrustRank, an MT system enhanced with a capability to rank the quality of translation outputs from good to bad. This enables the user to set a quality threshold, granting the user control over the quality of the translations. We quantify the gains we obtain in translation quality, and show that our solution works on a wide variety of domains and language pairs.
4 0.76683289 104 acl-2010-Evaluating Machine Translations Using mNCD
Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen
Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.
5 0.65592647 56 acl-2010-Bridging SMT and TM with Translation Recommendation
Author: Yifan He ; Yanjun Ma ; Josef van Genabith ; Andy Way
Abstract: We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding ex- act matches. Furthermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels.
6 0.54382241 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures
7 0.52736419 54 acl-2010-Boosting-Based System Combination for Machine Translation
8 0.47195733 57 acl-2010-Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
9 0.43831035 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
10 0.43385646 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
11 0.43163303 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
12 0.41999334 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
13 0.41249171 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
14 0.40157041 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
15 0.37713718 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
16 0.3763983 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
17 0.37558481 151 acl-2010-Intelligent Selection of Language Model Training Data
18 0.37313303 39 acl-2010-Automatic Generation of Story Highlights
19 0.37165701 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
20 0.37105423 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
topicId topicWeight
[(16, 0.012), (25, 0.044), (39, 0.011), (42, 0.015), (44, 0.107), (56, 0.286), (59, 0.102), (73, 0.043), (78, 0.028), (83, 0.064), (84, 0.019), (98, 0.156)]
simIndex simValue paperId paperTitle
1 0.87108266 64 acl-2010-Complexity Assumptions in Ontology Verbalisation
Author: Richard Power
Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.
same-paper 2 0.74961972 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
3 0.64087486 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou
Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1
4 0.63311696 210 acl-2010-Sentiment Translation through Lexicon Induction
Author: Christian Scheible
Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
5 0.63062513 165 acl-2010-Learning Script Knowledge with Web Experiments
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
6 0.57952464 243 acl-2010-Tree-Based and Forest-Based Translation
7 0.57037491 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
8 0.56688428 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
10 0.56157094 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
11 0.56135023 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
12 0.55887437 170 acl-2010-Letter-Phoneme Alignment: An Exploration
13 0.55857229 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
14 0.55818862 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
15 0.55796319 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
16 0.55795723 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
17 0.55753136 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
18 0.55653578 79 acl-2010-Cross-Lingual Latent Topic Extraction
19 0.5559594 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling
20 0.55464685 116 acl-2010-Finding Cognate Groups Using Phylogenies