emnlp emnlp2012 emnlp2012-50 knowledge-graph by maker-knowledge-mining

50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level


Source: pdf

Author: Billy T. M. Wong ; Chunyu Kit

Abstract: This paper proposes the utilization of lexical cohesion to facilitate evaluation of machine translation at the document level. As a linguistic means to achieve text coherence, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meaning. A comparison between machine and human translation is conducted to illustrate one of their critical distinctions that human translators tend to use more cohesion devices than machine. Various ways to apply this feature to evaluate machinetranslated documents are presented, including one without reliance on reference translation. Experimental results show that incorporating this feature into sentence-level evaluation metrics can enhance their correlation with human judgements.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hk Abstract This paper proposes the utilization of lexical cohesion to facilitate evaluation of machine translation at the document level. [sent-7, score-1.02]

2 As a linguistic means to achieve text coherence, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meaning. [sent-8, score-0.856]

3 A comparison between machine and human translation is conducted to illustrate one of their critical distinctions that human translators tend to use more cohesion devices than machine. [sent-9, score-1.108]

4 Experimental results show that incorporating this feature into sentence-level evaluation metrics can enhance their correlation with human judgements. [sent-11, score-0.195]

5 1 Introduction Machine translation (MT) has benefited a lot from the advancement of automatic evaluation in the past decade. [sent-12, score-0.111]

6 To a certain degree, its progress is also confined to the limitations of evaluation metrics in use. [sent-13, score-0.109]

7 As a consequence, MT system optimized this way to any of these metrics can only have a very dim chance of producing translated document that reads as natural as human writing. [sent-21, score-0.186]

8 The accuracy of MT output at the document level is particularly important to MT users, for they care about the overall meaning of a text in question more than the grammatical correctness of each sentence (Visser and Fuji, 1996). [sent-22, score-0.166]

9 Post-editors particularly need to ensure the quality of a whole document of MT output when revising its sentences. [sent-23, score-0.114]

10 This paper studies the inter-sentential linguistic features of cohesion and coherence and presents plausible ways to incorporate them into the sentence-based metrics to support MT evaluation at the document level. [sent-25, score-0.984]

11 Cohe- sion is realized via the interlinkage of grammatical and lexical elements across sentences. [sent-29, score-0.159]

12 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls cohesion refers to the syntactic links between text items, while lexical cohesion is achieved through the word choices in a text. [sent-32, score-1.579]

13 A quantitative comparison of lexical cohesion devices between MT output and human translation is first conducted, to examine the weakness of current MT systems in handling this feature. [sent-34, score-1.194]

14 Different ways of exploiting lexical cohesion devices for MT evaluation at the document level are then illustrated. [sent-35, score-1.201]

15 They can hardly be evaluated in isolation and have to be conjoined with other quality criteria such as adequacy and fluency. [sent-37, score-0.138]

16 A survey of MT post-editing (Vasconcellos, 1989) suggests that cohesion and coherence serve as higher level quality criteria beyond many others such as syntactic well-formedness. [sent-38, score-0.865]

17 Post-editors tend to correct syntactic errors first before any amend- ment for improving the cohesion and coherence of an MT output. [sent-39, score-0.801]

18 Cohesion and coherence are appropriate to serve as criteria for the overall quality of MT output. [sent-41, score-0.118]

19 Previous researches in MT predominantly focus on specific types of cohesion devices. [sent-42, score-0.723]

20 For lexical cohesion, it has been only partially and indirectly addressed in terms of translation consistency in MT output. [sent-53, score-0.233]

21 Carpuat (2009) also observes a general tendency in human translation that a given sense is usually lexicalized in a consistent manner throughout the whole translation. [sent-58, score-0.114]

22 Miller and Vanni (2001) devise a human evaluation approach to measure the comprehensibility of a text as a whole, based on the Rhetorical Struc- ture Theory (Mann and Thompson, 1988), a theory of text organization specifying coherence relations in an authentic text. [sent-60, score-0.169]

23 (2010) present a family of automatic MT evaluation measures, based on the Discourse Representation Theory (Kamp and Reyle, 1993), that generate semantic trees to put together different text entities for the same referent according to their contexts and grammatical connections. [sent-66, score-0.113]

24 Apart from MT evaluation, automated essay scoring programs such as E-rater (Burstein, 2003) also employ a rich set of discourse features for assessment. [sent-67, score-0.115]

25 Lexical cohesion has far been neglected in both MT and MT evaluation, even though it is the single most important form of cohesion devices, account- ing for nearly half of the cohesion devices in English (Halliday and Hasan, 1976). [sent-70, score-2.414]

26 It is also a significant feature contributing to translation equivalence of texts by preserving their texture (Lotfipour-Saedi, 1997). [sent-71, score-0.119]

27 The lexical cohesion devices in a text can be represented as lexical chains conjoining related entities. [sent-72, score-1.241]

28 There are many methods of computing lexical chains for various purposes, e. [sent-73, score-0.14]

29 Contrary to grammatical cohesion highly depending on syntactic well-formedness of a text, lexical cohesion is less affected by grammatical errors. [sent-77, score-1.652]

30 In this research, a number of formulations of lexical cohesion, with or without reliance on external language resource, will be explored for the purpose of MT evaluation. [sent-79, score-0.112]

31 3 Lexical Cohesion in Machine and Human Translation This section presents a comparative study of MT and human translation (HT) in terms of the use of lexical cohesion devices. [sent-80, score-0.949]

32 It is an intuition that more co- hesion devices are used by humans than machines in translation, as part of the superior quality of HT. [sent-81, score-0.316]

33 The results confirm the incapability of MT in handling this feature and the necessity of using lexical cohesion in MT evaluation. [sent-83, score-0.835]

34 Both datasets include human assessments of MT output, from which the part of adequacy assessment is selected for this study. [sent-89, score-0.261]

35 2 Identification of Lexical Cohesion Devices Lexical cohesion is achieved through word choices of two major types: reiteration and collocation. [sent-92, score-0.77]

36 Collocation refers to those lexical items that share the same or similar semantic relations, including complementarity, antonym, converse, coordinate term, meronym, troponym, and so on. [sent-95, score-0.154]

37 In this study, lexical cohesion devices are defined as content words (i. [sent-96, score-1.103]

38 To classify the semantic relationships of words, WordNet (Fellbaum, 1998) is used as a lexical resource, which clusters words of the same sense (i. [sent-101, score-0.134]

39 Superordinate and collocation are formed by words in a proximate semantic relationship, such as bicycle and vehicle (hypernym), bicycle and wheel (meronym), bicycle and car (coordinate term), and so on. [sent-105, score-0.145]

40 3 Results The difference between MT and HT (reference translation) in terms of the frequencies of lexical co- hesion devices in MetricsMATR and MTC4 datasets is presented in Table 2. [sent-116, score-0.41]

41 A further categorization breaks down content words into lexical cohesion devices and those that are not. [sent-118, score-1.103]

42 The count of each type of lexical cohesion device is also provided. [sent-119, score-0.835]

43 , not lexical cohesion devices) are close in MT and HT. [sent-126, score-0.835]

44 The difference ofcontent words 1063 in HT and MT is mostly due to that of lexical cohesion devices, which are mostly repetition. [sent-127, score-0.866]

45 4% more lexical cohesion devices are found in HT than in MT in the datasets. [sent-130, score-1.08]

46 A further analysis is carried out to investigate into the use of lexical cohesion devices in each version of MT and HT in terms of the following two ratios, LC = lexical cohesion devices / content words, RC = repetition / content words. [sent-131, score-2.242]

47 A higher LC or RC ratio means that a greater proportion ofcontent words are used as lexical cohesion devices. [sent-132, score-0.866]

48 Figure 1 illustrates the RC and LC ratios in the two datasets. [sent-133, score-0.158]

49 The ratios of different MT systems are presented in an ascending order in each graph from left to right, according to their human assessment results. [sent-134, score-0.245]

50 First, most of the RC and LC ratios are within an observable range, i. [sent-136, score-0.158]

51 107 (number of lexical cohesion devices: 5) Human assessment: 2. [sent-145, score-0.835]

52 23 1(number of lexical cohesion devices: 9) Human assessment: 4. [sent-149, score-0.835]

53 Table 3: An example of MT outputs of different quality (underlined: matched n-grams; italic: lexical cohesion devices) one MT system. [sent-152, score-0.896]

54 Second, the ratios in those differ- ent HT versions are very stable in comparison with those of MT. [sent-153, score-0.18]

55 This shows a typical level of the use of lexical cohesion device. [sent-156, score-0.859]

56 Third, the ratios in MT are lower than or at most equal to those in HT, suggesting their correlation with translation quality: the closer their RC and LC ratios to those in HT, the better the MT. [sent-157, score-0.464]

57 These results verify our assumption that lexical cohesion can serve as an effective proxy of the level of translation quality. [sent-158, score-0.947]

58 4 MT Evaluation at Document Level As a feature at the discourse level, lexical cohesion is a good complement to current evaluation metrics focusing on features at the sentence level. [sent-159, score-0.995]

59 The n-grams matched with the reference are under1064 lined, while the lexical cohesion devices are italicized. [sent-161, score-1.107]

60 Instead, their LC ratios seem to represent such a variation more accurately. [sent-164, score-0.158]

61 The theme of the second output is also highlighted through the lexical chains, including main/important, technology/technologies and achieve/achieving, which create a tight texture between the two sentences, a crucial factor of text quality. [sent-165, score-0.164]

62 To perform MT evaluation at the document level, the LC and RC ratios can be used alone or integrated into a sentence-level metric. [sent-166, score-0.255]

63 Although lexical cohesion gives a strong indication of text coherence, it is not indispensable, because a text can be coherent without any surface cohesive clue. [sent-170, score-0.877]

64 Furthermore, the quality of a document is also reflected in that of its sentences. [sent-171, score-0.114]

65 A coherent translation may be mistranslated, and on the other hand, a text containing lots of sentence-level errors would make it difficult to determine its document-level quality. [sent-172, score-0.109]

66 A previous study comparing MT evaluation at the sentence versus document level (Wong et al. [sent-173, score-0.121]

67 5 Experiments We examine, through experiments, the effectiveness of using LC and RC ratios alone and integrating them into other evaluation metrics for MT evaluation at the document and system levels. [sent-176, score-0.364]

68 These metrics are evaluated in terms of their correlation with human assessments, using Pearson’s r correlation coefficient. [sent-179, score-0.232]

69 The MetricsMATR and MTC4 datasets and their adequacy assessments are used as evaluation data. [sent-180, score-0.197]

70 Note that the adequacy assessment is in fact an evaluation method for the sentence level. [sent-181, score-0.182]

71 The integration ofthe two ratios into an evaluation metric follows a simple weighted average approach. [sent-184, score-0.207]

72 Table 4 shows the optimized weights for the metrics for evaluation at the document level. [sent-191, score-0.183]

73 18 Table 4: Optimized weights for the integration of discourse feature into sentence-level metrics Table 5 presents the correlation rates of evaluation metrics obtained in our experiments under different settings, with their 95% conference intervals (CI) provided. [sent-198, score-0.332]

74 The LC and RC ratios are found to have strong correlations with human assessments at the system level even when used alone, highly comparable to BLEU and TER. [sent-199, score-0.29]

75 Its correlation cannot be improved by integrating LC or RC, and is even slightly dropped at the document level. [sent-214, score-0.134]

76 Nevertheless, these results confirm the close relationship of an MT system’s capability to appropriately generate lexical cohesion devices with the quality of its output. [sent-216, score-1.12]

77 Table 6 presents the Pearson correlations between evaluation results at the document level using different evaluation metrics in the MTC4 data. [sent-217, score-0.258]

78 It illustrates the homogeneity/heterogeneity of different metrics and helps explain the performance change Table 5: Correlation of different metrics with adequacy assessment in MTC4 data BTRMELCEERTUEOR-000. [sent-218, score-0.331]

79 7C36L1C Table 6: Correlation between the evaluation results of different metrics by combining sentence- and document-level metrics. [sent-228, score-0.109]

80 The table shows that the two ratios LC and RC highly correlate with each other, as if they are two variants of quantifying lexical cohesion devices. [sent-229, score-0.993]

81 On the one hand, lexical cohesion is word choice oriented, which is only sensitive to the reiteration and semantic relatedness of words in MT output. [sent-237, score-0.904]

82 6 Discussion and Conclusion In this study we have attempted to address the problem that most existing MT evaluation metrics disregard the connectivity of sentences in a document. [sent-240, score-0.13]

83 , lexical cohesion, we have shown that its use frequency is a significant factor to differentiate HT from MT and MT outputs of different quality from each other. [sent-243, score-0.173]

84 The high correlation rate of its use with translation adequacy also suggests that the more lexical cohesion devices in use, the better the quality of MT output. [sent-244, score-1.366]

85 Our approach to extending the granularity of MT evaluation from sentence to document through lexical cohesion is highly applicable to different languages. [sent-249, score-0.932]

86 Our future work will continue to explore the relationship of lexical cohesion to translation quality, so as to identify, apart from its use frequency, other significant aspects for MT evaluation at the document level. [sent-252, score-1.02]

87 A frequent use of cohesion devices in a text is not necessarily appropriate, because an excess of them may decrease the quality and readability of a text. [sent-253, score-1.029]

88 Human writers can strategically change the ways of expression to achieve appropriate coherence and also avoid overuse of the same lexical item. [sent-254, score-0.19]

89 To a certain extent, this is one of the causes for the unnaturalness of MT output: it may contain a large number of lexical cohesion devices which are simply direct translation of those in a source text that do not fit in the target context. [sent-255, score-1.189]

90 How to use lexical cohesion devices appropriately instead of frequently is thus an important issue to tackle before we can adopt them in MT and MT evaluation by a suitable means. [sent-256, score-1.103]

91 Extraction of sailent textual patterns: Synergy between lexical cohesion and contextual coherence. [sent-286, score-0.835]

92 Automatic validation of terminology translation consistency with statistical method. [sent-309, score-0.121]

93 Scaling the ISLE taxonomy: Development of metrics for the multi-dimensional characterisation of machine translation quality. [sent-340, score-0.174]

94 Lexical cohesion computed by thesaural relations as an indicator of the structure of text. [sent-344, score-0.723]

95 Determination of referential property and number of nouns in Japanese sentences for machine translation into English. [sent-348, score-0.122]

96 Zero pronoun resolution in a machine translation system by using Japanese to English verbal semantic attributes. [sent-356, score-0.131]

97 2008 NIST metrics for machine translation (MetricsMATR08) development data. [sent-381, score-0.174]

98 A study of translation edit rate with targeted human annotation. [sent-385, score-0.114]

99 Cohesion and coherence in the presentation of machine translation products. [sent-399, score-0.166]

100 Lexical cohesion for evaluation of machine translation at document level. [sent-419, score-0.908]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cohesion', 0.723), ('mt', 0.377), ('devices', 0.245), ('ratios', 0.158), ('rc', 0.113), ('lexical', 0.112), ('metricsmatr', 0.098), ('adequacy', 0.098), ('nakaiwa', 0.094), ('ht', 0.092), ('translation', 0.088), ('metrics', 0.086), ('coherence', 0.078), ('document', 0.074), ('bleu', 0.071), ('meteor', 0.07), ('assessment', 0.061), ('correlation', 0.06), ('ter', 0.057), ('assessments', 0.054), ('discourse', 0.051), ('japanese', 0.051), ('snover', 0.05), ('grammatical', 0.047), ('billy', 0.047), ('halliday', 0.047), ('hiromi', 0.047), ('reiteration', 0.047), ('murata', 0.042), ('essay', 0.041), ('bicycle', 0.041), ('quality', 0.04), ('repetition', 0.036), ('referential', 0.034), ('chunyu', 0.034), ('wong', 0.034), ('consistency', 0.033), ('summit', 0.032), ('masaki', 0.032), ('kit', 0.032), ('breakthroughs', 0.031), ('cartoni', 0.031), ('comelles', 0.031), ('comtis', 0.031), ('femti', 0.031), ('hesion', 0.031), ('ikehara', 0.031), ('itagaki', 0.031), ('mdoc', 0.031), ('meronym', 0.031), ('mseg', 0.031), ('ofcontent', 0.031), ('peral', 0.031), ('sar', 0.031), ('satoru', 0.031), ('superordinate', 0.031), ('texture', 0.031), ('visser', 0.031), ('kong', 0.029), ('anaphora', 0.029), ('chains', 0.028), ('correlations', 0.028), ('reference', 0.027), ('burstein', 0.027), ('andrei', 0.027), ('georgetown', 0.027), ('hter', 0.027), ('interlingual', 0.027), ('kamp', 0.027), ('shirai', 0.027), ('integration', 0.026), ('human', 0.026), ('level', 0.024), ('mtc', 0.024), ('przybocki', 0.024), ('jill', 0.024), ('morris', 0.024), ('stemmer', 0.024), ('hong', 0.024), ('evaluation', 0.023), ('automated', 0.023), ('content', 0.023), ('pronouns', 0.023), ('gong', 0.023), ('hasan', 0.023), ('datasets', 0.022), ('versions', 0.022), ('semantic', 0.022), ('resolution', 0.021), ('rhetorical', 0.021), ('connectivity', 0.021), ('king', 0.021), ('outputs', 0.021), ('china', 0.021), ('text', 0.021), ('wordnet', 0.02), ('achieving', 0.02), ('explaining', 0.02), ('coordinate', 0.02), ('banerjee', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

Author: Billy T. M. Wong ; Chunyu Kit

Abstract: This paper proposes the utilization of lexical cohesion to facilitate evaluation of machine translation at the document level. As a linguistic means to achieve text coherence, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meaning. A comparison between machine and human translation is conducted to illustrate one of their critical distinctions that human translators tend to use more cohesion devices than machine. Various ways to apply this feature to evaluate machinetranslated documents are presented, including one without reliance on reference translation. Experimental results show that incorporating this feature into sentence-level evaluation metrics can enhance their correlation with human judgements.

2 0.10642584 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

Author: Mengqiu Wang ; Christopher D. Manning

Abstract: Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06,08; WMT0608).

3 0.067400791 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

Author: David Burkett ; Dan Klein

Abstract: We describe a transformation-based learning method for learning a sequence of monolingual tree transformations that improve the agreement between constituent trees and word alignments in bilingual corpora. Using the manually annotated English Chinese Translation Treebank, we show how our method automatically discovers transformations that accommodate differences in English and Chinese syntax. Furthermore, when transformations are learned on automatically generated trees and alignments from the same domain as the training data for a syntactic MT system, the transformed trees achieve a 0.9 BLEU improvement over baseline trees.

4 0.066800892 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

Author: Liwei Chen ; Yansong Feng ; Lei Zou ; Dongyan Zhao

Abstract: In this paper, we investigate different usages of feature representations in the web person name disambiguation task which has been suffering from the mismatch of vocabulary and lack of clues in web environments. In literature, the latter receives less attention and remains more challenging. We explore the feature space in this task and argue that collecting person specific evidences from a corpus level can provide a more reasonable and robust estimation for evaluating a feature’s importance in a given web page. This can alleviate the lack of clues where discriminative features can be reasonably weighted by taking their corpus level importance into account, not just relying on the current local context. We therefore propose a topic-based model to exploit the person specific global importance and embed it into the person name similarity. The experimental results show that the corpus level topic in- formation provides more stable evidences for discriminative features and our method outperforms the state-of-the-art systems on three WePS datasets.

5 0.061695304 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu

Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

6 0.061603304 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language

7 0.059709493 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

8 0.053086501 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

9 0.047798637 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

10 0.047743492 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

11 0.045326754 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

12 0.045269392 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering

13 0.044143587 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

14 0.043778207 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

15 0.043628681 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures

16 0.042594358 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

17 0.040370867 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

18 0.039431844 58 emnlp-2012-Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs

19 0.039008874 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming

20 0.038213935 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.142), (1, -0.018), (2, -0.101), (3, 0.016), (4, -0.031), (5, 0.006), (6, -0.03), (7, -0.003), (8, -0.02), (9, -0.043), (10, -0.015), (11, -0.039), (12, -0.063), (13, 0.116), (14, 0.061), (15, -0.03), (16, 0.037), (17, 0.002), (18, 0.072), (19, 0.124), (20, -0.015), (21, 0.055), (22, 0.097), (23, 0.045), (24, -0.011), (25, 0.071), (26, -0.055), (27, -0.191), (28, 0.04), (29, 0.157), (30, 0.119), (31, -0.224), (32, -0.063), (33, 0.082), (34, -0.194), (35, 0.182), (36, -0.177), (37, 0.062), (38, 0.097), (39, -0.188), (40, -0.196), (41, 0.024), (42, 0.077), (43, 0.06), (44, 0.125), (45, -0.225), (46, -0.136), (47, -0.11), (48, 0.048), (49, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97100961 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

Author: Billy T. M. Wong ; Chunyu Kit

Abstract: This paper proposes the utilization of lexical cohesion to facilitate evaluation of machine translation at the document level. As a linguistic means to achieve text coherence, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meaning. A comparison between machine and human translation is conducted to illustrate one of their critical distinctions that human translators tend to use more cohesion devices than machine. Various ways to apply this feature to evaluate machinetranslated documents are presented, including one without reliance on reference translation. Experimental results show that incorporating this feature into sentence-level evaluation metrics can enhance their correlation with human judgements.

2 0.72344828 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

Author: Mengqiu Wang ; Christopher D. Manning

Abstract: Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06,08; WMT0608).

3 0.3943809 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language

Author: Thomas Francois ; Cedrick Fairon

Abstract: This paper present a new readability formula for French as a foreign language (FFL), which relies on 46 textual features representative of the lexical, syntactic, and semantic levels as well as some of the specificities of the FFL context. We report comparisons between several techniques for feature selection and various learning algorithms. Our best model, based on support vector machines (SVM), significantly outperforms previous FFL formulas. We also found that semantic features behave poorly in our case, in contrast with some previous readability studies on English as a first language.

4 0.30299816 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

Author: Lemao Liu ; Hailong Cao ; Taro Watanabe ; Tiejun Zhao ; Mo Yu ; Conghui Zhu

Abstract: In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development set, which may lead to an unstable performance for testing. Second, translations become inconsistent at the sentence level since tuning is performed globally on a document level. In this paper, we propose a novel local training method to address these two problems. Unlike a global training method, such as MERT, in which a single weight is learned and used for all the input sentences, we perform training and testing in one step by learning a sentencewise weight for each input sentence. We pro- pose efficient incremental training methods to put the local training into practice. In NIST Chinese-to-English translation tasks, our local training method significantly outperforms MERT with the maximal improvements up to 2.0 BLEU points, meanwhile its efficiency is comparable to that of the global method.

5 0.27738178 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

6 0.26063591 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

7 0.25583288 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

8 0.23259658 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics

9 0.21543235 21 emnlp-2012-Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures

10 0.21541008 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

11 0.21506067 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

12 0.21357642 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

13 0.20780221 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

14 0.19323665 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation

15 0.19315682 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

16 0.19183744 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

17 0.19174349 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

18 0.18656665 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

19 0.17538273 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

20 0.17532326 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.021), (16, 0.036), (34, 0.109), (41, 0.015), (45, 0.013), (60, 0.105), (63, 0.061), (64, 0.025), (65, 0.028), (70, 0.016), (73, 0.017), (74, 0.053), (76, 0.031), (80, 0.019), (82, 0.304), (86, 0.021), (95, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8644672 88 emnlp-2012-Minimal Dependency Length in Realization Ranking

Author: Michael White ; Rajakrishnan Rajkumar

Abstract: Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better match the distributional characteristics of sentence orderings in news text. We find that with a stateof-the-art, comprehensive realization ranking model, dependency length minimization yields statistically significant improvements in BLEU scores and significantly reduces the number of heavy/light ordering errors. Through distributional analyses, we also show that with simpler ranking models, dependency length minimization can go overboard, too often sacrificing canonical word order to shorten dependencies, while richer models manage to better counterbalance the dependency length minimization preference against (sometimes) competing canonical word order preferences.

same-paper 2 0.76961476 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

Author: Billy T. M. Wong ; Chunyu Kit

Abstract: This paper proposes the utilization of lexical cohesion to facilitate evaluation of machine translation at the document level. As a linguistic means to achieve text coherence, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meaning. A comparison between machine and human translation is conducted to illustrate one of their critical distinctions that human translators tend to use more cohesion devices than machine. Various ways to apply this feature to evaluate machinetranslated documents are presented, including one without reliance on reference translation. Experimental results show that incorporating this feature into sentence-level evaluation metrics can enhance their correlation with human judgements.

3 0.50708669 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

Author: Wang Ling ; Joao Graca ; Isabel Trancoso ; Alan Black

Abstract: Phrase-based machine translation models have shown to yield better translations than Word-based models, since phrase pairs encode the contextual information that is needed for a more accurate translation. However, many phrase pairs do not encode any relevant context, which means that the translation event encoded in that phrase pair is led by smaller translation events that are independent from each other, and can be found on smaller phrase pairs, with little or no loss in translation accuracy. In this work, we propose a relative entropy model for translation models, that measures how likely a phrase pair encodes a translation event that is derivable using smaller translation events with similar probabilities. This model is then applied to phrase table pruning. Tests show that considerable amounts of phrase pairs can be excluded, without much impact on the transla- . tion quality. In fact, we show that better translations can be obtained using our pruned models, due to the compression of the search space during decoding.

4 0.49640688 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

Author: Taylor Berg-Kirkpatrick ; David Burkett ; Dan Klein

Abstract: We investigate two aspects of the empirical behavior of paired significance tests for NLP systems. First, when one system appears to outperform another, how does significance level relate in practice to the magnitude of the gain, to the size of the test set, to the similarity of the systems, and so on? Is it true that for each task there is a gain which roughly implies significance? We explore these issues across a range of NLP tasks using both large collections of past systems’ outputs and variants of single systems. Next, once significance levels are computed, how well does the standard i.i.d. notion of significance hold up in practical settings where future distributions are neither independent nor identically distributed, such as across domains? We explore this question using a range of test set variations for constituency parsing.

5 0.49542743 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

Author: Michael J. Paul

Abstract: Recent work has explored the use of hidden Markov models for unsupervised discourse and conversation modeling, where each segment or block of text such as a message in a conversation is associated with a hidden state in a sequence. We extend this approach to allow each block of text to be a mixture of multiple classes. Under our model, the probability of a class in a text block is a log-linear function of the classes in the previous block. We show that this model performs well at predictive tasks on two conversation data sets, improving thread reconstruction accuracy by up to 15 percentage points over a standard HMM. Additionally, we show quantitatively that the induced word clusters correspond to speech acts more closely than baseline models.

6 0.49430999 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

7 0.49428761 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

8 0.49344417 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

9 0.49274641 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques

10 0.49272689 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation

11 0.49259815 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

12 0.49209875 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

13 0.49163595 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

14 0.49041009 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

15 0.49009117 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking

16 0.48950267 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

17 0.48890403 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

18 0.48877326 95 emnlp-2012-N-gram-based Tense Models for Statistical Machine Translation

19 0.48702177 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

20 0.48695135 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?