acl acl2013 acl2013-13 knowledge-graph by maker-knowledge-mining

13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation


Source: pdf

Author: Melania Duma ; Cristina Vertan ; Wolfgang Menzel

Abstract: Machine translation (MT) evaluation aims at measuring the quality of a candidate translation by comparing it with a reference translation. This comparison can be performed on multiple levels: lexical, syntactic or semantic. In this paper, we propose a new syntactic metric for MT evaluation based on the comparison of the dependency structures of the reference and the candidate translations. The dependency structures are obtained by means of a Weighted Constraints Dependency Grammar parser. Based on experiments performed on English to German translations, we show that the new metric correlates well with human judgments at the system level. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Machine translation (MT) evaluation aims at measuring the quality of a candidate translation by comparing it with a reference translation. [sent-7, score-0.601]

2 In this paper, we propose a new syntactic metric for MT evaluation based on the comparison of the dependency structures of the reference and the candidate translations. [sent-9, score-0.66]

3 The dependency structures are obtained by means of a Weighted Constraints Dependency Grammar parser. [sent-10, score-0.202]

4 Based on experiments performed on English to German translations, we show that the new metric correlates well with human judgments at the system level. [sent-11, score-0.274]

5 1 Introduction Research in automatic machine translation (MT) evaluation has the goal of developing a set of computer-based methods that measure accurately the correctness of the output generated by a MT system. [sent-12, score-0.227]

6 However, this task is a difficult one mainly because there is no unique reference output that can be used in the comparison with the candidate translation. [sent-13, score-0.229]

7 Thus, it is difficult to decide if the deviation from an existing reference translation is a matter of style (the use of synonymous words, different syntax etc. [sent-15, score-0.299]

8 Most of the automatic evaluation metrics developed so far are focused on the idea of lexical matching between the tokens of one or more reference translations and the tokens of a candidate translation. [sent-17, score-0.445]

9 However, structural similarity between a reference translation and a candidate one cannot be captured by lexical features. [sent-18, score-0.448]

10 Therefore, research in MT evaluation experiences a gradual shift of focus from lexical metrics to structural ones, whether they are syntactic or semantic or a combination of both. [sent-19, score-0.206]

11 This paper introduces a new syntactic automatic MT evaluation method. [sent-20, score-0.096]

12 At this stage of research the new metric is evaluating translations from any source language into German. [sent-21, score-0.237]

13 The chosen tool for providing syntactic information for German is the Weighted Constraints Dependency Grammar (WCDG) parser (Menzel and Schröder, 1998), which is preferred over other parsers because of its robustness to ungrammatical input, as it is typical for MT output. [sent-23, score-0.274]

14 In Section 2 the state of the art in MT evaluation is presented, while in Section 3 the new syntactic metric is described. [sent-25, score-0.272]

15 Using an automatic method of evaluation a score is computed, based on the similarity between the output of the MT system and the reference. [sent-29, score-0.113]

16 This similarity can be computed at different levels: lexical, syntactic or semantic. [sent-30, score-0.118]

17 At the lexical level, the metrics developed so far can be divided into two major categories: n-gram based and edit distance based. [sent-31, score-0.181]

18 1 We will use the term reference for the reference translation and the term translation for the candidate translation. [sent-32, score-0.675]

19 Lexical metrics that use the edit distance are constructed using the Levenshtein distance applied at the word level. [sent-38, score-0.181]

20 , 2000) is the one which is used more frequently; it calculates the minimal number of insertion, substitutions and deletions needed to transform the candidate translation into a reference. [sent-40, score-0.224]

21 Thus, they reward a low score to an otherwise fluent and syntactically correct candidate translation, if it does not share a certain number of words with the set of references. [sent-42, score-0.107]

22 Another disadvantage is that many of them cannot be applied at the segment level, which is often needed in order to better assess the quality of MT output and to determine which improvements should be made to the MT system. [sent-45, score-0.128]

23 Because of these disadvantages there is an increasing need for other approaches to MT evaluation that go beyond the lexical level of the phrases compared. [sent-46, score-0.115]

24 In Liu and Gildea (2005), three syntactic evaluation metrics are presented. [sent-47, score-0.172]

25 The first of these metrics, the Subtree Metric (SMT), is based on determining the number of subtrees that can be found in both the candidate translation and the reference phrase structure trees. [sent-48, score-0.475]

26 The third metric proposed computes the number of matching n-grams between the headword chains of the reference and the candidate translation dependency trees obtained using the parser described in (Collins, 1999). [sent-50, score-0.948]

27 The idea of syntactic similarity is further exploited in Owczarzak et al. [sent-51, score-0.089]

28 The similarity between the translation and the reference is computed using the precision and the recall of the dependencies that illustrate the pair of sentences. [sent-53, score-0.366]

29 Furthermore, paraphrases are used in order to improve the correlation with human judgments. [sent-54, score-0.182]

30 Another set of syntactic metrics has been introduced in Gimenez (2008); some of them are based on analyzing different types of linguistic information (i. [sent-55, score-0.127]

31 3 A new syntactic automatic metric In this section we introduce the new syntactic metric which is based on constraint dependency parsing. [sent-58, score-0.687]

32 In the first subsection, the WCDG parser is presented, together with the advantages of using this parser over the other ones available, while the second subsection provides a detailed description of the new metric. [sent-59, score-0.31]

33 1 Weighted Constraint Grammar Parser Dependency Our research was performed using a dependency parser. [sent-61, score-0.159]

34 We decided on this type of parser because, as opposed to constituent parsers, it offers the possibility of better representing nonprojective structures. [sent-62, score-0.156]

35 The goal of constraint dependency grammars (CDG) is to create dependency structures that represent a given phrase (Schröder et al. [sent-64, score-0.422]

36 A relation between two words in a sentence is represented using an edge, which connects the regent and the dependent. [sent-66, score-0.083]

37 One property, for example, that is always enforced is that no word can have more than one regent on any level at a time. [sent-69, score-0.119]

38 During the analysis, each of the constraints is applied to every edge or every pair of edges belonging to the constructed dependency parse tree. [sent-70, score-0.308]

39 The main advantage of using constraint dependency grammars over dependency grammars based on generative rules is that they can deal better with free word order languages (Foth, 2004). [sent-71, score-0.485]

40 Every constraint in WCDG is assigned a score which is a number between 0. [sent-73, score-0.104]

41 0, 131 while the general score of a parse is calculated as the product of all the scores of all the instances of constraints that have not been satisfied. [sent-75, score-0.151]

42 Rules that have a score of 0 are called hard rules, meaning that they cannot be ignored, which is the case of the one regent only rule mentioned earlier. [sent-76, score-0.113]

43 The advantage of using graded constraints, as opposed to crisp ones, stems from the fact that weights allow the parser to tolerate constraint violations, which, in turn, makes the parser robust against ungrammaticality. [sent-77, score-0.378]

44 The parser was evaluated using different types of texts, and the results show that it has an accuracy between 80% and 90% in computing correct dependency attachments depending on the type of text (Foth et al. [sent-78, score-0.279]

45 The benefit of using WCDG over other parsers is that it provides further information on a parse, like the general score of the parse and the constraints that are violated by the final result. [sent-80, score-0.25]

46 Moreover, because of the fact that the candidate translations are sometimes not well-formed, parsing them represents a challenge. [sent-82, score-0.176]

47 However, WCDG will always provide a final result, in the form of a dependency structure, even though it might have a low score due to the violated constraints. [sent-83, score-0.227]

48 2 Description of the metric In order to define a new syntactic metric for MT evaluation, we have incorporated the WCDG parser in the process of evaluation. [sent-85, score-0.556]

49 Because the output of the WCDG parser is a dependency tree, we have looked into techniques of measuring how similar two trees are. [sent-86, score-0.386]

50 Our aim was to determine whether a tree similarity metric applied on the two dependency parse trees would prove to be an efficient way of capturing the similarity between the reference and the translation. [sent-87, score-0.746]

51 Let us consider this example, in which the reference sentence is “Die schwarze Katze springt schnell auf den roten Stuhl. [sent-88, score-0.493]

52 The black cat jumps quickly on the red chair) and the candidate translation is “Auf den roten Stuhl schnell springt die schwarze Katze ”(engl. [sent-90, score-0.664]

53 Even though the word order of the two segments is quite different, and the translation has an incorrect syntax, they roughly have the same meaning. [sent-92, score-0.236]

54 We present in Figure 1 the dependency parse trees obtained using WCDG for the sentences considered. [sent-93, score-0.342]

55 We can observe that the general structure of the translation is similar to that of the reference, the only difference being the reverse order between the left subtree and the right subtree. [sent-94, score-0.244]

56 The tree similarity measure that we chose to use was the All Common Embedded Subtrees (ACET) (Lin et al. [sent-95, score-0.081]

57 Given a tree T, an embedded subtree is obtained by removing one or more nodes, except for the root, from the tree T. [sent-97, score-0.297]

58 Therefore, ACET is defined as the number of common embedded subtrees shared between two trees. [sent-99, score-0.203]

59 (2008) show that ACET outperforms tree edit distance (Zhang and Shasha, 1989) in terms of efficiency. [sent-101, score-0.114]

60 Example of dependency parse trees for reference and candidate translations In our experiments, we have applied the ACET algorithm, and computed the number of common embedded subtrees between the dependency parse trees of the hypothesis and the reference. [sent-103, score-1.12]

61 Because of the additional information provided by the parsing, pre-processing of the output of the WCDG parser was necessary in order to transform the dependency tree into a general tree. [sent-104, score-0.355]

62 In the following, we will refer to the new proposed metric using CESM (Common Embedded Subtree Metric). [sent-106, score-0.176]

63 CESM was computed using the precision, the recall and the F-measure of the common embedded subtrees of the reference and the translation: 132 where treeref and treehyp represent the preprocessed dependency trees of the reference and the hypothesis translations. [sent-107, score-0.769]

64 4 Experimental setup and evaluation In order to determine how accurate CESM is in capturing the similarity between references and translations, we evaluated it at the system level and at the segment level. [sent-108, score-0.215]

65 The evaluation was conducted using data provided by the NAACL 2012 WMT workshop (Callison-Burch et al. [sent-109, score-0.092]

66 As a result, 500 segments with a length between 50 and 80 characters were extracted from the German reference file. [sent-114, score-0.208]

67 In the next step, we arbitrarily selected the outputs of 7 of the 15 systems that were submitted for evaluation in the English to German translation task: DFKI (Vilar, 2012), JHU (Ganitkevitch et al. [sent-115, score-0.192]

68 After this initial step of filtering the data, the 7 systems were evaluated by calculating the CESM score for every pair of reference and translation segments corresponding to a system. [sent-118, score-0.385]

69 The minimum value of ρ is -1, when there is no correlation between the two rankings, while the maximum value is 1, when the two rankings correlate perfectly (Callison-Burch et al. [sent-121, score-0.14]

70 The ρ rank correlation coefficient was calculated as being ρ = 0. [sent-125, score-0.187]

71 92, which shows there is a strong correlation between the results of CESM and the human judgments. [sent-126, score-0.108]

72 In order to better assess the quality of CESM, the test set was also evaluated using NIST (Doddington, 2002), which managed to obtain the same rank correlation coefficient of ρ = 0. [sent-127, score-0.252]

73 The first step in evaluating at the segment level was filtering the initial test set provided by the WMT12 workshop. [sent-129, score-0.099]

74 For this purpose, 2500 reference and translation segments were selected with a length between 50 and 80 characters. [sent-130, score-0.355]

75 The Kendall tau rank correlation coefficient was calculated in order to measure the correlation with human judgments, where Kendall tau (Callison-Burch et al. [sent-131, score-0.46]

76 , 2012), we penalized ties given by CESM and ignored ties assigned by the human judgments. [sent-134, score-0.089]

77 As a term of comparison, the highest correlation for segment level reported in Callinson-Burch et al. [sent-137, score-0.207]

78 The rather low correlation result we obtained can be partially explained by the fact that only one judgment of a pair of reference and translation was taken into account. [sent-143, score-0.45]

79 It will be 133 interesting to see how the averaging of the ranks of a translation influences the correlation coefficient. [sent-144, score-0.288]

80 5 Conclusions and future work In this paper, a new evaluation metric for MT was introduced, which is based on the comparison of dependency parse trees. [sent-145, score-0.446]

81 The dependency trees were obtained using the WCDG German parser. [sent-146, score-0.276]

82 The reason why we chose this parser was that, due to its architecture, it is able to handle ungrammatical and ambiguous input data. [sent-147, score-0.162]

83 The experiments conducted so far show that using the data made available at the NAACL 2012 WMT workshop, CESM correlates well with the human judgments at the system level. [sent-148, score-0.098]

84 One of the future experiments that we intend to perform is to assess metric quality on the entire evaluation set. [sent-149, score-0.284]

85 Furthermore, the WMT12 workshop offers different ranking possibilities, like the ones presented in Bojar et al (201 1) and in Lopez (2012). [sent-151, score-0.078]

86 It will be determined how much are the segment level evaluation results influenced by these ranking orders. [sent-152, score-0.144]

87 One limitation of the proposed metric is that, at the moment it is restricted to translations from any source language to German as a target language. [sent-153, score-0.237]

88 Because of this reason, we plan to extend the metric to other languages and see how well it performs in different settings. [sent-154, score-0.176]

89 In further experiments we also intend to test CESM using statistical based dependency parsers, like the Malt Parser (Nivre et al. [sent-155, score-0.22]

90 , 2006), in order to decide whether the choice of parser influences the performance of the metric. [sent-157, score-0.186]

91 Another approach that we will explore for improving CESM is to compare dependency parse trees using the base form and the part-ofspeech of the tokens, instead of the exact lexical match. [sent-158, score-0.333]

92 The accuracy of CESM can be further increased by the use of paraphrases, which can be obtained by using a German thesaurus or a lexical resource like GermaNet (Hamp and Feldweg, 1997). [sent-160, score-0.077]

93 The results reported show that the use of this kind of paraphrases in order to produce new references has increased the BLEU score, therefore this is an approach that will be further investigated. [sent-162, score-0.074]

94 A broadcoverage parser for German based on defeasible constraints. [sent-211, score-0.12]

95 KONVENS 2004, Beiträge zur 7, Konferenz zur Verarbeitung natürlicher Sprache, Wien. [sent-212, score-0.096]

96 Putting human assessments of machine translation systems in order. [sent-297, score-0.182]

97 The karlsruhe institute of technology translation systems for the WMT 2012. [sent-320, score-0.147]

98 Bleu: a method for automatic evaluation of machine translation. [sent-363, score-0.08]

99 Class error rates for evaluation of machine translation output. [sent-370, score-0.227]

100 Simple fast algorithms for the editing distance between trees and related problems. [sent-410, score-0.108]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cesm', 0.44), ('wcdg', 0.346), ('hamburg', 0.222), ('schr', 0.22), ('menzel', 0.189), ('metric', 0.176), ('foth', 0.167), ('dependency', 0.159), ('reference', 0.152), ('translation', 0.147), ('acet', 0.139), ('mt', 0.134), ('parser', 0.12), ('german', 0.109), ('correlation', 0.108), ('embedded', 0.104), ('subtrees', 0.099), ('regent', 0.083), ('wmt', 0.081), ('candidate', 0.077), ('metrics', 0.076), ('trees', 0.074), ('constraint', 0.074), ('seventh', 0.073), ('owczarzak', 0.072), ('grammar', 0.069), ('judgments', 0.068), ('parse', 0.066), ('kendall', 0.066), ('tau', 0.066), ('der', 0.064), ('subtree', 0.064), ('graded', 0.064), ('duma', 0.063), ('fishel', 0.063), ('hamp', 0.063), ('katze', 0.063), ('niessen', 0.063), ('roten', 0.063), ('schnell', 0.063), ('schwarze', 0.063), ('springt', 0.063), ('terrorcat', 0.063), ('segment', 0.063), ('parsers', 0.061), ('translations', 0.061), ('dfki', 0.056), ('germanet', 0.056), ('popovic', 0.056), ('segments', 0.056), ('constraints', 0.055), ('syntactic', 0.051), ('auf', 0.051), ('jumps', 0.048), ('bler', 0.048), ('niehues', 0.048), ('zur', 0.048), ('coefficient', 0.048), ('bleu', 0.048), ('workshop', 0.047), ('evaluation', 0.045), ('chair', 0.044), ('tree', 0.043), ('obtained', 0.043), ('ungrammatical', 0.042), ('paraphrases', 0.041), ('ganitkevitch', 0.04), ('subsection', 0.039), ('parsing', 0.038), ('similarity', 0.038), ('den', 0.038), ('die', 0.038), ('violated', 0.038), ('edit', 0.037), ('constituent', 0.036), ('level', 0.036), ('machine', 0.035), ('monz', 0.035), ('lexical', 0.034), ('distance', 0.034), ('red', 0.033), ('influences', 0.033), ('weighted', 0.033), ('order', 0.033), ('measuring', 0.033), ('rankings', 0.032), ('assess', 0.032), ('rank', 0.031), ('intend', 0.031), ('ones', 0.031), ('daum', 0.031), ('cat', 0.031), ('ties', 0.03), ('score', 0.03), ('statistical', 0.03), ('grammars', 0.03), ('correlates', 0.03), ('ignored', 0.029), ('computed', 0.029), ('edge', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation

Author: Melania Duma ; Cristina Vertan ; Wolfgang Menzel

Abstract: Machine translation (MT) evaluation aims at measuring the quality of a candidate translation by comparing it with a reference translation. This comparison can be performed on multiple levels: lexical, syntactic or semantic. In this paper, we propose a new syntactic metric for MT evaluation based on the comparison of the dependency structures of the reference and the candidate translations. The dependency structures are obtained by means of a Weighted Constraints Dependency Grammar parser. Based on experiments performed on English to German translations, we show that the new metric correlates well with human judgments at the system level. 1

2 0.14332867 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric

Author: Chi-kiu Lo ; Karteek Addanki ; Markus Saers ; Dekai Wu

Abstract: We present the first ever results showing that tuning a machine translation system against a semantic frame based objective function, MEANT, produces more robustly adequate translations than tuning against BLEU or TER as measured across commonly used metrics and human subjective evaluation. Moreover, for informal web forum data, human evaluators preferred MEANT-tuned systems over BLEU- or TER-tuned systems by a significantly wider margin than that for formal newswire—even though automatic semantic parsing might be expected to fare worse on informal language. We argue thatbypreserving the meaning ofthe trans- lations as captured by semantic frames right in the training process, an MT system is constrained to make more accurate choices of both lexical and reordering rules. As a result, MT systems tuned against semantic frame based MT evaluation metrics produce output that is more adequate. Tuning a machine translation system against a semantic frame based objective function is independent ofthe translation model paradigm, so, any translation model can benefit from the semantic knowledge incorporated to improve translation adequacy through our approach.

3 0.11635523 255 acl-2013-Name-aware Machine Translation

Author: Haibo Li ; Jing Zheng ; Heng Ji ; Qi Li ; Wen Wang

Abstract: We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. Additionally, we also propose a new MT metric to appropriately evaluate the translation quality of informative words, by assigning different weights to different words according to their importance values in a document. Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baseline1 .

4 0.11331266 135 acl-2013-English-to-Russian MT evaluation campaign

Author: Pavel Braslavski ; Alexander Beloborodov ; Maxim Khalilov ; Serge Sharoff

Abstract: This paper presents the settings and the results of the ROMIP 2013 MT shared task for the English→Russian language directfioorn. t Teh Een quality Rofu generated utraagnsel datiiroencswas assessed using automatic metrics and human evaluation. We also discuss ways to reduce human evaluation efforts using pairwise sentence comparisons by human judges to simulate sort operations.

5 0.1070985 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao

Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.

6 0.10640141 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data

7 0.10431558 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

8 0.10328433 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

9 0.10022461 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

10 0.099588893 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

11 0.097106718 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

12 0.093697071 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

13 0.09318357 312 acl-2013-Semantic Parsing as Machine Translation

14 0.090333052 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

15 0.089894317 289 acl-2013-QuEst - A translation quality estimation framework

16 0.087692663 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

17 0.087537631 314 acl-2013-Semantic Roles for String to Tree Machine Translation

18 0.08421541 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl

19 0.083263986 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

20 0.08139959 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.222), (1, -0.133), (2, 0.013), (3, 0.022), (4, -0.092), (5, 0.001), (6, 0.043), (7, -0.011), (8, 0.068), (9, -0.043), (10, -0.04), (11, 0.092), (12, -0.062), (13, 0.056), (14, 0.012), (15, 0.057), (16, -0.029), (17, -0.026), (18, -0.025), (19, 0.035), (20, 0.055), (21, -0.009), (22, -0.052), (23, -0.01), (24, -0.099), (25, 0.018), (26, -0.019), (27, 0.077), (28, -0.003), (29, 0.059), (30, -0.014), (31, -0.024), (32, 0.026), (33, -0.005), (34, 0.036), (35, -0.035), (36, -0.101), (37, 0.039), (38, 0.023), (39, -0.026), (40, -0.03), (41, 0.073), (42, -0.034), (43, -0.114), (44, -0.042), (45, -0.048), (46, -0.029), (47, -0.051), (48, -0.058), (49, -0.084)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95284146 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation

Author: Melania Duma ; Cristina Vertan ; Wolfgang Menzel

Abstract: Machine translation (MT) evaluation aims at measuring the quality of a candidate translation by comparing it with a reference translation. This comparison can be performed on multiple levels: lexical, syntactic or semantic. In this paper, we propose a new syntactic metric for MT evaluation based on the comparison of the dependency structures of the reference and the candidate translations. The dependency structures are obtained by means of a Weighted Constraints Dependency Grammar parser. Based on experiments performed on English to German translations, we show that the new metric correlates well with human judgments at the system level. 1

2 0.71800673 135 acl-2013-English-to-Russian MT evaluation campaign

Author: Pavel Braslavski ; Alexander Beloborodov ; Maxim Khalilov ; Serge Sharoff

Abstract: This paper presents the settings and the results of the ROMIP 2013 MT shared task for the English→Russian language directfioorn. t Teh Een quality Rofu generated utraagnsel datiiroencswas assessed using automatic metrics and human evaluation. We also discuss ways to reduce human evaluation efforts using pairwise sentence comparisons by human judges to simulate sort operations.

3 0.70573616 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric

Author: Chi-kiu Lo ; Karteek Addanki ; Markus Saers ; Dekai Wu

Abstract: We present the first ever results showing that tuning a machine translation system against a semantic frame based objective function, MEANT, produces more robustly adequate translations than tuning against BLEU or TER as measured across commonly used metrics and human subjective evaluation. Moreover, for informal web forum data, human evaluators preferred MEANT-tuned systems over BLEU- or TER-tuned systems by a significantly wider margin than that for formal newswire—even though automatic semantic parsing might be expected to fare worse on informal language. We argue thatbypreserving the meaning ofthe trans- lations as captured by semantic frames right in the training process, an MT system is constrained to make more accurate choices of both lexical and reordering rules. As a result, MT systems tuned against semantic frame based MT evaluation metrics produce output that is more adequate. Tuning a machine translation system against a semantic frame based objective function is independent ofthe translation model paradigm, so, any translation model can benefit from the semantic knowledge incorporated to improve translation adequacy through our approach.

4 0.68840778 64 acl-2013-Automatically Predicting Sentence Translation Difficulty

Author: Abhijit Mishra ; Pushpak Bhattacharyya ; Michael Carl

Abstract: In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and adhocism. We, rather, rely on cognitive evidences from eye tracking. TDI is measured as the sum of fixation (gaze) and saccade (rapid eye movement) times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of < L, DP, SC > and TDI pairs for a set of sentences. The primary use of our work is a way of “binning” sentences (to be translated) in “easy”, “medium” and “hard” categories as per their predicted TDI. This can decide pricing of any translation task, especially useful in a scenario where parallel corpora for Machine Translation are built through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.

5 0.66232115 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis

Author: Rudolf Rosa ; David Marecek ; Ales Tamchyna

Abstract: Deepfix is a statistical post-editing system for improving the quality of statistical machine translation outputs. It attempts to correct errors in verb-noun valency using deep syntactic analysis and a simple probabilistic model of valency. On the English-to-Czech translation pair, we show that statistical post-editing of statistical machine translation leads to an improvement of the translation quality when helped by deep linguistic knowledge.

6 0.63928258 312 acl-2013-Semantic Parsing as Machine Translation

7 0.61093718 360 acl-2013-Translating Italian connectives into Italian Sign Language

8 0.61062288 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

9 0.6094647 255 acl-2013-Name-aware Machine Translation

10 0.60127282 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

11 0.59515959 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures

12 0.59389067 289 acl-2013-QuEst - A translation quality estimation framework

13 0.58462566 92 acl-2013-Context-Dependent Multilingual Lexical Lookup for Under-Resourced Languages

14 0.58456385 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

15 0.58358735 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

16 0.58211708 163 acl-2013-From Natural Language Specifications to Program Input Parsers

17 0.57537878 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory

18 0.56543064 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation

19 0.56433058 94 acl-2013-Coordination Structures in Dependency Treebanks

20 0.55496567 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.061), (6, 0.046), (11, 0.046), (14, 0.33), (15, 0.023), (24, 0.03), (26, 0.03), (35, 0.066), (42, 0.056), (48, 0.034), (70, 0.029), (88, 0.015), (90, 0.04), (95, 0.128)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79543018 357 acl-2013-Transfer Learning for Constituency-Based Grammars

Author: Yuan Zhang ; Regina Barzilay ; Amir Globerson

Abstract: In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. To handle this apparent discrepancy, we design a probabilistic model that jointly generates CFG and target formalism parses. The model includes features of both parses, allowing trans- fer between the formalisms, while preserving parsing efficiency. We evaluate our approach on three constituency-based grammars CCG, HPSG, and LFG, augmented with the Penn Treebank-1. Our experiments show that across all three formalisms, the target parsers significantly benefit from the coarse annotations.1 —

same-paper 2 0.74294341 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation

Author: Melania Duma ; Cristina Vertan ; Wolfgang Menzel

Abstract: Machine translation (MT) evaluation aims at measuring the quality of a candidate translation by comparing it with a reference translation. This comparison can be performed on multiple levels: lexical, syntactic or semantic. In this paper, we propose a new syntactic metric for MT evaluation based on the comparison of the dependency structures of the reference and the candidate translations. The dependency structures are obtained by means of a Weighted Constraints Dependency Grammar parser. Based on experiments performed on English to German translations, we show that the new metric correlates well with human judgments at the system level. 1

3 0.70617443 314 acl-2013-Semantic Roles for String to Tree Machine Translation

Author: Marzieh Bazrafshan ; Daniel Gildea

Abstract: We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument structure. Our results demonstrate that the second approach is effective in increasing the quality of translations.

4 0.69634008 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models

Author: Abdellah Fourtassi ; Emmanuel Dupoux

Abstract: Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We show that it enables to predict two behaviorbased measures across a range of parameters in a Latent Semantic Analysis model.

5 0.65720493 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation

Author: Bing Xiang ; Xiaoqiang Luo ; Bowen Zhou

Abstract: Empty categories (EC) are artificial elements in Penn Treebanks motivated by the government-binding (GB) theory to explain certain language phenomena such as pro-drop. ECs are ubiquitous in languages like Chinese, but they are tacitly ignored in most machine translation (MT) work because of their elusive nature. In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features. We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system.

6 0.52281946 303 acl-2013-Robust multilingual statistical morphological generation models

7 0.51877129 267 acl-2013-PARMA: A Predicate Argument Aligner

8 0.50611275 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints

9 0.5038777 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing

10 0.50301242 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

11 0.50182223 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

12 0.49689659 4 acl-2013-A Context Free TAG Variant

13 0.49620771 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

14 0.49122828 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art

15 0.49012697 240 acl-2013-Microblogs as Parallel Corpora

16 0.48817444 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

17 0.48814881 322 acl-2013-Simple, readable sub-sentences

18 0.48756173 386 acl-2013-What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse

19 0.4867121 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration

20 0.4865889 80 acl-2013-Chinese Parsing Exploiting Characters