acl acl2012 acl2012-178 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer
Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
Reference: text
sentIndex sentText sentNum sentScore
1 nl Abstract In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. [sent-5, score-0.199]
2 We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. [sent-6, score-0.188]
3 Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. [sent-8, score-0.108]
4 We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems. [sent-10, score-0.731]
5 Sentence simplification can also serve to preprocess the input 1015 P. [sent-18, score-0.426]
6 nl of other tasks, such as summarization (Knight and Marcu, 2000), parsing, machine translation (Chandrasekar et al. [sent-28, score-0.105]
7 , 1996), semantic role labeling (Vickrey and Koller, 2008) or sentence fusion (Filippova and Strube, 2008). [sent-29, score-0.086]
8 The goal of simplification is to achieve an improvement in readability, defined as the ease with which a text can be understood. [sent-30, score-0.465]
9 Some of the factors that are known to help increase the readability oftext are the vocabulary used, the length of the sentences, the syntactic structures present in the text, and the usage of discourse markers. [sent-31, score-0.132]
10 One effort to create a simple version of English at the vocabulary level has been the creation of Basic English by Charles Kay Ogden. [sent-32, score-0.101]
11 Generally the structure of the sentences in English Simple Wikipedia is less complicated and the sentences are somewhat shorter than those found in English Wikipedia; we offer more detailed statistics below. [sent-37, score-0.122]
12 1 Related work Most earlier work on sentence simplification adopted rule-based approaches. [sent-39, score-0.512]
13 A frequently applied type of rule, aimed to reduce overall sentence length, splits long sentences on the basis of syntactic ProceediJnegjus, o Rfe thpeu 5bl0icth o Afn Knouraela M, 8e-e1t4in Jgul oyf t 2h0e1 A2. [sent-40, score-0.147]
14 There has also been work on lexical substitution for simplification, where the aim is to substitute difficult words with simpler synonyms, derived from WordNet or dictionaries (Inui et al. [sent-45, score-0.092]
15 (2010) examine the use of paired documents in English Wikipedia and Simple Wikipedia for a data-driven approach to the sentence simplifi- cation task. [sent-48, score-0.127]
16 They propose a probabilistic, syntaxbased machine translation approach to the problem and compare against a baseline of no simplification and a phrase-based machine translation approach. [sent-49, score-0.681]
17 In a similar vein, Coster and Kauchak (201 1) use a parallel corpus of paired documents from Simple Wikipedia and Wikipedia to train a phrase-based machine translation model coupled with a deletion model. [sent-50, score-0.186]
18 Another useful resource is the edit history of Simple Wikipedia, from which simplifications can be learned (Yatskar et al. [sent-51, score-0.131]
19 They select the most appropriate simplification by using integer linear programming. [sent-54, score-0.426]
20 (2010) and Coster and Kauchak (201 1) in proposing that sentence simplification can be approached as a monolingual machine translation task, where the source and target languages are the same and where the output should be simpler in form from the input but similar in meaning. [sent-56, score-0.779]
21 1 shows the average sentence length and the average 1http : / / s imple . [sent-61, score-0.086]
22 Statistical machine translation (SMT) has already been successfully applied to the related task of paraphrasing (Quirk et al. [sent-74, score-0.161]
23 These corpora need to be aligned at the sentence level. [sent-80, score-0.086]
24 Phrase-Based Machine Translation (PBMT) is a form of SMT where the translation model aims to translate longer sequences of words (“phrases”) in one go, solving part of the word ordering problem along the way that would be left to the target language model in a word-based SMT system. [sent-82, score-0.105]
25 The PBMT model makes use of a translation model, derived from the parallel corpus, and a language model, derived from a monolingual corpus in the target language. [sent-85, score-0.208]
26 For any given input sentence, a search is carried out producing an n-best list of candidate translations, ranked by the decoder score, a complex scoring function including likelihood scores from the translation model, and the target language model. [sent-87, score-0.208]
27 In principle, all of this should be transportable to a data-driven machine translation account of sentence simplification, provided that a parallel corpus is available that pairs text to simplified versions of that text. [sent-88, score-0.299]
28 2 This study In this work we aim to investigate the use of phrasebased machine translation modified with a dissimilarity component for the task of sentence simplification. [sent-90, score-0.269]
29 (2010) have demonstrated that their approach outperforms a PBMT approach in terms of Flesch Reading Ease test scores, we are not aware ofany studies that evaluate PBMT for sen- tence simplification with human judgements. [sent-92, score-0.426]
30 1 Word-Substitution Baseline The word substitution baseline replaces words in the source sentence with (near-)synonyms that are more likely according to a language model. [sent-99, score-0.188]
31 For each noun, adjective and verb in the sentence this model takes that word and its part-of-speech tag and retrieves from WordNet all synonyms from all synsets the word occurs in. [sent-100, score-0.086]
32 (2010) learn a sentence simplification model which is able to perform four rewrite operations on the parse trees of the input sentences, namely substitution, reordering, splitting, and deletion. [sent-113, score-0.553]
33 Their model is inspired by syntax-based SMT (Yamada and Knight, 2001) and consists of a language model, a translation model and a decoder. [sent-114, score-0.105]
34 The four mentioned simplification operations together form the translation model. [sent-115, score-0.572]
35 TF*IDF at the sentence level was used to align the sentences in the different articles (Nelken and Shieber, 2006). [sent-119, score-0.147]
36 (2010) evaluate their system using BLEU and NIST scores, as well as various readability scores that only take into account the output sentence, such as the Flesch Reading Ease test and n-gram language model perplexity. [sent-121, score-0.306]
37 Although their system outperforms several baselines at the level of these readability metrics, they do not achieve better when evaluated with BLEU or NIST. [sent-122, score-0.173]
38 Their model is trained on two different datasets: one containing alignments between Wikipedia and English Simple Wikipedia (AlignILP), and one containing alignments between edits in the revision history of Simple Wikipedia (RevILP). [sent-128, score-0.144]
39 (2010)’s system and is not scored significantly differently from English Simple Wikipedia. [sent-133, score-0.108]
40 Figure 1: Levenshtein distance and Flesch-Kincaid score of output when varying the n of the n-best output of Moses. [sent-135, score-0.195]
41 Then we invoke the GIZA++ aligner using the training simplification pairs. [sent-144, score-0.426]
42 Finally, we use the Moses decoder to generate simplifications for the sentences in the test set. [sent-147, score-0.163]
43 For each sentence we let the system generate the ten best distinct solutions (or less, if fewer than ten solutions are generated) as ranked by Moses. [sent-148, score-0.225]
44 Arguably, dissimilarity is a key factor in simplification (and in paraphrasing in general). [sent-149, score-0.56]
45 As output we would like to be able to select fluent sentences that adequately convey the meaning of the original input, yet that contain differences that operationalize the intended simplification. [sent-150, score-0.165]
46 To expand the functionality of Moses in the intended direction we perform post-hoc re-ranking on the output based on dissimilarity to the input. [sent-152, score-0.142]
47 Figure 1 displays Levenshtein Distance and FleschKincaid grade level scores for different values of n. [sent-157, score-0.179]
48 The readability score stays more or less the same, indicating no relation between n and readability. [sent-159, score-0.164]
49 The average edit distance starts out at just above 2 when selecting the 1-best output string, and increases roughly until n = 10. [sent-160, score-0.162]
50 5 Descriptive statistics Table 2 displays the average edit distance and the percentage of cases in which no edits were performed for each of the systems and for Simple Wikipedia. [sent-162, score-0.217]
51 About half of the original tokens in the source sentence do not return in the output. [sent-169, score-0.126]
52 Of the three simplification systems, the Zhu system (7. [sent-170, score-0.467]
53 18) attain similar edit distances, less substantial than the edits in Simple Wikipedia, but still consid4http : / /http : / / search . [sent-172, score-0.141]
54 pm 1019 erable compared to the baseline word-substitution system (4. [sent-176, score-0.086]
55 On the other hand, we observe some differences in the percentage of cases in which the systems decide to produce a sentence identical to the input. [sent-183, score-0.086]
56 This test set consists of 100 sentences from articles on English Wikipedia, paired with sentences from corresponding articles in English Simple Wikipedia. [sent-193, score-0.163]
57 We selected only those sentences where every system would perform minimally one edit, because we only want to compare the different systems when they actually generate altered, assumedly simplified output. [sent-194, score-0.17]
58 From this subset we randomly pick 20 source sentences, resulting in 20 clusters of one source sentence and 5 simplified sentences, as generated by humans (Simple Wikipedia) and the four systems. [sent-195, score-0.154]
59 3 Procedure The participants were told that they participated in the evaluation of a system that could simplify sentences, and that they would see one source sentence and five automatically simplified versions of that sentence. [sent-197, score-0.316]
60 Following earlier evaluation studies (Doddington, 2002; Woodsend and Lapata, 2011), we asked participants to evaluate Simplicity, Fluency and Adequacy of the target headlines on a five point Likert scale. [sent-199, score-0.078]
61 Fluency was defined in the instructions as the extent to which a sentence is proper, grammatical English. [sent-200, score-0.086]
62 Adequacy was defined as the extent to which the sentence has the same meaning as the source sentence. [sent-201, score-0.086]
63 Simplicity was defined as the extent to which the sentence was simpler than the original and thus easier to understand. [sent-202, score-0.161]
64 The order in which the clusters had to be judged was randomized and the order of the output of the various systems was randomized as well. [sent-203, score-0.126]
65 In terms of the Flesch-Kincaid grade level score, where lower scores are better, the Zhu system scores best, with 7. [sent-206, score-0.248]
66 With regard to the BLEU score, where Simple Wikipedia is the reference, the PBMT-R system scores highest with 0. [sent-213, score-0.11]
67 The word substitution baseline scores lowest with a BLEU score of 0. [sent-217, score-0.203]
68 E34 U3128 Table 3: Flesch-Kincaid grade level and BLEU scores 1020 4. [sent-221, score-0.138]
69 2 Human judgements To test for significance we ran repeated measures analyses of variance with system (Simple Wikipedia, PBMT-R, Zhu, RevILP, wordsubstitution baseline) as the independent variable, and the three individual metrics as well as their combined mean as the dependent variables. [sent-222, score-0.172]
70 Mauchlys test for sphericity was used to test for homogeneity of variance, and when this test was significant we applied a Greenhouse-Geisser correction on the degrees of freedom (for the purpose of readability we report the normal degrees of freedom in these cases). [sent-223, score-0.167]
71 Planned pairwise comparisons were made with the Bonferroni method. [sent-224, score-0.086]
72 We find that participants rated the Fluency of the simplified sentences from the four systems and Simple Wikipedia differently, F(4, 180) = 178. [sent-227, score-0.253]
73 All other pairwise comparisons are significant at p < . [sent-235, score-0.121]
74 Participants also rated the systems significantly differently on the Adequacy scale, F(4, 180) = 116. [sent-240, score-0.113]
75 Simple Wikipedia and the Zhu system do not differ significantly, and all other pairwise comparisons are significant at p < . [sent-250, score-0.162]
76 Key to the task of simplification are the human judgements of Simplicity. [sent-253, score-0.478]
77 Participants rated the Simplicity of the output from the four systems and Simple Wikipedia differently, F(4, 180) = 74. [sent-254, score-0.11]
78 05 and scores as assigned by humans and the automatic marked ** are significant at p metrics. [sent-279, score-0.104]
79 01 tems, which do not score significantly differently from each other. [sent-281, score-0.099]
80 All other pairwise comparisons are significant at p < . [sent-282, score-0.121]
81 We find that participants rated the systems significantly differently overall, F(4, 180) = 98. [sent-291, score-0.191]
82 All pairwise comparisons were statistically significant (p < . [sent-295, score-0.121]
83 3 Correlations Table 5 displays the correlations between the scores assigned by humans (Fluency, Adequacy and Simplicity) and the automatic metrics (Flesch-Kincaid and BLEU). [sent-298, score-0.188]
84 There is a negative significant correlation between Flesch-Kincaid scores and Simplicity (-0. [sent-302, score-0.142]
85 5 Discussion We conclude that a phrase-based machine trans- lation system with added dissimilarity-based reranking of the best ten output sentences can successfully be used to perform sentence simplification. [sent-309, score-0.301]
86 From the relatively low average numbers of edits made by our system we can conclude that our system performs relatively small numbers of changes to the input, that still constitute as sensible simplifications. [sent-312, score-0.16]
87 The output of all systems, the original and the simplified version of an example sentence from the PWKP dataset is displayed in Table 6. [sent-315, score-0.258]
88 The Simple Wikipedia sentences illustrate that significant portions of the original sentences may be dropped, and parts of the semantics of the original sentence discarded. [sent-316, score-0.323]
89 We also see the Zhu and RevILP systems resorting to splitting the original sentence in two, leading to better Flesch-Kincaid scores. [sent-317, score-0.126]
90 The wordsubstitution baseline changes ‘receive’ in ‘have’, while the PBMT-R system changes the same ‘receive’ in ’get’, ‘slightly’ to ‘a little bit’, and ‘maximum’ to ‘highest’ . [sent-318, score-0.125]
91 In terms of automatic measures we see that the Zhu system scores particularly well on the FleschKincaid metric, while the RevILP system and our PBMT-R system achieve the highest BLEU scores. [sent-319, score-0.192]
92 We believe that for the evaluation of sentence simplification, BLEU is a more appropriate metric than Flesch-Kincaid or a similar readability metric, although it should be noted that BLEU was found only to correlate significantly with Fluency, not with Adequacy. [sent-320, score-0.218]
93 While BLEU and NIST may be used with this in mind, readability metrics should be avoided altogether in our view. [sent-321, score-0.172]
94 1022 Arguably, readability metrics are best suited to be applied to texts that can be considered grammatical and meaningful, which is not necessarily true for the output of simplification algorithms. [sent-323, score-0.662]
95 In the future we would like to investigate how we can boost the number of edits the system performs, while still producing grammatical and meaningpreserving output. [sent-325, score-0.119]
96 Practical simplification of English newspaper text to assist aphasic readers. [sent-347, score-0.426]
97 The hiero machine translation system: extensions, evaluation, and analysis. [sent-366, score-0.105]
98 Automatic sentence simplification 1023 for subtitling in dutch and english. [sent-382, score-0.512]
99 Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. [sent-386, score-0.105]
100 Learning to simplify sentences with quasi-synchronous grammar and integer programming. [sent-480, score-0.104]
wordName wordTfidf (topN-words)
[('simplification', 0.426), ('revilp', 0.352), ('wikipedia', 0.342), ('zhu', 0.25), ('fluency', 0.156), ('pbmt', 0.156), ('adequacy', 0.144), ('pwkp', 0.137), ('readability', 0.132), ('woodsend', 0.125), ('translation', 0.105), ('simple', 0.101), ('bleu', 0.1), ('tilburg', 0.098), ('sentence', 0.086), ('simplicity', 0.08), ('edits', 0.078), ('canning', 0.078), ('chandrasekar', 0.078), ('coster', 0.078), ('dissimilarity', 0.078), ('participants', 0.078), ('moses', 0.071), ('levenshtein', 0.07), ('scores', 0.069), ('grade', 0.069), ('simplifications', 0.068), ('simplified', 0.068), ('differently', 0.067), ('output', 0.064), ('edit', 0.063), ('monolingual', 0.063), ('daelemans', 0.062), ('sentences', 0.061), ('fleschkincaid', 0.059), ('vickrey', 0.059), ('wubben', 0.059), ('yvonne', 0.059), ('substitution', 0.057), ('paraphrasing', 0.056), ('judgements', 0.052), ('madnani', 0.051), ('netherlands', 0.051), ('lapata', 0.05), ('chris', 0.049), ('ten', 0.049), ('carroll', 0.048), ('kauchak', 0.047), ('stroudsburg', 0.047), ('rated', 0.046), ('comparisons', 0.045), ('baseline', 0.045), ('simplify', 0.043), ('english', 0.042), ('system', 0.041), ('displays', 0.041), ('paired', 0.041), ('pairwise', 0.041), ('operations', 0.041), ('original', 0.04), ('metrics', 0.04), ('parallel', 0.04), ('reading', 0.039), ('emiel', 0.039), ('fathom', 0.039), ('flesch', 0.039), ('krahmer', 0.039), ('nelken', 0.039), ('nijmegen', 0.039), ('sander', 0.039), ('siobhan', 0.039), ('uvt', 0.039), ('wordsubstitution', 0.039), ('yatskar', 0.039), ('zhemin', 0.039), ('ease', 0.039), ('correlation', 0.038), ('correlations', 0.038), ('inui', 0.038), ('wordnet', 0.036), ('significant', 0.035), ('distance', 0.035), ('pages', 0.035), ('henceforth', 0.035), ('simplifying', 0.035), ('simpler', 0.035), ('decoder', 0.034), ('pa', 0.034), ('lingua', 0.034), ('guido', 0.034), ('cpan', 0.034), ('antal', 0.034), ('devlin', 0.034), ('alignments', 0.033), ('score', 0.032), ('smt', 0.032), ('minnen', 0.031), ('randomized', 0.031), ('quasisynchronous', 0.031), ('filippova', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer
Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
2 0.17284854 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
Author: Hong Sun ; Ming Zhou
Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.
3 0.13627987 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
4 0.10913607 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
5 0.10893459 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu
Abstract: unkown-abstract
6 0.10515023 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
7 0.10471155 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
8 0.10379709 134 acl-2012-Learning to Find Translations and Transliterations on the Web
9 0.10001711 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation
10 0.099774867 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
11 0.09809012 140 acl-2012-Machine Translation without Words through Substring Alignment
12 0.097355567 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
13 0.096906431 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
14 0.096080281 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
15 0.092117287 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
16 0.089040615 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages
17 0.086439595 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
18 0.085553445 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
19 0.08418826 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
20 0.083383858 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
topicId topicWeight
[(0, -0.239), (1, -0.129), (2, 0.069), (3, 0.067), (4, 0.09), (5, 0.015), (6, -0.039), (7, 0.026), (8, -0.049), (9, 0.008), (10, -0.001), (11, 0.022), (12, 0.013), (13, 0.101), (14, 0.018), (15, 0.001), (16, 0.074), (17, -0.085), (18, -0.069), (19, 0.065), (20, 0.035), (21, -0.069), (22, 0.085), (23, -0.044), (24, 0.043), (25, 0.089), (26, 0.003), (27, 0.029), (28, -0.009), (29, 0.072), (30, 0.027), (31, 0.018), (32, -0.017), (33, 0.084), (34, -0.015), (35, -0.069), (36, -0.09), (37, -0.032), (38, 0.081), (39, -0.001), (40, -0.063), (41, 0.009), (42, -0.037), (43, 0.112), (44, 0.011), (45, 0.037), (46, 0.092), (47, -0.031), (48, -0.049), (49, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.91960579 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer
Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
2 0.65829843 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
Author: Hong Sun ; Ming Zhou
Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.
3 0.62836224 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
Author: Malte Nuhn ; Arne Mauser ; Hermann Ney
Abstract: In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report results using data from the monolingual French and English GIGAWORD corpora.
4 0.62483066 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
Author: Boxing Chen ; Roland Kuhn ; Samuel Larkin
Abstract: Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves 1 consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT-tuned output 45.3% of the time (vs. 32.7% BLEU tuning preferences and 22.0% ties). 1
5 0.61944991 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
Author: Chang Liu ; Hwee Tou Ng
Abstract: In this work, we introduce the TESLACELAB metric (Translation Evaluation of Sentences with Linear-programming-based Analysis Character-level Evaluation for Languages with Ambiguous word Boundaries) for automatic machine translation evaluation. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. By reformulating the problem in the linear programming framework, TESLACELAB addresses several drawbacks of the character-level metrics, in particular the modeling of synonyms spanning multiple characters. We show empirically that TESLACELAB significantly outperforms characterlevel BLEU in the English-Chinese translation evaluation tasks. –
6 0.6122725 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System
7 0.60674709 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
8 0.59750074 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
9 0.57212824 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
11 0.54470396 195 acl-2012-The Creation of a Corpus of English Metalanguage
12 0.53752553 136 acl-2012-Learning to Translate with Multiple Objectives
13 0.52778435 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.52744895 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
15 0.52598441 163 acl-2012-Prediction of Learning Curves in Machine Translation
16 0.50131124 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia
17 0.49937016 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
18 0.49907893 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum
19 0.49750495 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
20 0.49083203 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
topicId topicWeight
[(25, 0.031), (26, 0.053), (28, 0.06), (30, 0.039), (37, 0.027), (39, 0.046), (48, 0.028), (57, 0.034), (69, 0.239), (74, 0.039), (81, 0.011), (82, 0.017), (84, 0.022), (85, 0.042), (90, 0.124), (92, 0.041), (94, 0.026), (99, 0.05)]
simIndex simValue paperId paperTitle
1 0.90295041 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
Author: Lun-Wei Ku ; Cheng-Wei Sun ; Ya-Hsin Hsueh
Abstract: We present IlluMe, a software tool pack which creates a personalized ambient using the music and lighting. IlluMe includes an emotion analysis software, the small space ambient lighting, and a multimedia controller. The software analyzes emotional changes from instant message logs and corresponds the detected emotion to the best sound and light settings. The ambient lighting can sparkle with different forms of light and the smart phone can broadcast music respectively according to different atmosphere. All settings can be modified by the multimedia controller at any time and the new settings will be feedback to the emotion analysis software. The IlluMe system, equipped with the learning function, provides a link between residential situation and personal emotion. It works in a Chinese chatting environment to illustrate the language technology in life.
Author: Yukino Baba ; Hisami Suzuki
Abstract: This paper presents a comparative study of spelling errors that are corrected as you type, vs. those that remain uncorrected. First, we generate naturally occurring online error correction data by logging users’ keystrokes, and by automatically deriving pre- and postcorrection strings from them. We then perform an analysis of this data against the errors that remain in the final text as well as across languages. Our analysis shows a clear distinction between the types of errors that are generated and those that remain uncorrected, as well as across languages.
same-paper 3 0.72451651 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer
Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.
4 0.58122098 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
Author: Arianna Bisazza ; Marcello Federico
Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.
5 0.58096385 83 acl-2012-Error Mining on Dependency Trees
Author: Claire Gardent ; Shashi Narayan
Abstract: In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data.
7 0.57367837 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
8 0.5736683 140 acl-2012-Machine Translation without Words through Substring Alignment
9 0.57140064 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
10 0.57118791 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
11 0.57072407 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
12 0.57016045 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules
13 0.56947517 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.56900764 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
15 0.56885749 136 acl-2012-Learning to Translate with Multiple Objectives
16 0.56748712 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
17 0.56657195 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
18 0.56649143 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
19 0.56643909 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification
20 0.56437939 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons