acl acl2013 acl2013-322 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sigrid Klerke ; Anders Sgaard
Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.
Reference: text
sentIndex sentText sentNum sentScore
1 Simple, readable sub-sentences Sigrid Klerke Centre for Language Technology University of Copenhagen s igridkle rke @ gmai l . [sent-1, score-0.054]
2 com Abstract We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. [sent-2, score-0.412]
3 The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. [sent-3, score-0.465]
4 Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications. [sent-4, score-0.397]
5 1 Introduction As a field of research in NLP, text simplification (TS) has gained increasing attention recently, primarily for English text, but also for Brazilian Portuguese (Specia, 2010; Aluísio et al. [sent-5, score-0.312]
6 TS can help readers with below average reading skills access information and may supply relevant training material, which is crucial for developing reading skills. [sent-10, score-0.09]
7 One of the persistent chalenges of TS is that different interventions are called for depending on the target reader population. [sent-12, score-0.129]
8 dk 2 Approach Definitions of TS typically reflect varying target reader populations and the methods studied. [sent-16, score-0.204]
9 We cast the problem of generating a more readable sentence from an input as a problem of choosing a reasonable sub-sentence from the words present in the original. [sent-22, score-0.105]
10 The corpus-example below illustrates how a simplified sentence can be embedded as scattered parts of a non-simplified sentence. [sent-23, score-0.188]
11 The words in bold are the common parts which make up almost the entire human generated simplification and constitutes a suitable simplification on its own. [sent-24, score-0.59]
12 Original: Der er målt hvad der bliver betegnet som abnormt store mængderaf radioaktivt materiale i havvand nær et jordskælvsramte d atomkraftværk i Japan . [sent-25, score-1.039]
13 What has been termed an abnormally large amount of radioactivity has been measured in sea water near the nuclear power plant that was hit by earthquakes in Japan Simplified: Der er målt en stor mængderadioaktivt materiale i havet nær tom-kraftværket Fukushima i Japan . [sent-26, score-0.296]
14 a A large amount of radioactivity has been measured in the sea near the nuclear power plant Fukushima in Japan To generate candidate sub-sentences we use a random deletion procedure in combination with 142 Sofia, BuPrlgoacreiead, iAngusgu osft 4h-e9 A 2C01L3 S. [sent-27, score-0.177]
15 c d2en0t1 3Re Ases aorc hiat Wio nrk fsohro Cp,om papguesta 1ti4o2n–a1l4 L9in,guistics general dependency-based heuristics for conserving main sentence constituents, and then introduce a loss-function for choosing between candidates. [sent-29, score-0.094]
16 Since we avoid relying on a specialized parallel corpus or a simplification grammar, which can be expensive to create, the method is especially relevant for under-resourced languages and organizations. [sent-30, score-0.279]
17 Although we limit rewriting to deletions, the space of possible candidates grows exponentially with the length of the input sentence, prohibiting exhaustive candidate generation, which is why we chose to sample the deletions randomly. [sent-31, score-0.217]
18 Another way in which we restrict the candidate space is by splitting long sentences. [sent-33, score-0.08]
19 Some clauses are simple to identify and extract, like relative clauses, and doing so can dramatically reduce sentence length. [sent-34, score-0.114]
20 Both simple deletions and extraction of clauses can be observed in professionally simplified text. [sent-35, score-0.321]
21 , 2010), on identifying re-write rules at sentence level either manually (Chandrasekar et al. [sent-41, score-0.051]
22 (1996) propose a structural approach, which uses syntactic cues to recover relative clauses and appositives. [sent-48, score-0.101]
23 Sentence level syntactic re-writing has since seen a variety of manually constructed general sentence splitting rules, designed to operate both on dependencies and phrase structure trees, and typically including lexical cues (Siddharthan, 2011; Heilman and Smith, 2010; Canning et al. [sent-49, score-0.169]
24 Similar rules have been created from direct inspection of simplification corpora (Decker, 2003; Seretan, 2012) and discovered automatically from large scale aligned corpora (Woodsend and Lapata, 2011; Zhu et al. [sent-51, score-0.279]
25 In our experiment we apply few basic sentence splitting rules as a pre-processing technique before using an over-generating random deletion approach. [sent-53, score-0.196]
26 Their system further include compound sentence splitting and rewriting of passive sentences to active ones (Canning et al. [sent-56, score-0.131]
27 (2012) are both recent publications of new resources for evaluating lexical simplification in English consisting of lists of synonyms ranked by human judges. [sent-60, score-0.279]
28 In a minimally supervised setup, our TS approach can be modified to include lexical simplifications as part of the random generation process. [sent-64, score-0.105]
29 An exception is Siddharthan and Katsos (2012), who seek to isolate the psycholinguistically motivated notions of sentence comprehension from sentence acceptability by actually measuring the effect of TS on cognition on a small scale. [sent-67, score-0.102]
30 One goal is to develop techniques and metrics for assessing the readability of unseen 143 text. [sent-69, score-0.233]
31 Language modeling in particular has shown to be a robust and informative component of systems for assessing text readability (Schwarm and Ostendorf, 2005; Vajjala and Meurers, 2012) as it is better suited to evaluate grammaticality than standard metrics. [sent-72, score-0.382]
32 We use language modeling alongside traditional metrics for selecting good simplification candidates. [sent-73, score-0.279]
33 1 Baseline Systems We used the original input text and the human simplified text from the sentence aligned DSim corpus which consist of 48k original and manually simplified sentences of Danish news wire text (Klerke and Søgaard, 2012) as reference in the evaluations. [sent-75, score-0.526]
34 In addition we trained a statistical machine translation (SMT) simplification system, in effect translating from normal news wire text to simplified news. [sent-76, score-0.551]
35 Including both corpora gives better coverage and assigns lower average ppl and a simlar difference in average ppl between the two sides of a held out part of the DSim corpus compared to using only the simplified part of DSim for the language model. [sent-80, score-0.643]
36 2 Experimental setup Three system variants were set up to generate simplified output from the original news wire of the development and test partitions of the DSim corpus. [sent-87, score-0.239]
37 Sample over-generated candidates by sampling the heuristically restricted space of random lexical deletions and ranking candidates with a loss function. [sent-92, score-0.421]
38 Combined is a combination of the two, applying the sampling procedure of Sample to the split sentences from Split. [sent-94, score-0.124]
39 Sentence Splitting We implemented sentence splitting to extract relative clauses, as marked by the dependency relation rel, coordinated clauses, coord, and conjuncts, con j ,when at least a verb and a noun is left in each part of the split. [sent-95, score-0.131]
40 In case of more than one possibility, the split resulting in the most balanced division of the sentence was chosen and the rules were re-applied if a new sentence was still longer than ten tokens. [sent-98, score-0.166]
41 Structural Heuristics To preserve nodes from later deletion we applied heuristics using simple structural cues from the dependency structures. [sent-99, score-0.146]
42 The heuristics were applied both to trees, acting by preserving entire subtrees and applied to words, only preserving single tokens. [sent-101, score-0.117]
43 3 Scoring We rank candidates according to a loss function incorporating both readability score (the lower, the more readable) and language model perplexity (the lower, the less perplexing) as described below. [sent-113, score-0.419]
44 The loss function assigns values to the candidates such that the best simplification candidate receives the lowest score. [sent-114, score-0.465]
45 The loss function is a weighted combination of three scores: perplexity (PPL), LIX and wordclass distribution (WCD). [sent-115, score-0.086]
46 The PPL scores were obtained from a 5-gram language model of Danish6 We used the standard readability metric for Danish, LIX (Bjornsson, 1983)7. [sent-116, score-0.233]
47 Finally, the WCD measured the variation in universal postag-distribution 8 compared to the observed tagvariation in the entire simplified corpus. [sent-117, score-0.137]
48 For PPL and LIX we calculated the difference between the score of the input sentence and the candidate. [sent-118, score-0.092]
49 Development data was used for tuning the weights of the loss function. [sent-119, score-0.086]
50 Because the candidate-generation is free to produce extremely short candidates, we have to deal with candidates 6The LM was Knesser-Ney smoothed, using the same corpora as the baseline system, without punctuation and built using SRILM (Stolcke, 2002). [sent-120, score-0.1]
51 (Anderson, 1983) calculated a conversion from LIX to grade levels. [sent-123, score-0.091]
52 Those scores never arise in the professionally simplified text, so we eliminate extreme candidates by introducing filters on all scores. [sent-126, score-0.283]
53 The upper limit was fixed at the input-level plus 20% to allow more varied candidates through the filters. [sent-128, score-0.1]
54 The WCD-filter accepted all candidates with a tag-variance that fell below the 75-percentile observed variance in the simplified training part of the DSim corpus. [sent-129, score-0.237]
55 The resulting loss was calculated as the sum of three weighted scores. [sent-130, score-0.127]
56 Below is the loss function we minimized over the filtered candidates t ∈ Ts for each input sentence, s. [sent-131, score-0.186]
57 7755,)t) If no candidates passed through the filters, the input sentence was kept. [sent-136, score-0.151]
58 The judges were asked to rate each sentence in terms of grammaticality and in terms of perceived beginner reader appropriateness, both on a 5-point scale, with one signifying very good and five signifying very bad. [sent-141, score-0.585]
59 The evaluators had to rate six versions of each sentence: original news wire, a human simplified version, the baseline system, a split sentence version (Split), a sampled only version (Sample), and a version combining the Split and Sample techniques (Combined). [sent-142, score-0.252]
60 Below are example outputs 145 for the baseline and the other three automatic systems: BL: Der er hvad der bliver betegnet som abnormt store mængder radioaktivt materiale i havvand nærrygter atomkraftværk . [sent-144, score-1.039]
61 Hvad bliver betegnet som abnormt store mængder af radioaktivt materiale i havvand nær det jordskælvsramte atomkraftværk i Japan . [sent-146, score-0.831]
62 Sample: Der er målt hvad der bliver betegnet som store mængder af radioaktivt materiale i havvand japan . [sent-147, score-1.078]
63 Hvad bliver betegnet som store mængder af radioaktivt materiale det atomkraftværk i japan . [sent-150, score-0.74]
64 5 Results The ranking of the systems in terms of beginner reader appropriateness and grammaticality, are shown in Figure 1. [sent-151, score-0.459]
65 From the test set of the DSim corpus, 15 news wire texts were arbitrarily selected for evaluation. [sent-152, score-0.102]
66 As expected, the filtering of candidates and the loss function force the systems Sample and Combined to choose simplifications with LIX and PPL scores close to the ones observed in the human simplified version. [sent-156, score-0.394]
67 Split sentences only reduce LIX as a result of shorter sentences, however PPL is the highest, indicating a loss of grammaticality. [sent-157, score-0.086]
68 For texts ranked by more than one judge, we calculated agreement as Krippendorff’s α. [sent-162, score-0.082]
69 In addition to sentence-wise agreement, the systemwise evaluation agreement was calculated as all judges were evaluating the same 6 systems 8 times each. [sent-164, score-0.082]
70 We calculated α of the most frequent score (mode) assigned by each judge to each system. [sent-165, score-0.081]
71 As shown in Table 2 this system score agreement was only about half of the single sentence agreement, which reflect a notable instability in output quality of all computer generated systems. [sent-166, score-0.092]
72 While grammaticality is mostly agreed upon when the scores are collapsed into three bins (α = 0. [sent-168, score-0.116]
73 650), proficient speakers do not agree to the same extent on what constitutes beginner reader appropriate text (α = 0. [sent-169, score-0.451]
74 Beginning reader appropriateness votes for each system Grammaticality votes for each system Figure 1: Distribution of all rankings on systems before collapsing rankings. [sent-174, score-0.252]
75 Beginner reader appropriateness is significantly better in the human simplified version 146 BGeragimn mearti rcea lidteyrS0 ys. [sent-182, score-0.389]
76 t63en35c80es Table 2: Krippendorff’s α agreement for full-text and sentence evaluation. [sent-184, score-0.092]
77 Agreement on system ranks was calculated from the most frequent score per judge per system. [sent-185, score-0.081]
78 We found that our Combined system produced sentences that were as grammatical as the baseline and also frequently judged to be appropriate for beginner readers. [sent-188, score-0.207]
79 The main source of error affecting both Combined and Split is faulty sentence splitting as a result of errors in tagging and parsing. [sent-189, score-0.172]
80 One way to avoid this in future development is to propagate several split variants to the final sampling and scoring. [sent-190, score-0.124]
81 As can be expected in a system operating exclusively at sentence level, coherence across sentence boundaries remains a weak point. [sent-192, score-0.102]
82 Another important point is that while the baseline system performs well in the evaluation, this is likely due to its conservativeness: choosing simplifications resembling the original input very closely. [sent-193, score-0.071]
83 Our systems Sample and Combine, on the other hand, have been tuned to perform much more radical changes and in this respect more closely model the changes we see in the human simplification. [sent-195, score-0.061]
84 Combined is thus evaluated to be at level with the baseline in grammaticality and beginner reader appropriateness, despite the fact that the baseline system is supervised. [sent-196, score-0.452]
85 Conclusion and perspectives We have shown promising results for simplification of Danish sentences. [sent-197, score-0.312]
86 Mean ( x¯), median ( x˜) and most frequent (mode) of assigned ranks by beginner reader appropriateness and grammaticality as assessed by proficient Danish speakers. [sent-205, score-0.625]
87 Table 4: Significant differences between systems in experiment b: Beginner reader appropriateness and g: Grammaticality. [sent-210, score-0.252]
88 To integrate language modeling and readability metrics in scoring is a first step towards applying results from readability research to the simplification framework. [sent-215, score-0.745]
89 Future perspectives include combining supervised and unsupervised methods to exploit the radical unsupervised deletion approach and the knowledge obtainable from observable structural changes and potential lexical simplifications. [sent-217, score-0.159]
90 We plan to focus on refining the reliability of sentence splitting in the presence of parser errors as well as on developing a loss function that incorporates more of the insights from readability research, and to apply machine learning techniques to the weighting of features. [sent-218, score-0.45]
91 A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. [sent-235, score-0.591]
92 LIX and RIX: Variations on a little-known readability index. [sent-240, score-0.233]
93 In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 33–39, Montr{é}al, Canada, June. [sent-263, score-0.129]
94 Automatic sentence simplification for subtitling in dutch and english. [sent-300, score-0.33]
95 On the failure of readability formulas to define readable texts: A case study from adaptations. [sent-307, score-0.287]
96 In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 8–16, Montr{é}al, Canada, June. [sent-323, score-0.129]
97 sampling a restricted space of rewrites to optimize readability using lexical substitutions and dependency analyses. [sent-344, score-0.293]
98 In Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations, pages 17–24, Montr{é}al, Canada, June. [sent-402, score-0.129]
99 On improving the accuracy of readability classification using insights from second language acquisition. [sent-433, score-0.233]
100 For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia. [sent-442, score-0.071]
wordName wordTfidf (topN-words)
[('simplification', 0.279), ('dsim', 0.253), ('ppl', 0.253), ('lix', 0.247), ('readability', 0.233), ('beginner', 0.207), ('danish', 0.188), ('klerke', 0.161), ('materiale', 0.138), ('simplified', 0.137), ('reader', 0.129), ('appropriateness', 0.123), ('ts', 0.122), ('grammaticality', 0.116), ('betegnet', 0.115), ('bliver', 0.115), ('hvad', 0.115), ('radioaktivt', 0.115), ('canning', 0.102), ('som', 0.102), ('wire', 0.102), ('candidates', 0.1), ('atomkraftv', 0.092), ('havvand', 0.092), ('ngder', 0.092), ('siddharthan', 0.088), ('loss', 0.086), ('der', 0.085), ('splitting', 0.08), ('deletions', 0.075), ('populations', 0.075), ('specia', 0.073), ('simplifications', 0.071), ('japan', 0.07), ('abnormt', 0.069), ('sigrid', 0.069), ('deletion', 0.065), ('gaard', 0.064), ('split', 0.064), ('clauses', 0.063), ('woodsend', 0.062), ('belder', 0.061), ('radical', 0.061), ('chandrasekar', 0.061), ('seretan', 0.061), ('lt', 0.06), ('sampling', 0.06), ('rk', 0.056), ('korpus', 0.056), ('coster', 0.056), ('readable', 0.054), ('sentence', 0.051), ('proficient', 0.05), ('grade', 0.05), ('advaith', 0.05), ('store', 0.047), ('er', 0.046), ('attatchment', 0.046), ('bjornsson', 0.046), ('decker', 0.046), ('drndarevic', 0.046), ('jordsk', 0.046), ('lvsramte', 0.046), ('medero', 0.046), ('petersen', 0.046), ('professionally', 0.046), ('radioactivity', 0.046), ('rybing', 0.046), ('wcd', 0.046), ('reading', 0.045), ('heuristics', 0.043), ('sample', 0.042), ('agreement', 0.041), ('calculated', 0.041), ('bott', 0.041), ('vajjala', 0.041), ('signifying', 0.041), ('schwarm', 0.041), ('faulty', 0.041), ('carroll', 0.04), ('judge', 0.04), ('master', 0.038), ('af', 0.038), ('cues', 0.038), ('sio', 0.037), ('fukushima', 0.037), ('irstlm', 0.037), ('kauchak', 0.037), ('preserving', 0.037), ('alu', 0.035), ('yatskar', 0.035), ('daelemans', 0.035), ('combined', 0.034), ('generation', 0.034), ('krippendorff', 0.034), ('plant', 0.034), ('text', 0.033), ('perspectives', 0.033), ('constitutes', 0.032), ('nuclear', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 322 acl-2013-Simple, readable sub-sentences
Author: Sigrid Klerke ; Anders Sgaard
Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.
2 0.19854055 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.
Author: Matthew Shardlow
Abstract: Identifying complex words (CWs) is an important, yet often overlooked, task within lexical simplification (The process of automatically replacing CWs with simpler alternatives). If too many words are identified then substitutions may be made erroneously, leading to a loss of meaning. If too few words are identified then those which impede a user’s understanding may be missed, resulting in a complex final text. This paper addresses the task of evaluating different methods for CW identification. A corpus of sentences with annotated CWs is mined from Simple Wikipedia edit histories, which is then used as the basis for several experiments. Firstly, the corpus design is explained and the results of the validation experiments using human judges are reported. Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine. These are based upon previous approaches to the task and show that thresholding does not perform significantly differently to the more na¨ ıve technique of simplifying everything. The support vector machine achieves a slight increase in precision over the other two methods, but at the cost of a dramatic trade off in recall.
3 0.18419945 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
Author: David Kauchak
Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.
4 0.11288077 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation
Author: Shachar Mirkin ; Sriram Venkatapathy ; Marc Dymetman ; Ioan Calapodescu
Abstract: The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s ap- proval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals.
5 0.10226186 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
Author: Christian Hardmeier ; Sara Stymne ; Jorg Tiedemann ; Joakim Nivre
Abstract: We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-bysentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT models. 1 Motivation Most of the research on statistical machine translation (SMT) that was conducted during the last 20 years treated every text as a “bag of sentences” and disregarded all relations between elements in different sentences. Systematic research into explicitly discourse-related problems has only begun very recently in the SMT community (Hardmeier, 2012) with work on topics such as pronominal anaphora (Le Nagard and Koehn, 2010; Hardmeier and Federico, 2010; Guillou, 2012), verb tense (Gong et al., 2012) and discourse connectives (Meyer et al., 2012). One of the problems that hamper the development of cross-sentence models for SMT is the fact that the assumption of sentence independence is at the heart of the dynamic programming (DP) beam search algorithm most commonly used for decoding in phrase-based SMT systems (Koehn et al., 2003). For integrating cross-sentence features into the decoding process, researchers had to adopt strategies like two-pass decoding (Le Nagard and Koehn, 2010). We have previously proposed an algorithm for document-level phrase-based SMT decoding (Hardmeier et al., 2012). Our decoding algorithm is based on local search instead of dynamic programming and permits the integration of 193 document-level models with unrestricted dependencies, so that a model score can be conditioned on arbitrary elements occurring anywhere in the input document or in the translation that is being generated. In this paper, we present an open-source implementation of this search algorithm. The decoder is written in C++ and follows an objectoriented design that makes it easy to extend it with new feature models, new search operations or different types of local search algorithms. The code is released under the GNU General Public License and published on Github1 to make it easy for other researchers to use it in their own experiments. 2 Document-Level Decoding with Local Search Our decoder is based on the phrase-based SMT model described by Koehn et al. (2003) and implemented, for example, in the popular Moses decoder (Koehn et al., 2007). Translation is performed by splitting the input sentence into a number of contiguous word sequences, called phrases, which are translated into the target lan- guage through a phrase dictionary lookup and optionally reordered. The choice between different translations of an ambiguous source phrase and the ordering of the target phrases are guided by a scoring function that combines a set of scores taken from the phrase table with scores from other models such as an n-gram language model. The actual translation process is realised as a search for the highest-scoring translation in the space of all the possible translations that could be generated given the models. The decoding approach that is implemented in Docent was first proposed by Hardmeier et al. (2012) and is based on local search. This means that it has a state corresponding to a complete, if possibly bad, translation of a document at every 1https : //github .com/chardmeier/docent/wiki Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t.he ?c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 193–198, stage of the search progress. Search proceeds by making small changes to the current search state in order to transform it gradually into a better translation. This differs from the DP algorithm used in other decoders, which starts with an empty translation and expands it bit by bit. It is similar to previous work on phrase-based SMT decoding by Langlais et al. (2007), but enables the creation of document-level models, which was not addressed by earlier approaches. Docent currently implements two search algorithms that are different generalisations of the hill climbing local search algorithm by Hardmeier et al. (2012). The original hill climbing algorithm starts with an initial state and generates possible successor states by randomly applying simple elementary operations to the state. After each operation, the new state is scored and accepted if its score is better than that of the previous state, else rejected. Search terminates when the decoder cannot find an acceptable successor state after a certain number of attempts, or when a maximum number of steps is reached. Simulated annealing is a stochastic variant of hill climbing that always accepts moves towards better states, but can also accept moves towards lower-scoring states with a certain probability that depends on a temperature parameter in order to escape local maxima. Local beam search generalises hill climbing in a different way by keeping a beam of a fixed number of multiple states at any time and randomly picking a state from the beam to modify at each move. The original hill climbing procedure can be recovered as a special case of either one of these search algorithms, by calling simulated annealing with a fixed temperature of 0 or local beam search with a beam size of 1. Initial states for the search process can be generated either by selecting a random segmentation with random translations from the phrase table in monotonic order, or by running DP beam search with sentence-local models as a first pass. For the second option, which generally yields better search results, Docent is linked with the Moses decoder and makes direct calls to the DP beam search algorithm implemented by Moses. In addition to these state initialisation procedures, Docent can save a search state to a disk file which can be loaded again in a subsequent decoding pass. This saves time especially when running repeated experiments from the same starting point obtained 194 by DP search. In order to explore the complete search space of phrase-based SMT, the search operations in a local search decoder must be able to change the phrase translations, the order of the output phrases and the segmentation of the source sentence into phrases. The three operations used by Hardmeier et al. (2012), change-phrase-translation, resegment and swap-phrases, jointly meet this requirement and are all implemented in Docent. Additionally, Docent features three extra operations, all of which affect the target word order: The movephrases operation moves a phrase to another location in the sentence. Unlike swap-phrases, it does not require that another phrase be moved in the opposite direction at the same time. A pair of operations called permute-phrases and linearisephrasescanreorderasequenceofphrasesintorandom order and back into the order corresponding to the source language. Since the search algorithm in Docent is stochastic, repeated runs of the decoder will gen- erally produce different output. However, the variance of the output is usually small, especially when initialising with a DP search pass, and it tends to be lower than the variance introduced by feature weight tuning (Hardmeier et al., 2012; Stymne et al., 2013a). 3 Available Feature Models In its current version, Docent implements a selection of sentence-local feature models that makes it possible to build a baseline system with a configuration comparable to that of a typical Moses baseline system. The published source code also includes prototype implementations of a few document-level models. These models should be considered work in progress and serve as a demonstration of the cross-sentence modelling capabilities of the decoder. They have not yet reached a state of maturity that would make them suitable for production use. The sentence-level models provided by Docent include the phrase table, n-gram language models implemented with the KenLM toolkit (Heafield, 2011), an unlexicalised distortion cost model with geometric decay (Koehn et al., 2003) and a word penalty cost. All of these features are designed to be compatible with the corresponding features in Moses. From among the typical set of baseline features in Moses, we have not implemented the lexicalised distortion model, but this model could easily be added if required. Docent uses the same binary file format for phrase tables as Moses, so the same training apparatus can be used. DP-based SMT decoders have a parameter called distortion limit that limits the difference in word order between the input and the MT output. In DP search, this is formally considered to be a parameter of the search algorithm because it affects the algorithmic complexity of the search by controlling how many translation options must be considered at each hypothesis expansion. The stochastic search algorithm in Docent does not require this limitation, but it can still be useful because the standard models of SMT do not model long-distance reordering well. Docent therefore includes a separate indicator feature to indicate a violated distortion limit. In conjunction with a very large weight, this feature can effectively ensure that the distortion limit is enforced. In contrast with the distortion limit parameter of a DP decoder, the weight ofour distortion limit feature can potentially be tuned to permit occasional distortion limit violations when they contribute to better translations. The document-level models included in Docent include a length parity model, a semantic language model as well as a collection of documentlevel readability models. The length parity model is a proof-of-concept model that ensures that all sentences in a document have either consistently odd or consistently even length. It serves mostly as a template to demonstrate how a simple documentlevel model can be implemented in the decoder. The semantic language model was originally proposed by Hardmeier et al. (2012) to improve lexical cohesion in a document. It is a cross-sentence model over sequences of content words that are scored based on their similarity in a word vector space. The readability models serve to improve the readability of the translation by encouraging the selection of easier and more consistent target words. They are described and demonstrated in more detail in section 5. Docent can read input files both in the NISTXML format commonly used to encode documents in MT shared tasks such as NIST or WMT and in the more elaborate MMAX format (Müller and Strube, 2003). The MMAX format makes it possible to include a wide range of discourselevel corpus annotations such as coreference links. 195 These annotations can then be accessed by the feature models. To allow for additional targetlanguage information such as morphological features of target words, Docent can handle simple word-level annotations that are encoded in the phrase table in the same way as target language factors in Moses. In order to optimise feature weights we have adapted the Moses tuning infrastructure to Docent. In this way we can take advantage of all its features, for instance using different optimisation algorithms such as MERT (Och, 2003) or PRO (Hopkins and May, 2011), and selective tuning of a subset of features. Since document features only give meaningful scores on the document level and not on the sentence level, we naturally perform optimisation on document level, which typically means that we need more data than for the optimisation of sentence-based decoding. The results we obtain are relatively stable and competitive with sentence-level optimisation of the same models (Stymne et al., 2013a). 4 Implementing Feature Models Efficiently While translating a document, the local search decoder attempts to make a great number of moves. For each move, a score must be computed and tested against the acceptance criterion. An overwhelming majority of the proposed moves will be rejected. In order to achieve reasonably fast decoding times, efficient scoring is paramount. Recomputing the scores of the whole document at every step would be far too slow for the decoder to be useful. Fortunately, score computation can be sped up in two ways. Knowledge about how the state to be scored was generated from its predecessor helps to limit recomputations to a minimum, and by adopting a two-step scoring procedure that just computes the scores that can be calculated with little effort at first, we need to compute the complete score only if the new state has some chance of being accepted. The scores of SMT feature models can usually be decomposed in some way over parts of the document. The traditional models borrowed from sentence-based decoding are necessarily decomposable at the sentence level, and in practice, all common models are designed to meet the constraints of DP beam search, which ensures that they can in fact be decomposed over even smaller sequences of just a few words. For genuine document-level features, this is not the case, but even these models can often be decomposed in some way, for instance over paragraphs, anaphoric links or lexical chains. To take advantage of this fact, feature models in Docent always have access to the previous state and its score and to a list of the state modifications that transform the previous state into the next. The scores of the new state are calculated by identifying the parts of a document that are affected by the modifications, subtracting the old scores of this part from the previous score and adding the new scores. This approach to scoring makes feature model implementation a bit more complicated than in DP search, but it gives the feature models full control over how they decompose a document while still permitting efficient decoding. A feature model class in Docent implements three methods. The initDocument method is called once per document when decoding starts. It straightforwardly computes the model score for the entire document from scratch. When a state is modified, the decoder first invokes the estimateScoreUpdate method. Rather than calculating the new score exactly, this method is only required to return an upper bound that reflects the maximum score that could possibly be achieved by this state. The search algorithm then checks this upper bound against the acceptance criterion. Only if the upper bound meets the criterion does it call the updateScore method to calculate the exact score, which is then checked against the acceptance criterion again. The motivation for this two-step procedure is that some models can compute an upper bound approximation much more efficiently than an exact score. For any model whose score is a log probability, a value of 0 is a loose upper bound that can be returned instantly, but in many cases, we can do much better. In the case of the n-gram language model, for instance, a more accurate upper bound can be computed cheaply by subtracting from the old score all log-probabilities of n-grams that are affected by the state modifications without adding the scores of the n-grams replacing them in the new state. This approximation can be calculated without doing any language model lookups at all. On the other hand, some models like the distortion cost or the word penalty are very cheap to compute, so that the estimateScoreUpdate method 196 can simply return the precise score as a tight up- per bound. If a state gets rejected because of a low score on one of the cheap models, this means we will never have to compute the more expensive feature scores at all. 5 Readability: A Case Study As a case study we report initial results on how document-wide features can be used in Docent in order to improve the readability oftexts by encouraging simple and consistent terminology (Stymne et al., 2013b). This work is a first step towards achieving joint SMT and text simplification, with the final goal of adapting MT to user groups such as people with reading disabilities. Lexical consistency modelling for SMT has been attempted before. The suggested approaches have been limited by the use of sentence-level decoders, however, and had to resort to procedures like post processing (Carpuat, 2009), multiple decoding runs with frozen counts from previous runs (Ture et al., 2012), or cache-based models (Tiedemann, 2010). In Docent, however, we al- ways have access to a full document translation, which makes it straightforward to include features directly into the decoder. We implemented four features on the document level. The first two features are type token ratio (TTR) and a reformulation of it, OVIX, which is less sensitive to text length. These ratios have been related to the “idea density” of a text (Mühlenbock and Kokkinakis, 2009). We also wanted to encourage consistent translations of words, for which we used the Q-value (Deléger et al., 2006), which has been proposed to measure term quality. We applied it on word level (QW) and phrase level (QP). These features need access to the full target document, which we have in Docent. In addition, we included two sentence-level count features for long words that have been used to measure the readability of Swedish texts (Mühlenbock and Kokkinakis, 2009). We tested our features on English–Swedish translation using the Europarl corpus. For training we used 1,488,322 sentences. As test data, we extracted 20 documents with a total of 690 sen- tences. We used the standard set of baseline features: 5-gram language model, translation model with 5 weights, a word penalty and a distortion penalty. BaselineReadability featuresComment de ärade ledamöterna (the honourableledamöterna (the members) / ni+ Removal of non-essential words Members) (you) på ett sådant sätt att (in such a way så att (so that) + Simplified expression that) gemenskapslagstiftningen (the gemenskapens lagstiftning (the + Shorter community legislation) community’s compound to genitive construction Världshandelsorganisationen (World WTO (WTO) legislation) − Changing Trade Organisation) long compound to E−nCg hliasnhg-biansged lo handlingsplanen (the action plan) ägnat särskild uppmärksamhet particular attention to) words by changing long åt (paid planen (the plan) särskilt uppmärksam − Removal på (particular attentive on) anbgb creomvipatoiounn of important word −− RBaedm grammar bpeocratuasnet wofo rcdhanged p−ar Bt aodf gspraeemcmh aarn dbe mcaisussieng o fv cehrban Table 2: Example translation snippets with comments FeatureBLEUOVIXLIX Baseline0.24356.8851.17 TTR 0.243 55.25 51.04 OVIX 0.243 54.65 51.00 QW 0.242 57.16 51.16 QP 0.243 57.07 51.06 All 0.235 47.80 49.29 Table 1: Results for adding single lexical consistency features to Docent To evaluate our system we used the BLEU score (Papineni et al., 2002) together with a set of readability metrics, since readability is what we hoped to improve by adding consistency features. Here we used OVIX to confirm a direct impact on con- sistency, and LIX (Björnsson, 1968), which is a common readability measure for Swedish. Unfortunately we do not have access to simplified translated text, so we calculate the MT metrics against a standard reference, which means that simple texts will likely have worse scores than complicated texts closer to the reference translation. We tuned the standard features using Moses and MERT, and then added each lexical consistency feature with a small weight, using a grid search approach to find values with a small impact. The results are shown in Table 1. As can be seen, for individual features the translation quality was maintained, with small improvements in LIX, and in OVIX for the TTR and OVIX features. For the combination we lost a little bit on translation quality, but there was a larger effect on the readability metrics. When we used larger weights, there was a bigger impact on the readability metrics, with a further decrease on MT quality. We also investigated what types of changes the readability features could lead to. Table 2 shows a sample of translations where the baseline is compared to systems with readability features. There are both cases where the readability features help 197 and cases where they are problematic. Overall, these examples show that our simple features can help achieve some interesting simplifications. There is still much work to do on how to take best advantage of the possibilities in Docent in order to achieve readable texts. This attempt shows the feasibility of the approach. We plan to extend this work for instance by better feature optimisation, by integrating part-of-speech tags into our features in order to focus on terms rather than common words, and by using simplified texts for evaluation and tuning. 6 Conclusions In this paper, we have presented Docent, an opensource document-level decoder for phrase-based SMT released under the GNU General Public License. Docent is the first decoder that permits the inclusion of feature models with unrestricted dependencies between arbitrary parts of the output, even crossing sentence boundaries. A number of research groups have recently started to investigate the interplay between SMT and discourse-level phenomena such as pronominal anaphora, verb tense selection and the generation of discourse connectives. We expect that the availability of a document-level decoder will make it substantially easier to leverage discourse information in SMT and make SMT models explore new ground beyond the next sentence boundary. References Carl-Hugo Björnsson. 1968. Läsbarhet. Liber, Stockholm. Marine Carpuat. 2009. One translation per discourse. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009), pages 19–27, Boulder, Colorado. Louise Deléger, Magnus Merkel, and Pierre Zweigenbaum. 2006. Enriching medical terminologies: an approach based on aligned corpora. In International Congress of the European Federation for Medical Informatics, pages 747–752, Maastricht, The Netherlands. Zhengxian Gong, Min Zhang, Chew Lim Tan, and Guodong Zhou. 2012. N-gram-based tense models for statistical machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 276–285, Jeju Island, Korea. Liane Guillou. 2012. Improving pronoun translation for statistical machine translation. In Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 1–10, Avignon, France. Christian Hardmeier and Marcello Federico. 2010. Modelling pronominal anaphora in statistical machine translation. In Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT), pages 283–289, Paris, France. Christian Hardmeier, Joakim Nivre, and Jörg Tiedemann. 2012. Document-wide decoding for phrase-based statistical machine translation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1179–1 190, Jeju Island, Korea. Christian Hardmeier. 2012. Discourse in statistical machine translation: A survey and a case study. Discours, 11. Kenneth Heafield. 2011. KenLM: faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 187–197, Edinburgh, Scotland. Mark Hopkins and Jonathan ranking. In Proceedings on Empirical Methods in cessing, pages 1352–1362, May. 2011. Tuning as of the 2011 Conference Natural Language ProEdinburgh, Scotland. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology, pages 48–54, Edmonton. Philipp Koehn, Hieu Hoang, Alexandra Birch, et al. 2007. Moses: open source toolkit for Statistical Machine Translation. In Annual meeting of the Associationfor Computational Linguistics: Demonstration session, pages 177–180, Prague, Czech Republic. Philippe Langlais, Alexandre Patry, and Fabrizio Gotti. 2007. A greedy decoder for phrase-based statistical machine translation. In TMI-2007: Proceedings 198 of the 11th International Conference on Theoretical and Methodological Issues in Machine Translation, pages 104–1 13, Skövde, Sweden. Ronan Le Nagard and Philipp Koehn. 2010. Aiding pronoun translation with co-reference resolution. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 252–261, Uppsala, Sweden. Thomas Meyer, Andrei Popescu-Belis, Najeh Hajlaoui, and Andrea Gesmundo. 2012. Machine translation of labeled discourse connectives. In Proceedings of the Tenth Biennial Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, California, USA. Katarina Mühlenbock and Sofie Johansson Kokkinakis. 2009. LIX 68 revisited an extended readability. In Proceedings of the Corpus Linguistics Conference, Liverpool, UK. – Christoph Müller and Michael Strube. 2003. Multilevel annotation in MMAX. In Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue, pages 198–207, Sapporo, Japan. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting ofthe Associationfor Computational Linguistics, pages 3 11–3 18, Philadelphia, Pennsylvania, USA. Sara Stymne, Christian Hardmeier, Jörg Tiedemann, and Joakim Nivre. 2013a. Feature weight optimization for discourse-level SMT. In Proceedings of the Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria. Sara Stymne, Jörg Tiedemann, Christian Hardmeier, and Joakim Nivre. 2013b. Statistical machine translation with readability constraints. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), pages 375–386, Oslo, Norway. Jörg Tiedemann. 2010. Context adaptation in statistical machine translation using models with exponentially decaying cache. In Proceedings of the ACL 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP), pages 8–15, Uppsala, Sweden. Ferhan Ture, Douglas W. Oard, and Philip Resnik. 2012. Encouraging consistent translation choices. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 417–426, Montréal, Canada.
6 0.068158917 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
7 0.064070314 300 acl-2013-Reducing Annotation Effort for Quality Estimation via Active Learning
8 0.063492194 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
9 0.062888347 64 acl-2013-Automatically Predicting Sentence Translation Difficulty
10 0.060079269 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
11 0.059723068 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
12 0.058396887 172 acl-2013-Graph-based Local Coherence Modeling
13 0.058197632 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
14 0.05812503 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
15 0.058049869 135 acl-2013-English-to-Russian MT evaluation campaign
16 0.057058308 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation
17 0.055549301 53 acl-2013-Annotation of regular polysemy and underspecification
18 0.054120317 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
19 0.052872222 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation
20 0.052736111 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines
topicId topicWeight
[(0, 0.174), (1, -0.02), (2, 0.02), (3, -0.019), (4, -0.032), (5, -0.015), (6, 0.031), (7, -0.013), (8, 0.014), (9, -0.002), (10, -0.044), (11, 0.04), (12, -0.109), (13, 0.028), (14, -0.113), (15, -0.027), (16, -0.026), (17, 0.004), (18, -0.01), (19, -0.019), (20, 0.033), (21, -0.009), (22, -0.026), (23, -0.003), (24, -0.026), (25, 0.043), (26, -0.01), (27, 0.019), (28, 0.005), (29, 0.058), (30, -0.114), (31, -0.058), (32, -0.033), (33, 0.104), (34, 0.021), (35, -0.079), (36, 0.089), (37, 0.03), (38, -0.015), (39, -0.066), (40, -0.168), (41, -0.028), (42, -0.139), (43, 0.038), (44, -0.021), (45, 0.053), (46, -0.004), (47, -0.096), (48, -0.094), (49, 0.213)]
simIndex simValue paperId paperTitle
same-paper 1 0.89099622 322 acl-2013-Simple, readable sub-sentences
Author: Sigrid Klerke ; Anders Sgaard
Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.
2 0.85860825 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
Author: David Kauchak
Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.
3 0.82803464 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.
Author: Matthew Shardlow
Abstract: Identifying complex words (CWs) is an important, yet often overlooked, task within lexical simplification (The process of automatically replacing CWs with simpler alternatives). If too many words are identified then substitutions may be made erroneously, leading to a loss of meaning. If too few words are identified then those which impede a user’s understanding may be missed, resulting in a complex final text. This paper addresses the task of evaluating different methods for CW identification. A corpus of sentences with annotated CWs is mined from Simple Wikipedia edit histories, which is then used as the basis for several experiments. Firstly, the corpus design is explained and the results of the validation experiments using human judges are reported. Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine. These are based upon previous approaches to the task and show that thresholding does not perform significantly differently to the more na¨ ıve technique of simplifying everything. The support vector machine achieves a slight increase in precision over the other two methods, but at the cost of a dramatic trade off in recall.
4 0.60981464 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
5 0.60219049 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation
Author: Shachar Mirkin ; Sriram Venkatapathy ; Marc Dymetman ; Ioan Calapodescu
Abstract: The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s ap- proval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals.
6 0.56098044 64 acl-2013-Automatically Predicting Sentence Translation Difficulty
7 0.53612572 381 acl-2013-Variable Bit Quantisation for LSH
8 0.51860589 390 acl-2013-Word surprisal predicts N400 amplitude during reading
9 0.51411462 135 acl-2013-English-to-Russian MT evaluation campaign
10 0.50849146 371 acl-2013-Unsupervised joke generation from big data
11 0.50142473 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
12 0.48719046 225 acl-2013-Learning to Order Natural Language Texts
13 0.48546028 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
14 0.48341694 325 acl-2013-Smoothed marginal distribution constraints for language modeling
15 0.46444798 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis
16 0.46216267 300 acl-2013-Reducing Annotation Effort for Quality Estimation via Active Learning
17 0.45562938 250 acl-2013-Models of Translation Competitions
18 0.44848779 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
19 0.44334793 88 acl-2013-Computational considerations of comparisons and similes
20 0.43174943 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
topicId topicWeight
[(0, 0.07), (5, 0.024), (6, 0.041), (11, 0.053), (13, 0.291), (14, 0.019), (15, 0.019), (24, 0.035), (26, 0.078), (35, 0.062), (42, 0.073), (48, 0.026), (70, 0.029), (88, 0.028), (90, 0.035), (95, 0.055)]
simIndex simValue paperId paperTitle
1 0.74999475 386 acl-2013-What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse
Author: Claudiu Mihaila ; Sophia Ananiadou
Abstract: Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vaster amounts of knowledge in short times. Automatic discourse causality recognition can further improve their workload by suggesting possible causal connections and aiding in the curation of pathway models. We here describe an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with various parameter settings for three algorithms, i.e., Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). Also, we evaluate the impact of lexical, syntactic and semantic features on each of the algorithms and look at er- rors. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
same-paper 2 0.74219388 322 acl-2013-Simple, readable sub-sentences
Author: Sigrid Klerke ; Anders Sgaard
Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.
3 0.70937878 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach
Author: Veronika Vincze ; Istvan Nagy T. ; Richard Farkas
Abstract: Here, we introduce a machine learningbased approach that allows us to identify light verb constructions (LVCs) in Hungarian and English free texts. We also present the results of our experiments on the SzegedParalellFX English–Hungarian parallel corpus where LVCs were manually annotated in both languages. With our approach, we were able to contrast the performance of our method and define language-specific features for these typologically different languages. Our presented method proved to be sufficiently robust as it achieved approximately the same scores on the two typologically different languages.
4 0.58794647 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
Author: Lei Cui ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou
Abstract: The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the reliance on labeled examples, we propose an unsupervised method to clean bilingual data. The method leverages the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. End-to-end experiments show that the proposed method substantially improves the performance in largescale Chinese-to-English translation tasks.
5 0.50055337 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
Author: Kai Liu ; Yajuan Lu ; Wenbin Jiang ; Qun Liu
Abstract: This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual projection and unsupervised induction, so as to induce a monolingual grammar much better than previous models only using bilingual projection or unsupervised induction. We induced dependency gram- mar for five different languages under the guidance of dependency information projected from the parsed English translation, experiments show that the bilinguallyguided method achieves a significant improvement of 28.5% over the unsupervised baseline and 3.0% over the best projection baseline on average.
6 0.49922985 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
7 0.49782187 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
8 0.49762589 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
9 0.49641594 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
10 0.4959746 225 acl-2013-Learning to Order Natural Language Texts
11 0.49562371 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
12 0.49315733 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
13 0.49256158 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
14 0.48996329 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
15 0.48992398 318 acl-2013-Sentiment Relevance
16 0.48959678 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
17 0.48830879 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
18 0.48829895 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
19 0.48786899 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
20 0.48734584 312 acl-2013-Semantic Parsing as Machine Translation