acl acl2013 acl2013-194 knowledge-graph by maker-knowledge-mining

194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

Source: pdf

Author: David Kauchak

Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. [sent-3, score-0.579]

2 We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. [sent-4, score-1.05]

3 We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. [sent-5, score-0.707]

4 We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. [sent-6, score-1.784]

5 text compression, text simplification and summarization) can be viewed as monolingual translation tasks, translating between text variations within a single language. [sent-12, score-0.542]

6 In this paper, we investigate this possibility for text simplification where both simplified English text and normal English text are available for training a simple English language model. [sent-14, score-1.38]

7 Throughout the rest of this paper we refer to sentences/articles/text from English Wikipedia as normal and sentences/articles/text from Simple English Wikipedia as simple. [sent-17, score-0.708]

8 On the one hand, there is a strong correspondence between the simple and normal data. [sent-18, score-0.863]

9 At the word level 96% of the simple words are found in the normal corpus and even for n-grams as large as 5, more than half of the n-grams can be found in the normal text. [sent-19, score-1.571]

10 In addition, the normal text does represent English text and contains many n-grams not seen in the simple corpus. [sent-20, score-0.944]

11 If the word distributions were very similar between simple and normal text, then the overlap proportions between the two languages would be similar regardless of which direction the comparison is made. [sent-23, score-0.916]

12 Instead, we see that the normal text has more varied language and contains more n-grams. [sent-24, score-0.737]

13 Previous research has also shown other differences between simple and normal data sources that could impact language model performance including average number of syllables, reading 1http://www. [sent-25, score-0.932]

14 Although this question arises in other monolingual translation domains, text simplification represents an ideal problem area for analysis. [sent-39, score-0.484]

15 After preprocessing, the 60K articles represents less than half a million sentences which is orders of mag- nitude smaller than the amount of normal English data available (for example the English Gigaword corpus (David Graff, 2003)). [sent-44, score-0.846]

16 Finally, many recent text simplification systems have utilized language models trained only on simplified data (Zhu et al. [sent-45, score-0.595]

17 Our goal is more general: to examine the relationship between simple and normal data and determine whether normal data is helpful. [sent-53, score-1.641]

18 Simple language models play a role in a variety of text simplification applications. [sent-55, score-0.442]

19 Many recent statistical simplification techniques build upon models from machine translation and utilize a simple language model during simplification/decoding both in English (Zhu et al. [sent-56, score-0.659]

20 Simple English language models have also been used as predictive features in other simplification sub-problems such as lexical simplification (Specia et al. [sent-59, score-0.827]

21 3 Corpus We collected a data and Simple English representing normal ple English. [sent-66, score-0.763]

22 , 2012) and has been shown to be simpler than normal English Wikipedia by both automatic measures and human perception (Coster and Kauchak, 2011b; 1538 svwoencratedbns ciezs 73i. [sent-71, score-0.708]

23 We extracted the corresponding 60K normal articles from English Wikipedia based on the article title to represent the normal data. [sent-77, score-1.487]

24 Although the simple and normal data contain the same number of articles, because normal articles tend to be longer and contain more content, the normal side is an order of magnitude larger. [sent-81, score-2.354]

25 4 Language Model Evaluation: Perplexity To analyze the impact of data source on simple English language modeling, we trained language models on varying amounts of simple data, normal data, and a combination of the two. [sent-82, score-1.31]

26 For our first task, we evaluated these language models using perplexity based on how well they modeled the simple side of the held-out data. [sent-83, score-0.448]

27 data: - simple-only: simple sentences only - normal-only: normal sentences only - simple-X+normal: X simple sentences combined with a varying number of normal sentences To evaluate the language models we calculated the model perplexity (Chen et al. [sent-91, score-2.286]

28 As expected, when trained on the same amount of data, the language models trained on simple data perform significantly better than language models trained on normal data. [sent-97, score-1.191]

29 However, the results also show that the normal data does have some benefit. [sent-99, score-0.743]

30 The perplexity for the simple-ALL+normal model, which starts with all available simple data, continues to improve as normal data is added resulting in a 23% improvement over the model trained with only simple data (from a perplexity of 129 down to 100). [sent-100, score-1.692]

31 1539 number of additional normal sentences Figure 2: Language model perplexities for combined simple-normal models. [sent-103, score-0.898]

32 Each line represents a model trained on a different amount of simple data as normal data is added. [sent-104, score-1.058]

33 To better understand how the amount of sim- ple and normal data impacts perplexity, Figure 2 shows perplexity scores for models trained on varying amounts of simple data as we add increasing amounts of normal data. [sent-105, score-2.34]

34 We again see that normal data is beneficial; regardless of the amount of simple data, adding normal data improves perplexity. [sent-106, score-1.72]

35 Models trained on less simple data achieved larger performance increases than those models trained on more simple data. [sent-108, score-0.532]

36 Figure 2 also shows again that simple data is more valuable than normal data. [sent-109, score-0.898]

37 To achieve this same perplexity level starting with 200K simple sentences requires an additional 300K normal sentences, or starting with 100K simple sentences an additional 850K normal sentences. [sent-111, score-2.136]

38 3 Language Model Adaptation In the experiments above, we generated the language models by treating the simple and normal data as one combined corpus. [sent-113, score-0.969]

39 Our goal for this paper is not to explore domain adaptation techniques, but to determine if normal data is useful for the simple language modeling task. [sent-115, score-1.041]

40 However, to provide another dimension for comparison and to understand lambda Figure 3: Perplexity scores for a linearly interpo- lated model between the simple-only model and the normal-only model for varying lambda values. [sent-116, score-0.445]

41 s perplexity scores for varying lambda values ranging from the simple-only model on the left with λ = 0 to the normal-only model on the right with λ = 1. [sent-121, score-0.479]

42 As with the previous experiments, adding normal data improves improves perplexity. [sent-122, score-0.813]

43 The results also highlight the balance between simple and normal data; normal data is not as good as simple data and adding too much of it can cause the results to degrade. [sent-125, score-1.826]

44 cally based on the lexical simplification task from SemEval 2012 (Specia et al. [sent-129, score-0.414]

45 Lexical simplification is a sub-problem of the general text simplification problem (Chandrasekar and Srinivas, 1997); a sentence is simplified by substituting words or phrases in the sentence with “simpler” variations. [sent-131, score-0.877]

46 1 Experimental Setup Examples from the lexical simplification data set from SemEval 2012 consist of three parts: w, the word to be simplified; s1, . [sent-135, score-0.449]

47 Given a language model p(·) and a lexical simplification example, we eraln pk(e·)d atnhed ali lste xoicf cla snidmi-dates based on the probability the language model assigns to the sentence with the candidate simplification inserted in context. [sent-149, score-0.878]

48 plete lexical substitution system, but it was a common feature for many of the submitted systems, it performs well relative to the other systems, and it allows for a concrete comparison between the language models on a simplification task. [sent-178, score-0.453]

49 Open vocabulary models allow for the language models to better utilize the varying amounts of data and since the lexical simplification problem only requires a comparison of probabilities within a given model to produce the final ranking, we do not need the closed vocabulary requirement. [sent-183, score-0.808]

50 As with the perplexity results, for similar amounts of data the simple-only model performs better than the normal-only model. [sent-186, score-0.414]

51 However, 1541 number of additional normal sentences Figure 6: Kappa rank scores for models trained with varying amounts of simple data combined with increasing amounts of normal data. [sent-188, score-2.139]

52 unlike the perplexity results, simply appending additional normal data to the entire simple data set does not improve the performance of the lexical simplifier. [sent-189, score-1.296]

53 To determine if additional normal data improves the performance for models trained on smaller amounts of simple data, Figure 6 shows the kappa rank scores for models trained on different amounts of simple data as additional normal data is added. [sent-190, score-2.45]

54 For smaller amounts of simple data adding normal data does improve the kappa rank score. [sent-191, score-1.181]

55 01 improvement in kappa rank score) by adding normal data. [sent-196, score-0.865]

56 3 Language Model Adaptation The results in the previous section show that adding normal data to a simple data set can improve the lexical simplifier if the amount of simple data is limited. [sent-198, score-1.222]

57 Figure 7 shows results for the same experimental design as Figure 6 with varying amounts of simple and normal data, however, rather than appending the normal data we trained the models separately and created a linearly interpolated model as described in Section 4. [sent-200, score-2.149]

58 For all starting amounts of simple data, interpo- number of additional normal sentences Figure 7: Kappa rank scores for linearly interpolated models between simple-only and normalonly models trained with varying amounts of simple and normal data. [sent-203, score-2.466]

59 lating the simple model with the normal model re- sults in a large increase in the kappa rank score. [sent-204, score-1.058]

60 Combining the model trained on all the simple data with the model trained on all the normal data achieves a score of 0. [sent-205, score-1.125]

61 Although our goal was not to create the best lexical simplification system, this approach would have ranked 6th out of 11 submitted systems in the SemEval 2012 competition (Specia et al. [sent-207, score-0.414]

62 Interestingly, although the performance of the simple-only models varied based on the amount of simple data, when these models are interpolated with a model trained on normal data, the performance tended to converge. [sent-209, score-1.22]

63 This may indicate that for some tasks like lexical simplification, only a modest amount of simple data is required when combining with additional normal data to achieve reasonable performance. [sent-211, score-1.088]

64 For both the perplexity experiments and the lexical simplification experiments, utilizing additional normal data resulted in large performance improvements; using all of the simple data available, performance is still significantly improved when combined with normal data. [sent-213, score-2.406]

65 In this section, we investigate why the additional normal data is beneficial for simple language modeling. [sent-214, score-0.968]

66 1 More n-grams Intuitively, adding normal data provides additional English data to train on. [sent-216, score-0.852]

67 Table 3: Proportion of n-grams in the test sets that occur in the simple and normal training data sets. [sent-225, score-0.898]

68 We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency. [sent-229, score-0.964]

69 For n-grams that have never been seen before, the normal data provides some estimate from English text. [sent-230, score-0.766]

70 For n-grams that have been seen but are rare, the additional normal data can help provide better probability estimates. [sent-234, score-0.81]

71 Table 3 shows the percentage of unigrams, bigrams and trigrams from the two test sets that are found in the simple and normal training data. [sent-238, score-0.892]

72 For all n-gram sizes the normal data contained more test set n-grams than the simple data. [sent-239, score-0.918]

73 Even at the unigram level, the normal data contained significantly more of the test set unigrams than the simple data. [sent-240, score-0.95]

74 4% increase in word occurrence between the simple and normal data set represents an over 50% reduction in the number of out of vocabulary words. [sent-242, score-0.934]

75 %asoitlvhenra occur in the combination of both the simple and normal data. [sent-248, score-0.863]

76 larger n-grams, the difference between the simple and normal data sets are even more pronounced. [sent-249, score-0.898]

77 On the lexical simplification data the normal data contained more than twice as many test trigrams as the simple data. [sent-250, score-1.396]

78 Table 4 shows the test set n-gram overlap on the combined data set of simple and normal data. [sent-254, score-0.962]

79 Because the simple and normal data come from the same content areas, the simple data provides little additional coverage if the normal data is already used. [sent-255, score-1.875]

80 For example, adding the simple data to the normal data only increases the number of seen unigrams by 0. [sent-256, score-1.018]

81 However, the experiments above showed the combined models performed much better than models trained only on normal data. [sent-258, score-0.88]

82 This discrepancy highlights the key problem with normal data: it is out-of-domain data. [sent-259, score-0.73]

83 To make this discrepancy more explicit, we created a sentence aligned data set by aligning the simple and normal articles using the approach from Coster and Kauchak (201 1b). [sent-261, score-1.044]

84 sentences represent the same content, the language use is different between simple and normal and the normal data performs consistently worse. [sent-267, score-1.64]

85 3 A Balance Between Simple and Normal Examining the optimal lambda values for the lin- early interpolated models also helps understand the role of the normal data. [sent-269, score-1.014]

86 On the perplexity task, the best perplexity results were obtained with a lambda of 0. [sent-270, score-0.6]

87 5, or an equal weighting between the simple and normal models. [sent-271, score-0.863]

88 Even though the normal data contained six times as many sentences and nine times as many words, the best modeling performance balanced the quality of the simple model with the coverage of the normal model. [sent-272, score-1.738]

89 For the simplification task, the optimal lambda value determined on the development set was 0. [sent-273, score-0.466]

90 Only when the simple model did not provide differentiation between lexical choices will the normal model play a role in selecting the candidates. [sent-275, score-0.971]

91 For the lexical simplification task, the role of the normal model is even more clear: to handle rare occurrences not covered by the simple model and to smooth the simple model estimates. [sent-276, score-1.574]

92 7 Conclusions and Future Work In the experiments above we have shown that on two different tasks utilizing additional normal data improves the performance of simple English language models. [sent-277, score-0.983]

93 On the perplexity task, the combined model achieved a performance improvement of 23% over the simple-only model and on the lexical simplification task, the combined model achieved a 24% improvement. [sent-278, score-0.882]

94 For both tasks, the best improvements were seen when using language model adaptation techniques, however, the adaptation results also indicated that the role of normal data is partially task dependent. [sent-280, score-0.976]

95 However, on the lexical simplification task, the best results were achieved with a very strong bias towards the simple-only model. [sent-282, score-0.438]

96 For many of the experiments, combining a smaller amount of simple data (50K-100K sen- tences) with normal data achieved results that were similar to larger simple data set sizes. [sent-284, score-1.197]

97 For example, on the lexical simplification task, when using a linearly interpolated model, the model combining 100K simple sentences with all the normal data achieved comparable results to the model combining all the simple sentences with all the normal data. [sent-285, score-2.604]

98 This is encouraging for other monolingual domains such as text compression or text simplification in non-English languages where less data is available. [sent-286, score-0.581]

99 First, further experiments with larger normal data sets are required to understand the limits of adding out-of-domain data. [sent-288, score-0.794]

100 Second, we have only utilized data from Wikipedia for normal text. [sent-289, score-0.743]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('normal', 0.708), ('simplification', 0.374), ('perplexity', 0.254), ('simple', 0.155), ('interpolated', 0.154), ('coster', 0.129), ('kauchak', 0.129), ('lambda', 0.092), ('amounts', 0.091), ('adaptation', 0.088), ('kappa', 0.079), ('wikipedia', 0.077), ('wi', 0.076), ('linearly', 0.073), ('wubben', 0.07), ('specia', 0.067), ('varying', 0.065), ('market', 0.063), ('semeval', 0.062), ('trained', 0.062), ('woodsend', 0.059), ('simplified', 0.056), ('physical', 0.053), ('bacchiani', 0.053), ('constricted', 0.053), ('middlebury', 0.053), ('pressurised', 0.053), ('unsimplified', 0.053), ('tight', 0.053), ('rank', 0.048), ('perplexities', 0.046), ('compression', 0.045), ('monolingual', 0.045), ('additional', 0.044), ('english', 0.042), ('simplifications', 0.041), ('rare', 0.04), ('articles', 0.04), ('lexical', 0.04), ('models', 0.039), ('aligned', 0.037), ('translation', 0.036), ('vocabulary', 0.036), ('lapata', 0.036), ('data', 0.035), ('eickhoff', 0.035), ('leroy', 0.035), ('model', 0.034), ('sentences', 0.034), ('overlap', 0.032), ('combined', 0.032), ('unigrams', 0.032), ('article', 0.031), ('chandrasekar', 0.031), ('adding', 0.03), ('domain', 0.03), ('trigrams', 0.029), ('amount', 0.029), ('zhu', 0.029), ('text', 0.029), ('yatskar', 0.027), ('turner', 0.027), ('jelinek', 0.027), ('increasing', 0.027), ('beneficial', 0.026), ('cat', 0.026), ('biran', 0.026), ('landis', 0.026), ('napoles', 0.026), ('modeling', 0.025), ('aligning', 0.025), ('appending', 0.025), ('cohn', 0.024), ('achieved', 0.024), ('domains', 0.024), ('seen', 0.023), ('rj', 0.023), ('daume', 0.023), ('sentence', 0.022), ('unseen', 0.022), ('si', 0.022), ('discrepancy', 0.022), ('banko', 0.022), ('combining', 0.021), ('techniques', 0.021), ('understand', 0.021), ('utilizing', 0.021), ('marcu', 0.021), ('modest', 0.021), ('proportions', 0.021), ('improves', 0.02), ('contained', 0.02), ('roark', 0.02), ('pan', 0.02), ('ple', 0.02), ('ronald', 0.02), ('simplicity', 0.02), ('ranking', 0.019), ('though', 0.019), ('closed', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

Author: David Kauchak

2 0.20028149 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.

Author: Matthew Shardlow

Abstract: Identifying complex words (CWs) is an important, yet often overlooked, task within lexical simplification (The process of automatically replacing CWs with simpler alternatives). If too many words are identified then substitutions may be made erroneously, leading to a loss of meaning. If too few words are identified then those which impede a user’s understanding may be missed, resulting in a complex final text. This paper addresses the task of evaluating different methods for CW identification. A corpus of sentences with annotated CWs is mined from Simple Wikipedia edit histories, which is then used as the basis for several experiments. Firstly, the corpus design is explained and the results of the validation experiments using human judges are reported. Experiments are carried out into the CW identification techniques of: simplifying everything, frequency thresholding and training a support vector machine. These are based upon previous approaches to the task and show that thresholding does not perform significantly differently to the more na¨ ıve technique of simplifying everything. The support vector machine achieves a slight increase in precision over the other two methods, but at the cost of a dramatic trade off in recall.

3 0.18419945 322 acl-2013-Simple, readable sub-sentences

Author: Sigrid Klerke ; Anders Sgaard

Abstract: We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.

4 0.11235111 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Author: Wenbin Jiang ; Meng Sun ; Yajuan Lu ; Yating Yang ; Qun Liu

Abstract: Structural information in web text provides natural annotations for NLP problems such as word segmentation and parsing. In this paper we propose a discriminative learning algorithm to take advantage of the linguistic knowledge in large amounts of natural annotations on the Internet. It utilizes the Internet as an external corpus with massive (although slight and sparse) natural annotations, and enables a classifier to evolve on the large-scaled and real-time updated web text. With Chinese word segmentation as a case study, experiments show that the segmenter enhanced with the Chinese wikipedia achieves sig- nificant improvement on a series of testing sets from different domains, even with a single classifier and local features.

5 0.096550584 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

Author: Shachar Mirkin ; Sriram Venkatapathy ; Marc Dymetman ; Ioan Calapodescu

Abstract: The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s ap- proval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals.

6 0.087602116 325 acl-2013-Smoothed marginal distribution constraints for language modeling

7 0.085642695 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

8 0.079243779 381 acl-2013-Variable Bit Quantisation for LSH

9 0.073655762 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance

10 0.072843306 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web

11 0.070669889 224 acl-2013-Learning to Extract International Relations from Political Context

12 0.065223396 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

13 0.061032597 39 acl-2013-Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors

14 0.058476284 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

15 0.057605721 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

16 0.056586813 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling

17 0.054148391 300 acl-2013-Reducing Annotation Effort for Quality Estimation via Active Learning

18 0.052743472 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation

19 0.050491679 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation

20 0.050046246 250 acl-2013-Models of Translation Competitions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.144), (1, -0.021), (2, 0.04), (3, -0.015), (4, 0.031), (5, -0.034), (6, 0.017), (7, 0.003), (8, -0.038), (9, 0.017), (10, -0.031), (11, 0.014), (12, -0.066), (13, 0.023), (14, -0.096), (15, -0.017), (16, -0.008), (17, -0.008), (18, -0.026), (19, -0.012), (20, 0.048), (21, -0.028), (22, -0.013), (23, 0.037), (24, -0.021), (25, -0.024), (26, 0.029), (27, 0.022), (28, 0.031), (29, 0.006), (30, -0.089), (31, -0.002), (32, -0.052), (33, 0.083), (34, 0.046), (35, -0.078), (36, 0.157), (37, 0.024), (38, -0.057), (39, -0.041), (40, -0.153), (41, -0.004), (42, -0.195), (43, 0.054), (44, -0.071), (45, 0.118), (46, -0.024), (47, -0.125), (48, -0.045), (49, 0.168)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93570518 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

Author: David Kauchak

2 0.78657186 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.

Author: Matthew Shardlow

3 0.7733922 322 acl-2013-Simple, readable sub-sentences

Author: Sigrid Klerke ; Anders Sgaard

4 0.68980455 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation

Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn

Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.

5 0.62522042 325 acl-2013-Smoothed marginal distribution constraints for language modeling

Author: Brian Roark ; Cyril Allauzen ; Michael Riley

Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1

6 0.58222485 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance

7 0.54826409 381 acl-2013-Variable Bit Quantisation for LSH

8 0.5218637 390 acl-2013-Word surprisal predicts N400 amplitude during reading

9 0.51544309 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

10 0.46728611 64 acl-2013-Automatically Predicting Sentence Translation Difficulty

11 0.4436368 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

12 0.43650135 225 acl-2013-Learning to Order Natural Language Texts

13 0.43407542 371 acl-2013-Unsupervised joke generation from big data

14 0.41911876 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl

15 0.40861338 250 acl-2013-Models of Translation Competitions

16 0.40723339 257 acl-2013-Natural Language Models for Predicting Programming Comments

17 0.39402026 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation

18 0.39160511 135 acl-2013-English-to-Russian MT evaluation campaign

19 0.38998359 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling

20 0.3873117 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.04), (6, 0.054), (11, 0.055), (15, 0.019), (22, 0.19), (24, 0.039), (26, 0.056), (29, 0.015), (35, 0.098), (42, 0.058), (48, 0.061), (70, 0.053), (88, 0.025), (90, 0.03), (95, 0.085)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78806651 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Author: Young-Bum Kim ; Benjamin Snyder

Abstract: In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsu- pervised accuracy of 89% across the same set of languages.

same-paper 2 0.78442115 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data

Author: David Kauchak

3 0.78224885 292 acl-2013-Question Classification Transfer

Author: Anne-Laure Ligozat

Abstract: Question answering systems have been developed for many languages, but most resources were created for English, which can be a problem when developing a system in another language such as French. In particular, for question classification, no labeled question corpus is available for French, so this paper studies the possibility to use existing English corpora and transfer a classification by translating the question and their labels. By translating the training corpus, we obtain results close to a monolingual setting.

4 0.70172292 275 acl-2013-Parsing with Compositional Vector Grammars

Author: Richard Socher ; John Bauer ; Christopher D. Manning ; Ng Andrew Y.

Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

5 0.7008152 172 acl-2013-Graph-based Local Coherence Modeling

Author: Camille Guinaudeau ; Michael Strube

Abstract: We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.

6 0.69680017 62 acl-2013-Automatic Term Ambiguity Detection

7 0.69548213 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

8 0.69517559 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

9 0.69496787 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

10 0.69422412 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

11 0.69385999 250 acl-2013-Models of Translation Competitions

12 0.69384915 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

13 0.69315523 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

14 0.69242322 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

15 0.69240391 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

16 0.69206899 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

17 0.69163108 318 acl-2013-Sentiment Relevance

18 0.69142699 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

19 0.69134527 264 acl-2013-Online Relative Margin Maximization for Statistical Machine Translation

20 0.69127113 175 acl-2013-Grounded Language Learning from Video Described with Sentences