acl acl2013 acl2013-308 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
Reference: text
sentIndex sentText sentNum sentScore
1 gmai l Abstract We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. [sent-8, score-0.155]
2 8 days, we built an unpruned model on 126 billion tokens. [sent-11, score-0.242]
3 Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7. [sent-14, score-0.086]
4 1 Introduction Relatively low perplexity has made modified Kneser-Ney smoothing (Kneser and Ney, 1995; Chen and Goodman, 1998) a popular choice for language modeling. [sent-18, score-0.198]
5 However, existing estimation methods require either large amounts of RAM (Stolcke, 2002) or machines (Brants et al. [sent-19, score-0.137]
6 As a result, practitioners have chosen to use less data (Callison-Burch et al. [sent-21, score-0.048]
7 Backoff-smoothed n-gram language models (Katz, 1987) assign probability to a word wn in context according to the recursive equation w1n−1 p(wn|wn1−1) =(bp( wwnn1−|w1)1np−(1w),n i|wf wn2)1n,w otahse sreweinse The task is to estimate probability p and backoff b from text for each seen entry w1n. [sent-24, score-0.345]
8 Figure 1: Each MapReduce performs three copies over the network when only one is required. [sent-33, score-0.254]
9 Both options use local disk within each reducer for merge sort. [sent-37, score-0.417]
10 contributes an efficient multi-pass streaming algorithm using disk and a user-specified amount of RAM. [sent-38, score-0.371]
11 (2007) showed how to estimate Kneser-Ney models with a series of five MapReduces (Dean and Ghemawat, 2004). [sent-40, score-0.061]
12 On 3 1 billion words, estimation took 400 machines for two days. [sent-41, score-0.227]
13 Recently, Google estimated a pruned Kneser-Ney model on 230 billion words (Chelba and Schalkwyk, 2013), though no cost was provided. [sent-42, score-0.194]
14 Each MapReduce consists of one layer of mappers and an optional layer of reducers. [sent-43, score-0.353]
15 Mappers read from a network filesystem, perform optional processing, and route data to reducers. [sent-44, score-0.244]
16 Reducers process input and write to a network filesystem. [sent-45, score-0.201]
17 Ideally, reducers would send data directly to another layer of reducers, but this is not supported. [sent-46, score-0.444]
18 Their workaround, a series of MapReduces, performs unnecessary copies over the network (Figure 1). [sent-47, score-0.254]
19 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgices 690–696, Writing and reading from the distributed filesystem improves fault tolerance. [sent-51, score-0.528]
20 However, the same level of fault tolerance could be achieved by checkpointing to the network filesystem then only reading in the case of failures. [sent-52, score-0.641]
21 Doing so would enable reducers to start processing without waiting for the network filesystem to write all the data. [sent-53, score-0.869]
22 (2013) identify several problems with the scaleout approach of distributed computation and put forward several scenarios in which a single machine scale-up approach is more cost effective in terms of both raw performance and performance per dollar. [sent-56, score-0.056]
23 (2007) contributed Stupid Backoff, a simpler form of smoothing calculated at runtime from counts. [sent-58, score-0.149]
24 We agree that Stupid Backoff is cheaper to estimate, but contend that this work makes Kneser-Ney smoothing cheap enough. [sent-61, score-0.203]
25 , 2012), we showed how to collapse probability and backoff into a single value without changing sentence-level probabilities. [sent-64, score-0.27]
26 However, local scores do change and, like Stupid Backoff, are no longer probabilities. [sent-65, score-0.047]
27 , 2007) aims to scalably estimate language models on a single machine. [sent-67, score-0.112]
28 Counting is performed with streaming algorithms similarly to this work. [sent-68, score-0.181]
29 Their parallel merge sort also has the potential to be faster than ours. [sent-69, score-0.169]
30 The biggest difference is that their pipeline delays some computation (part of normalization and all of interpolation) until query time. [sent-70, score-0.197]
31 This means that it cannot produce a standard ARPA file and that more time and memory are required at query time. [sent-71, score-0.104]
32 Moreover, they use memory mapping on entire files and these files may be larger than physical RAM. [sent-72, score-0.251]
33 We have found that, even with mostlysequential access, memory mapping is slower because the kernel does not explicitly know where to read ahead or write behind. [sent-73, score-0.301]
34 In contrast, we use dedicated threads for reading and writing. [sent-74, score-0.172]
35 Performance comparisons are omitted because we were unable to compile and run MSRLM on recent versions of Linux. [sent-75, score-0.051]
36 SRILM (Stolcke, 2002) estimates modified Kneser-Ney models by storing n-grams in RAM. [sent-76, score-0.094]
37 Corpus Counting SummingIDAnidtvejriuspsito inl agtio Cnounts Model Figure 2: Data flow in the estimation pipeline. [sent-77, score-0.154]
38 Normalization has two threads per order: summing and division. [sent-78, score-0.072]
39 It also offers a disk-based pipeline for initial steps (i. [sent-80, score-0.118]
40 However, the later steps store all n-grams that survived count pruning in RAM. [sent-83, score-0.142]
41 , 2008) does not implement modified Kneser-Ney but rather an approximation dubbed “improved Kneser-Ney” (or “modified shift-beta” depending on the version). [sent-86, score-0.142]
42 3 Estimation Pipeline Estimation has four streaming passes: counting, adjusting counts, normalization, and interpolation. [sent-89, score-0.257]
43 1 Counting For a language model of order N, this step counts all N-grams (with length exactly N) by streaming through the corpus. [sent-93, score-0.298]
44 Words near the beginning of sentence also form N-grams padded by the marker (possibly repeated multiple times). [sent-94, score-0.12]
45 The end of sentence marker is appended to each sentence and acts like a normal token. [sent-95, score-0.108]
46 Unpruned N-gram counts are sufficient, so lower-order n-grams (n < N) are not counted. [sent-96, score-0.117]
47 Even pruned models require unpruned N-gram counts to compute smoothing statistics. [sent-97, score-0.431]
48 1 Token strings are written to disk and a 64-bit Mur1This hash table is the only part of the pipeline that can grow. [sent-99, score-0.38]
49 Users can specify an estimated vocabulary size for memory budgeting. [sent-100, score-0.15]
50 In future work, we plan to support local vocabularies with renumbering. [sent-101, score-0.047]
51 691 Suffix 3 2 1 Context 2 1 3 ZB AB BABZBZ B A BA Figure 3: In suffix order, the last word is primary. [sent-102, score-0.059]
52 Counts are combined in a hash table and spilled to disk when a fixed amount of memory is full. [sent-105, score-0.405]
53 Merge sort also combines identical N-grams (Bitton and DeWitt, 1983). [sent-106, score-0.051]
54 2 Adjusting Counts The counts c are replaced with adjusted counts a. [sent-108, score-0.234]
wordName wordTfidf (topN-words)
[('filesystem', 0.31), ('reducers', 0.31), ('stupid', 0.31), ('backoff', 0.228), ('disk', 0.19), ('streaming', 0.181), ('mapreduce', 0.167), ('ram', 0.152), ('unpruned', 0.152), ('counting', 0.13), ('network', 0.129), ('copies', 0.125), ('mapreduces', 0.124), ('msrlm', 0.124), ('brants', 0.123), ('counts', 0.117), ('hash', 0.111), ('mappers', 0.11), ('memory', 0.104), ('smoothing', 0.104), ('fault', 0.101), ('modified', 0.094), ('estimation', 0.094), ('billion', 0.09), ('layer', 0.086), ('pipeline', 0.079), ('heafield', 0.078), ('merge', 0.078), ('adjusting', 0.076), ('passes', 0.072), ('threads', 0.072), ('write', 0.072), ('optional', 0.071), ('arrows', 0.071), ('normalization', 0.067), ('marker', 0.065), ('ivan', 0.064), ('cmu', 0.062), ('estimate', 0.061), ('reading', 0.061), ('flow', 0.06), ('suffix', 0.059), ('pruned', 0.058), ('distributed', 0.056), ('wn', 0.056), ('trillion', 0.055), ('penultimate', 0.055), ('dewitt', 0.055), ('yandex', 0.055), ('padded', 0.055), ('workaround', 0.055), ('zb', 0.055), ('crichton', 0.055), ('ort', 0.055), ('schalkwyk', 0.055), ('pruning', 0.055), ('stolcke', 0.055), ('files', 0.054), ('options', 0.051), ('sort', 0.051), ('delays', 0.051), ('forbes', 0.051), ('ghemawat', 0.051), ('moscow', 0.051), ('reducer', 0.051), ('irstlm', 0.051), ('arpa', 0.051), ('compile', 0.051), ('contend', 0.051), ('chelba', 0.051), ('scalably', 0.051), ('survived', 0.048), ('send', 0.048), ('waiting', 0.048), ('pkoehn', 0.048), ('cheaper', 0.048), ('practitioners', 0.048), ('dubbed', 0.048), ('local', 0.047), ('estimated', 0.046), ('simpler', 0.045), ('read', 0.044), ('appended', 0.043), ('russia', 0.043), ('thick', 0.043), ('machines', 0.043), ('ahead', 0.042), ('collapse', 0.042), ('tolerance', 0.04), ('dean', 0.04), ('faster', 0.04), ('mapping', 0.039), ('steps', 0.039), ('scaled', 0.039), ('kneser', 0.039), ('mation', 0.039), ('dedicated', 0.039), ('uk', 0.038), ('avenue', 0.038), ('wf', 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
2 0.17943686 325 acl-2013-Smoothed marginal distribution constraints for language modeling
Author: Brian Roark ; Cyril Allauzen ; Michael Riley
Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1
Author: Heike Adel ; Ngoc Thang Vu ; Tanja Schultz
Abstract: In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can . be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model. Index Terms: multilingual speech processing, code switching, language modeling, recurrent neural networks, factored language models
4 0.070441402 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
Author: Sujith Ravi
Abstract: In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only nonparallel corpora. Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models. In order to perform fast, efficient Bayesian inference in this framework, we then derive a hash sampling strategy that is inspired by the work of Ahmed et al. (2012). The new translation hash sampler enables us to scale elegantly to complex models (for the first time) and large vocab- ulary/corpora sizes. We show empirical results on the OPUS data—our method yields the best BLEU scores compared to existing approaches, while achieving significant computational speedups (several orders faster). We also report for the first time—BLEU score results for a largescale MT task using only non-parallel data (EMEA corpus).
5 0.067985132 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
6 0.067735508 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
7 0.064774074 326 acl-2013-Social Text Normalization using Contextual Graph Random Walks
8 0.063371658 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
9 0.059902683 113 acl-2013-Derivational Smoothing for Syntactic Distributional Semantics
10 0.056781054 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
11 0.054416962 257 acl-2013-Natural Language Models for Predicting Programming Comments
12 0.053857788 251 acl-2013-Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce
13 0.051417269 37 acl-2013-Adaptive Parser-Centric Text Normalization
14 0.050491679 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
15 0.048194434 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
16 0.047984928 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
17 0.047064263 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
18 0.045767959 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance
19 0.044206612 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
20 0.042919062 219 acl-2013-Learning Entity Representation for Entity Disambiguation
topicId topicWeight
[(0, 0.115), (1, -0.017), (2, 0.038), (3, 0.003), (4, -0.007), (5, -0.023), (6, 0.03), (7, -0.0), (8, -0.024), (9, -0.012), (10, -0.049), (11, 0.0), (12, -0.016), (13, -0.08), (14, -0.029), (15, -0.058), (16, -0.068), (17, -0.009), (18, -0.005), (19, -0.093), (20, 0.02), (21, 0.011), (22, -0.001), (23, 0.012), (24, -0.011), (25, -0.075), (26, 0.054), (27, 0.009), (28, 0.014), (29, -0.024), (30, -0.051), (31, -0.063), (32, -0.115), (33, 0.056), (34, 0.03), (35, -0.111), (36, 0.088), (37, -0.064), (38, -0.028), (39, -0.007), (40, -0.088), (41, 0.02), (42, -0.11), (43, -0.006), (44, -0.033), (45, 0.058), (46, -0.041), (47, -0.031), (48, 0.049), (49, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.9409219 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
2 0.81803346 325 acl-2013-Smoothed marginal distribution constraints for language modeling
Author: Brian Roark ; Cyril Allauzen ; Michael Riley
Abstract: We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of wellknown Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff ngram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library.1
3 0.69342935 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
Author: David Kauchak
Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.
Author: Tze Yuang Chong ; Rafael E. Banchs ; Eng Siong Chng ; Haizhou Li
Abstract: In this paper, we explore the use of distance and co-occurrence information of word-pairs for language modeling. We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long history-contexts. Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively. Compared to the distant bigram, we show that word-pairs can be more effectively modeled in terms of both distance and occurrence. 1
Author: Heike Adel ; Ngoc Thang Vu ; Tanja Schultz
Abstract: In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can . be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model. Index Terms: multilingual speech processing, code switching, language modeling, recurrent neural networks, factored language models
6 0.55215317 390 acl-2013-Word surprisal predicts N400 amplitude during reading
7 0.54173315 322 acl-2013-Simple, readable sub-sentences
8 0.54147261 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.
9 0.50906205 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
10 0.50151354 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
11 0.42913568 37 acl-2013-Adaptive Parser-Centric Text Normalization
12 0.4174189 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach
13 0.40996602 257 acl-2013-Natural Language Models for Predicting Programming Comments
14 0.40643817 326 acl-2013-Social Text Normalization using Contextual Graph Random Walks
15 0.4058809 250 acl-2013-Models of Translation Competitions
16 0.40415898 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
17 0.39438981 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
18 0.39180937 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
19 0.39034587 349 acl-2013-The mathematics of language learning
20 0.3887296 381 acl-2013-Variable Bit Quantisation for LSH
topicId topicWeight
[(0, 0.086), (6, 0.016), (11, 0.053), (24, 0.032), (26, 0.06), (35, 0.074), (40, 0.339), (42, 0.073), (48, 0.044), (70, 0.027), (88, 0.043), (90, 0.022), (95, 0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.79922885 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
2 0.75983703 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
3 0.72547722 163 acl-2013-From Natural Language Specifications to Program Input Parsers
Author: Tao Lei ; Fan Long ; Regina Barzilay ; Martin Rinard
Abstract: We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0% F-Score accu- , racy compared to an F-Score of 66.7% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing).1
4 0.7148034 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
5 0.62818217 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
Author: Yuki Arase ; Ming Zhou
Abstract: We propose a method for automatically detecting low-quality Web-text translated by statistical machine translation (SMT) systems. We focus on the phrase salad phenomenon that is observed in existing SMT results and propose a set of computationally inexpensive features to effectively detect such machine-translated sentences from a large-scale Web-mined text. Unlike previous approaches that require bilingual data, our method uses only monolingual text as input; therefore it is applicable for refining data produced by a variety of Web-mining activities. Evaluation results show that the proposed method achieves an accuracy of 95.8% for sentences and 80.6% for text in noisy Web pages.
6 0.57797819 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
7 0.5144673 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
8 0.46291956 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
9 0.45828038 225 acl-2013-Learning to Order Natural Language Texts
10 0.4552702 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
11 0.45493928 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
12 0.4549222 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
13 0.45444223 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
14 0.45415568 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
15 0.45395157 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
16 0.45380637 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
17 0.45317268 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
18 0.45105225 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
19 0.45068818 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
20 0.45005363 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization