emnlp emnlp2012 emnlp2012-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qing Dou ; Kevin Knight
Abstract: We apply slice sampling to Bayesian decipherment and use our new decipherment framework to improve out-of-domain machine translation. Compared with the state of the art algorithm, our approach is highly scalable and produces better results, which allows us to decipher ciphertext with billions of tokens and hundreds of thousands of word types with high accuracy. We decipher a large amount ofmonolingual data to improve out-of-domain translation and achieve significant gains of up to 3.8 BLEU points.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We apply slice sampling to Bayesian decipherment and use our new decipherment framework to improve out-of-domain machine translation. [sent-2, score-1.149]
2 Compared with the state of the art algorithm, our approach is highly scalable and produces better results, which allows us to decipher ciphertext with billions of tokens and hundreds of thousands of word types with high accuracy. [sent-3, score-0.491]
3 We decipher a large amount ofmonolingual data to improve out-of-domain translation and achieve significant gains of up to 3. [sent-4, score-0.175]
4 1 Introduction Nowadays, state of the art statistical machine translation (SMT) systems are built using large amounts of bilingual parallel corpora. [sent-6, score-0.107]
5 Those corpora are used to estimate probabilities of word-to-word translation, word sequences rearrangement, and even syntactic transformation. [sent-7, score-0.031]
6 Unfortunately, as parallel corpora are expensive and not available for every domain, performance of SMT systems drops significantly when translating out-of-domain texts (Callison-Burch et al. [sent-8, score-0.062]
7 In general, it is easier to obtain in-domain monolingual corpora. [sent-10, score-0.05]
8 Is it possible to use domain specific monolingual data to improve an MT system trained on parallel texts from a different domain? [sent-11, score-0.138]
9 Some researchers have attempted to do this by adding a domain specific dictionary (Wu et al. [sent-12, score-0.047]
10 , 2008), or mining unseen words (Daum ´e and Jagarlamudi, 2011) using one of several translation lexicon induction techniques (Haghighi et al. [sent-13, score-0.075]
11 However, a dictionary is not always available, and it is difficult to assign probabilities to a translation lexicon. [sent-15, score-0.108]
12 (Ravi and Knight, 2011b) have shown that one can use decipherment to learn a full translation model from non-parallel data. [sent-16, score-0.48]
13 Their approach is able to find translations, and assign probabilities to them. [sent-17, score-0.031]
14 First of all, the corpus they use to build the translation system has a very small vocabulary. [sent-19, score-0.056]
15 Secondly, although their algorithm is able to handle word substitution ciphers with limited vocabulary, its deciphering accuracy is low. [sent-20, score-0.742]
16 The contributions of this work are: • • We improve previous decipherment work by introducing a more oefufsicdieecnitp sampling algorithm. [sent-21, score-0.536]
17 In experiments, our new method improves deciphering accuracy from 82. [sent-22, score-0.197]
18 1% on (Ravi and Knight, 2011b)’s domain specific data set. [sent-24, score-0.026]
19 Furthermore, we also solve a very large word substitution cipher built from the English Gigaword corpus and achieve 92. [sent-25, score-0.823]
20 With the ability to handle a much larger vocabulary, we alebialrinty a doo hmanadinle specific t rlaanrgselrat vioonc atba-ble from a large amount of monolingual data and use the translation table to improve out-ofdomain machine translation. [sent-27, score-0.157]
21 In experiments, we observe significant gains of up to 3. [sent-28, score-0.02]
22 Unlike previous works, the translation table we build from monolingual data do not only contain unseen words but also words seen in parallel data. [sent-30, score-0.155]
23 lc L2a0n1g2ua Agseso Pcrioactieosnsi fnogr a Cnodm Cpoumtaptiuotna tilo Lnianlg Nuaist uircasl 2 Word Substitution Ciphers Before we present our new decipherment framework, we quickly review word substitution decipherment. [sent-33, score-0.683]
24 Recently, there has been an increasing interest in decipherment work (Ravi and Knight, 2011a; Ravi and Knight, 2008). [sent-34, score-0.424]
25 While letter substitution ciphers can be solved easily, nobody has been able to solve a word substitution cipher with high accuracy. [sent-35, score-1.41]
26 As shown in Figure 1, a word substitution cipher is generated by replacing each word in a natural language (plaintext) sequence with a cipher token according to a substitution table. [sent-36, score-1.542]
27 The mapping in the table is deterministic each plaintext word type is only encoded with one unique cipher token. [sent-37, score-0.823]
28 Solving a word substitution cipher means recovering the original plaintext from the ciphertext without knowing the substitution table. [sent-38, score-1.591]
29 The only thing we rely on is knowledge about the underlying language. [sent-39, score-0.019]
30 – Figure 1: Encoding and Decipherment of a Word Substitution Cipher How can we solve a word substitution cipher? [sent-40, score-0.329]
31 The approach is similar to those taken by cryptanalysts who try to recover keys that convert encrypted texts to readable texts. [sent-41, score-0.079]
32 Suppose we observe a large cipher string f and want to decipher it into English e. [sent-42, score-0.626]
33 We can follow the work in (Ravi and Knight, 2011b) and assume that the cipher string f is generated in the following way: • • Generate English plaintext sequence e1, e2. [sent-43, score-0.856]
34 267 e = Replace each English plaintext token ei with a cipher etok eeanch fi nwgilthis probability P(fi |ei). [sent-47, score-1.037]
35 Based on the above generative story, we write the probability of the cipher string f as: = ∑P(e) ·∏Pθ(fi|ei) ∏n P(f)θ ∑e ∏i (1) We use this equation as an objective function for maximum likelihood training. [sent-48, score-0.597]
36 In the equation, P(e) is given by an ngram language model, which is trained using a large amount of monolingual texts. [sent-49, score-0.096]
37 The rest of the task is to manipulate channel probabilities Pθ(fi |ei) so that the probability of the observed texts P(f)θ is maximized. [sent-50, score-0.191]
38 , 2006), or Bayesian decipherment (Ravi and Knight, 2011a) to solve the problem. [sent-52, score-0.494]
39 However, unlike letter substitution ciphers, word substitution ciphers pose much greater challenges to algorithm scalability. [sent-53, score-0.837]
40 In the world of word substitution ciphers, both V and N are very large, making these approaches impractical. [sent-55, score-0.259]
41 However, the modified algorithms are only an approximation of the original algorithms and produce poor deciphering accuracy, and they are still unable to handle very large scale ciphers. [sent-57, score-0.241]
42 To address the above problems, we propose the following two new improvements to previous decipherment methods. [sent-58, score-0.424]
43 • We apply slice sampling (Neal, 2000) to scale up et oa ciphers wei stham a very large vocabulary. [sent-59, score-0.586]
44 Instead of deciphering using the original ciphertext, we becriepahke trhineg ciphertext ein otori bigrams, collect their counts, and use the bigrams with their counts for decipherment. [sent-60, score-0.488]
45 The new improvements allow us to solve a word substitution cipher with billions of tokens and hundreds of thousands of word types. [sent-61, score-0.964]
46 Through better approximation, we achieve a significant increase in deciphering accuracy. [sent-62, score-0.197]
47 • 3 Slice Sampling for Bayesian Decipherment In this section, we first give an introduction to Bayesian decipherment and then describe how to use slice sampling for it. [sent-64, score-0.725]
48 It is very attractive for problems like word substitution ciphers for the following reasons. [sent-68, score-0.541]
49 First, there are no memory bottlenecks as compared to EM, which has an O(N · V2) space complexity. [sent-69, score-0.028]
50 Each sampling operation involves changing a plaintext token ei, which has V possible choices, where V is the plaintext vocabulary size, and the final sam- ple is chosen with probability∑nVP=1(dP)(d). [sent-73, score-0.898]
51 2 Slice Sampling With Gibbs sampling, one has to evaluate all possible plaintext word types (10k—1M) for each sample decision. [sent-75, score-0.38]
52 This become intractable when the vocabulary is large and the ciphertext is long. [sent-76, score-0.277]
53 Slice sampling (Neal, 2000) can solve this problem by automatically adjusting the number of samples to be considered for each sampling operation. [sent-77, score-0.388]
54 Suppose the derivation probability for current sample is P(current s). [sent-78, score-0.135]
55 Then slice sampling draws a sample in two steps: • • Select a threshold T uniformly from the range {0, P(current s)}. [sent-79, score-0.419]
56 Draw a new sample new s uniformly from a pool wo fa c naenwdid saatmesp: {new s|P(new s) > T}. [sent-80, score-0.111]
57 From the above two steps, we can see that given a threshold T, we only need to consider those samples whose probability is higher than the threshold. [sent-81, score-0.136]
58 This will lead to a significant reduction on the number of samples to be considered, if probabilities of the most samples are below T. [sent-82, score-0.175]
59 An obvious way to collect candidate samples is to go over all possible samples and record those with probabilities higher than T. [sent-84, score-0.195]
60 According to Equation 1, the probability of the current sample is given by a language model P(e) and a channel model P(c|e). [sent-87, score-0.199]
61 The language model aisn usually an ngram language mThoede lal. [sent-88, score-0.072]
62 n Suppose our current sample current s contains English tokens X, Y , and Z at position i− 1, i, and i+ 1respectively. [sent-89, score-0.111]
63 aLnedt ci abte p tohsei cipher t1o ,ke i,n aantd position si. [sent-90, score-0.594]
64 p eTcoobtain a new sample, we just need to change token Y to Y′. [sent-91, score-0.036]
65 Since the rest of the sample stays the same, we only need to calculate the probability of any tri- gram 1: P(XY′Z) and the channel model probability: P(ci |Y′), and multiply them together as shown in Equation 4. [sent-92, score-0.245]
66 P(XY′Z) · P(ci|Y′) (4) 1The probability is given by a bigram language model. [sent-93, score-0.037]
67 In slice sampling, each sampling operation has two steps. [sent-94, score-0.324]
68 For the first step, we choose a threshold T uniformly between 0 and P(XY Z) · P(ci |Y ). [sent-95, score-0.067]
69 First, we notice that two types of Y′ are more likely to pass the threshold T: (1) Those that have a very high trigram probability , and (2) those that have high channel model probability. [sent-97, score-0.183]
70 To find candidates that have high trigram probability, we build sorted lists ranked by P(XY′Z), which can be precomputed off-line. [sent-98, score-0.087]
71 We only keep the top K English words for each of the sorted list. [sent-99, score-0.031]
wordName wordTfidf (topN-words)
[('cipher', 0.494), ('decipherment', 0.424), ('plaintext', 0.329), ('ciphers', 0.263), ('substitution', 0.259), ('ciphertext', 0.23), ('deciphering', 0.197), ('slice', 0.189), ('ravi', 0.146), ('knight', 0.142), ('sampling', 0.112), ('decipher', 0.099), ('channel', 0.091), ('xy', 0.089), ('ei', 0.088), ('bayesian', 0.086), ('ci', 0.072), ('samples', 0.072), ('solve', 0.07), ('translation', 0.056), ('fi', 0.053), ('sample', 0.051), ('billions', 0.051), ('monolingual', 0.05), ('vocabulary', 0.047), ('ngram', 0.046), ('neal', 0.042), ('uniformly', 0.04), ('hundreds', 0.038), ('suppose', 0.038), ('probability', 0.037), ('letter', 0.037), ('token', 0.036), ('gibbs', 0.035), ('equation', 0.033), ('string', 0.033), ('texts', 0.032), ('thousands', 0.032), ('probabilities', 0.031), ('sorted', 0.031), ('parallel', 0.03), ('nvp', 0.028), ('doo', 0.028), ('bottlenecks', 0.028), ('eisx', 0.028), ('aantd', 0.028), ('nobody', 0.028), ('precomputed', 0.028), ('trigram', 0.028), ('threshold', 0.027), ('derivation', 0.027), ('smt', 0.027), ('domain', 0.026), ('em', 0.026), ('readable', 0.026), ('jagarlamudi', 0.026), ('rapp', 0.026), ('nod', 0.026), ('dou', 0.026), ('aisn', 0.026), ('nowadays', 0.026), ('nthe', 0.026), ('stays', 0.024), ('southern', 0.024), ('crp', 0.024), ('cache', 0.024), ('qing', 0.024), ('operation', 0.023), ('handle', 0.023), ('bleu', 0.023), ('draw', 0.023), ('ple', 0.022), ('tthe', 0.022), ('adjusting', 0.022), ('oa', 0.022), ('keys', 0.021), ('multiply', 0.021), ('gram', 0.021), ('ein', 0.021), ('approximation', 0.021), ('art', 0.021), ('dictionary', 0.021), ('tokens', 0.02), ('bigrams', 0.02), ('gains', 0.02), ('recovering', 0.02), ('pool', 0.02), ('po', 0.02), ('fortunately', 0.02), ('current', 0.02), ('collect', 0.02), ('unseen', 0.019), ('satisfies', 0.019), ('sv', 0.019), ('thing', 0.019), ('fm', 0.019), ('attractive', 0.019), ('pose', 0.019), ('theoretically', 0.018), ('yk', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We apply slice sampling to Bayesian decipherment and use our new decipherment framework to improve out-of-domain machine translation. Compared with the state of the art algorithm, our approach is highly scalable and produces better results, which allows us to decipher ciphertext with billions of tokens and hundreds of thousands of word types with high accuracy. We decipher a large amount ofmonolingual data to improve out-of-domain translation and achieve significant gains of up to 3.8 BLEU points.
2 0.076546058 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
Author: Simon Carter ; Marc Dymetman ; Guillaume Bouchard
Abstract: We present a method for exact optimization and sampling from high order Hidden Markov Models (HMMs), which are generally handled by approximation techniques. Motivated by adaptive rejection sampling and heuristic search, we propose a strategy based on sequentially refining a lower-order language model that is an upper bound on the true model we wish to decode and sample from. This allows us to build tractable variable-order HMMs. The ARPA format for language models is extended to enable an efficient use of the max-backoff quantities required to compute the upper bound. We evaluate our approach on two problems: a SMS-retrieval task and a POS tagging experiment using 5-gram models. Results show that the same approach can be used for exact optimization and sampling, while explicitly constructing only a fraction of the total implicit state-space.
3 0.066804618 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
4 0.050666336 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Author: Dan Garrette ; Jason Baldridge
Abstract: Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MINGREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to per- formance over the original MIN-GREEDY algorithm for both English and Italian data.
5 0.04409343 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
Author: Robert Lindsey ; William Headden ; Michael Stipicevic
Abstract: Topic models traditionally rely on the bagof-words assumption. In data mining applications, this often results in end-users being presented with inscrutable lists of topical unigrams, single words inferred as representative of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultaneously infers the location, length, and topic of phrases within a corpus and relaxes the bagof-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for approximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases than do competing models.
6 0.043092918 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
7 0.043068092 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
8 0.042181898 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
9 0.039513268 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
10 0.037395779 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
11 0.035934739 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
12 0.034108344 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
13 0.031888872 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
14 0.031437255 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
15 0.031272043 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
16 0.031070322 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
17 0.030583374 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
18 0.028335176 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
19 0.027703699 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
20 0.027693767 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
topicId topicWeight
[(0, 0.103), (1, -0.035), (2, -0.037), (3, 0.016), (4, -0.083), (5, 0.005), (6, -0.003), (7, -0.023), (8, 0.053), (9, -0.052), (10, 0.081), (11, 0.063), (12, 0.014), (13, -0.046), (14, -0.115), (15, 0.046), (16, -0.05), (17, 0.005), (18, -0.001), (19, 0.126), (20, -0.068), (21, 0.08), (22, -0.055), (23, 0.099), (24, 0.065), (25, 0.004), (26, -0.034), (27, 0.03), (28, -0.011), (29, -0.066), (30, -0.001), (31, 0.057), (32, -0.254), (33, 0.018), (34, -0.149), (35, -0.006), (36, -0.228), (37, 0.045), (38, -0.037), (39, 0.066), (40, -0.007), (41, 0.006), (42, -0.052), (43, 0.123), (44, 0.04), (45, 0.144), (46, -0.04), (47, -0.087), (48, 0.086), (49, 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.94929647 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We apply slice sampling to Bayesian decipherment and use our new decipherment framework to improve out-of-domain machine translation. Compared with the state of the art algorithm, our approach is highly scalable and produces better results, which allows us to decipher ciphertext with billions of tokens and hundreds of thousands of word types with high accuracy. We decipher a large amount ofmonolingual data to improve out-of-domain translation and achieve significant gains of up to 3.8 BLEU points.
2 0.56626475 43 emnlp-2012-Exact Sampling and Decoding in High-Order Hidden Markov Models
Author: Simon Carter ; Marc Dymetman ; Guillaume Bouchard
Abstract: We present a method for exact optimization and sampling from high order Hidden Markov Models (HMMs), which are generally handled by approximation techniques. Motivated by adaptive rejection sampling and heuristic search, we propose a strategy based on sequentially refining a lower-order language model that is an upper bound on the true model we wish to decode and sample from. This allows us to build tractable variable-order HMMs. The ARPA format for language models is extended to enable an efficient use of the max-backoff quantities required to compute the upper bound. We evaluate our approach on two problems: a SMS-retrieval task and a POS tagging experiment using 5-gram models. Results show that the same approach can be used for exact optimization and sampling, while explicitly constructing only a fraction of the total implicit state-space.
3 0.47003004 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification
Author: Shoushan Li ; Shengfeng Ju ; Guodong Zhou ; Xiaojun Li
Abstract: Active learning is a promising way for sentiment classification to reduce the annotation cost. In this paper, we focus on the imbalanced class distribution scenario for sentiment classification, wherein the number of positive samples is quite different from that of negative samples. This scenario posits new challenges to active learning. To address these challenges, we propose a novel active learning approach, named co-selecting, by taking both the imbalanced class distribution issue and uncertainty into account. Specifically, our co-selecting approach employs two feature subspace classifiers to collectively select most informative minority-class samples for manual annotation by leveraging a certainty measurement and an uncertainty measurement, and in the meanwhile, automatically label most informative majority-class samples, to reduce humanannotation efforts. Extensive experiments across four domains demonstrate great potential and effectiveness of our proposed co-selecting approach to active learning for imbalanced sentiment classification. 1
4 0.38904604 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
Author: Abby Levenberg ; Chris Dyer ; Phil Blunsom
Abstract: We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex translational correspondences— including discontiguous, many-to-many alignments—and produces competitive translation results. Further, inference is efficient and we present results on significantly larger corpora than prior work.
5 0.37759802 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
Author: Mengqiu Wang ; Christopher D. Manning
Abstract: Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping and cross alignments that cannot be captured by standard edit distance models. Our models can easily incorporate a rich set of linguistic features, and automatically learn their weights, eliminating the need for ad-hoc parameter tuning. Our methods achieve state-of-the-art correlation with human judgments on two different prediction tasks across a diverse set of standard evaluations (NIST OpenMT06,08; WMT0608).
6 0.36130881 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
7 0.35022837 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation
8 0.29661819 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
9 0.27699122 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
10 0.2433157 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
11 0.23589785 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
12 0.23428801 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
13 0.20652346 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
14 0.20547572 50 emnlp-2012-Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level
15 0.20251739 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
16 0.17992505 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
17 0.17593794 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
18 0.17574224 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
19 0.17250216 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
20 0.17109238 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
topicId topicWeight
[(14, 0.02), (16, 0.03), (21, 0.389), (25, 0.014), (34, 0.078), (45, 0.01), (60, 0.063), (63, 0.03), (65, 0.029), (70, 0.023), (74, 0.081), (76, 0.026), (79, 0.014), (80, 0.032), (81, 0.011), (86, 0.024), (95, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.67137617 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
Author: Qing Dou ; Kevin Knight
Abstract: We apply slice sampling to Bayesian decipherment and use our new decipherment framework to improve out-of-domain machine translation. Compared with the state of the art algorithm, our approach is highly scalable and produces better results, which allows us to decipher ciphertext with billions of tokens and hundreds of thousands of word types with high accuracy. We decipher a large amount ofmonolingual data to improve out-of-domain translation and achieve significant gains of up to 3.8 BLEU points.
2 0.33180246 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
Author: Kewei Tu ; Vasant Honavar
Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.
3 0.33084652 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis
Author: Shafiq Joty ; Giuseppe Carenini ; Raymond Ng
Abstract: We propose a complete probabilistic discriminative framework for performing sentencelevel discourse analysis. Our framework comprises a discourse segmenter, based on a binary classifier, and a discourse parser, which applies an optimal CKY-like parsing algorithm to probabilities inferred from a Dynamic Conditional Random Field. We show on two corpora that our approach outperforms the state-of-the-art, often by a wide margin.
4 0.32289937 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
5 0.32097945 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou
Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1
6 0.32097208 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules
7 0.31906351 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
8 0.3147772 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
9 0.31296921 54 emnlp-2012-Forced Derivation Tree based Model Training to Statistical Machine Translation
10 0.31227592 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
11 0.31149411 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
12 0.31010193 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
13 0.3085537 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
14 0.3085508 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering
15 0.30707315 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
16 0.30699992 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
17 0.3069545 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
18 0.30636373 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
19 0.30591461 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts
20 0.30519342 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction