emnlp emnlp2013 emnlp2013-54 knowledge-graph by maker-knowledge-mining

54 emnlp-2013-Decipherment with a Million Random Restarts


Source: pdf

Author: Taylor Berg-Kirkpatrick ; Dan Klein

Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu , Abstract This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. [sent-3, score-0.752]

2 We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. [sent-4, score-1.337]

3 For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. [sent-5, score-0.392]

4 We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340. [sent-6, score-0.729]

5 1 Introduction What can a million restarts do for decipherment? [sent-7, score-0.474]

6 EM frequently gets stuck in local optima, so running between ten and a hundred random restarts is common practice (Knight et al. [sent-8, score-0.641]

7 But, how important are random restarts and how many random restarts does it take to saturate gains in accuracy? [sent-10, score-1.135]

8 We look at both Zodiac 408, a famous homophonic substitution cipher, and a more difficult homophonic cipher constructed to match properties of the famously unsolved Zodiac 340. [sent-12, score-1.034]

9 Gains in accuracy saturate after only a hundred random restarts for Zodiac 408, but for the constructed cipher we see large gains 874 in accuracy even as we scale the number of random restarts up into the hundred thousands. [sent-13, score-1.716]

10 In both cases the difference between few and many random restarts is the difference between almost complete failure and successful decipherment. [sent-14, score-0.543]

11 We also find that millions of random restarts can be helpful for performing exploratory analysis. [sent-15, score-0.575]

12 We look at some empirical properties of decipherment problems, visualizing the distribution of local op- tima encountered by EM both in a successful decipherment of a homophonic cipher and in an unsuccessful attempt to decipher Zodiac 340. [sent-16, score-1.073]

13 Finally, we attack a series of ciphers generated to match properties of Zodiac 340 and use the results to argue that Zodiac 340 is likely not a homophonic cipher under the commonly assumed linearization order. [sent-17, score-0.942]

14 2 Decipherment Model Various types of ciphers have been tackled by the NLP community with great success (Knight et al. [sent-18, score-0.193]

15 Many of these approaches learn an encryption key by maximizing the score of the decrypted message under a language model. [sent-21, score-0.092]

16 We focus on homophonic substitution ciphers, where the encryption key is a 1-to-many mapping from a plaintext alphabet to a cipher alphabet. [sent-22, score-0.815]

17 , 1977) is used to learn the emission parameters of an HMM that has a character trigram language model as a backbone and the ciphertext as the observed sequence of emissions. [sent-25, score-0.125]

18 This means that we learn a multinomial over cipher symbols for each plaintext character, but do not learn transition ProceSe datintlges, o Wfa tsh ein 2g01to3n, C UoSnfAe,re 1n8c-e2 o1n O Ecmtopbier ic 2a0l1 M3. [sent-26, score-0.558]

19 We predict the deciphered text using posterior decoding in the learned HMM. [sent-29, score-0.026]

20 1 Implementation Running multiple random restarts means running EM to convergence multiple times, which can be computationally intensive; luckily, restarts can be run in parallel. [sent-31, score-1.035]

21 We implemented EM with parallel random restarts using the CUDA API (Nickolls et al. [sent-33, score-0.543]

22 With a GPU workstation,1 we can complete a million random restarts roughly a thousand times more quickly than we can complete the same computation with a serial implementation on a CPU. [sent-35, score-0.577]

23 3 Experiments We ran experiments on several homophonic sub- stitution ciphers: some produced by the infamous Zodiac killer and others that were automatically generated to be similar to the Zodiac ciphers. [sent-36, score-0.289]

24 In each of these experiments, we ran numerous random restarts; and in all cases we chose the random restart that attained the highest model score in order to produce the final decode. [sent-37, score-0.296]

25 1 Experimental Setup The specifics of how random restarts are produced is usually considered a detail; however, in this work it is important to describe the process precisely. [sent-39, score-0.543]

26 In order to generate random restarts, we sampled emission parameters by drawing uniformly at random from the interval [0, 1] and then normalizing. [sent-40, score-0.223]

27 The corresponding distribution on the multinomial emission parameters is mildly concentrated at the center of the simplex. [sent-41, score-0.073]

28 2 For each random restart, we ran EM for 200 itera1We used a single workstation with three NVIDIA GTX 580 GPUs. [sent-42, score-0.102]

29 2We also ran experiments where emission parameters were drawn from Dirichlet distributions with various concentration parameter settings. [sent-44, score-0.059]

30 We noticed little effect so long as the distribution did not favor the corners of the simplex. [sent-45, score-0.044]

31 If the distribu- tion did favor the corners of the simplex, decipherment results deteriorated sharply. [sent-46, score-0.188]

32 875 Number of random restarts Figure 1: Zodiac 408 cipher. [sent-47, score-0.543]

33 Accuracy by best model score and best model score vs. [sent-48, score-0.04]

34 2 An Easy Cipher: Zodiac 408 Zodiac 408 is a homophonic cipher that is 408 characters long and contains 54 different cipher symbols. [sent-58, score-1.147]

35 Produced by the Zodiac killer, this cipher was solved, manually, by two amateur code-breakers a week after its release to the public in 1969. [sent-59, score-0.451]

36 Ravi and Knight (201 1) were the first to crack Zodiac 408 using completely automatic methods. [sent-60, score-0.089]

37 In our first experiment, we compare a decode of Zodiac 408 using one random restart to a decode using 100 random restarts. [sent-61, score-0.317]

38 Random restarts have high 3While this does not guarantee convergence, in practice 200 iterations seems to be sufficient for the problems we looked at. [sent-62, score-0.487]

39 4The interpolation between n-gram orders is uniform, and the interpolation between corpora favors the Zodiac corpus with weight 0. [sent-63, score-0.03]

40 variance, so when we present the accuracy corresponding to a given number of restarts we present an average over many bootstrap samples, drawn from a set of one million random restarts. [sent-65, score-0.595]

41 If we attack Zodiac 408 with a single random restart, on average we achieve an accuracy of 18%. [sent-66, score-0.16]

42 If we instead use 100 random restarts we achieve a much better average accuracy of 90%. [sent-67, score-0.58]

43 The accuracies for various numbers of random restarts are plotted in Figure 1. [sent-68, score-0.543]

44 Based on these results, we expect accuracy to increase by about 72% when using 100 random restarts instead of a single random restart; however, using more than 100 random restarts for this particular cipher does not appear to be useful. [sent-69, score-1.658]

45 Also in Figure 1, we plot a related graph, this time showing the effect that random restarts have on the achieved model score. [sent-70, score-0.569]

46 By construction, the (maximum) model score must increase as we increase the number of random restarts. [sent-71, score-0.104]

47 We see that it quickly saturates in the same way that accuracy did. [sent-72, score-0.056]

48 This raises the question: have we actually achieved the globally optimal model score or have we only saturated the usefulness of random restarts? [sent-73, score-0.134]

49 We can’t prove that we have achieved the global optimum,5 but we can at least check that we have surpassed the model score achieved by EM when it is initialized with the gold encryption key. [sent-74, score-0.152]

50 On Zodiac 408, if we initialize with the gold key, EM finds a local optimum with a model score of −1467. [sent-75, score-0.161]

51 The accuracy after gold initialization was 92%, while the accuracy of the best local optimum was only 89%. [sent-81, score-0.229]

52 This suggests that the global optimum may not be worth finding if we haven’t already found it. [sent-82, score-0.069]

53 From Figure 1, it appears that large increases in likelihood are correlated with increases in accuracy, but small improvements to high likelihoods (e. [sent-83, score-0.084]

54 the best local optimum versus the gold initialization) may not to be. [sent-85, score-0.141]

55 876 Number of random restarts Figure 2: Synth 340 cipher. [sent-87, score-0.543]

56 Accuracy by best model score and best model score vs. [sent-88, score-0.04]

57 Zodiac 340 is the second cipher released by the Zodiac killer, and it remains unsolved to this day. [sent-93, score-0.486]

58 However, it is unknown whether Zodiac 340 is actually a homophonic cipher. [sent-94, score-0.232]

59 If it were a homophonic cipher we would certainly expect it to be harder than Zodiac 408 because Zodiac 340 is shorter (only 340 characters long) and at the same time has more cipher symbols: 63. [sent-95, score-1.164]

60 We sample a random consecutive sequence of 340 characters from our small Zodiac corpus and use this as our message (and, of course, remove this sequence from our language model training data). [sent-97, score-0.117]

61 We then generate an encryption key by assigning each of 63 cipher symbols to a single plain text character so that the number of cipher symbols mapped to each plaintext character is proportional to the frequency of that character in the message (this balancing makes the cipher more difficult). [sent-98, score-1.68]

62 Finally, we generate the actual ciphertext by randomly sampling a cipher token for each plain text token uniformly at random from the cipher symbols allowed for that to- ken under our generated key. [sent-99, score-1.076]

63 For this cipher, there is an abso- Log likelihood Figure 3: Synth 340 cipher. [sent-101, score-0.025]

64 Histogram of the likelihoods of the local optima encountered by EM across 1M random restarts. [sent-102, score-0.332]

65 Several peaks are labeled with their average accuracy and a snippet of a decode. [sent-103, score-0.144]

66 ” lute gain in accuracy of about 9% between 100 random restarts and 100K random restarts. [sent-105, score-0.677]

67 This means that, even after tens of thousands of random restarts, EM is still finding new local optima with better likelihoods. [sent-107, score-0.237]

68 It also appears that, even for a short cipher like Synth 340, likelihood and accuracy are reasonably coupled. [sent-108, score-0.513]

69 We can visualize the distribution of local optima encountered by EM across 1M random restarts by plotting a histogram. [sent-109, score-0.757]

70 Figure 3 shows, for each range of likelihood, the number of random restarts that led to a local optimum with a model score in that range. [sent-110, score-0.682]

71 It is quickly visible that a few model scores are substantially more likely than all the rest. [sent-111, score-0.035]

72 This kind of sparsity might be expected if there were a small number of local optima that EM was extremely likely to find. [sent-112, score-0.153]

73 We can check whether the peaks of this histogram each correspond to a single local optimum or whether each is composed of multiple local optima that happen to have the same likelihood. [sent-113, score-0.428]

74 For the histogram bucket corresponding to a particular peak, we compute the average relative difference between each multinomial parameter and its mean. [sent-114, score-0.109]

75 The average relative difference for the highest peak in Figure 3 is 0. [sent-115, score-0.055]

76 These values are much smaller than 877 Log likelihood Figure 4: Zodiac 340 cipher. [sent-118, score-0.025]

77 Histogram ofthe likelihoods ofthe local optima encountered by EM across 1M random restarts. [sent-119, score-0.332]

78 the average relative difference between the means of these two peaks, 40%, indicating that the peaks do correspond to single local optima or collections of extremely similar local optima. [sent-120, score-0.269]

79 There are several very small peaks that have the highest model scores (the peak with the highest model score has a frequency of 90 which is too small to be visible in Figure 3). [sent-121, score-0.173]

80 The fact that these model scores are both high and rare is the reason we continue to see improvements to both accuracy and model score as we run numerous random restarts. [sent-122, score-0.16]

81 The two tallest peaks and the peak with highest model score are labeled with their average accuracy and a small snippet of a decode in Figure 3. [sent-123, score-0.266]

82 As mentioned, this cipher has never been cracked and may not be a homphonic cipher or even a valid cipher of any kind. [sent-127, score-1.353]

83 The reading order of the cipher, which consists of a grid of symbols, is unknown. [sent-128, score-0.021]

84 We make two arguments supporting the claim that Zodiac 340 is not a homophonic cipher with row-major reading order: the first is statistical, based on the success rate of attempts to crack similar synthetic ciphers; the second is qualitative, comparing distributions of local optimum likelihoods. [sent-129, score-0.929]

85 If Zodiac 340 is a homophonic cipher should we expect to crack it? [sent-130, score-0.772]

86 In order to answer this question we generate 100 more ciphers in the same way we generated Synth 340. [sent-131, score-0.193]

87 We use 10K random restarts to attack each cipher, and compute accuracies by best model score. [sent-132, score-0.582]

88 The average accuracy across these 100 ciphers was 75% and the minimum accuracy was 36%. [sent-133, score-0.267]

89 All but two of the ciphers were deciphered with more than 5 1% accuracy, which is usually sufficient for a human to identify a decode as partially correct. [sent-134, score-0.266]

90 We attempted to crack Zodiac 340 using a rowmajor reading order and 1M random restarts, but the decode with best model score was nonsensical. [sent-135, score-0.261]

91 This outcome would be unlikely if Zodiac 340 were like our synthetic ciphers, so Zodiac 340 is probably not a homophonic cipher with a row-major order. [sent-136, score-0.7]

92 Of course, it could be a homophonic cipher with a different reading order. [sent-137, score-0.704]

93 In Figure 4, we show the histogram of model scores for the attempt to crack Zodiac 340. [sent-139, score-0.179]

94 We note that this histogram is strikingly different from the histogram for Synth 340. [sent-140, score-0.18]

95 Zodiac 340’s histogram is not as sparse, and the range of model scores is much smaller. [sent-141, score-0.09]

96 The sparsity of Synth 340’s histogram (but not Zodiac 340’s histogram) is typical of histograms corresponding to our set of 100 generated ciphers. [sent-142, score-0.09]

97 In particular, we found that the initializations that lead to the local optima with highest likelihoods are sometimes very rare, but finding them can be worthwhile; for the problems we looked at, local optima with high likelihoods also achieved high accuracies. [sent-144, score-0.482]

98 While the present experiments are on a very specific unsupervised learning problem, it is certainly reasonable to think that large-scale random restarts have potential more broadly. [sent-145, score-0.56]

99 In addition to improving search, large-scale restarts can also provide a novel perspective when performing exploratory analysis, here letting us argue in support for the hypothesis that Zodiac 340 is not a row-major homophonic cipher. [sent-146, score-0.721]

100 Maximum likelihood from incomplete data via the EM algorithm. [sent-157, score-0.025]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zodiac', 0.578), ('restarts', 0.459), ('cipher', 0.451), ('homophonic', 0.232), ('ciphers', 0.193), ('synth', 0.133), ('decipherment', 0.131), ('optima', 0.103), ('histogram', 0.09), ('crack', 0.089), ('random', 0.084), ('em', 0.077), ('optimum', 0.069), ('peaks', 0.066), ('ravi', 0.059), ('likelihoods', 0.059), ('restart', 0.055), ('knight', 0.052), ('encryption', 0.052), ('plaintext', 0.052), ('local', 0.05), ('decode', 0.047), ('snippet', 0.041), ('emission', 0.041), ('peak', 0.039), ('attack', 0.039), ('character', 0.039), ('killer', 0.039), ('accuracy', 0.037), ('encountered', 0.036), ('symbols', 0.036), ('unsolved', 0.035), ('corners', 0.03), ('nickolls', 0.03), ('saturate', 0.03), ('surpassed', 0.03), ('substitution', 0.028), ('hundred', 0.028), ('ciphertext', 0.026), ('deciphered', 0.026), ('famously', 0.026), ('objectives', 0.025), ('likelihood', 0.025), ('kevin', 0.024), ('snyder', 0.024), ('sujith', 0.024), ('gold', 0.022), ('reading', 0.021), ('score', 0.02), ('message', 0.02), ('dempster', 0.02), ('running', 0.02), ('multinomial', 0.019), ('trigram', 0.019), ('numerous', 0.019), ('gains', 0.019), ('quickly', 0.019), ('exploratory', 0.018), ('bootstrapped', 0.018), ('ran', 0.018), ('synthetic', 0.017), ('certainly', 0.017), ('visible', 0.016), ('globally', 0.016), ('highest', 0.016), ('look', 0.015), ('million', 0.015), ('interpolation', 0.015), ('properties', 0.015), ('brants', 0.015), ('initialization', 0.014), ('favor', 0.014), ('taylor', 0.014), ('plain', 0.014), ('looked', 0.014), ('millions', 0.014), ('achieved', 0.014), ('uniformly', 0.014), ('problems', 0.014), ('convergence', 0.013), ('characters', 0.013), ('attacking', 0.013), ('cuda', 0.013), ('nvidia', 0.013), ('simt', 0.013), ('mildly', 0.013), ('anish', 0.013), ('deteriorated', 0.013), ('haven', 0.013), ('lute', 0.013), ('nishit', 0.013), ('optimally', 0.013), ('plotting', 0.013), ('rathod', 0.013), ('plot', 0.012), ('argue', 0.012), ('decipher', 0.012), ('kle', 0.012), ('visualize', 0.012), ('buck', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 54 emnlp-2013-Decipherment with a Million Random Restarts

Author: Taylor Berg-Kirkpatrick ; Dan Klein

Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.

2 0.14513485 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

Author: Qing Dou ; Kevin Knight

Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.

3 0.11058 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete sys- tem achieves 48.6% accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) more than 5% higher than the previous state-of-the-art. —

4 0.025521893 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

5 0.025178123 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

Author: Hrushikesh Mohapatra ; Siddhanth Jain ; Soumen Chakrabarti

Abstract: Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase, and these are a highly biased sample of well-connected, frequently mentioned “head” entities. To bring hope to “tail” entities, we broaden our goal to a second task: assigning types to entities in Freebase but not Wikipedia. The two tasks are synergistic: knowing the types of unfamiliar entities helps disambiguate mentions, and words in mention contexts help assign types to entities. We present TMI, a bipartite graphical model for joint type-mention inference. TMI attempts no schema integration or entity resolution, but exploits the above-mentioned synergy. In experiments involving 780,000 people in Wikipedia, 2.3 million people in Freebase, 700 million Web pages, and over 20 professional editors, TMI shows considerable annotation accuracy improvement (e.g., 70%) compared to baselines (e.g., 46%), especially for “tail” and emerging entities. We also compare with Google’s recent annotations of the same corpus with Freebase entities, and report considerable improvements within the people domain.

6 0.024892636 2 emnlp-2013-A Convex Alternative to IBM Model 2

7 0.02245629 159 emnlp-2013-Regularized Minimum Error Rate Training

8 0.022345027 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

9 0.02171284 123 emnlp-2013-Learning to Rank Lexical Substitutions

10 0.020564664 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

11 0.020500928 138 emnlp-2013-Naive Bayes Word Sense Induction

12 0.018818395 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

13 0.018151466 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

14 0.016959282 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

15 0.016702106 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

16 0.01591613 72 emnlp-2013-Elephant: Sequence Labeling for Word and Sentence Segmentation

17 0.015785633 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

18 0.015078743 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization

19 0.014568635 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

20 0.014305782 70 emnlp-2013-Efficient Higher-Order CRFs for Morphological Tagging


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.06), (1, -0.022), (2, 0.001), (3, -0.005), (4, -0.024), (5, -0.01), (6, 0.006), (7, 0.023), (8, -0.006), (9, -0.007), (10, -0.006), (11, -0.01), (12, 0.021), (13, 0.019), (14, 0.019), (15, -0.032), (16, 0.005), (17, 0.016), (18, 0.042), (19, 0.027), (20, -0.006), (21, 0.049), (22, 0.079), (23, 0.117), (24, 0.037), (25, -0.041), (26, -0.153), (27, 0.043), (28, 0.114), (29, -0.157), (30, 0.023), (31, -0.174), (32, 0.057), (33, 0.056), (34, 0.014), (35, -0.142), (36, 0.042), (37, 0.122), (38, 0.052), (39, -0.308), (40, -0.263), (41, -0.01), (42, -0.107), (43, 0.097), (44, 0.092), (45, 0.074), (46, 0.021), (47, 0.094), (48, 0.001), (49, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97065443 54 emnlp-2013-Decipherment with a Million Random Restarts

Author: Taylor Berg-Kirkpatrick ; Dan Klein

Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.

2 0.55748481 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: Many statistical learning problems in NLP call for local model search methods. But accuracy tends to suffer with current techniques, which often explore either too narrowly or too broadly: hill-climbers can get stuck in local optima, whereas samplers may be inefficient. We propose to arrange individual local optimizers into organized networks. Our building blocks are operators of two types: (i) transform, which suggests new places to search, via non-random restarts from already-found local optima; and (ii) join, which merges candidate solutions to find better optima. Experiments on grammar induction show that pursuing different transforms (e.g., discarding parts of a learned model or ignoring portions of training data) results in improvements. Groups of locally-optimal solutions can be further perturbed jointly, by constructing mixtures. Using these tools, we designed several modular dependency grammar induction networks of increasing complexity. Our complete sys- tem achieves 48.6% accuracy (directed dependency macro-average over all 19 languages in the 2006/7 CoNLL data) more than 5% higher than the previous state-of-the-art. —

3 0.54743904 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

Author: Qing Dou ; Kevin Knight

Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-ofthe-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.

4 0.25659376 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

Author: Ann Irvine ; Chris Quirk ; Hal Daume III

Abstract: When using a machine translation (MT) model trained on OLD-domain parallel data to translate NEW-domain text, one major challenge is the large number of out-of-vocabulary (OOV) and new-translation-sense words. We present a method to identify new translations of both known and unknown source language words that uses NEW-domain comparable document pairs. Starting with a joint distribution of source-target word pairs derived from the OLD-domain parallel corpus, our method recovers a new joint distribution that matches the marginal distributions of the NEW-domain comparable document pairs, while minimizing the divergence from the OLD-domain distribution. Adding learned translations to our French-English MT model results in gains of about 2 BLEU points over strong baselines.

5 0.23327473 138 emnlp-2013-Naive Bayes Word Sense Induction

Author: Do Kook Choe ; Eugene Charniak

Abstract: We introduce an extended naive Bayes model for word sense induction (WSI) and apply it to a WSI task. The extended model incorporates the idea the words closer to the target word are more relevant in predicting its sense. The proposed model is very simple yet effective when evaluated on SemEval-2010 WSI data. 1

6 0.22338215 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

7 0.21722433 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers

8 0.20235306 2 emnlp-2013-A Convex Alternative to IBM Model 2

9 0.18038891 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

10 0.17704678 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation

11 0.17354994 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

12 0.17003822 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

13 0.15867963 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

14 0.15110455 58 emnlp-2013-Dependency Language Models for Sentence Completion

15 0.14958559 95 emnlp-2013-Identifying Multiple Userids of the Same Author

16 0.14761549 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features

17 0.14567272 23 emnlp-2013-Animacy Detection with Voting Models

18 0.1434046 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs

19 0.14142685 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

20 0.13943717 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.017), (18, 0.025), (22, 0.031), (30, 0.067), (50, 0.022), (51, 0.118), (53, 0.41), (66, 0.037), (71, 0.019), (75, 0.024), (77, 0.047), (96, 0.021), (97, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71394837 54 emnlp-2013-Decipherment with a Million Random Restarts

Author: Taylor Berg-Kirkpatrick ; Dan Klein

Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.

2 0.67071754 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith

Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.

3 0.66748488 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation

Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai

Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.

4 0.58618796 26 emnlp-2013-Assembling the Kazakh Language Corpus

Author: Olzhas Makhambetov ; Aibek Makazhanov ; Zhandos Yessenbayev ; Bakhyt Matkarimov ; Islam Sabyrgaliyev ; Anuar Sharafudinov

Abstract: This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official, scientific and informal. Along with its primary part KLC comprises such parts as: (i) annotated sub-corpus, containing segmented documents encoded in the eXtensible Markup Language (XML) that marks complete morphological, syntactic, and structural characteristics of texts; (ii) as well as a sub-corpus with the annotated speech data. KLC has a web-based corpus management system that helps to navigate the data and retrieve necessary information. KLC is also open for contributors, who are willing to make suggestions, donate texts and help with annotation of existing materials.

5 0.36138961 121 emnlp-2013-Learning Topics and Positions from Debatepedia

Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith

Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.

6 0.35197067 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

7 0.35192397 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

8 0.35011354 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

9 0.34986317 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

10 0.34954008 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

11 0.34930891 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

12 0.34783593 143 emnlp-2013-Open Domain Targeted Sentiment

13 0.34748712 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

14 0.34731206 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

15 0.34625182 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.34496093 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

17 0.34424004 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

18 0.34297973 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

19 0.34244648 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

20 0.34214628 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks