emnlp emnlp2011 emnlp2011-77 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Hall ; Dan Klein
Abstract: We present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a system for the large scale induction of cognate groups. [sent-3, score-0.574]
2 Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. [sent-4, score-0.349]
3 On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%. [sent-5, score-0.331]
4 1 Introduction The critical first step in the reconstruction of an ancient language is the recovery of related cognate words in its descendants. [sent-6, score-0.725]
5 The traditional approach used by linguists—the comparative method—iterates be- tween positing putative cognates and then identifying regular sound laws that explain correspondences between those words (Bloomfield, 1938). [sent-8, score-0.453]
6 Successful computational approaches have been developed for large-scale reconstruction of phylogenies (Ringe et al. [sent-9, score-0.143]
7 , 2002; Daum e´ III and Campbell, 2007; Daum e´ III, 2009; Nerbonne, 2010) and ancestral word forms of known cognate sets (Oakes, 2000; Bouchard-C oˆt´ e et al. [sent-10, score-0.641]
8 While it may seem surprising that cognate detection has not successfully scaled to large num- bers of languages, the task poses challenges not seen in reconstruction and phylogeny inference. [sent-15, score-0.835]
9 For instance, morphological innovations and irregular sound changes can completely obscure relationships between words in different languages. [sent-16, score-0.286]
10 In this paper, we present a system that uses two generative models for large-scale cognate identification. [sent-18, score-0.574]
11 Both models describe the evolution of words along a phylogeny according to automatically learned sound laws in the form of parametric edit distances. [sent-19, score-0.458]
12 The first is an adaptation of the generative model of Hall and Klein (2010), and the other is a new generative model called PARSIM with connections to parsimony methods in computational biology (Cavalli-Sforza and Edwards, 1965; Fitch, 1971). [sent-20, score-0.231]
13 To help correct this deficiency, we also describe an agglomerative inference procedure for the model of Hall and Klein (2010). [sent-22, score-0.177]
14 By using the output of our system as input to this system, we can find cognate groups that PARSIM alone cannot recover. [sent-23, score-0.659]
15 We apply these models to identifying cognate groups from two language families using the Austronesian Basic Vocabulary Database (Greenhill et al. [sent-24, score-0.659]
16 The datasets are by far the largest on which automated cognate recovery has ever been attempted, with 18 and 271 languages respectively. [sent-29, score-0.693]
17 We also analyze the mistakes of our system, where we find that some of the erroneous cognate groups our system finds may not be errors at all. [sent-33, score-0.687]
18 It also contains manual reconstructions for select ancestor languages produced by linguists. [sent-50, score-0.158]
19 These words are grouped into cognate groups and arranged by gloss. [sent-53, score-0.659]
20 For instance, there are 37 distinct cognate groups for the gloss “tail. [sent-54, score-0.753]
21 In this sample, there are 6307 such cognate groups and 210 distinct glosses. [sent-58, score-0.659]
22 Moreover, there is some amount of homoplasy—that is, languages with a word from more than one cognate group for a given gloss. [sent-60, score-0.727]
23 The languages in this group are exclusively found on the Austronesian homeland of Formosa. [sent-65, score-0.179]
24 For our final test set, we use the Oceanic subfamily, which includes almost 50% of the languages in the Austronesian family, meaning that it represents around 10% of all languages in the world. [sent-69, score-0.17]
25 3 Models In this section we describe two models, one based on Hall and Klein (2010)—which we call HK10—and another new model that shares some connection to parsimony methods in computational biology, which we call PARSIM. [sent-72, score-0.183]
26 Both are generative models that describe the evolution of words w‘ from a set of languages {‘} in a cognate group g along a fixed phylogeny T{‘. [sent-73, score-0.917]
27 1} nE aac cho cognate group laonndg w ao firxd ids pahlysoassociated with a gloss or meaning m, which we assume to be fixed. [sent-74, score-0.736]
28 2 In both models, words evolve according to regular sound laws ϕ‘, which are specific to each language. [sent-75, score-0.273]
29 In HK10, there is an unknown number of cognate groups G where each cognate group g consists of a set of words {wg,‘}. [sent-81, score-1.301]
30 In each cognate group, words esvetol ovfe w along a phylogeny, awchhe croeg genaacthe w groordup pin, a olardn-s guage is the result of that word evolving from its parent according to regular sound laws. [sent-82, score-0.814]
31 To model the fact that not all languages have a cognate in each group, each language in the tree has an associated “survival” variable Sg,‘, where a word may be lost on that branch (and its descendants) instead of evolving. [sent-83, score-0.684]
32 346 Figure 1: Plate diagrams for (a) HK10 (Hall and Klein, 2010) and (b) PARSIM, our new parsimony model, for a small set of languages. [sent-89, score-0.183]
33 In HK10, words are generated following a phylogenetic tree according to sound laws ϕ, and then “scrambled” with a permutation π so that the original cognate groups are lost. [sent-90, score-0.953]
34 In PARSIM, all words for each of the M glosses are generated in a single tree, with innovations I starting new cognate groups. [sent-91, score-0.676]
35 The task of inference then is to recover the original cognate groups. [sent-94, score-0.644]
36 The generative process for their model is as fol- lows: • • For each cognate group g, choose a root word Wroot ∼ p(W|λ), a language am orodoetl over words. [sent-95, score-0.642]
37 We reproduce the graphical model for HK10 for a small phylogeny in Figure 1a. [sent-103, score-0.144]
38 Inference in this model is intractable; to perform inference exactly, one has to reason over all partitions of the data into cognate groups. [sent-104, score-0.632]
39 First, we restrict the cognate assignments to stay within a gloss. [sent-108, score-0.574]
40 Second, we use an agglomerative inference procedure, which greedily merges cognate groups that result in the greatest gain 347 in likelihood. [sent-110, score-0.836]
41 That is, for all pairs of cognate groups ga with words wa and gb with words wb, we compute the score: logp(wa∪b|ϕ) − logp(wa |ϕ) − logp(wb |ϕ) This score is the difference between the log probability of generating two cognate groups jointly and generating them separately. [sent-111, score-1.318]
42 These changes may either be mutations, which merely change the surface form of the word, or innovations, which start a new word in a new cognate group that is unrelated to the previous word. [sent-121, score-0.69]
43 Mutations take the form of a conditional edit operation that models insertions, substitutions, and deletions that correspond to regular (and, with lower probability, irregular) sound changes that are likely to occur between a language and its parent. [sent-122, score-0.307]
44 We also depict our model as a plate diagram for a small phylogeny in Figure 1b. [sent-134, score-0.144]
45 Instead, pieces of the phylogeny are simply “cut” into subtrees whenever an innovation occurs. [sent-136, score-0.243]
46 4 Relation to Parsimony PARSIM is related to the parsimony principle from computational biology (Cavalli-Sforza and Edwards, 1965; Fitch, 1971), where it is used to search for phylogenies. [sent-142, score-0.231]
47 When using parsimony, a phylogeny is scored according to the derivation that requires the fewest number of changes of state, where a state is typically thought of as a gene or some other trait in a species. [sent-143, score-0.192]
48 When inducing phylogenies of languages, a natural choice for characters are glosses from a restricted vocabulary like a Swadesh list, and two words are represented as the same value for a character if they are cognate (Ringe et al. [sent-145, score-0.68]
49 Therefore, the parsimony score for this tree is two. [sent-151, score-0.208]
50 For instance, it might be extremely likely that B changes into both A and C, but that A never changes into B or C, and so weighted variants of parsimony might be necessary (Sankoff and Cedergren, 1983). [sent-153, score-0.279]
51 3 5 Limitations of the Parsimony Model Potentially, our parsimony model sacrifices a certain amount of power to make inference tractable. [sent-161, score-0.218]
52 Specifically, it cannot model homoplasy, the presence of more than one word in a language for a given 3It is worth noting that we are not the first to point out a connection between parsimony and likelihood. [sent-162, score-0.183]
53 (Aa)ABACAA(bA){A,BB}{AB,B}{A,AB}(cA)ABABAA(dA)AAAABBB BB 3: Trees illustrating parsimony and its limitations. [sent-164, score-0.183]
54 Here, given this tree, it seems likely that the ancestral languages contained both A and B. [sent-169, score-0.152]
55 Instead, it can at best only select group A or group B as the value for the parent, and leave the other group fragmented as two innovations, as in Figure 3c. [sent-176, score-0.204]
56 Where it does appear, our model should simply fail to get one of the cognate groups, instead explaining all of them via innovation. [sent-183, score-0.574]
57 To repair this shortcoming, we can simply run the agglomerative clustering procedure for the model of Hall and Klein (2010), starting from the groups that PARSIM has recovered. [sent-184, score-0.227]
58 Specifically, we group each word variable W‘ with its innovation parameter I‘. [sent-190, score-0.167]
59 Automata have been used to successfully model distributions of strings for inferring morphology (Dreyer and Eisner, 2009) as well as cognate detection (Hall and Klein, 2010). [sent-200, score-0.574]
60 Even in models that would be tractable with “ordinary” messages, inference with automata quickly becomes intractable, because the size of the automata grow exponentially with the number of messages passed. [sent-201, score-0.232]
61 Dreyer and Eisner (2009) used a mixture of a k-best list and a unigram language model, while Hall and Klein (2010) used an approximation procedure that projected complex automata to simple, tractable automata using a modified KL divergence. [sent-203, score-0.169]
62 That is, if a gloss has 10 distinct words across all the languages in our dataset, we pass messages that only contain information about those 10 words. [sent-208, score-0.207]
63 1 Sound Laws The core piece of our system is learning the sound laws associated with each edge. [sent-218, score-0.242]
64 Since the foundation of historical linguists with the neogrammarians, linguists have argued for the regularity of sound change at the phonemic level (Schleicher, 1861 ; Bloomfield, 1938). [sent-219, score-0.313]
65 In practice, of course, sound change is not entirely regular, and complex extralinguistic events can lead to sound changes that are irregular. [sent-221, score-0.382]
66 Speakers of these languages do find ways around this prohibition, often resulting in sound changes that cannot be explained by sound laws alone (Keesing and Fifi’i, 1969). [sent-223, score-0.542]
67 Nevertheless, we find it useful to model sound change as a largely regular if stochastic process. [sent-224, score-0.198]
68 We employ a sound change model whose expressive power is equivalent to that of Hall and Klein (2010), though with a different parameterization. [sent-225, score-0.167]
69 This last feature reflects the propensity of a particular phoneme to appear in a given language at all, no matter what its ancestral phoneme was. [sent-236, score-0.155]
70 These parameters can be learned separately, though due to data sparsity, we found it better to use a tied parameterization as with the sound laws. [sent-242, score-0.167]
71 As with the sound laws, we used an ‘2 regularization penalty to encourage the use of the global innovation parameter. [sent-244, score-0.266]
72 PARSIM is the new parsimony model in this paper, Agg. [sent-259, score-0.183]
73 HK10 is our agglomerative variant of Hall and Klein (2010) and Combination uses PARSIM’s output to seed the agglomerative matcher. [sent-260, score-0.284]
74 For the agglomerative systems, we report the point with maximal F1 score, but we also show precision/recall curves. [sent-261, score-0.142]
75 1 Cognate Recovery We ran both PARSIM and our agglomerative version of HK10 on the Formosan datasets. [sent-265, score-0.142]
76 For PARSIM, we initialized the mutation parameters ϕ to a model that preferred matches to insertions, substitutions and deletions by a factor of e3, innovation parameters to 0. [sent-266, score-0.173]
77 For the agglomerative HK10, we initialized its parameters to the values found by our model. [sent-268, score-0.142]
78 4 Based on our observations about homoplasy, we also considered a combined system where we ran PARSIM, and then seeded the agglomerative clustering algorithm with the clusters found by PARSIM. [sent-269, score-0.142]
79 Specifically, each cluster is assigned to the cognate group that is the most common cognate word in that group, and then purity is computed as the fraction of words that 4Attempts to learn parameters directly with the agglomerative clustering algorithm were not effective. [sent-272, score-1.424]
80 are in a cluster whose gold cognate group matches the cognate group of the cluster. [sent-276, score-1.284]
81 On Formosan, PARSIM has much higher precision and purity than our agglomerative version of HK10 at its highest point, though its recall and F1 suffer somewhat. [sent-287, score-0.232]
82 Next, we also examined precision and recall curves for the two agglomerative systems on For5The main difference between precision and purity is that pairwise precision is inherently quadratic, meaning that it penalizes mistakes in large groups much more heavily than mistakes in small groups. [sent-291, score-0.421]
83 2 Reconstruction We also wanted to see how well our cognates could be used to actually reconstruct the ancestral forms of words. [sent-297, score-0.289]
84 (2009)’s reconstruction system using both the cognate groups PARSIM found in the Oceanic language family and the gold cognate groups provided by the ABVD. [sent-299, score-1.497]
85 We then evaluated the average Levenshtein distance of the reconstruction for each word to the reconstruction of that word’s Proto-Oceanic ancestor provided by linguists. [sent-300, score-0.268]
86 (2009) in that they averaged over cognate groups, which does not make sense for our task because there are different cognate groups. [sent-302, score-1.148]
87 Using this metric, reconstructions using our system’s cognates are an average of 2. [sent-304, score-0.219]
88 47 edit operations from the gold reconstruction, while with gold cognates the error is 2. [sent-305, score-0.206]
89 While the plots are similar, the automatic cognates exhibit a longer tail. [sent-310, score-0.18]
90 Thus, even with automatic cognates, the reconstruction system can reconstruct words faithfully in many cases, but in a few instances our system fails. [sent-311, score-0.159]
91 1 Precision Many of our precision errors seem to be due to our somewhat limited model of sound change. [sent-315, score-0.191]
92 Somewhat surprisingly the former word is cognate with Paiwan /qmereN/ and Saisiat /maPr@m/ while the latter is not. [sent-321, score-0.574]
93 Perhaps a more sophisticated model of sound change could correctly learn this relationship. [sent-324, score-0.167]
94 However, despite these words’ similarity, there are actually three cognate groups here. [sent-327, score-0.659]
95 Crucially, these cognate groups do not follow the phylogeny closely. [sent-329, score-0.803]
96 While a more thorough, linguisticallyinformed analysis is needed to ensure that these are actually cognates, we believe that our system, in 353 conjunction with a trained Austronesian specialist, could potentially find many more cognate groups, speeding up the process of completing the ABVD. [sent-331, score-0.574]
97 For instance, there is a cognate group for “to eat” that includes Bunun /maun/, Thao /kman/, Favorlang /man/, and Sediq /manakamakan/, among others. [sent-335, score-0.642]
98 Reduplication cannot be modeled using mere sound laws, and so a more complex transition model is needed to correctly identify these kinds of changes. [sent-337, score-0.167]
99 10 Conclusion We have presented a new system for automatically finding cognates across many languages. [sent-338, score-0.18]
100 Automatic prediction of cognate orthography using support vector machines. [sent-469, score-0.574]
wordName wordTfidf (topN-words)
[('cognate', 0.574), ('parsim', 0.34), ('austronesian', 0.313), ('formosan', 0.196), ('parsimony', 0.183), ('cognates', 0.18), ('sound', 0.167), ('oceanic', 0.157), ('phylogeny', 0.144), ('agglomerative', 0.142), ('reconstruction', 0.117), ('homoplasy', 0.104), ('innovation', 0.099), ('gloss', 0.094), ('groups', 0.085), ('languages', 0.085), ('paiwan', 0.078), ('wpar', 0.078), ('laws', 0.075), ('innovations', 0.071), ('group', 0.068), ('automata', 0.068), ('ancestral', 0.067), ('hall', 0.067), ('purity', 0.066), ('squliq', 0.065), ('family', 0.062), ('linguists', 0.056), ('insertions', 0.053), ('klein', 0.052), ('abvd', 0.052), ('atayalic', 0.052), ('cognacy', 0.052), ('mutations', 0.052), ('changes', 0.048), ('biology', 0.048), ('evolution', 0.046), ('reconstruct', 0.042), ('parent', 0.042), ('levenshtein', 0.041), ('daum', 0.039), ('blust', 0.039), ('bunun', 0.039), ('greenhill', 0.039), ('mutation', 0.039), ('reconstructions', 0.039), ('ringe', 0.039), ('saisiat', 0.039), ('zoology', 0.039), ('deletions', 0.035), ('inference', 0.035), ('recover', 0.035), ('historical', 0.034), ('recovery', 0.034), ('ancestor', 0.034), ('tractable', 0.033), ('regular', 0.031), ('dreyer', 0.031), ('phoneme', 0.031), ('glosses', 0.031), ('kondrak', 0.028), ('drift', 0.028), ('messages', 0.028), ('mistakes', 0.028), ('database', 0.028), ('bernoulli', 0.027), ('phylogenetic', 0.027), ('bloomfield', 0.026), ('campbell', 0.026), ('cedergren', 0.026), ('centralami', 0.026), ('ciuli', 0.026), ('durbin', 0.026), ('fifi', 0.026), ('fitch', 0.026), ('genetics', 0.026), ('hawai', 0.026), ('hawaiian', 0.026), ('homeland', 0.026), ('kavalan', 0.026), ('keesing', 0.026), ('needleman', 0.026), ('oakes', 0.026), ('phylogenies', 0.026), ('propensity', 0.026), ('reduplication', 0.026), ('sankoff', 0.026), ('subfamilies', 0.026), ('wroot', 0.026), ('vocabulary', 0.026), ('edit', 0.026), ('tree', 0.025), ('phonetic', 0.025), ('precision', 0.024), ('characters', 0.023), ('transitions', 0.023), ('partitions', 0.023), ('edwards', 0.022), ('affecting', 0.022), ('ahead', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 77 emnlp-2011-Large-Scale Cognate Recovery
Author: David Hall ; Dan Klein
Abstract: We present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%.
2 0.31767011 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information.
3 0.059927311 129 emnlp-2011-Structured Sparsity in Structured Prediction
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.
4 0.050494183 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
Author: Richard Socher ; Jeffrey Pennington ; Eric H. Huang ; Andrew Y. Ng ; Christopher D. Manning
Abstract: We introduce a novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions. Our method learns vector space representations for multi-word phrases. In sentiment prediction tasks these representations outperform other state-of-the-art approaches on commonly used datasets, such as movie reviews, without using any pre-defined sentiment lexica or polarity shifting rules. We also evaluate the model’s ability to predict sentiment distributions on a new dataset based on confessions from the experience project. The dataset consists of personal user stories annotated with multiple labels which, when aggregated, form a multinomial distribution that captures emotional reactions. Our algorithm can more accurately predict distributions over such labels compared to several competitive baselines.
5 0.046262927 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
6 0.042258427 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
7 0.041154373 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
8 0.040157087 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
9 0.039452489 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
10 0.037640505 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
11 0.03701197 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels
12 0.033436287 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
13 0.03236879 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
14 0.028768573 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
15 0.028500399 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
16 0.027994547 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
17 0.02742642 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
18 0.027384281 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
19 0.026084015 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
20 0.025905456 96 emnlp-2011-Multilayer Sequence Labeling
topicId topicWeight
[(0, 0.114), (1, -0.008), (2, -0.019), (3, 0.025), (4, 0.042), (5, 0.026), (6, -0.106), (7, -0.049), (8, -0.176), (9, 0.07), (10, -0.054), (11, 0.07), (12, -0.094), (13, -0.002), (14, -0.051), (15, 0.009), (16, -0.215), (17, 0.254), (18, -0.156), (19, -0.429), (20, 0.272), (21, 0.23), (22, 0.025), (23, -0.021), (24, -0.031), (25, -0.122), (26, -0.034), (27, 0.023), (28, -0.127), (29, -0.055), (30, -0.018), (31, -0.134), (32, -0.076), (33, -0.024), (34, -0.003), (35, 0.12), (36, -0.06), (37, -0.086), (38, -0.014), (39, 0.003), (40, 0.025), (41, -0.082), (42, 0.098), (43, 0.002), (44, -0.016), (45, 0.077), (46, 0.038), (47, -0.025), (48, 0.036), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.93896741 77 emnlp-2011-Large-Scale Cognate Recovery
Author: David Hall ; Dan Klein
Abstract: We present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%.
2 0.87185603 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information.
3 0.16887282 129 emnlp-2011-Structured Sparsity in Structured Prediction
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability.
4 0.15878764 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
Author: Jiajun Zhang ; Feifei Zhai ; Chengqing Zong
Abstract: Due to its explicit modeling of the grammaticality of the output via target-side syntax, the string-to-tree model has been shown to be one of the most successful syntax-based translation models. However, a major limitation of this model is that it does not utilize any useful syntactic information on the source side. In this paper, we analyze the difficulties of incorporating source syntax in a string-totree model. We then propose a new way to use the source syntax in a fuzzy manner, both in source syntactic annotation and in rule matching. We further explore three algorithms in rule matching: 0-1 matching, likelihood matching, and deep similarity matching. Our method not only guarantees grammatical output with an explicit target tree, but also enables the system to choose the proper translation rules via fuzzy use of the source syntax. Our extensive experiments have shown significant improvements over the state-of-the-art string-to-tree system. 1
5 0.15345538 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: unkown-abstract
6 0.15048835 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
7 0.13542815 33 emnlp-2011-Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs
8 0.12914099 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
9 0.1287678 120 emnlp-2011-Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
10 0.12081441 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
11 0.11800899 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
12 0.11769101 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
13 0.11194485 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
14 0.11151505 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
15 0.10826792 131 emnlp-2011-Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
16 0.10556874 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
17 0.10508518 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
18 0.10465475 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
19 0.10357438 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
20 0.10345085 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
topicId topicWeight
[(15, 0.014), (18, 0.376), (23, 0.074), (36, 0.025), (37, 0.028), (45, 0.057), (53, 0.013), (54, 0.035), (57, 0.019), (62, 0.02), (64, 0.052), (66, 0.029), (69, 0.019), (79, 0.039), (82, 0.035), (87, 0.016), (90, 0.014), (96, 0.034), (98, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.71313876 77 emnlp-2011-Large-Scale Cognate Recovery
Author: David Hall ; Dan Klein
Abstract: We present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%.
2 0.56595641 57 emnlp-2011-Extreme Extraction - Machine Reading in a Week
Author: Marjorie Freedman ; Lance Ramshaw ; Elizabeth Boschee ; Ryan Gabbard ; Gary Kratkiewicz ; Nicolas Ward ; Ralph Weischedel
Abstract: We report on empirical results in extreme extraction. It is extreme in that (1) from receipt of the ontology specifying the target concepts and relations, development is limited to one week and that (2) relatively little training data is assumed. We are able to surpass human recall and achieve an F1 of 0.5 1 on a question-answering task with less than 50 hours of effort using a hybrid approach that mixes active learning, bootstrapping, and limited (5 hours) manual rule writing. We compare the performance of three systems: extraction with handwritten rules, bootstrapped extraction, and a combination. We show that while the recall of the handwritten rules surpasses that of the learned system, the learned system is able to improve the overall recall and F1.
3 0.32129768 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
4 0.32095623 107 emnlp-2011-Probabilistic models of similarity in syntactic context
Author: Diarmuid O Seaghdha ; Anna Korhonen
Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.
5 0.32042775 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
6 0.31861559 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
7 0.31781089 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
8 0.31412998 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
9 0.31221089 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
10 0.3120918 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
11 0.31173143 136 emnlp-2011-Training a Parser for Machine Translation Reordering
12 0.31089997 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
13 0.31027663 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
14 0.30995491 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
15 0.30881602 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
16 0.30823061 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
17 0.30739021 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
18 0.30736575 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
19 0.30684242 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
20 0.30645159 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests