emnlp emnlp2011 emnlp2011-140 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
Reference: text
sentIndex sentText sentNum sentScore
1 edu j oao Abstract In this paper, we consider the problem of unsupervised morphological analysis from a new angle. [sent-3, score-0.473]
2 We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. [sent-5, score-0.832]
3 We define a universal morphological feature space in which every language and its morphological analysis reside. [sent-6, score-1.069]
4 We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. [sent-7, score-1.161]
5 We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. [sent-8, score-0.338]
6 Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases. [sent-9, score-0.317]
7 Recently, increasing attention has been paid to the wide variety of other languages of the world. [sent-11, score-0.225]
8 Most of these languages still pose severe difficulties, due to (i) their 322 Prediction João V. [sent-12, score-0.225]
9 , 1994), and as a result, it is easy to treat the words bearing these tags as completely distinct word classes, with no internal morphological structure. [sent-21, score-0.366]
10 In this paper, we argue that languages for which we have gold-standard morphological analyses can be used as effective guides for languages lacking such resources. [sent-27, score-1.053]
11 ec th2o0d1s1 i Ans Nsoactuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgaugies ti 3c2s2–3 2, More formally, we recast morphological induction as a new kind of supervised structured prediction problem, where each annotated language serves as a single training example. [sent-31, score-0.628]
12 Each language’s noun lexicon serves as a single input x, and the analysis of the nouns into stems and suffixes serves as a complex structured label y. [sent-32, score-0.67]
13 Our first step is to define a universal morphological feature space, into which each language and its morphological analysis can be mapped. [sent-33, score-1.012]
14 We opt for a simple and intuitive mapping, which measures the sizes of the stem and suffix lexicons, the entropy of these lexicons, and the fraction of word forms which appear without any inflection. [sent-34, score-0.326]
15 Because languages tend to cluster into well defined morphological groups, we cast our learning and prediction problem in the nearest neighbor framework (Cover and Hart, 1967). [sent-35, score-1.088]
16 In contrast to its typical use in classification problems, where one can simply pick the label of the nearest training example, we are here faced with a structured prediction problem, where locations in feature space depend jointly on the input-label pair (x, y). [sent-36, score-0.579]
17 Finding a nearest neighbor thus consists of searching over the space ofmorphological analyses, until a point in feature space is reached which lies closest to one of the labeled languages. [sent-37, score-0.672]
18 To provide a measure of empirical validation, we applied our approach to eight languages with inflectional nominal morphology, ranging in complexity from very simple (English) to very complex (Hungarian). [sent-39, score-0.453]
19 Further analysis indicates that accuracy improves as the number of training languages increases. [sent-43, score-0.317]
20 2 Related Work In this section, we briefly review prior work on unsupervised morphological induction, as well as multilingual analysis in NLP. [sent-44, score-0.548]
21 Multilingual Analysis: An influential line ofprior multilingual work starts with the observation that rich linguistic resources exist for some languages but not others. [sent-51, score-0.3]
22 , 2005; Padó and Lapata, 2006) In these cases, the existence of a bilingual parallel text along with highly accurate predictions for one of the languages was assumed. [sent-55, score-0.332]
23 This idea has been developed and applied to a wide variety tasks, including morphological analysis (Snyder and Barzilay, 2008b; Snyder and Barzilay, 2008a), part-of-speech induction (Snyder et al. [sent-58, score-0.484]
24 An even more recent line of work does away with the assumption of parallel texts and performs joint unsupervised induction for various languages through the use of coupled priors in the context of grammar induction (Cohen and Smith, 2009; Berg-Kirkpatrick and Klein, 2010). [sent-65, score-0.479]
25 3 Structured Nearest Neighbor We reformulate morphological induction as a supervised learning task, where each annotated language serves as a single training example for our languageindependent model. [sent-67, score-0.478]
26 1 Because our goal is to generalize across languages, we define a feature function which maps each (x, y) pair to a universal feature space: f : X Y → Rd. [sent-69, score-0.293]
27 aFtuorre ee sapcahc eun:l fab : eXled × input language x, our goal is to predict a complete morphological analysis y ∈ Y wtoh picrhed imcta axi cmomizpesl a scoring foguniccatilo ann on itsh ey ∈fea Yture space, score : Rd → R. [sent-70, score-0.411]
28 On the assumption that for each test language, at least one typologically similar language will be present in the training set, we employ a nearest neighbor scoring function. [sent-77, score-0.461]
29 In the standard nearest neighbor classifi- cation setting, one simply predicts the label of the closest training example in the input space. [sent-78, score-0.52]
30 2 In our structured prediction setting, the mapping to the universal feature space depends crucially on the structure of the proposed label y, not simply the input 1Technically, the label space of each input, Y, should be thought hofn as a yfu,n thcteio lnab boefl th spea input x. [sent-79, score-0.609]
31 We thus generalize nearest-neighbor prediction to the structured scenario and propose the following prediction rule: y∗= aryg∈mYinm‘ink f(x,y) − f(x‘,y‘) k, (1) where the index ‘ ranges over the training languages. [sent-83, score-0.23]
32 In words, we predict the morphological analysis y for our test language which places it as close as possible in the universal feature space to one of the training languages ‘. [sent-84, score-0.961]
33 Morphological Analysis: In this paper we focus on nominal inflectional suffix morphology. [sent-85, score-0.364]
34 A correct analysis of this word would divide it into a stem (utisak = impression), a suffix (-om = instrumental case), and a phonological deletion rule on the stem’s penultimate vowel (. [sent-87, score-0.885]
35 More generally, as we define it, a morphological analysis of a word type w consists of (i) a stem t, (ii), a suffix f, and (iii) a deletion rule d. [sent-92, score-1.033]
36 Either or both of the suffix and deletion rule can be NULL. [sent-93, score-0.508]
37 We allow three types of deletion rules on stems: deletion of final vowels (. [sent-94, score-0.478]
38 And, of course, we require that after (1) applying deletion rule d to stem t, and (2) adding suffix f to the result, we obtain word w. [sent-121, score-0.622]
39 Consider the set of stems T, suffixes F, and deletion rules D, induced by the morphological analyses y of the words x. [sent-123, score-1.232]
40 After convergence, the label which is closest in distance to a training language is predicted, in this case being the label near training language (x3, y3). [sent-129, score-0.229]
41 have the null suffix, 2/3 in the example above), and the percentage of segmented words which employ a deletion rule (0 in the example above). [sent-133, score-0.397]
42 Thus, in total, our model employs 8 universal morphological features. [sent-134, score-0.543]
43 1 Search Algorithm The main algorithmic challenge for our model lies in efficiently computing the best morphological analy- sis y for each language-specific word set x, according to Equation 1. [sent-138, score-0.401]
44 Exhaustive search through the set of all possible morphological analyses is impossible, as the number of such analyses grows exponentially in the size of the vocabulary. [sent-139, score-0.84]
45 We finally select from amongst these analyses and make our prediction: ‘∗= argm‘ink f(x,y(‘)) − f(x‘,y‘) y∗ = k y(‘∗) The main outline of our search algorithm is based on the MDL-based greedy search heuristic developed and studied by (Goldsmith, 2005). [sent-168, score-0.237]
46 At a high level, this search procedure alternates between individual analyses of words (keeping the set of stems and suffixes fixed), aggregate discoveries of new stems (keeping the suffixes fixed), and aggregate discoveries of new suffixes (keeping stems fixed). [sent-169, score-1.563]
47 As this procedure tends to produce an overly large set of suffixes F, we further prune F down to the number of suffixes found in the training language, retaining those which appear with the largest number of stems. [sent-174, score-0.44]
48 We use the set of stems T and suffixes F obtained from the previous stage, and don’t permit the addition of any new items to these lists. [sent-177, score-0.39]
49 Instead, we focus on obtaining better analyses of each word, while also building up a set of phonological deletion rules D. [sent-178, score-0.578]
50 5With the restriction that at this stage we only allow suffixes up to length 5, and stems of at least length 3. [sent-180, score-0.49]
51 For each such possible analysis y0, we compute the resulting location in feature space f(x, y0), and select the analysis that brings us closest to our target training language: y = argminy0 k f(x, y0) − f(x‘, y‘) k . [sent-184, score-0.253]
52 Stage 2: Find New Stems In this stage, we keep our set of suffixes F and deletion rules D from the previous stage fixed, and attempt to find new stems to add to T through an aggregate analysis of unsegmented words. [sent-185, score-0.859]
53 For every string s, we consider the set of words which are currently unsegmented, and can be analyzed as a stemsuffix pair (s, f) for some existing suffix f ∈ F, asnufdf some d (esle,tfio)n f orrul seo dm ∈ xDis. [sent-186, score-0.256]
54 Stage 3: Find New Suffixes This stage is exactly analogous to the previous stage, except we now fix the set of stems T and seek to find new suffixes. [sent-190, score-0.27]
55 2 A Monolingual Supervised Model In order to provide a plausible upper bound on performance, we also formulate a supervised monolingual morphological model, using the structured perceptron framework (Collins, 2002). [sent-192, score-0.596]
56 Here we assume that we are given some training sequence ofinputs and morphological analyses (all within one language): (x1, y1) , (x2, y2) , . [sent-193, score-0.603]
57 We define each input xi to be a noun w, along with a morphological tag z, which specifies the gender, case, and number of the noun. [sent-197, score-0.366]
58 The goal is to predict the correct segmentation of w into stem, suffix, and phonological deletion rule: yi = (t, f,d). [sent-198, score-0.433]
59 The next three columns give, respectively, the entropies of the distributions of stems, suffixes (including NULL), and deletion rules (including NULL) over word types. [sent-201, score-0.503]
60 The final two columns give, respectively, the percentage ofword types occurring with the NULL suffix, and the number of non-NULL suffix words which use a phonological deletion rule. [sent-202, score-0.597]
61 Note that the final eight columns define the universal feature space used by our model. [sent-203, score-0.368]
62 (2) According to label yi, the suffix and deletion rule are (f, d) (one feature for every possible pair of deletion rules and suffixes). [sent-205, score-0.86]
63 (3) According to label yi and morphological tag z, the suffix, deletion rule, and gender are respectively (f, d, G). [sent-206, score-0.698]
64 (4) According to label yi and morphological tag z, the suffix, deletion rule, and case are (f, d, C). [sent-207, score-0.698]
65 (5) According to label yi and morphological tag z, the suffix, deletion rule, and number are (f, d, N). [sent-208, score-0.698]
66 Corpus: To test our cross-lingual model, we apply it to a morphologically analyzed corpus of eight languages (Erjavec, 2004). [sent-211, score-0.36]
67 All the words in the corpus are tagged with morphological stems and a detailed morpho-syntactic analysis. [sent-213, score-0.536]
68 In contrast, the number of suffixes across the languages varies quite a bit. [sent-218, score-0.479]
69 Hungarian and Estonian, both Uralic languages with very complex nominal morphology, use 231 and 141 nominal suffixes, respectively. [sent-219, score-0.411]
70 Besides English, the remaining languages employ between 21 and 32 suffixes, and English is the outlier in the other direction, with just three nominal inflectional suffixes. [sent-220, score-0.377]
71 (iii) Average: The training languages include all seven other (ii) Self (oracle): Each language is trained to minimize the distance to its own The feature values of all seven training languages are averaged together to create a single objective. [sent-227, score-0.681]
72 languages in turn as the test language, with the other seven serving as training examples. [sent-228, score-0.276]
73 ), we average the feature values of all seven training languages into a single objective. [sent-235, score-0.334]
74 For all languages other than English (which is a morphological loner in our group of languages), our model improves over the baseline by a substantial margin, yielding an average increase of 11. [sent-241, score-0.591]
75 These languages are closely related to one another, and indeed our model discovers that they are each others’ nearest neighbors. [sent-244, score-0.442]
76 By guiding their morphological analyses towards one another, our model achieves a 21 percentage point increase in the case of Slovene and a 15 percentage point increase in the case of Slovene. [sent-245, score-0.691]
77 By the same token, the resulting distance in universal feature space between training and test analyses is cut in halfunder this variant, when compared to the nonoracular nearest neighbor method. [sent-248, score-1.017]
78 incorrect analyses might map to the same feature values as the correct analysis). [sent-253, score-0.295]
79 Finally, we note that minimizing the distance to the average feature values of the seven training languages (Avg. [sent-254, score-0.405]
80 in Table 2) yields subpar performance and very large distances between between predicted analyses and target feature values (4. [sent-255, score-0.295]
81 This Figure 2: Locations in Feature Space of Linguistica predictions (green squares), gold standard analyses (red tri- angles), and our model’s nearest neighbor predictions (blue circles). [sent-258, score-0.776]
82 result may indicate that the average feature point between training languages is simply unattainable as an analysis of a real lexicon of nouns. [sent-260, score-0.36]
83 Visualizing Locations in Feature Space: Besides assessing our method quantitatively, we can also visualize the the eight languages in universal feature space according to (i) their gold standard analyses, (ii) the predictions of our model and (iii) the predictions of Linguistica. [sent-261, score-0.715]
84 With the exception of English, our model’s analyses lie closer in feature space to their gold standard counterparts than those of the baseline. [sent-264, score-0.352]
85 329 Learning Curves: We also measured the performance of our method as a function of the number of languages in the training set. [sent-267, score-0.225]
86 For each target language, we consider all possible training sets of sizes ranging from 1to 7 and select the predictions which bring our test language closest in distance to one of the languages in the set. [sent-268, score-0.405]
87 Figure 3 shows the resulting learning curves averaged over all test languages (left), as well as broken down by test language (right). [sent-270, score-0.269]
88 The overall trend is clear: as additional languages are added to the training set, test performance improves. [sent-271, score-0.225]
89 The more training languages available, the greater the chance that we can guide our test language into very close proximity to Figure 3: Learning curves for our model as the number of training languages increases. [sent-276, score-0.527]
90 The figure on the left shows the average accuracy of all eight languages for increasingly larger training sets (results are averaged over all training sets of size 1,2,3,. [sent-277, score-0.348]
91 We separately scaled accuracy and distance to the unit interval for each test language (as some test languages are inherently more difficult than others). [sent-286, score-0.38]
92 5 Conclusions and Future Work The approach presented in this paper recasts morphological induction as a structured prediction task. [sent-289, score-0.589]
93 We assume the presence of morphologically labeled languages as training examples which guide the induction process for unlabeled test languages. [sent-290, score-0.357]
94 We developed a novel structured nearest neighbor approach for this task, in which all languages and their morphological analyses lie in a universal feature space. [sent-291, score-1.55]
95 The task of the learner is to search through the space of morphological analyses for the test language and return the result which lies closest to one 330 Distance (normalized) Figure 4: Accuracy vs. [sent-292, score-0.743]
96 Distance: For all 56 possible test-train language pairs, we computed test accuracy along with resulting distance in universal feature space to the training language. [sent-293, score-0.41]
97 Our empirical findings validate this approach: On a set of eight different languages, our method yields substantial accuracy gains over a traditional MDL-based approach in the task of nominal morphological induction. [sent-297, score-0.582]
98 Besides potential gains in prediction accuracy, this approach may shed light on deeper relationships between languages than are otherwise apparent. [sent-301, score-0.305]
99 Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1. [sent-347, score-0.262]
100 Adding more languages improves unsupervised multilingual part-of-speech tagging: a Bayesian non-parametric approach. [sent-434, score-0.362]
wordName wordTfidf (topN-words)
[('morphological', 0.366), ('deletion', 0.239), ('analyses', 0.237), ('languages', 0.225), ('suffixes', 0.22), ('nearest', 0.217), ('suffix', 0.212), ('neighbor', 0.2), ('universal', 0.177), ('stems', 0.17), ('snyder', 0.153), ('goldsmith', 0.132), ('hungarian', 0.126), ('stem', 0.114), ('phonological', 0.102), ('stage', 0.1), ('yarowsky', 0.098), ('linguistica', 0.094), ('nominal', 0.093), ('morphology', 0.088), ('estonian', 0.087), ('serbian', 0.087), ('creutz', 0.085), ('prediction', 0.08), ('eight', 0.076), ('multilingual', 0.075), ('induction', 0.073), ('distance', 0.071), ('benjamin', 0.07), ('structured', 0.07), ('naseem', 0.067), ('mdl', 0.066), ('slovene', 0.063), ('regina', 0.062), ('unsupervised', 0.062), ('predictions', 0.061), ('tahira', 0.059), ('morphologically', 0.059), ('inflectional', 0.059), ('inductive', 0.059), ('feature', 0.058), ('rule', 0.057), ('space', 0.057), ('null', 0.057), ('successor', 0.056), ('bulgarian', 0.056), ('label', 0.055), ('segmentation', 0.054), ('seven', 0.051), ('lagus', 0.051), ('unsegmented', 0.051), ('perceptron', 0.05), ('closest', 0.048), ('morpheme', 0.047), ('accuracy', 0.047), ('parallel', 0.046), ('analysis', 0.045), ('self', 0.044), ('percentage', 0.044), ('curves', 0.044), ('adler', 0.044), ('discoveries', 0.044), ('entropies', 0.044), ('penultimate', 0.044), ('romanian', 0.044), ('stemsuffix', 0.044), ('typologically', 0.044), ('uralic', 0.044), ('locations', 0.042), ('monolingual', 0.039), ('serves', 0.039), ('jacob', 0.038), ('upper', 0.038), ('yi', 0.038), ('erjavec', 0.038), ('mathias', 0.038), ('eac', 0.038), ('reanalyze', 0.038), ('hug', 0.038), ('ink', 0.038), ('instrumental', 0.038), ('impression', 0.038), ('kiss', 0.038), ('eisenstein', 0.037), ('interval', 0.037), ('lexicons', 0.036), ('lies', 0.035), ('varies', 0.034), ('aggregate', 0.034), ('krista', 0.034), ('schone', 0.034), ('ngai', 0.034), ('inflecting', 0.034), ('vowel', 0.034), ('aryg', 0.034), ('dasgupta', 0.034), ('close', 0.033), ('plausible', 0.033), ('lexicon', 0.032), ('resnik', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
2 0.19480969 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
3 0.15663446 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
4 0.13875221 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
5 0.12758477 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
Author: Yugo Murawaki ; Sadao Kurohashi
Abstract: A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological ana- lyzer.
6 0.12741284 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
7 0.082188934 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
8 0.072717547 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
9 0.071895011 96 emnlp-2011-Multilayer Sequence Labeling
10 0.061429296 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data
11 0.060721144 129 emnlp-2011-Structured Sparsity in Structured Prediction
12 0.060045965 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
13 0.059373796 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
14 0.059359264 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
15 0.05508364 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
16 0.051943239 145 emnlp-2011-Unsupervised Semantic Role Induction with Graph Partitioning
17 0.051866658 69 emnlp-2011-Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
18 0.051209513 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
19 0.050035767 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
20 0.049764805 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
topicId topicWeight
[(0, 0.218), (1, 0.019), (2, -0.033), (3, 0.078), (4, -0.017), (5, 0.096), (6, -0.205), (7, 0.021), (8, -0.291), (9, 0.035), (10, -0.075), (11, 0.068), (12, -0.071), (13, 0.046), (14, -0.053), (15, -0.013), (16, -0.092), (17, 0.044), (18, -0.049), (19, 0.109), (20, -0.059), (21, -0.019), (22, 0.079), (23, 0.066), (24, 0.058), (25, 0.085), (26, -0.016), (27, -0.054), (28, 0.014), (29, -0.126), (30, 0.087), (31, -0.036), (32, -0.06), (33, -0.055), (34, 0.135), (35, -0.055), (36, 0.108), (37, 0.057), (38, 0.014), (39, -0.193), (40, 0.081), (41, 0.063), (42, -0.077), (43, 0.062), (44, -0.069), (45, -0.106), (46, -0.009), (47, 0.035), (48, 0.211), (49, 0.106)]
simIndex simValue paperId paperTitle
same-paper 1 0.96115017 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
2 0.7776233 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
3 0.62752408 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
Author: Markus Dreyer ; Jason Eisner
Abstract: We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words. Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finitestate transducers with language-specific parameters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50–100 seed paradigms, adding a 10million-word corpus reduces prediction error for morphological inflections by up to 10%.
4 0.5743174 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
5 0.46681523 99 emnlp-2011-Non-parametric Bayesian Segmentation of Japanese Noun Phrases
Author: Yugo Murawaki ; Sadao Kurohashi
Abstract: A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological ana- lyzer.
6 0.43539602 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
7 0.3187252 96 emnlp-2011-Multilayer Sequence Labeling
8 0.3065244 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
9 0.30246824 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
10 0.2881133 77 emnlp-2011-Large-Scale Cognate Recovery
11 0.2830058 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
12 0.26899478 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
13 0.26442361 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
14 0.26437068 3 emnlp-2011-A Correction Model for Word Alignments
15 0.26132062 69 emnlp-2011-Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
16 0.26096746 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
17 0.25593299 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
18 0.25459236 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
19 0.25440902 48 emnlp-2011-Enhancing Chinese Word Segmentation Using Unlabeled Data
20 0.25125748 124 emnlp-2011-Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words
topicId topicWeight
[(23, 0.107), (36, 0.039), (37, 0.028), (45, 0.057), (53, 0.02), (54, 0.026), (62, 0.014), (64, 0.09), (66, 0.366), (69, 0.01), (79, 0.041), (82, 0.039), (90, 0.019), (96, 0.031), (98, 0.016)]
simIndex simValue paperId paperTitle
1 0.93525034 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky
Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.
2 0.93509817 60 emnlp-2011-Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation
Author: Jason Riesa ; Ann Irvine ; Daniel Marcu
Abstract: unkown-abstract
3 0.93011969 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen
Abstract: This paper presents a novel method for the computation of word meaning in context. We make use of a factorization model in which words, together with their window-based context words and their dependency relations, are linked to latent dimensions. The factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. The evaluation on a lexical substitution task carried out for both English and French – indicates that our approach is able to reach better results than state-of-the-art methods in lexical substitution, while at the same time providing more accurate meaning representations. –
same-paper 4 0.87299371 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
5 0.63630772 107 emnlp-2011-Probabilistic models of similarity in syntactic context
Author: Diarmuid O Seaghdha ; Anna Korhonen
Abstract: This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, we report performance above the prior state of the art for estimating sentence similarity and ranking lexical substitutes.
6 0.6337992 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
7 0.58184224 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
8 0.58106846 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
9 0.57431585 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
10 0.57155585 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
11 0.56813663 138 emnlp-2011-Tuning as Ranking
12 0.56062436 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
13 0.55102962 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
14 0.55073595 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
15 0.54868644 97 emnlp-2011-Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French
16 0.54783648 37 emnlp-2011-Cross-Cutting Models of Lexical Semantics
17 0.54406995 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
18 0.53801805 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
19 0.53176272 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
20 0.52058113 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation