emnlp emnlp2011 emnlp2011-146 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , , Abstract We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. [sent-5, score-0.645]
2 Our approach is based on a model that locally mixes between supervised models from the helper languages. [sent-6, score-0.682]
3 We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing. [sent-8, score-0.369]
4 1 Introduction A major focus of recent NLP research has involved unsupervised learning of structure such as POS tag sequences and parse trees (Klein and Manning, 2004; Johnson et al. [sent-9, score-0.213]
5 1 In this paper, we present an approach to using annotated data from one or more languages (helper languages) to learn models for another language that lacks annotated data (the target language). [sent-19, score-0.169]
6 We focus on generative probabilistic models parameterized by multinomial distributions. [sent-24, score-0.167]
7 We begin with supervised maximum likelihood estimates for models of the helper languages. [sent-25, score-0.648]
8 In the second stage, we learn a model for the target language using unannotated data, maximizing likelihood over interpolations of the helper language models’ distributions. [sent-26, score-0.616]
9 c e2th0o1d1s A ins Nocaitautiroanl L foarn Cguoamgpeu Ptartoicoensaslin Lgin,g puaigsetisc 5s0–61, drift down phylogenetic trees (Berg-Kirkpatrick and Klein, 2010) is comparable, but the practical assumption of supervised helper languages is new to this work. [sent-32, score-0.759]
10 (2010) used universal syntactic categories and rules to improve grammar induction, but their model required expert handwritten rules as constraints. [sent-34, score-0.271]
11 Herein, we specifically focus on two problems in linguistic structure prediction: unsupervised POS tagging and unsupervised dependency grammar induction. [sent-35, score-0.474]
12 We also experiment with unsupervised learning of dependency structures from words, by combining our tagger and parser. [sent-38, score-0.221]
13 Select a set of L helper languages for which there exists annotated data hD1, . [sent-43, score-0.756]
14 The link between the languages is achieved through coarsegrained categories, which are now now commonplace (and arguably central to any theory of natural language syntax). [sent-75, score-0.166]
15 A key novel contribution is the use of helper languages for initialization, and of unsupervised learning to learn the contribution of each helper language to that initialization (step 3). [sent-76, score-1.499]
16 3 Interpolated Multilingual Probabilistic Context-Free Grammars Our focus in this paper is on models that consist of multinomial distributions that have relationships between them through a generative process such as a probabilistic context-free grammar (PCFG). [sent-78, score-0.275]
17 The shaded area corresponds to a convex hull inside the probability simplex, indicating a mixture of the parameters of the four languages shown in the figure. [sent-80, score-0.409]
18 Each such estimate , for 1 ≤ ‘ ≤ L, corresponds to a the maximum lik,e floirho 1od ≤ e ‘st i≤ma Lte, c boarsreedsp on asn tnoot aa ttheed data for the ‘th helper language. [sent-94, score-0.703]
19 Then, to create a model for new language, we define a new set of parameters θ as: θ(‘) θk,i=XLβ‘,kθ(k‘,)i, (4) X‘=1 52 where β is the set of coefficients that we will now be interested in estimating (instePad of directly estimating θ). [sent-95, score-0.274]
20 nd T dependency grammar induction we experiment with in §6. [sent-107, score-0.265]
21 We assume that there exists L set of parameters for this PCFG each corresponding to a helper language. [sent-110, score-0.661]
22 where rule A θ(‘) →A α (5) is the probability associated with αin the ‘th helper language. [sent-128, score-0.616]
23 At each point, the derivational process of this PCFG uses the nonterminal’s specific β coefficients to choose one of the helper languages. [sent-129, score-0.739]
24 Such a construction allows more syntactic variability in the language we are trying to estimate, originating in the syntax of the various helper languages. [sent-134, score-0.616]
25 These derivations will now include L K features of the form g‘,k (x, y), corresponding Kto a actouurnest o off th thee event o gf choosing the ‘th mixture component for multinomial k. [sent-140, score-0.441]
26 ” 4 Inference and Parameter Estimation The main building block commonly required for unsupervised learning in NLP is that of computing feature expectations for a given model. [sent-145, score-0.208]
27 (2010) found that replacing traditional multinomial parameterizations with locally normalized, feature-based log-linear models was advantageous. [sent-162, score-0.17]
28 For such a feature-rich model, our mul- tilingual modeling framework still substitutes θ with a mixture of supervised multinomials for L helper languages as in Eq. [sent-164, score-1.122]
29 However, for computational convenience, we also reparametrize the mixture coefficients β: β‘,k=P‘L0e=x1pexγp‘,kγ‘0,k (10) Here, each γ‘,k is an Punconstrained parameter, and the above “softmax” transformation ensures that β lies within the probability simplex for context k. [sent-166, score-0.422]
30 In addition to these estimation techniques, which are based on the optimization of the log-likelihood, we also consider a trivially simple technique for estimating β: setting βl,k to the uniform weight L−1, where L is the number of helper languages. [sent-173, score-0.821]
31 Whenever a probability θi within a multinomial dWishterinbeuvteiorn a in prvoolbvaebsi a coarse-grained category c as an event (i. [sent-176, score-0.188]
32 4During this expansion process for a coarse event, we tried adding random noise to |λt−θ1i(c)| and renormalizing, to break symmetry between the fine events, but that was found to be harmful in preliminary experiments. [sent-180, score-0.238]
33 54 The result of this expansion is a model in the desired family; we use it to initialize conventional unsupervised parameter estimation. [sent-181, score-0.216]
34 Lexical parameters, if any, do not undergo this expansion process, and they are estimated anew in the fine grained model during unsupervised learning, and are initialized using standard methods. [sent-182, score-0.209]
35 We first note the characteristics of the datasets and the universal POS tags used in multilingual modeling. [sent-184, score-0.401]
36 1 Data For our experiments, we fixed a set of four helper languages with relatively large amounts of data, displaying nontrivial linguistic diversity: Czech (Slavic), English (West-Germanic), German (WestGermanic), and Italian (Romance). [sent-186, score-0.727]
37 This was the only set of helper languages we tested; improvements are likely possible. [sent-190, score-0.727]
38 We leave an exploration of helper language choice (a subset selection problem) to future research, instead demonstrating that the concept has merit. [sent-191, score-0.616]
39 Following standard practice, in unsupervised grammar induction experiments we remove punctuation and then eliminate sentences from the data of length greater than 10. [sent-197, score-0.295]
40 These follow recent work by Das and Petrov (201 1) on unsupervised POS tagging in a multilingual setting with parallel data, and have been described in detail by Petrov et al. [sent-204, score-0.334]
41 While there might be some controversy about what an appropriate universal tag set should include, these 12 categories (or a subset) cover the most frequent parts of speech and exist in one form or another in all of the languages that we studied. [sent-206, score-0.381]
42 For each language in our data, a mapping from the fine-grained treebank POS tags to these universal POS tags was constructed manually by Petrov et al. [sent-207, score-0.456]
43 1 Model The model is a hidden Markov model (HMM), which has been popular for unsupervised tagging tasks (Merialdo, 1994; Elworthy, 1994; Smith and Eisner, 2005; Berg-Kirkpatrick et al. [sent-213, score-0.181]
44 These locally normalized log-linear models can look at various aspects of the observation x given a tag y, or the pair of tags in a transition, incorporating overlapping features. [sent-217, score-0.26]
45 (2010) used only a single indicator feature of a tag pair, essentially equating to a traditional multinomial distribution. [sent-222, score-0.24]
46 Since only the unlexicalized transition distributions are common across multiple languages, assuming that they all use a set of universal POS tags, akin to Eq. [sent-224, score-0.284]
47 4, we can have a multilingual version of the transition distributions, by incorporating supervised helper transition probabilities. [sent-225, score-0.989]
48 Thus, we can write: θy→y0=‘XL=1β‘,yθy(→‘)y0 (11) We use the above expression to replace the transition distributions, obtaining a multilingual mixture version of the model. [sent-226, score-0.5]
49 Here, the transition probabilities θy(→‘)y0 for the ‘th helper language are fixed after beingy →esytimated using maximum likelihood estimation on the helper language’s treebank. [sent-227, score-1.385]
50 g93 Table 2: Results for unsupervised POS induction (a) without a tagging dictionary and (b) with a tag dictionary constructed from the training section of the corresponding treebank. [sent-255, score-0.52]
51 “Mixture+DG” is the model where multilingual mixture coefficients β of helper languages are estimated using coarse tags (§4), followed by expansion m(§5ul),t lainndg uthalen m initializing iDciGen wtsit βh tohfe h expanded turaagnesist aiorne parameters. [sent-258, score-1.605]
52 “inUgn ciofoarrmse+ tDagGs” ( §is4 )th, efo case ewdh beyre e xβp are soent (to§ 1)/,4 a, ntdra tnhseintio innsi oiafl helper languages are pmanixdeedd, expanded, paanrda mtheetner sD. [sent-259, score-0.727]
53 In case of (b), the tag dictionary solves the problem of tag identification and performance is measured using per word POS accuracy. [sent-262, score-0.285]
54 (2010), while for the mixture case, we computed gradients with respect to γ, the unconstrained parameters used to express the mixture coefficients β (see Eq. [sent-267, score-0.674]
55 0or1 our multilingual nm oofd tehle (step 3u ien §2), we similarly sampled trielainl gvuaalule ms ofrdoeml ( N(0, 0. [sent-274, score-0.215]
56 01) ; as explained irena §2, coarse umnpivleerds falr tags are ,u0s. [sent-277, score-0.229]
57 After the mixture parameters γ are estimated, we compute the mixture probabilities β using Eq. [sent-279, score-0.551]
58 56 Next, for each tag pair y, y0, we compute θy→y0, which are the coarse transition probabilities interpolated using β, given the helper languages. [sent-281, score-0.954]
59 Finally, we train a feature-HMM by initializing its transition parameters with natural logarithms of the expanded θ parameters, and the emission parameters using small random real values sampled from N(0, 0. [sent-283, score-0.315]
60 This implies that the lexicalized emissNio(n0 parameters η itmhaptl were previously easlitzimeda etemd sinthe coarse multilingual model are thrown away and not used for initialization; instead standard initialization is used. [sent-285, score-0.382]
61 In this baseline, we set the number of HMM states to the number of fine-grained treebank tags for the given language. [sent-296, score-0.173]
62 The first initializes training of the target language’s POS model using a uniform mixture ofthe helper language models (i. [sent-298, score-0.962]
63 , each β‘,y = L1 = 14), and expansion from coarse-grained to fine-grained POS tags as described in §5. [sent-300, score-0.179]
64 ” The second version estimates the mixture coefficients to maximize likelihood, then expands the POS tags (§5), using the result to initialize training of the ftiangasl (m§5o)d,e ul. [sent-302, score-0.548]
65 o”f No Tag Dictionary For each of the above configurations, we ran purely unsupervised training without a tag dictionary, and evaluated using one-to-one mapping accuracy constraining at most one HMM state to map to a unique treebank tag in the test data, using maximum bipartite matching. [sent-304, score-0.433]
66 8 With a Tag Dictionary We also ran a second version of each experimental configuration, where we used a tag dictionary to restrict the possible path sequences of the HMM during both learning and inference. [sent-306, score-0.181]
67 This tag dictionary was constructed only from the training section of a given language’s treebank. [sent-307, score-0.181]
68 For this experiment we removed punctuation from the training and test data, enabling direct use within the dependency grammar induction experiments. [sent-309, score-0.265]
69 We did not choose the other variant, many-to-one mapping accuracy, because quite often the metric mapped several HMM states to one treebank tag, leaving many treebank tags unaccounted for. [sent-311, score-0.259]
70 Without a tag dictionary, in eight out of ten cases, either Uniform+DG or Mixture+DG outperforms the monolingual baseline (Table 2a). [sent-315, score-0.194]
71 For six of these eight languages, the latter model where the mixture coefficients are learned automatically fares better than uniform weighting. [sent-316, score-0.469]
72 With a tag dictionary, the multilingual variants outperform the baseline in seven out of ten cases, and the learned mixture outperforms or matches the uniform mixture in five of those seven (Table 2b). [sent-317, score-0.908]
73 4 Dependency Grammar Induction We next describe experiments for dependency grammar induction. [sent-319, score-0.184]
74 As the basic grammatical model, we adopt the dependency model with valence (Klein and Manning, 2004), which forms the basis for stateof-the-art results for dependency grammar induction in various settings (Cohen and Smith, 2009; Spitkovsky et al. [sent-320, score-0.344]
75 Uniform and Mixture behave similarly, with a slight advantage to the trained mixture setting. [sent-330, score-0.253]
76 Using EM to train the mixture coefficients more often hurts than helps (six languages out of ten). [sent-331, score-0.487]
77 It is well known that likelihood does not cor9Its supervised performance is still far from the supervised state of the art in dependency parsing. [sent-332, score-0.173]
78 Table 3: Results for dependency grammar induction given gold-standard POS tags, reported as attachment accuracy (fraction of parents which are correct). [sent-333, score-0.265]
79 Projection of the learned mixture coefficients through PCA. [sent-338, score-0.376]
80 relate with the true accuracy measurement, and so it is unsurprising that this holds in the constrained mixture family as well. [sent-345, score-0.283]
81 In future work, a different parametrization of the mixture coefficients, through features, or perhaps a Bayesian prior on the weights, might lead to an objective that better simulates accuracy. [sent-346, score-0.253]
82 Table 3 shows that even uniform mixture coefficients are sufficient to obtain accuracy which supercedes most unsupervised baselines. [sent-347, score-0.578]
83 Our experiments also show that multilingual learning performs better for dependency grammar induction than part-of-speech tagging. [sent-351, score-0.418]
84 The transition matrix in partof-speech tagging largely depends on word order in the various helper languages, which differs greatly. [sent-353, score-0.782]
85 This means that a mixture of transition matrices will not necessarily yield a meaningful transition matrix. [sent-354, score-0.441]
86 However, for dependency grammar, there are certain universal dependencies which appear in all helper languages, and therefore, a mixture between multinomials for these dependencies still yields a useful multinomial. [sent-355, score-1.26]
87 5 Inducing Dependencies from Words Finally, we combine the models for POS tagging and grammar induction to perform grammar induction directly from words, instead of gold-standard POS tags. [sent-357, score-0.444]
88 With a tag dictionary, learn a fine-grained POS tagging model unsupervised, using either DG or Mixture+DG as described in §6. [sent-359, score-0.176]
89 956g0 Table 4: Results for dependency grammar induction over words. [sent-372, score-0.265]
90 Given the two models, we infer POS tags on the test data using DG or Mixture+DG to get a lattice (Joint) or a sequence (Pipeline) and then parse using the model from the previous The resulting dependency trees are evaluated against the gold standard. [sent-394, score-0.201]
91 In almost all cases, joint decoding of tags and trees performs better than the pipeline. [sent-397, score-0.166]
92 Even though our part-of-speech tagger with multilingual guidance outperforms the completely unsupervised baseline, there is not always an advantage of using this multilingually guided partof-speech tagger for dependency grammar induction. [sent-398, score-0.558]
93 For Turkish, Japanese, Slovene and Dutch, our unsupervised learner from words outperforms unsupervised parsing using gold-standard part-of-speech tags. [sent-399, score-0.218]
94 Earlier work that induced part-of-speech tags and then performed unsupervised parsing in a pipeline includes Klein and Manning (2004) and Smith (2006). [sent-402, score-0.283]
95 7 Conclusion We presented an approach to exploiting annotated data in helper languages to infer part-of-speech tagging and dependency parsing models in a different, target language, without parallel data. [sent-405, score-0.907]
96 We also described a way to do joint decoding of part-of-speech tags and dependencies which performs better than a pipeline. [sent-407, score-0.204]
97 Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. [sent-463, score-0.291]
98 Improving unsupervised dependency parsing with richer contexts and smoothing. [sent-520, score-0.188]
99 From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. [sent-676, score-0.188]
100 Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. [sent-688, score-0.183]
wordName wordTfidf (topN-words)
[('helper', 0.616), ('mixture', 0.253), ('multilingual', 0.153), ('dg', 0.149), ('multinomial', 0.136), ('universal', 0.126), ('coefficients', 0.123), ('tags', 0.122), ('pcfg', 0.117), ('languages', 0.111), ('multinomials', 0.11), ('unsupervised', 0.109), ('pos', 0.107), ('coarse', 0.107), ('grammar', 0.105), ('tag', 0.104), ('expectations', 0.099), ('transition', 0.094), ('uniform', 0.093), ('smith', 0.089), ('aa', 0.087), ('cohen', 0.081), ('induction', 0.081), ('dependency', 0.079), ('dictionary', 0.077), ('tagging', 0.072), ('hmm', 0.072), ('petrov', 0.068), ('klein', 0.068), ('naseem', 0.065), ('das', 0.062), ('estimation', 0.059), ('dmv', 0.058), ('snyder', 0.057), ('expansion', 0.057), ('em', 0.056), ('coarsegrained', 0.055), ('gillenwater', 0.055), ('estimating', 0.053), ('pipeline', 0.052), ('ten', 0.052), ('event', 0.052), ('nonterminal', 0.052), ('treebank', 0.051), ('initialize', 0.05), ('derivatives', 0.05), ('hg', 0.05), ('initialization', 0.047), ('conjoined', 0.046), ('simplex', 0.046), ('guidance', 0.046), ('parameters', 0.045), ('decoding', 0.044), ('avg', 0.043), ('tying', 0.043), ('fine', 0.043), ('montemagni', 0.042), ('ngiextfh', 0.042), ('udm', 0.042), ('hthee', 0.041), ('categories', 0.04), ('emission', 0.039), ('xl', 0.039), ('ganchev', 0.039), ('monolingual', 0.038), ('dependencies', 0.038), ('egt', 0.037), ('initializer', 0.037), ('dutch', 0.036), ('italian', 0.036), ('mapping', 0.035), ('distributions', 0.034), ('locally', 0.034), ('denoted', 0.034), ('ien', 0.033), ('interpolated', 0.033), ('tiger', 0.033), ('headden', 0.033), ('tagger', 0.033), ('supervised', 0.032), ('buchholz', 0.032), ('expanded', 0.032), ('understood', 0.032), ('parameterized', 0.031), ('danish', 0.031), ('slovene', 0.031), ('backbone', 0.031), ('initializing', 0.031), ('symmetry', 0.031), ('lexicalized', 0.03), ('projection', 0.03), ('state', 0.03), ('unlexicalized', 0.03), ('family', 0.03), ('mcdonald', 0.03), ('posterior', 0.029), ('sampled', 0.029), ('annotated', 0.029), ('lr', 0.029), ('pcfgs', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999893 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
2 0.22591098 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
3 0.18868764 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky
Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.
4 0.15950106 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
5 0.15663446 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
6 0.14866047 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
7 0.13896364 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
8 0.11993639 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
9 0.095363989 125 emnlp-2011-Statistical Machine Translation with Local Language Models
10 0.093655936 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
11 0.081785314 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
12 0.077066623 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
13 0.075319186 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
14 0.073508941 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion
15 0.07317438 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
16 0.07239034 129 emnlp-2011-Structured Sparsity in Structured Prediction
17 0.071894728 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
18 0.068741791 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
19 0.067855299 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
20 0.067070171 96 emnlp-2011-Multilayer Sequence Labeling
topicId topicWeight
[(0, 0.273), (1, 0.084), (2, -0.054), (3, 0.175), (4, -0.049), (5, 0.173), (6, -0.18), (7, -0.05), (8, -0.199), (9, -0.024), (10, -0.109), (11, 0.124), (12, 0.044), (13, 0.073), (14, 0.124), (15, 0.046), (16, 0.071), (17, 0.081), (18, -0.106), (19, 0.11), (20, -0.031), (21, -0.047), (22, 0.077), (23, 0.026), (24, 0.09), (25, 0.18), (26, -0.01), (27, -0.099), (28, -0.029), (29, -0.131), (30, -0.019), (31, -0.0), (32, -0.016), (33, 0.053), (34, -0.034), (35, 0.067), (36, 0.014), (37, 0.025), (38, 0.054), (39, 0.047), (40, 0.018), (41, -0.046), (42, -0.103), (43, 0.011), (44, -0.004), (45, -0.008), (46, 0.004), (47, 0.003), (48, 0.008), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.95799017 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
2 0.72140074 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Angel X. Chang ; Daniel Jurafsky
Abstract: We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of- the-art dependency grammar inducer achieves 59. 1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% higher than using gold tags.
3 0.71606338 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
4 0.64793378 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Author: Young-Bum Kim ; Joao Graca ; Benjamin Snyder
Abstract: In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled lan- guage which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDLbased approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases.
5 0.62720907 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
Author: Christos Christodoulopoulos ; Sharon Goldwater ; Mark Steedman
Abstract: In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.
6 0.59203064 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
7 0.54639381 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
8 0.51947033 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion
9 0.44310024 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
10 0.43943509 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
11 0.3907679 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
12 0.37094408 125 emnlp-2011-Statistical Machine Translation with Local Language Models
13 0.36486009 129 emnlp-2011-Structured Sparsity in Structured Prediction
14 0.36206573 96 emnlp-2011-Multilayer Sequence Labeling
15 0.34698728 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
16 0.34481081 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
17 0.34158313 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
18 0.33533722 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
19 0.33241123 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
20 0.32668731 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
topicId topicWeight
[(23, 0.073), (36, 0.026), (37, 0.02), (45, 0.058), (53, 0.031), (54, 0.029), (57, 0.011), (62, 0.015), (64, 0.411), (66, 0.065), (69, 0.022), (79, 0.062), (82, 0.034), (90, 0.019), (96, 0.036)]
simIndex simValue paperId paperTitle
1 0.92242295 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
2 0.87101567 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
same-paper 3 0.87003374 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
Author: Shay B. Cohen ; Dipanjan Das ; Noah A. Smith
Abstract: We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts are unavailable. We obtain state-of-theart performance for two tasks of structure prediction: unsupervised part-of-speech tagging and unsupervised dependency parsing.
4 0.52875382 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
Author: Sebastian Riedel ; Andrew McCallum
Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.
5 0.51513207 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information.
6 0.49415565 140 emnlp-2011-Universal Morphological Analysis using Structured Nearest Neighbor Prediction
7 0.49073821 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
8 0.47468343 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
9 0.47042996 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
10 0.46774498 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
11 0.45303354 77 emnlp-2011-Large-Scale Cognate Recovery
12 0.44394356 11 emnlp-2011-A Simple Word Trigger Method for Social Tag Suggestion
13 0.44366285 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
14 0.43853766 136 emnlp-2011-Training a Parser for Machine Translation Reordering
15 0.43809536 64 emnlp-2011-Harnessing different knowledge sources to measure semantic relatedness under a uniform model
16 0.43791026 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
17 0.43607891 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
18 0.43195915 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
19 0.43184832 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
20 0.43135041 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts