emnlp emnlp2012 emnlp2012-123 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. [sent-3, score-0.745]
2 In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. [sent-4, score-0.687]
3 NNS Gewerkschaften verlangen NN VVFIN Unions demand Verzicht auf die Reform NN APPR ART NN abandonment on the reform Figure 1: Sentences in English and German both containing words that mean “demand. [sent-19, score-0.382]
4 ” The fact that the English demand takes nouns on its left and right indicates that the German verlangen should do the same, correctly suggesting attachments to Verzicht and Gewerkschaften. [sent-20, score-0.371]
5 While these universal parsers currently constitute the highest-performing methods for languages without treebanks, they are inherently limited by operating at the coarse POS level, as lexical features are vital to supervised parsing models. [sent-24, score-0.392]
6 In this work, we consider augmenting delexicalized parsers by transferring syntactic information through a bilingual lexicon at the word type level. [sent-25, score-0.598]
7 These parsers are delexicalized in the sense that, although they receive target language words as input, their feature sets do not include indicators on those words. [sent-26, score-0.337]
8 This setting is appropriate when there is too little target language data to learn lexical features di- rectly. [sent-27, score-0.201]
9 Our main approach is to add features which are lexical in the sense that they compute a function of specific target language words, but are still unPrLocane gduiang se L ofe athrnei 2n0g1,2 pa Jogienst 1 C–1o1n,fe Jre jnuce Is olan Ed,m Kpoir ieca ,l 1 M2–e1th4o Jdusly in 20 N1a2t. [sent-28, score-0.201]
10 c al2 L0a1n2g Auas goec Piartoiocnes fsoinrg C aonmdp Cutoamtiopnuatalt Lioin aglui Nstaitcusral lexical in the sense that all lexical knowledge comes from the bilingual lexicon and training data in the source language. [sent-30, score-0.521]
11 A delexicalized parser operating at the part of speech level does not have sufficient information to make the correct decision about, for example, the choice of subcategorization frame for the verb verlangen. [sent-32, score-0.209]
12 We can fire features in our German parser on the attachments of Gewerkschaften and Verzicht to verlangen indicating that similar-looking attachments are attested in English for an English translation of verlangen. [sent-34, score-0.52]
13 This allows us to exploit fine-grained lexical cues to make German parsing decisions even when we have little or no supervised German data; moreover, this syntactic transfer is possible even in spite of the fact that demand and verlangen are not observed in parallel context. [sent-35, score-0.759]
14 Using type-level transfer through a dictionary in this way allows us to decouple the lexico-syntactic projection from the data conditions under which we are learning the parser. [sent-36, score-0.484]
15 After computing feature values using source language resources and a bilingual lexicon, our model can be trained very simply using any appropriate training method for a supervised parser. [sent-37, score-0.368]
16 Furthermore, because the transfer mechanism is just a set of features over word types, we are free to derive our bilingual lexicon either from bitext or from a manually-constructed dictionary, making our method strictly more general than those of McDonald et al. [sent-38, score-0.828]
17 This flexibility is potentially useful for resource-poor languages, where a humancurated bilingual lexicon may be broader in coverage or more robust to noise than a small, domainlimited bitext. [sent-41, score-0.393]
18 Of course, it is an empirical question whether transferring type level information about word behavior is effective; we show that, indeed, this method compares favorably with other transfer mechanisms used in past work. [sent-42, score-0.319]
19 The actual syntactic information that we transfer consists of purely monolingual lexical attachment 2 statistics computed on an annotated source language resource. [sent-43, score-0.595]
20 , 2011), doing so in a projection setting is novel and forces us to design features suitable for projection through a bilingual lexicon. [sent-46, score-0.454]
21 Our features must also be flexible enough to provide benefit even in the presence of cross-lingual syntactic differences and noise introduced by the bilingual dictionary. [sent-47, score-0.328]
22 Under two different training conditions and with two different varieties of bilingual lexicons, we show that our method of lexico-syntactic projection does indeed improve the performance of parsers that would otherwise be agnostic to lexical information. [sent-48, score-0.493]
23 In all settings, we see statistically significant gains for a range of languages, with our method providing up to 3% absolute improvement in unlabeled attachment score (UAS) and 11% relative error reduction. [sent-49, score-0.249]
24 2 Model The projected lexical features that we propose in this work are based on lexicalized versions of features found in MSTParser (McDonald et al. [sent-50, score-0.228]
25 2 Byinstantiating the basic MSTParser features over coarse parts of speech, we construct a state-of-the-art delexicalized parser in the style of McDonald et al. [sent-53, score-0.269]
26 (201 1), where feature weights can be directly transferred from a source language or languages to a desired target language. [sent-54, score-0.267]
27 When we add projected lexical features on top of this baseline parser, we do so in a way that does not sacrifice this generality: while our new features take on values that are languagespecific, they interact with the model at a languageindependent level. [sent-55, score-0.228]
28 We therefore have the best of 1Throughout this work, we will use English as the source language, but it is possible to use any language for which the appropriate bilingual lexicons and treebanks exist. [sent-56, score-0.661]
29 We form a query from each stipulated set of characteristics, compute the values of these queries heuristically, and then fire a feature based on each query’s signature. [sent-62, score-0.243]
30 Signatures indicate which attachment properties were considered, which part of the query was lexicalized (shown by brackets here), and the POS of the query word. [sent-63, score-0.468]
31 two worlds in that our features any treebank or treebanks but still exploit highly can be learned on that are available specific lexical to us, information to achieve performance gains over using coarse POS features alone. [sent-65, score-0.589]
32 The ATTACH features for a given dependency link consist of indicators of the tags of the head and modifier, separately as well as together. [sent-70, score-0.17]
33 The INBETWEEN and SURROUNDING features are indicators on the tags of the head and modifier in addition to each intervening tag in turn (INBETWEEN) or various combinations of tags adjacent to the head or modifier (SURROUNDING). [sent-71, score-0.241]
34 3 MSTParser by default also includes a copy of each of these indicator features conjoined with the direction and distance of the attachment it denotes. [sent-72, score-0.244]
35 We slightly modify the conjunction scheme and expand it with additional backed-off conjunctions, since these changes lead to features that empirically transfer better than the MSTParser defaults. [sent-77, score-0.332]
36 Specifically, we use conjunctions with attachment direction (left or right), coarsened distance,4 and attachment direction and coarsened distance combined. [sent-78, score-0.42]
37 We emphasize again that these baseline features are entirely standard, and all the DELEX feature set does is recreate an MSTParser-based analogue ofthe direct transfer parser described by McDonald et al. [sent-79, score-0.404]
38 2 PROJ Features We will now describe how to compute our projected lexical features, the PROJ feature set, which constitutes the main contribution of this work. [sent-82, score-0.164]
39 Recall that we wish our method to be as general as possible and work under many different training conditions; in particular, we wish to be able to train our model on only existing treebanks in other languages when no target language trees are available (discussed in Section 3. [sent-83, score-0.447]
40 3), or on only a very small target language treebank (Section 3. [sent-84, score-0.168]
41 We instead must augment our baseline model with a relatively small number of features that are nonetheless rich enough to transfer the necessary lexical information. [sent-88, score-0.388]
42 Our overall approach is sketched in Figure 2, where we show the features that fire on two proposed edges in a German dependency parse. [sent-89, score-0.178]
43 For sets of properties that do not include a lexical item, such as VERB→NOUN, we fire an indicator feature from the DEL→EX feature set. [sent-91, score-0.28]
44 Rather than firing the query as an indicator feature directly, which would result in a model parameter for each target word, we fire a broad feature called an signature whose value reflects the specifics of the query (computation of these values is discussed in Section 2. [sent-93, score-0.625]
45 Our signatures allow us to instantiate features at different levels of granularity corresponding to the levels of granularity in the DELEX feature set. [sent-101, score-0.29]
46 Therefore, our feature set manages to provide the training procedure with choices about how much syntactic information to transfer at the same time as it prevents overfitting and provides language independence. [sent-104, score-0.352]
47 1 Query and Signature Types A query is a subset of the following pieces of information about an edge: parent word, parent POS, child word, child POS, attachment direction, and binned attachment distance. [sent-107, score-0.645]
48 The non-empty subset of attachment properties included in the query 2. [sent-111, score-0.339]
49 aAt sw an example, t h×e queries verlangen → NOUN verlangen → ADP sprechen → NOUN all share the signature [VERB]→CHILD, but verlangen → NOUN,RIGHT Verzicht → ADP VERB → Verzicht have [VERB]→CHILD,DIR, [ADP]→CHILD, and PARENT→[NOUN] as their signatures, respectively. [sent-115, score-0.776]
50 The le→vel[ of granularity for signatures is a parameter that simply must be engineered. [sent-116, score-0.174]
51 We found some benefit in actually instantiating two signatures for every query, one as described above and one that 5Bilexical features are possible in our framework, but we do not use them here, so for clarity we assume that each query has one associated word. [sent-117, score-0.315]
52 We then compute relative frequency counts for each possible query type to get source language scores, which will later be projected through the dictionary to obtain target language feature values. [sent-121, score-0.454]
53 2 Query Value Estimation Each query is given a value according to a gener- ative heuristic that involves the source training data and the probabilistic bilingual lexicon. [sent-127, score-0.428]
54 , wt) where wt is the target language query word and the xi are the values of the included language-independent attachment properties. [sent-131, score-0.456]
55 The value this feature takes is given by a simple generative model: we imagine generating the attachment properties xi given wt by first generating a source 7Lexicons such as those produced by automatic aligners include probabilities natively, but obviously human-created lexicons do not. [sent-132, score-0.51]
56 5 word ws from wt based on the bilingual lexicon, then jointly generating the xi conditioned on ws. [sent-135, score-0.359]
57 In addition, we consider two different sources for our bilingual lexicon: AUTOMATIC: Extracted from bitext MANUAL: Constructed from human annotations Both bitexts and human-curated bilingual dictionaries are more widely available than complete treebanks. [sent-161, score-0.743]
58 We show results of our method on bilingual dictionaries derived from both sources, in order to show that it is applicable under a variety of data conditions and can successfully take advantage of such resources as are available. [sent-163, score-0.34]
59 We make use of dependency treebanks for Danish, German, Greek, Spanish, Italian, Dutch, Portuguese, and Swedish, all from the 2006 shared task. [sent-167, score-0.263]
60 (201 1), we strip punctuation from all treebanks for the results of Section 3. [sent-177, score-0.215]
61 All results are given in terms of unlabeled attachment score (UAS), ignoring punctuation even when it is present. [sent-179, score-0.164]
62 We use the Europarl parallel corpus (Koehn, 2005) as the bitext from which to extract the AUTOMATIC bilingual lexicons. [sent-180, score-0.422]
63 For each target language, we produce one-to-one alignments on the English- target bitext by running the Berkeley Aligner (Liang et al. [sent-181, score-0.336]
64 Coverage of rare words in the treebank is less important when a given word must also appear in the bilingual lexicon as the translation ofan observed German word in order to be useful. [sent-183, score-0.425]
65 that even in the absence of gold annotation, such tags could be produced from bitext using the method of (Das and Petrov, 2011) or could be read off from a bilingual lexicon. [sent-184, score-0.425]
66 27580 * * Table 1: Evaluation of features derived from AUTOMATIC and MANUAL bilingual lexicons when trained on a concatenation of non-target-language treebanks (the BANKS setting). [sent-195, score-0.654]
67 (2012) and improvements from their best methods of using bitext and lexical information. [sent-202, score-0.198]
68 However, we still see that the performance of our type-level transfer method approaches that of bitext-based methods, which require complex bilingual training for each new language. [sent-204, score-0.528]
69 We also use our method on bilingual dictionaries constructed in a more conventional way. [sent-207, score-0.286]
70 For this purpose, we scrape our MANUAL bilingual lexicons from English Wiktionary (Wikimedia Foundation, 2012). [sent-208, score-0.437]
71 Finally, because our method requires a dictionary with probability weights, we assume that each target language word translates with uniform probability into any of the candidates that we scrape. [sent-211, score-0.162]
72 Training our PROJ features on the non-English treebanks in this concatenation can be understood as trying to learn which lexico-syntactic properties transfer “universally,” or at least transfer broadly within the families of languages we are considering. [sent-217, score-0.993]
73 Table 1shows the performance of the DELEX feature set and the DELEX+PROJ feature set using both AUTOMATIC and MANUAL bilingual lexicons. [sent-218, score-0.308]
74 (2012) (multi-direct transfer and no clusters) and the improvements from their best methods using lexical information (multi-projected transfer and crosslingual clusters). [sent-223, score-0.624]
75 and MANUAL bilingual lexicons when trained on various Values reported are UAS for sentences of all lengths on our enlarged CoNLL test sets (see text); each value is based on 50 sampled training sets of the given size. [sent-225, score-0.391]
76 4 SEED We now turn our attention to the SEED scenario, where a small number of target language trees are available for each language we consider. [sent-232, score-0.149]
77 While it is imaginable to continue to exploit the other treebanks in the presence of target language trees, we found that training our DELEX features on the seed treebank alone gave higher performance than any 10The baseline of T ¨ackstr o¨m et al. [sent-233, score-0.52]
78 8 attempt to also use the concatenation of treebanks from the previous section. [sent-235, score-0.215]
79 This is not too surprising because, with this number of sentences, there is already good monolingual coverage of coarse POS features, and attempting to train features on other languages can be expected to introduce noise into otherwise accurate monolingual feature weights. [sent-236, score-0.268]
80 We train our DELEX+PROJ model with both AUTOMATIC and MANUAL lexicons on target language training sets of size 100, 200, and 400, and give results for each language in Table 2. [sent-237, score-0.244]
81 The performance of parsers trained on small numbers of trees can be highly variable, so we create multiple treebanks of each size by repeatedly sampling from each language’s train treebank, and report averaged results. [sent-238, score-0.325]
82 These are not comparable to the other values in the table because we are using our projection strategy monolingually, which removes the barriers of imperfect lexical correspondence (from using the lexicon) and imperfect syntactic correspondence (from projecting). [sent-244, score-0.173]
83 As one might expect, the gains on English are far higher than the gains on other languages. [sent-245, score-0.17]
84 We conclude that ac- curately learning which signatures transfer between languages is important, and it is easier to learn good feature weights when some target language data is available. [sent-251, score-0.634]
85 1 AUTOMATIC versus MANUAL Overall, we see that gains from using our MANUAL lexicons are slightly lower than those from our AUTOMATIC lexicons. [sent-254, score-0.232]
86 One might expect higher performance because scraped bilingual lexicons are not prone to some of the same noise that exists in auto9 AVUoTcOMOATCICCVMocANUOACLC Table 3: Lexicon statistics for all languages for both sources of bilingual lexicons. [sent-255, score-0.718]
87 “Voc” indicates vocabulary size and “OCC” indicates open-class coverage, the fraction of open-class tokens in the test treebanks with entries in our bilingual lexicon. [sent-256, score-0.459]
88 Rather, as we see in Table 3, the low recall of our MANUAL lexicons on open-class words appears to be a possible culprit. [sent-258, score-0.147]
89 The coverage gap between these and the AUTOMATIC lexicons is partially due to the inconsistent structure of Wiktionary: inflected German and Greek words often do not have their own pages, so we miss even common morphological variants of verb forms in those languages. [sent-259, score-0.289]
90 One might hypothesize that our uniform weighting scheme in the MANUAL lexicon is another source of problems, and that bitext-derived weights are necessary to get high performance. [sent-262, score-0.165]
91 The English verb want takes fundamentally different children than wollen does, so properties of the sort we present in Section 2. [sent-267, score-0.168]
92 Bitext is not necessary for our approach to work, and results on the AUTOMATIC lexicon suggest that our type-level transfer method can in fact do much better given a higher quality resource. [sent-270, score-0.394]
93 2 Limitations While our method does provide consistent gains across a range of languages, the injection of lexical information is clearly not sufficient to bridge the gap between unsupervised and supervised parsers. [sent-272, score-0.178]
94 4 that the cross-lingual transfer step of our method imposes a fundamental limitation on how useful any such approach can be, which we now investigate further. [sent-274, score-0.284]
95 While it is still broadly true that want and wollen both have verbal elements located to their right, it is less clear how to design features that can still take advantage of this while working around the differences we have described. [sent-279, score-0.149]
96 Therefore, a gap between the per10 formance of our features on English and the perfor- mance of our projected features, as is observed in Table 2, is to be expected in the absence of a more complete model of syntactic divergence. [sent-280, score-0.16]
97 5 Conclusion In this work, we showed that lexical attachment preferences can be projected to a target language at the type level using only a bilingual lexicon, improving over a delexicalized baseline parser. [sent-281, score-0.752]
98 This method is broadly applicable in the presence or absence of target language training trees and with bilingual lexicons derived from either manually-annotated resources or bitexts. [sent-282, score-0.573]
99 The greatest improvements arise when the bilingual lexicon has high coverage and a number of target language trees are available in order to learn exactly what lexico-syntactic properties transfer from the source language. [sent-283, score-0.927]
100 While unsupervised and existing projection methods do feature great versatility and may yet produce state-of-the-art parsers on resource-poor languages, spending time constructing small supervised resources appears to be the fastest method to achieve high performance in these settings. [sent-285, score-0.208]
wordName wordTfidf (topN-words)
[('delex', 0.319), ('transfer', 0.284), ('bilingual', 0.244), ('proj', 0.235), ('verlangen', 0.228), ('treebanks', 0.215), ('attachment', 0.164), ('lexicons', 0.147), ('bitext', 0.142), ('mstparser', 0.138), ('signatures', 0.138), ('query', 0.129), ('german', 0.128), ('mcdonald', 0.118), ('delexicalized', 0.115), ('ackstr', 0.114), ('banks', 0.114), ('verzicht', 0.114), ('lexicon', 0.11), ('target', 0.097), ('manual', 0.093), ('signature', 0.092), ('inbetween', 0.091), ('gains', 0.085), ('languages', 0.083), ('demand', 0.082), ('fire', 0.082), ('projection', 0.081), ('projected', 0.076), ('treebank', 0.071), ('bitexts', 0.071), ('wollen', 0.068), ('petrov', 0.066), ('coarse', 0.066), ('wt', 0.066), ('uas', 0.065), ('dictionary', 0.065), ('attachments', 0.061), ('adp', 0.059), ('parsers', 0.058), ('attach', 0.058), ('lexical', 0.056), ('source', 0.055), ('seed', 0.055), ('conditions', 0.054), ('verb', 0.054), ('wiktionary', 0.053), ('child', 0.052), ('trees', 0.052), ('english', 0.051), ('conll', 0.05), ('inflected', 0.049), ('ws', 0.049), ('das', 0.048), ('dependency', 0.048), ('features', 0.048), ('properties', 0.046), ('coarsened', 0.046), ('daggers', 0.046), ('fight', 0.046), ('gewerkschaften', 0.046), ('quota', 0.046), ('scrape', 0.046), ('svo', 0.046), ('universal', 0.044), ('klein', 0.043), ('dictionaries', 0.042), ('parent', 0.042), ('nns', 0.041), ('nn', 0.041), ('surrounding', 0.041), ('pos', 0.04), ('parser', 0.04), ('modifier', 0.04), ('coverage', 0.039), ('tags', 0.039), ('dagger', 0.039), ('bansal', 0.039), ('reform', 0.039), ('wikimedia', 0.039), ('ryan', 0.038), ('supervised', 0.037), ('slav', 0.037), ('syntactic', 0.036), ('granularity', 0.036), ('parallel', 0.036), ('appr', 0.035), ('lightly', 0.035), ('transferring', 0.035), ('koo', 0.035), ('dipanjan', 0.035), ('indicators', 0.035), ('continue', 0.034), ('broadly', 0.033), ('die', 0.033), ('greek', 0.033), ('logarithm', 0.033), ('indicator', 0.032), ('noun', 0.032), ('feature', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
2 0.20258649 81 emnlp-2012-Learning to Map into a Universal POS Tagset
Author: Yuan Zhang ; Roi Reichart ; Regina Barzilay ; Amir Globerson
Abstract: We present an automatic method for mapping language-specific part-of-speech tags to a set of universal tags. This unified representation plays a crucial role in cross-lingual syntactic transfer of multilingual dependency parsers. Until now, however, such conversion schemes have been created manually. Our central hypothesis is that a valid mapping yields POS annotations with coherent linguistic properties which are consistent across source and target languages. We encode this intuition in an objective function that captures a range of distributional and typological characteristics of the derived mapping. Given the exponential size of the mapping space, we propose a novel method for optimizing over soft mappings, and use entropy regularization to drive those towards hard mappings. Our results demonstrate that automatically induced mappings rival the quality of their manually designed counterparts when evaluated in the . context of multilingual parsing.1
3 0.14831448 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
Author: Akihiro Tamura ; Taro Watanabe ; Eiichiro Sumita
Abstract: This paper proposes a novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graphbased label propagation. In previous work, it was established that performance drastically decreases when the coverage of a seed lexicon is small. We resolve this problem by utilizing indirect relations with the bilingual seeds together with direct relations, in which each word is represented by a distribution of translated seeds. The seed distributions are propagated over a graph representing relations among words, and translation pairs are extracted by identifying word pairs with a high similarity in the seed distributions. We propose two types of the graphs: a co-occurrence graph, representing co-occurrence relations between words, and a similarity graph, representing context similarities between words. Evaluations using English and Japanese patent comparable corpora show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph achieved improved performance by clustering synonyms into the same translation.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
5 0.14173771 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
Author: Shen Li ; Joao Graca ; Ben Taskar
Abstract: Despite significant recent work, purely unsupervised techniques for part-of-speech (POS) tagging have not achieved useful accuracies required by many language processing tasks. Use of parallel text between resource-rich and resource-poor languages is one source ofweak supervision that significantly improves accuracy. However, parallel text is not always available and techniques for using it require multiple complex algorithmic steps. In this paper we show that we can build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary. Across eight languages for which we have labeled data to evaluate results, we achieve accuracy that significantly exceeds best unsupervised and parallel text methods. We achieve highest accuracy reported for several languages and show that our . approach yields better out-of-domain taggers than those trained using fully supervised Penn Treebank.
6 0.12853558 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
7 0.12218773 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
8 0.12200933 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
9 0.10396685 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
10 0.09632124 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
11 0.091042802 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
12 0.087597623 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
13 0.084612451 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
14 0.07589823 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
15 0.074741706 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
16 0.074398972 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
17 0.072602183 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
18 0.071144082 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
19 0.06845165 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
20 0.065960072 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
topicId topicWeight
[(0, 0.273), (1, -0.148), (2, 0.068), (3, -0.036), (4, 0.097), (5, 0.052), (6, 0.097), (7, 0.085), (8, 0.04), (9, 0.01), (10, 0.202), (11, 0.013), (12, 0.122), (13, 0.196), (14, -0.176), (15, -0.082), (16, 0.056), (17, 0.035), (18, -0.019), (19, -0.077), (20, 0.024), (21, -0.161), (22, -0.027), (23, 0.077), (24, 0.017), (25, -0.007), (26, -0.128), (27, -0.169), (28, -0.096), (29, 0.066), (30, -0.123), (31, 0.007), (32, 0.015), (33, 0.109), (34, 0.091), (35, -0.142), (36, -0.002), (37, 0.076), (38, -0.095), (39, -0.133), (40, 0.045), (41, 0.118), (42, -0.019), (43, 0.016), (44, 0.015), (45, -0.072), (46, -0.022), (47, 0.079), (48, 0.088), (49, -0.073)]
simIndex simValue paperId paperTitle
same-paper 1 0.94364202 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
2 0.68455112 81 emnlp-2012-Learning to Map into a Universal POS Tagset
Author: Yuan Zhang ; Roi Reichart ; Regina Barzilay ; Amir Globerson
Abstract: We present an automatic method for mapping language-specific part-of-speech tags to a set of universal tags. This unified representation plays a crucial role in cross-lingual syntactic transfer of multilingual dependency parsers. Until now, however, such conversion schemes have been created manually. Our central hypothesis is that a valid mapping yields POS annotations with coherent linguistic properties which are consistent across source and target languages. We encode this intuition in an objective function that captures a range of distributional and typological characteristics of the derived mapping. Given the exponential size of the mapping space, we propose a novel method for optimizing over soft mappings, and use entropy regularization to drive those towards hard mappings. Our results demonstrate that automatically induced mappings rival the quality of their manually designed counterparts when evaluated in the . context of multilingual parsing.1
3 0.5860163 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
Author: Akihiro Tamura ; Taro Watanabe ; Eiichiro Sumita
Abstract: This paper proposes a novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graphbased label propagation. In previous work, it was established that performance drastically decreases when the coverage of a seed lexicon is small. We resolve this problem by utilizing indirect relations with the bilingual seeds together with direct relations, in which each word is represented by a distribution of translated seeds. The seed distributions are propagated over a graph representing relations among words, and translation pairs are extracted by identifying word pairs with a high similarity in the seed distributions. We propose two types of the graphs: a co-occurrence graph, representing co-occurrence relations between words, and a similarity graph, representing context similarities between words. Evaluations using English and Japanese patent comparable corpora show that our proposed graph propagation method outperforms conventional methods. Further, the similarity graph achieved improved performance by clustering synonyms into the same translation.
4 0.57049644 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
Author: David Marecek ; Zdene20 ek Zabokrtsky
Abstract: The possibility of deleting a word from a sentence without violating its syntactic correctness belongs to traditionally known manifestations of syntactic dependency. We introduce a novel unsupervised parsing approach that is based on a new n-gram reducibility measure. We perform experiments across 18 languages available in CoNLL data and we show that our approach achieves better accuracy for the majority of the languages then previously reported results.
5 0.51049787 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
Author: Shen Li ; Joao Graca ; Ben Taskar
Abstract: Despite significant recent work, purely unsupervised techniques for part-of-speech (POS) tagging have not achieved useful accuracies required by many language processing tasks. Use of parallel text between resource-rich and resource-poor languages is one source ofweak supervision that significantly improves accuracy. However, parallel text is not always available and techniques for using it require multiple complex algorithmic steps. In this paper we show that we can build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary. Across eight languages for which we have labeled data to evaluate results, we achieve accuracy that significantly exceeds best unsupervised and parallel text methods. We achieve highest accuracy reported for several languages and show that our . approach yields better out-of-domain taggers than those trained using fully supervised Penn Treebank.
6 0.43342823 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
7 0.41499957 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
8 0.39816657 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
10 0.36053717 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
11 0.35932499 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
12 0.35318905 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
13 0.3486827 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
14 0.32669005 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
15 0.31867939 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
16 0.31445628 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
17 0.31136912 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
18 0.31014773 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
19 0.29190037 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
20 0.28251582 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation
topicId topicWeight
[(2, 0.022), (11, 0.019), (16, 0.07), (25, 0.014), (29, 0.023), (34, 0.085), (39, 0.029), (42, 0.232), (45, 0.022), (60, 0.092), (63, 0.04), (64, 0.025), (65, 0.021), (70, 0.051), (74, 0.053), (76, 0.062), (80, 0.035), (86, 0.019), (95, 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.7571553 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
Author: Greg Durrett ; Adam Pauls ; Dan Klein
Abstract: We consider the problem of using a bilingual dictionary to transfer lexico-syntactic information from a resource-rich source language to a resource-poor target language. In contrast to past work that used bitexts to transfer analyses of specific sentences at the token level, we instead use features to transfer the behavior of words at a type level. In a discriminative dependency parsing framework, our approach produces gains across a range of target languages, using two different lowresource training methodologies (one weakly supervised and one indirectly supervised) and two different dictionary sources (one manually constructed and one automatically constructed).
2 0.57831943 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
3 0.57124871 81 emnlp-2012-Learning to Map into a Universal POS Tagset
Author: Yuan Zhang ; Roi Reichart ; Regina Barzilay ; Amir Globerson
Abstract: We present an automatic method for mapping language-specific part-of-speech tags to a set of universal tags. This unified representation plays a crucial role in cross-lingual syntactic transfer of multilingual dependency parsers. Until now, however, such conversion schemes have been created manually. Our central hypothesis is that a valid mapping yields POS annotations with coherent linguistic properties which are consistent across source and target languages. We encode this intuition in an objective function that captures a range of distributional and typological characteristics of the derived mapping. Given the exponential size of the mapping space, we propose a novel method for optimizing over soft mappings, and use entropy regularization to drive those towards hard mappings. Our results demonstrate that automatically induced mappings rival the quality of their manually designed counterparts when evaluated in the . context of multilingual parsing.1
4 0.57035363 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
6 0.56416208 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
7 0.56128752 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
8 0.56077504 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
9 0.56001687 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
10 0.5592075 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
11 0.55796289 2 emnlp-2012-A Beam-Search Decoder for Grammatical Error Correction
12 0.55725962 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
13 0.5531339 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
14 0.55100483 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
15 0.55080152 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
16 0.55073196 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
17 0.54991585 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
18 0.54940432 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
19 0.54776967 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
20 0.54692173 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction