acl acl2013 acl2013-154 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ahmet Aker ; Monica Paramita ; Rob Gaizauskas
Abstract: In this paper we present a method for extracting bilingual terminologies from comparable corpora. In our approach we treat bilingual term extraction as a classification problem. For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs. The performance of our classifier reaches the 100% precision level for many language pairs. We also perform manual evaluation on bilingual terms extracted from English-German term-tagged comparable corpora. The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations.
Reference: text
sentIndex sentText sentNum sentScore
1 In our approach we treat bilingual term extraction as a classification problem. [sent-7, score-0.705]
2 The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations. [sent-12, score-0.843]
3 We focus on techniques for bilingual term extraction from comparable corpora collections of source-target language document pairs that are not direct translations but are topically related. [sent-16, score-1.08]
4 In Section 3 we review related work on bilingual term extraction. [sent-21, score-0.66]
5 Section 4 describes feature extraction for term pair classification. [sent-22, score-0.601]
6 2 Method The method we present below for bilingual term extraction is a symmetric approach, i. [sent-25, score-0.705]
7 it assumes a method exists for monolingual term extraction in both source and target languages. [sent-27, score-0.816]
8 Our method works by first pairing each term extracted from a source language document S with each term extracted from a target language document T aligned with S in the comparable corpus. [sent-30, score-1.384]
9 We then treat term alignment as a binary classification task, i. [sent-31, score-0.509]
10 we extract features for each source-target language potential term pair and decide whether to classify the pair as a term equivalent or not. [sent-33, score-1.112]
11 , 2002), a term thesaurus covering the activities of the EU and the European Parliament. [sent-36, score-0.509]
12 This second test simulates the situation of using the term alignment system in a real world scenario. [sent-49, score-0.509]
13 For this evaluation we collected English-German comparable corpora from Wikipedia, performed monolingual term tagging and ran our tool over the term tagged corpora to extract bilingual terms. [sent-50, score-1.443]
14 3 Related Work Previous studies have investigated the extraction of bilingual terms from parallel and comparable corpora. [sent-51, score-0.45]
15 (2007), Cao and Li (2002) and Ismail and Manandhar (2010) the context of text units is used to identify term mappings. [sent-65, score-0.509]
16 Very few approaches have treated term alignment as a classification problem suitable for machine learning (ML) techniques. [sent-69, score-0.509]
17 Secondly for a given source language term hN1, N2i, target language candidate terms are proposed by composing uaallg tera cnasnldatiidoantse (given by a bilingual dictionary) of N1 into the target language with all translations of N2. [sent-74, score-1.191]
18 By considering all terms proposed by monolingual term extractors we consider terms that are syntactically much richer than nounnoun pairs. [sent-76, score-0.795]
19 In addition, the term pairs we align are not constrained by an assumption that their component words must be translations of each other as found in a particular dictionary resource. [sent-77, score-0.853]
20 4 Feature extraction To align or map source and target terms we use an SVM binary classifier (Joachims, 2002) with a linear kernel and the trade-off between training error and margin parameter c = 10. [sent-78, score-0.499]
21 In addition we also removed 403 every entry from the dictionary where the source word was less than four characters and the target word more than five characters in length and vice versa. [sent-87, score-0.434]
22 After performing these filtering steps we use the dictionaries to extract the following language dependent features: • isFirstWordTranslated is a binary feature indicating owrdheTthraenr stlhaet fdirs its wa boirnda riny tfehea source term is a translation of the first word in the target term. [sent-89, score-1.012]
23 isLastWordTranslated is a binary feature indicating owrdhTetrhaenrs tlhatee dla isst wa boirnda riny tfehea source term is a translation of the last word in the target term. [sent-93, score-0.949]
24 As with the previous feature in case of compound terms we check whether the source term ends with the translation of the target last word. [sent-94, score-0.972]
25 percentageOfTranslatedWords returns the percentage Ooff wTroarndssl aint etdheW source teetrumrn wsh tichhe have their translations in the target term. [sent-95, score-0.427]
26 To address compound terms we check for each source word translation whether it appears anywhere within the target term. [sent-96, score-0.498]
27 percentageOfNotTranslatedWords returns tpheer percentage ootfT wraonrdslsa toefd Wtheor source term which have no translations in the target term. [sent-97, score-0.936]
28 longestTranslatedUnitInPercentage returns tlohne graestitoT roafn tshlaet enduUmnbietIr oPfe wrcoerndtsa gweit hreintu trnhes longest contiguous sequence of source words • which has a translation in the target term to the length of the source term, expressed as a percentage. [sent-98, score-1.125]
29 longestNotTranslatedUnitInPercentage reltuonrngse tshtNe percentage dofU tnhiteI nnPumerbceern otafg weor rdeswithin the longest sequence of source words which have no translations in the target term. [sent-100, score-0.506]
30 Because of this we also use cognate based methods to perform the mapping between source and target words or vice versa. [sent-112, score-0.576]
31 , 2001) to implement LCS, so that its computation is efficient and can be applied to a large number of possible term pairs quickly. [sent-120, score-0.615]
32 We normalize relative to the length of the longest term: LCSR(X,Y ) =mlaexn[l[LenC(XS()X,l,eYn( )Y] )] • where LCS is the longest common subsequence between two strings and characters in this subsequence need not be contiguous. [sent-121, score-0.478]
33 3 Cognate based features with term matching The cognate methods assume that the source and target language strings being compared are drawn from the same character set and fail to capture the corresponding terms if this is not the case. [sent-132, score-1.193]
34 1 We address this problem by mapping a source term to the target language writing system or vice versa. [sent-137, score-0.775]
35 After mapping a term from source to target language we apply the cognate metrics described in 4. [sent-143, score-1.085]
36 2 to the resulting mapped term and the original term in the other language. [sent-144, score-1.018]
37 Since we perform both target to source and source to target mapping, the number of cognate feature scores on the mapped terms is 10 5 due to source to target mapping and 5 due to target to source mapping. [sent-145, score-1.334]
38 The combined features are as follows: • isFirstWordCovered is a binary feature indicating WwohredthCeorv tehreed df i srst a bwionradr yin f tahteu source term has a translation (i. [sent-148, score-0.704]
39 • isLastWordCovered is similar to the previous fsetWatuorred one ebruetd din idsic siatmesil awrh teoth tehre eth per elavistword in the source term has a translation or 1Assuming the terms are correctly spelled, otherwise the misspelling is another problem. [sent-159, score-0.808]
40 percentageOfCoverage returns the percentage oefn source Cteromve wraogreds r ewtuhrincsh h thaeve p a rtrcaennst-lation or transliteration in the target term. [sent-163, score-0.455]
41 percentageOfNonCoverage returns the percentage aogfe source nteCromve ewraorgeds rwethuircnhs sh tahvee npeeri-ther a translation nor transliteration in the target term. [sent-164, score-0.542]
42 Like the dictionary based features, these five features are direction-dependent and are computed in both directions source to target and target to source, resulting in 10 combined features. [sent-166, score-0.428]
43 2, 10 cognate related features derived from character mappings over terms as described in Section 4. [sent-169, score-0.51]
44 1 EUROVOC terms EUROVOC is a term thesaurus covering the activities of the EU and the European Parliament in particular. [sent-175, score-0.613]
45 It contains 6797 term entries in 24 different languages including 22 EU languages and Croatian and Serbian (Steinberger et al. [sent-176, score-0.646]
46 We pre-processed each Wikipedia article by performing monolingual term tagging using TWSC (Pinnis et al. [sent-195, score-0.553]
47 TWSC is a term extraction tool which identifies terms ranging from one to four tokens in length. [sent-197, score-0.658]
48 Next, it uses term grammar rules, in the form of sequences of POS tags or non-stop words, to identify candidate terms. [sent-200, score-0.509]
49 2 Performance test of the classifier To test the classifier’s performance we evaluated it against a list of positive and negative examples of bilingual term pairs using the measures of precision, recall and F-measure. [sent-203, score-0.956]
50 4 In the evaluation we used 600 positive term pairs taken randomly from the EUROVOC term list. [sent-205, score-1.182]
51 3M negative term pairs by pairing a source term with 200 randomly chosen distinct target terms. [sent-207, score-1.419]
52 We select such a large number to simulate the real application scenario where the classifier will be confronted with a huge number of negative cases 4Note that we do not use the Maltese-English language pair, as for this pair we found that 5861 out of 6797 term pairs were identical, i. [sent-208, score-0.794]
53 406 Table 1: Wikipedia term pairs processed and judged as positive by the classifier. [sent-213, score-0.673]
54 The 600 positive examples contain 200 single term pairs (i. [sent-215, score-0.673]
55 single word on both sides), 200 term pairs with a single word on only one side (either source or target) and 200 term pairs with more than one word on each side. [sent-217, score-1.377]
56 For training we took the remaining 6200 positive term pairs from EUROVOC and constructed another 6200 term pairs as negative examples, leading to total of 12400 term pairs. [sent-218, score-1.836]
57 To construct the 6200 negative examples we used the 6200 terms on the source side and paired each source term with an incorrect target term. [sent-219, score-1.017]
58 The same term is translated into istruzione degli adulti in Italian and contains three words. [sent-227, score-0.553]
59 For this reason we carry out the data preparation process separately for each language pair in order to obtain the three term pair sets consisting of term pairs with only a single word on each side, term pairs with a single word on just one side and term pairs with multiple words on both sides. [sent-228, score-2.487]
60 For each pair of Wikipedia articles we used the terms tagged by TWSC and aligned each source term with every target term. [sent-231, score-0.878]
61 This means if both source and target articles contain 100 terms then this leads to 10K term pairs. [sent-232, score-0.831]
62 Table 1 shows the number of term pairs processed and the count of pairs classified as positive. [sent-234, score-0.721]
63 Table 2 shows five positive term pairs extracted from the EnglishGerman comparable corpora for each of the IT and automotive domains. [sent-235, score-1.05]
64 We asked human assessors to categorize each term pair into one of the following categories: 1. [sent-237, score-0.625]
65 Inclusion: Not an exact translation/transliteration, but an exact translation/transliteration of one term is entirely contained within the term in the other language, e. [sent-240, score-1.171]
66 Unrelated: No word in either term is a translation/transliteration of a word in the other. [sent-246, score-0.509]
67 We asked the assessors to place each of the term pair into one of the categories 1to 4. [sent-248, score-0.625]
68 each term pair contains at least two words on each side. [sent-260, score-0.556]
69 Of these, 187 contained 50% or more translation due to cognate words examples of such cases are capital in– – – crease kapitalo eksportas or Arab organisation Arabu lyga with the cognates capital kapitalo and Arab Arabu respectively. [sent-262, score-0.542]
70 We observed that all the missing term pairs were not cognates. [sent-296, score-0.615]
71 For these term pairs either the source or target terms were not found in the dictionaries. [sent-299, score-0.937]
72 For instance, for the term pair offshoring uudelleensijoittautuminen the GIZA++ dictionary contains the entry offshoring but according to the dictionary it is not translated into uudelleensijoittautuminen, which is the matching term in EUROVOC. [sent-300, score-1.444]
73 Only a small proportion of the term pairs arejudged as belonging to category 4 (3–7%) the category containing unrelated term pairs. [sent-308, score-1.293]
74 For the automotive domain the proportion of equivalent term pairs varies between 60 and 66%. [sent-309, score-0.86]
75 For unrelated term pairs this is below 10% for both assessors. [sent-310, score-0.656]
76 Across the four classes the percentage agreement was 83% for the automotive domain term pairs and 86% for the IT domain term pairs. [sent-312, score-1.455]
77 We also considered two class agreement where we treated term pairs within categories 2 and 3 as belonging to category 4 (i. [sent-316, score-0.714]
78 We analyzed the differences and found that they differ in cases where the German and the English term are both in English. [sent-323, score-0.509]
79 Since the GIZA++ dictionaries contain only single word–single word mappings, we examined the 408 newly aligned term pairs that consisted of one word on both source and target sides. [sent-326, score-0.896]
80 Taking both the IT and automotive domains together, our algorithm proposed 5021 term pairs of which 2751 (55%) were word-word term pairs. [sent-327, score-1.371]
81 17% of the word-word term pairs or 9% of the overall set of aligned term pairs) were already in either the EN-DE or DE-EN GIZA++ dictionaries. [sent-330, score-1.124]
82 Thus, of our newly extracted term pairs a relatively small proportion are rediscovered dictionary entries. [sent-331, score-0.715]
83 We also checked our evaluation data to see what proportion of the assessed term pairs were already to be found in the GIZA++ dictionaries. [sent-332, score-0.658]
84 A total of 600 term pairs were put in front of the judges of which 198 (33%) were word-word term pairs. [sent-333, score-1.124]
85 Of these 15 (less than 8% of the word-word pairs and less then 3% ofthe overall assessed set of assessed term pairs) were word-word pairs already in the dictionaries. [sent-334, score-0.807]
86 We conclude that our evaluation results are not unduly affected by assessing term pairs which were given to the algorithm. [sent-335, score-0.615]
87 To achieve this, training data consisting of term pairs along with contextual information is required. [sent-353, score-0.615]
88 Partial Translation The assessors assigned 6 7% of the term pairs in the IT domain and 12 16% in the automotive domain to categories 2 and 3. [sent-357, score-0.967]
89 In both categories the term pairs share translations or cognates. [sent-358, score-0.714]
90 In category 2 this will be the entire translation of one term in the other such as the following examples. [sent-360, score-0.66]
91 In example (4), again the translation of the German term is entirely found in the English term, but as in the previous example, one of the English words systems in this case, has no match within the German term. [sent-362, score-0.669]
92 Another application of the extracted term pairs is to use them to enhance existing parallel corpora to train SMT systems. [sent-366, score-0.715]
93 5In our data it is always the case that the target term is entirely translated within the English one and the other way round. [sent-370, score-0.736]
94 409 6 Conclusion In this paper we presented an approach to align terms identified by a monolingual term extractor in bilingual comparable corpora using a binary classifier. [sent-371, score-1.017]
95 Each candidate term pair was pre-processed to extract various features which are cognate-based or dictionary-based. [sent-373, score-0.556]
96 In the manual evaluation we had our algorithm extract pairs of terms from Wikipedia articles articles forming comparable corpora in the IT and automotive domains and asked native speakers to categorize a selection of the term pairs into categories reflecting the level of translation of the terms. [sent-378, score-1.378]
97 In the manual evaluation we used the English-German language pair and showed that over 80% of the extracted term pairs were exact translations in the IT domain and over 60% in the automotive domain. [sent-379, score-1.095]
98 For both domains over 90% of the extracted term pairs were either exact or partial translations. [sent-380, score-0.695]
99 Exploring ways to add contextual or distributional features to our term representations is also an avenue for future work, though it clearly significantly complicates the approach, one of whose advantages is its simplicitiy. [sent-382, score-0.509]
100 We would also like to thank partners at Tilde SIA and at the University of Zagreb for supplying the TWSC term extraction tool, developed within the EU funded project ACCURAT. [sent-388, score-0.589]
wordName wordTfidf (topN-words)
[('term', 0.509), ('eurovoc', 0.314), ('cognate', 0.31), ('automotive', 0.207), ('bilingual', 0.151), ('longest', 0.141), ('transliteration', 0.127), ('comparable', 0.11), ('target', 0.11), ('source', 0.108), ('pairs', 0.106), ('terms', 0.104), ('dictionary', 0.1), ('translations', 0.099), ('classifier', 0.093), ('eu', 0.092), ('twsc', 0.09), ('translation', 0.087), ('daille', 0.079), ('steinberger', 0.076), ('german', 0.071), ('assessors', 0.069), ('subsequence', 0.069), ('lcstr', 0.067), ('levenshtein', 0.067), ('terminologies', 0.066), ('category', 0.064), ('lcs', 0.063), ('dictionaries', 0.063), ('returns', 0.062), ('corpora', 0.06), ('aker', 0.059), ('giza', 0.059), ('positive', 0.058), ('characters', 0.058), ('cognates', 0.055), ('compound', 0.054), ('precision', 0.052), ('character', 0.052), ('wikipedia', 0.052), ('manual', 0.049), ('cao', 0.049), ('entries', 0.049), ('percentage', 0.048), ('terminology', 0.048), ('mapping', 0.048), ('pair', 0.047), ('dice', 0.045), ('len', 0.045), ('allowable', 0.045), ('arabu', 0.045), ('boirnda', 0.045), ('cormen', 0.045), ('ismail', 0.045), ('kapitalo', 0.045), ('karimi', 0.045), ('lithuanian', 0.045), ('offshoring', 0.045), ('okita', 0.045), ('percentageoftranslatedwords', 0.045), ('pinnis', 0.045), ('racing', 0.045), ('riny', 0.045), ('standardsprache', 0.045), ('tfehea', 0.045), ('thermoplastic', 0.045), ('uudelleensijoittautuminen', 0.045), ('extraction', 0.045), ('translated', 0.044), ('monolingual', 0.044), ('mappings', 0.044), ('languages', 0.044), ('assessed', 0.043), ('crawling', 0.041), ('unrelated', 0.041), ('parallel', 0.04), ('domains', 0.04), ('exact', 0.04), ('bouamor', 0.04), ('nwd', 0.04), ('lcsr', 0.04), ('maltese', 0.04), ('side', 0.039), ('substring', 0.039), ('seed', 0.039), ('align', 0.039), ('negative', 0.039), ('pairing', 0.038), ('entirely', 0.038), ('domain', 0.038), ('threshold', 0.038), ('sk', 0.038), ('aswani', 0.037), ('morin', 0.037), ('clicks', 0.037), ('within', 0.035), ('alphabets', 0.034), ('nounnoun', 0.034), ('arab', 0.034), ('och', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 154 acl-2013-Extracting bilingual terminologies from comparable corpora
Author: Ahmet Aker ; Monica Paramita ; Rob Gaizauskas
Abstract: In this paper we present a method for extracting bilingual terminologies from comparable corpora. In our approach we treat bilingual term extraction as a classification problem. For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs. The performance of our classifier reaches the 100% precision level for many language pairs. We also perform manual evaluation on bilingual terms extracted from English-German term-tagged comparable corpora. The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations.
2 0.17884426 62 acl-2013-Automatic Term Ambiguity Detection
Author: Tyler Baldwin ; Yunyao Li ; Bogdan Alexe ; Ioana R. Stanoi
Abstract: While the resolution of term ambiguity is important for information extraction (IE) systems, the cost of resolving each instance of an entity can be prohibitively expensive on large datasets. To combat this, this work looks at ambiguity detection at the term, rather than the instance, level. By making a judgment about the general ambiguity of a term, a system is able to handle ambiguous and unambiguous cases differently, improving throughput and quality. To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling. Results over a dataset of entities from four product domains show that the proposed approach achieves significantly above baseline F-measure of 0.96.
3 0.17716312 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
Author: Dhouha Bouamor ; Nasredine Semmar ; Pierre Zweigenbaum
Abstract: This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the unresolved problem of polysemous words revealed by the bilingual dictionary and introduce a use of a Word Sense Disambiguation process that aims at improving the adequacy of context vectors. On two specialized FrenchEnglish comparable corpora, empirical experimental results show that our method improves the results obtained by two stateof-the-art approaches.
4 0.17271502 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Author: Jiajun Zhang ; Chengqing Zong
Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
5 0.16170041 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
Author: Johann-Mattis List ; Steven Moran
Abstract: Given the increasing interest and development of computational and quantitative methods in historical linguistics, it is important that scholars have a basis for documenting, testing, evaluating, and sharing complex workflows. We present a novel open-source toolkit for quantitative tasks in historical linguistics that offers these features. This toolkit also serves as an interface between existing software packages and frequently used data formats, and it provides implementations of new and existing algorithms within a homogeneous framework. We illustrate the toolkit’s functionality with an exemplary workflow that starts with raw language data and ends with automatically calculated phonetic alignments, cognates and borrowings. We then illustrate evaluation metrics on gold standard datasets that are provided with the toolkit.
6 0.15859205 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
7 0.12192523 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
8 0.11878902 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
9 0.11403506 255 acl-2013-Name-aware Machine Translation
10 0.11369652 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
11 0.1062914 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
12 0.10515023 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
13 0.10477147 374 acl-2013-Using Context Vectors in Improving a Machine Translation System with Bridge Language
14 0.10374541 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
15 0.10296135 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
16 0.10104777 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
17 0.096445501 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
18 0.093995184 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
19 0.093721725 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
20 0.090647496 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation
topicId topicWeight
[(0, 0.259), (1, -0.029), (2, 0.151), (3, 0.016), (4, 0.056), (5, -0.057), (6, -0.101), (7, 0.028), (8, 0.029), (9, -0.07), (10, 0.035), (11, -0.038), (12, -0.018), (13, 0.045), (14, 0.016), (15, -0.021), (16, -0.008), (17, -0.027), (18, -0.1), (19, 0.012), (20, -0.049), (21, -0.015), (22, 0.008), (23, 0.053), (24, -0.003), (25, 0.072), (26, -0.091), (27, -0.009), (28, 0.016), (29, 0.056), (30, -0.047), (31, -0.012), (32, 0.037), (33, 0.005), (34, 0.037), (35, 0.071), (36, -0.04), (37, 0.078), (38, -0.074), (39, -0.069), (40, 0.008), (41, 0.041), (42, 0.033), (43, -0.052), (44, -0.036), (45, -0.006), (46, 0.053), (47, 0.006), (48, 0.076), (49, -0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.95986068 154 acl-2013-Extracting bilingual terminologies from comparable corpora
Author: Ahmet Aker ; Monica Paramita ; Rob Gaizauskas
Abstract: In this paper we present a method for extracting bilingual terminologies from comparable corpora. In our approach we treat bilingual term extraction as a classification problem. For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs. The performance of our classifier reaches the 100% precision level for many language pairs. We also perform manual evaluation on bilingual terms extracted from English-German term-tagged comparable corpora. The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations.
2 0.80283016 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
Author: Dhouha Bouamor ; Nasredine Semmar ; Pierre Zweigenbaum
Abstract: This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the unresolved problem of polysemous words revealed by the bilingual dictionary and introduce a use of a Word Sense Disambiguation process that aims at improving the adequacy of context vectors. On two specialized FrenchEnglish comparable corpora, empirical experimental results show that our method improves the results obtained by two stateof-the-art approaches.
3 0.74165457 92 acl-2013-Context-Dependent Multilingual Lexical Lookup for Under-Resourced Languages
Author: Lian Tze Lim ; Lay-Ki Soon ; Tek Yong Lim ; Enya Kong Tang ; Bali Ranaivo-Malancon
Abstract: Current approaches for word sense disambiguation and translation selection typically require lexical resources or large bilingual corpora with rich information fields and annotations, which are often infeasible for under-resourced languages. We extract translation context knowledge from a bilingual comparable corpora of a richer-resourced language pair, and inject it into a multilingual lexicon. The multilin- gual lexicon can then be used to perform context-dependent lexical lookup on texts of any language, including under-resourced ones. Evaluations on a prototype lookup tool, trained on a English–Malay bilingual Wikipedia corpus, show a precision score of 0.65 (baseline 0.55) and mean reciprocal rank score of 0.81 (baseline 0.771). Based on the early encouraging results, the context-dependent lexical lookup tool may be developed further into an intelligent reading aid, to help users grasp the gist of a second or foreign language text.
4 0.70914054 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
Author: Taesung Lee ; Seung-won Hwang
Abstract: This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”. Unlike the previous approaches relying on the “symmetry” found in parallel corpora, the proposed method is tolerant to asymmetry often found in comparable corpora, by distinguishing different semantics of relations of entity pairs to selectively propagate seed entity translations on weakly comparable corpora. Our experimental results on English-Chinese corpora show that our selective propagation approach outperforms the previous approaches in named entity translation in terms of the mean reciprocal rank by up to 0.16 for organization names, and 0.14 in a low com- parability case.
5 0.70854247 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
6 0.70766729 255 acl-2013-Name-aware Machine Translation
7 0.70703363 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
8 0.70340735 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
9 0.69801134 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
10 0.68777275 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization
11 0.6835705 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
12 0.67631811 360 acl-2013-Translating Italian connectives into Italian Sign Language
13 0.65817529 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
14 0.64238727 236 acl-2013-Mapping Source to Target Strings without Alignment by Analogical Learning: A Case Study with Transliteration
15 0.63584971 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia
16 0.6206851 64 acl-2013-Automatically Predicting Sentence Translation Difficulty
17 0.61896008 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
18 0.61663443 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
19 0.61612284 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
20 0.61375815 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
topicId topicWeight
[(0, 0.052), (6, 0.032), (11, 0.117), (24, 0.064), (26, 0.049), (35, 0.07), (42, 0.056), (48, 0.045), (70, 0.051), (88, 0.037), (90, 0.038), (95, 0.132), (98, 0.184)]
simIndex simValue paperId paperTitle
1 0.87248802 67 acl-2013-Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model
Author: Roman Klinger ; Philipp Cimiano
Abstract: Opinion mining is often regarded as a classification or segmentation task, involving the prediction of i) subjective expressions, ii) their target and iii) their polarity. Intuitively, these three variables are bidirectionally interdependent, but most work has either attempted to predict them in isolation or proposing pipeline-based approaches that cannot model the bidirectional interaction between these variables. Towards better understanding the interaction between these variables, we propose a model that allows for analyzing the relation of target and subjective phrases in both directions, thus providing an upper bound for the impact of a joint model in comparison to a pipeline model. We report results on two public datasets (cameras and cars), showing that our model outperforms state-ofthe-art models, as well as on a new dataset consisting of Twitter posts.
same-paper 2 0.86078972 154 acl-2013-Extracting bilingual terminologies from comparable corpora
Author: Ahmet Aker ; Monica Paramita ; Rob Gaizauskas
Abstract: In this paper we present a method for extracting bilingual terminologies from comparable corpora. In our approach we treat bilingual term extraction as a classification problem. For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs. The performance of our classifier reaches the 100% precision level for many language pairs. We also perform manual evaluation on bilingual terms extracted from English-German term-tagged comparable corpora. The results of this manual evaluation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations.
3 0.81772292 237 acl-2013-Margin-based Decomposed Amortized Inference
Author: Gourab Kundu ; Vivek Srikumar ; Dan Roth
Abstract: Given that structured output prediction is typically performed over entire datasets, one natural question is whether it is possible to re-use computation from earlier inference instances to speed up inference for future instances. Amortized inference has been proposed as a way to accomplish this. In this paper, first, we introduce a new amortized inference algorithm called the Margin-based Amortized Inference, which uses the notion of structured margin to identify inference problems for which previous solutions are provably optimal. Second, we introduce decomposed amortized inference, which is designed to address very large inference problems, where earlier amortization methods become less ef- fective. This approach works by decomposing the output structure and applying amortization piece-wise, thus increasing the chance that we can re-use previous solutions for parts of the output structure. These parts are then combined to a global coherent solution using Lagrangian relaxation. In our experiments, using the NLP tasks of semantic role labeling and entityrelation extraction, we demonstrate that with the margin-based algorithm, we need to call the inference engine only for a third of the test examples. Further, we show that the decomposed variant of margin-based amortized inference achieves a greater reduction in the number of inference calls.
4 0.7861349 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
Author: Mikhail Kozhevnikov ; Ivan Titov
Abstract: Semantic Role Labeling (SRL) has become one of the standard tasks of natural language processing and proven useful as a source of information for a number of other applications. We address the problem of transferring an SRL model from one language to another using a shared feature representation. This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline. We also consider the contribution of different aspects of the feature representation to the performance of the model and discuss practical applicability of this method. 1 Background and Motivation Semantic role labeling has proven useful in many natural language processing tasks, such as question answering (Shen and Lapata, 2007; Kaisser and Webber, 2007), textual entailment (Sammons et al., 2009), machine translation (Wu and Fung, 2009; Liu and Gildea, 2010; Gao and Vogel, 2011) and dialogue systems (Basili et al., 2009; van der Plas et al., 2009). Multiple models have been designed to automatically predict semantic roles, and a considerable amount of data has been annotated to train these models, if only for a few more popular languages. As the annotation is costly, one would like to leverage existing resources to minimize the human effort required to construct a model for a new language. A number of approaches to the construction of semantic role labeling models for new languages have been proposed. On one end of the scale is unsupervised SRL, such as Grenager and Manning (2006), which requires some expert knowledge, but no labeled data. It clusters together arguments that should bear the same semantic role, but does not assign a particular role to each cluster. On the other end is annotating a new dataset from scratch. There are also intermediate options, which often make use of similarities between languages. This way, if an accurate model exists for one language, it should help simplify the construction of a model for another, related language. The approaches in this third group often use parallel data to bridge the gap between languages. Cross-lingual annotation projection systems (Pad o´ and Lapata, 2009), for example, propagate information directly via word alignment links. However, they are very sensitive to the quality of parallel data, as well as the accuracy of a sourcelanguage model on it. An alternative approach, known as cross-lingual model transfer, or cross-lingual model adaptation, consists of modifying a source-language model to make it directly applicable to a new language. This usually involves constructing a shared feature representation across the two languages. McDonald et al. (201 1) successfully apply this idea to the transfer of dependency parsers, using part-of- speech tags as the shared representation of words. A later extension of T ¨ackstr o¨m et al. (2012) enriches this representation with cross-lingual word clusters, considerably improving the performance. In the case of SRL, a shared representation that is purely syntactic is likely to be insufficient, since structures with different semantics may be realized by the same syntactic construct, for example “in August” vs “in Britain”. However with the help of recently introduced cross-lingual word represen1190 Proce dingsS o f ita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h.e ? Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1 90–120 , tations, such as the cross-lingual clustering mentioned above or cross-lingual distributed word representations of Klementiev et al. (2012), we may be able to transfer models of shallow semantics in a similar fashion. In this work we construct a shared feature representation for a pair of languages, employing crosslingual representations of syntactic and lexical information, train a semantic role labeling model on one language and apply it to the other one. This approach yields an SRL model for a new language at a very low cost, effectively requiring only a source language model and parallel data. We evaluate on five (directed) language pairs EN-ZH, ZH-EN, EN-CZ, CZ-EN and EN-FR, where EN, FR, CZ and ZH denote English, French, Czech and Chinese, respectively. The transferred model is compared against two baselines: an unsupervised SRL system and a model trained on the output of a cross-lingual annotation projection system. In the next section we will describe our setup, then in section 3 present the shared feature representation we use, discuss the evaluation data and other technical aspects in section 4, present the results and conclude with an overview of related work. – 2 Setup The purpose of the study is not to develop a yet another semantic role labeling system any existing SRL system can (after some modification) be used in this setup but to assess the practical applicability of cross-lingual model transfer to this – – problem, compare it against the alternatives and identify its strong/weak points depending on a particular setup. 2.1 Semantic Role Labeling Model We consider the dependency-based version of semantic role labeling as described in Haji cˇ et al. (2009) and transfer an SRL model from one language to another. We only consider verbal predicates and ignore the predicate disambiguation stage. We also assume that the predicate identification information is available in most languages it can be obtained using a relatively simple heuristic based on part-of-speech tags. The model performs argument identification and classification (Johansson and Nugues, 2008) separately in a pipeline first each candidate is classified as being or not being a head of an argument phrase with respect to the predicate in question and then each of the arguments is assigned a role from a given inventory. The model is factorized over arguments the decisions regarding the classification of different arguments are made in– – – dependently of each other. With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2). Following McDonald et al. (201 1), we assume that a part-of-speech tagger is available for the target language. 2.2 SRL in the Low-resource Setting Several approaches have been proposed to obtain an SRL model for a new language with little or no manual annotation. Unsupervised SRL models (Lang and Lapata, 2010) cluster the arguments of predicates in a given corpus according to their semantic roles. The performance of such models can be impressive, especially for those languages where semantic roles correlate strongly with syntactic relation of the argument to its predicate. However, assigning meaningful role labels to the resulting clusters requires additional effort and the model’s parameters generally need some adjustment for every language. If the necessary resources are already available for a closely related language, they can be utilized to facilitate the construction of a model for the target language. This can be achieved either by means of cross-lingual annotation projection (Yarowsky et al., 2001) or by cross-lingual model transfer (Zeman and Resnik, 2008). This last approach is the one we are considering in this work, and the other two options are treated as baselines. The unsupervised model will be further referred to as UNSUP and the projection baseline as PROJ. 2.3 Evaluation Measures We use the F1 measure as a metric for the argument identification stage and accuracy as an aggregate measure of argument classification performance. When comparing to the unsupervised SRL system the clustering evaluation measures are used instead. These are purity and collocation 1191 N1Ximajx|Gj∩ Ci| CO =N1Xjmiax|Gj∩ Ci|, PU = where Ci is the set of arguments in the i-th induced cluster, Gj is the set of arguments in the jth gold cluster and N is the total number of arguments. We report the harmonic mean ofthe two (Lang and Lapata, 2011) and denote it F1c to avoid confusing it with the supervised metric. 3 Model Transfer The idea of this work is to abstract the model away from the particular source language and apply it to a new one. This setup requires that we use the same feature representation for both languages, for example part-of-speech tags and dependency relation labels should be from the same inventory. Some features are not applicable to certain lan- guages because the corresponding phenomena are absent in them. For example, consider a strongly inflected language and an analytic one. While the latter can usually convey the information encoded in the word form in the former one (number, gender, etc.), finding a shared feature representation for such information is non-trivial. In this study we will confine ourselves to those features that are applicable to all languages in question, namely: part-of-speech tags, syntactic dependency structures and representations of the word’s identity. 3.1 Lexical Information We train a model on one language and apply it to a different one. In order for this to work, the words of the two languages have to be mapped into a common feature space. It is also desirable that closely related words from both languages have similar representations in this space. Word mapping. The first option is simply to use the source language words as the shared representation. Here every source language word would have itself as its representation and every target word would map into a source word that corresponds to it. In other words, we supply the model with a gloss of the target sentence. The mapping (bilingual dictionary) we use is derived from a word-aligned parallel corpus, by identifying, for each word in the target language, the word in the source language it is most often aligned to. Cross-lingual clusters. There is no guarantee that each of the words in the evaluation data is present in our dictionary, nor that the corresponding source-language word is present in the training data, so the model would benefit from the ability to generalize over closely related words. This can, for example, be achieved by using cross-lingual word clusters induced in T ¨ackstr o¨m et al. (2012). We incorporate these clusters as features into our model. 3.2 Syntactic Information Part-of-speech Tags. We map part-of-speech tags into the universal tagset following Petrov et al. (2012). This may have a negative effect on the performance of a monolingual model, since most part-of-speech tagsets are more fine-grained than the universal POS tags considered here. For example Penn Treebank inventory contains 36 tags and the universal POS tagset only 12. Since the finergrained POS tags often reflect more languagespecific phenomena, however, they would only be useful for very closely related languages in the cross-lingual setting. The universal part-of-speech tags used in evaluation are derived from gold-standard annotation for all languages except French, where predicted ones had to be used instead. Dependency Structure. Another important aspect of syntactic information is the dependency structure. Most dependency relation inventories are language-specific, and finding a shared representation for them is a challenging problem. One could map dependency relations into a simplified form that would be shared between languages, as it is done for part-of-speech tags in Petrov et al. (2012). The extent to which this would be useful, however, depends on the similarity of syntactic-semantic in– terfaces of the languages in question. In this work we discard the dependency relation labels where the inventories do not match and only consider the unlabeled syntactic dependency graph. Some discrepancies, such as variations in attachment order, may be present even there, but this does not appear to be the case with the datasets we use for evaluation. If a target language is poor in resources, one can obtain a dependency parser for the target language by means of cross-lingual model transfer (Zeman and Resnik, 2008). We 1192 take this into account and evaluate both using the original dependency structures and the ones obtained by means of cross-lingual model transfer. 3.3 The Model The model we use is based on that of Bj ¨orkelund et al. (2009). It is comprised of a set of linear classifiers trained using Liblinear (Fan et al., 2008). The feature model was modified to accommodate the cross-lingual cluster features and the reranker component was not used. We do not model the interaction between different argument roles in the same predicate. While this has been found useful, in the cross-lingual setup one has to be careful with the assumptions made. For example, modeling the sequence of roles using a Markov chain (Thompson et al., 2003) may not work well in the present setting, especially between distant languages, as the order or arguments is not necessarily preserved. Most constraints that prove useful for SRL (Chang et al., 2007) also require customization when applied to a new language, and some rely on languagespecific resources, such as a valency lexicon. Taking into account the interaction between different arguments of a predicate is likely to improve the performance of the transferred model, but this is outside the scope of this work. 3.4 Feature Selection Compatibility of feature representations is necessary but not sufficient for successful model transfer. We have to make sure that the features we use are predictive of similar outcomes in the two languages as well. Depending on the pair of languages in question, different aspects of the feature representation will retain or lose their predictive power. We can be reasonably certain that the identity of an argument word is predictive of its semantic role in any language, but it might or might not be true of, for example, the word directly preceding the argument word. It is therefore important to pre- SCPDGylOespoSntreslTabunc1lra:obsFel-daitnguplrdoaeusntpagd-elronwfu-dcsopeyrnsd c.eylafguhtorsia mepgnrhs vent the model from capturing overly specific aspects of the source language, which we do by confining the model to first-order features. We also avoid feature selection, which, performed on the source language, is unlikely to help the model to better generalize to the target one. The experiments confirm that feature selection and the use of second-order features degrade the performance of the transferred model. 3.5 Feature Groups For each word, we use its part-of-speech tag, cross-lingual cluster id, word identity (glossed, when evaluating on the target language) and its dependency relation to its parent. Features associated with an argument word include the attributes of the predicate word, the argument word, its parent, siblings and children, and the words directly preceding and following it. Also included are the sequences of part-of-speech tags and dependency relations on the path between the predicate and the argument. Since we are also interested in the impact of different aspects of the feature representation, we divide the features into groups as summarized in table 1 and evaluate their respective contributions to the performance of the model. If a feature group is enabled the model has access to the corre– sponding source of information. For example, if only POS group is enabled, the model relies on the part-of-speech tags of the argument, the predicate and the words to the right and left of the argument word. If Synt is enabled too, it also uses the POS tags of the argument’s parent, children and siblings. Word order information constitutes an implicit group that is always available. It includes the Pos it ion feature, which indicates whether the argument is located to the left or to the right of the predicate, and allows the model to look up the attributes of the words directly preceding and following the argument word. The model we compare against the baselines uses all applicable feature groups (Deprel is only used in EN-CZ and CZ-EN experiments with original syntax). 4 Evaluation 4.1 Datasets and Preprocessing Evaluation of the cross-lingual model transfer requires a rather specific kind of dataset. Namely, the data in both languages has to be annotated 1193 with the same set of semantic roles following the same (or compatible) guidelines, which is seldom the case. We have identified three language pairs for which such resources are available: EnglishChinese, English-Czech and English-French. The evaluation datasets for English and Chinese are those from the CoNLL Shared Task 2009 (Haji ˇc et al., 2009) (henceforth CoNLL-ST). Their annotation in the CoNLL-ST is not identical, but the guidelines for “core” semantic roles are similar (Kingsbury et al., 2004), so we evaluate only on core roles here. The data for the second language pair is drawn from the Prague Czech-English Dependency Treebank 2.0 (Haji ˇc et al., 2012), which we converted to a format similar to that of CoNLL-ST1 . The original annotation uses the tectogrammatical representation (Haji ˇc, 2002) and an inventory of semantic roles (or functors), most of which are interpretable across various predicates. Also note that the syntactic anno- tation of English and Czech in PCEDT 2.0 is quite similar (to the extent permitted by the difference in the structure of the two languages) and we can use the dependency relations in our experiments. For English-French, the English CoNLL-ST dataset was used as a source and the model was evaluated on the manually annotated dataset from van der Plas et al. (201 1). The latter contains one thousand sentences from the French part ofthe Europarl (Koehn, 2005) corpus, annotated with semantic roles following an adapted version of PropBank (Palmer et al., 2005) guidelines. The authors perform annotation projection from English to French, using a joint model of syntax and semantics and employing heuristics for filtering. We use a model trained on the output of this projection system as one of the baselines. The evaluation dataset is relatively small in this case, so we perform the transfer only one-way, from English to French. The part-of-speech tags in all datasets were replaced with the universal POS tags of Petrov et al. (2012). For Czech, we have augmented the map- pings to account for the tags that were not present in the datasets from which the original mappings were derived. Namely, tag “t” is mapped to “VERB” and “Y” to “PRON”. We use parallel data to construct a bilingual dictionary used in word mapping, as well as in the projection baseline. For English-Czech – 1see http://www.ml4nlp.de/code-and-data/treex2conll and English-French, the data is drawn from Europarl (Koehn, 2005), for English-Chinese from MultiUN (Eisele and Chen, 2010). The word alignments were obtained using GIZA++ (Och and Ney, 2003) and the intersection heuristic. – 4.2 Syntactic Transfer In the low-resource setting, we cannot always rely on the availability of an accurate dependency parser for the target language. If one is not available, the natural solution would be to use crosslingual model transfer to obtain it. Unfortunately, the models presented in the previous work, such as Zeman and Resnik (2008), McDonald et al. (201 1) and T ¨ackstr o¨m et al. (2012), were not made available, so we reproduced the direct transfer algorithm of McDonald et al. (201 1), using Malt parser (Nivre, 2008) and the same set of features. We did not reimplement the projected transfer algorithm, however, and used the default training procedure instead of perceptron-based learning. The dependency structure thus obtained is, of course, only a rough approximation even a much more sophisticated algorithm may not perform well when transferring syntax between such languages as Czech and English, given the inherent difference in their structure. The scores are shown in table 2. We will henceforth refer to the syntactic annotations that were provided with the datasets as original, as opposed to the annotations obtained by means of syntactic transfer. – 4.3 Baselines Unsupervised Baseline: We are using a version of the unsupervised semantic role induction system of Titov and Klementiev (2012a) adapted to SetupUAS, % Table2:SyntaciE C ZcN HNt- rE ZaCFnN HZRsfer34 692567acuracy,unlabe dat- tachment score (percent). Note that in case of French we evaluate against the output of a supervised system, since manual annotation is not available for this dataset. This score does not reflect the true performance of syntactic transfer. 1194 the shared feature representation considered in order to make the scores comparable with those of the transfer model and, more importantly, to enable evaluation on transferred syntax. Note that the original system, tailored to a more expressive language-specific syntactic representation and equipped with heuristics to identify active/passive voice and other phenomena, achieves higher scores than those we report here. Projection Baseline: The projection baseline we use for English-Czech and English-Chinese is a straightforward one: we label the source side of a parallel corpus using the source-language model, then identify those verbs on the target side that are aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion. A model is then trained on the resulting training data and applied to the test set. For English-French we instead use the output of a fully featured projection model of van der Plas et al. (201 1), published in the CLASSiC project. 5 Results In order to ensure that the results are consistent, the test sets, except for the French one, were partitioned into five equal parts (of 5 to 10 thousand sentences each, depending on the dataset) and the evaluation performed separately on each one. All evaluation figures for English, Czech or Chinese below are the average values over the five subsets. In case of French, the evaluation dataset is too small to split it further, so instead we ran the evaluation five times on a randomly selected 80% sample of the evaluation data and averaged over those. In both cases the results are consistent over the subsets, the standard deviation does not exceed 0.5% for the transfer system and projection baseline and 1% for the unsupervised system. 5.1 Argument Identification We summarize the results in table 3. Argument identification is known to rely heavily on syntactic information, so it is unsurprising that it proves inaccurate when transferred syntax is used. Our simple projection baseline suffers from the same problem. Even with original syntactic information available, the performance of argument identification is moderate. Note that the model of (van der Plas et al., 2011), though relying on more expressive syntax, only outperforms the transferred system by 3% (F1) on this task. SetupSyntaxTRANSPROJ ZEC NH Z- EFCZNRHt r a n s 3462 1. 536 142 35. 4269 Table3EZ C:N H- CFEZANHZRrgumeon rt ig identf56 7ic13 a. t27903ion,21569t10ra. 3976nsferd model vs. projection baseline, F1. Most unsupervised SRL approaches assume that the argument identification is performed by some external means, for example heuristically (Lang and Lapata, 2011). Such heuristics or unsupervised approaches to argument identification (Abend et al., 2009) can also be used in the present setup. 5.2 Argument Classification In the following tables, TRANS column contains the results for the transferred system, UNSUP for the unsupervised baseline and PROJ for projection baseline. We highlight in bold the higher score where the difference exceeds twice the maximum of the standard deviation estimates of the two results. Table 4 presents the unsupervised evaluation results. Note that the unsupervised model performs as well as the transferred one or better where the – – SetupSyntaxTRANSUNSUP ZEC NH Z- EFCZNRHt r a n s 768 93648. 34627 6 5873. 1769 TableEZ C4NHZ:- FCEZANHZRrgumoe nr itg clasi78 fi94 3c. a25136tion,8 7 r9a4263n. 07 sferd model vs. unsupervised baseline in terms of the clustering metric F1c (see section 2.3). 1195 SetupSyntaxTRANSPROJ ZEC NH Z- EFCZNRHt r a n s 657 053. 1 36456419. 372 Table5EZ C:N H- CFEZANHZRrgumeon rt ig clasif657ic1936a. t170 ion,65 9t3804ra. 20847nsferd model vs. projection baseline, accuracy. original syntactic dependencies are available. In the more realistic scenario with transferred syn- tax, however, the transferred model proves more accurate. In table 5 we compare the transferred system with the projection baseline. It is easy to see that the scores vary strongly depending on the language pair, due to both the difference in the annotation scheme used and the degree of relatedness between the languages. The drop in performance when transferring the model to another language is large in every case, though, see table 6. SetupTargetSource Table6:MoCEZdHeNZ l- FECaZNRcH urac67 y53169o. 017nthes87 o25670u. r1245ceandtrge language using original syntax. The source language scores for English vary between language pairs because of the difference in syntactic annotation and role subset used. We also include the individual F1 scores for the top-10 most frequent labels for EN-CZ transfer with original syntax in table 7. The model provides meaningful predictions here, despite low overall accuracy. Most of the labels2 are self-explanatory: Patient (PAT), Actor (ACT), Time (TWHEN), Effect (EFF), Location (LOC), Manner (MANN), Addressee (ADDR), Extent (EXT). CPHR marks the 2http://ufal.mff.cuni.cz/∼toman/pcedt/en/functors.html LabelFreq.F1Re.Pr. recall and precision for the top-10 most frequent roles. nominal part of a complex predicate, as in “to have [a plan]CPHR”, and DIR3 indicates destination. 5.3 Additional Experiments We now evaluate the contribution of different aspects of the feature representation to the performance of the model. Table 8 contains the results for English-French. FeaturesOrigTrans ferent feature subsets, using original and transferred syntactic information. The fact that the model performs slightly better with transferred syntax may be explained by two factors. Firstly, as we already mentioned, the original syntactic annotation is also produced automatically. Secondly, in the model transfer setup it is more important how closely the syntacticsemantic interface on the target side resembles that on the source side than how well it matches the “true” structure of the target language, and in this respect a transferred dependency parser may have an advantage over one trained on target-language data. The high impact of the Glos s features here 1196 may be partly attributed to the fact that the mapping is derived from the same corpus as the evaluation data Europarl (Koehn, 2005) and partly by the similarity between English and French in terms of word order, usage of articles and prepositions. The moderate contribution of the crosslingual cluster features are likely due to the insufficient granularity of the clustering for this task. For more distant language pairs, the contributions of individual feature groups are less interpretable, so we only highlight a few observations. First of all, both EN-CZ and CZ-EN benefit noticeably from the use of the original syntactic annotation, including dependency relations, but not from the transferred syntax, most likely due to the low syntactic transfer performance. Both perform better when lexical information is available, although – – the improvement is not as significant as in the case of French only up to 5%. The situation with Chinese is somewhat complicated in that adding lexical information here fails to yield an improvement in terms of the metric considered. This is likely due to the fact that we consider only the core roles, which can usually be predicted with high accuracy based on syntactic information alone. – 6 Related Work Development of robust statistical models for core NLP tasks is a challenging problem, and adaptation of existing models to new languages presents a viable alternative to exhaustive annotation for each language. Although the models thus obtained are generally imperfect, they can be further refined for a particular language and domain using techniques such as active learning (Settles, 2010; Chen et al., 2011). Cross-lingual annotation projection (Yarowsky et al., 2001) approaches have been applied ex- tensively to a variety of tasks, including POS tagging (Xi and Hwa, 2005; Das and Petrov, 2011), morphology segmentation (Snyder and Barzilay, 2008), verb classification (Merlo et al., 2002), mention detection (Zitouni and Florian, 2008), LFG parsing (Wr o´blewska and Frank, 2009), information extraction (Kim et al., 2010), SRL (Pad o´ and Lapata, 2009; van der Plas et al., 2011; Annesi and Basili, 2010; Tonelli and Pianta, 2008), dependency parsing (Naseem et al., 2012; Ganchev et al., 2009; Smith and Eisner, 2009; Hwa et al., 2005) or temporal relation prediction (Spreyer and Frank, 2008). Interestingly, it has also been used to propagate morphosyntactic information between old and modern versions of the same language (Meyer, 2011). Cross-lingual model transfer methods (McDonald et al., 2011; Zeman and Resnik, 2008; Durrett et al., 2012; Søgaard, 2011; Lopez et al., 2008) have also been receiving much attention recently. The basic idea behind model transfer is similar to that of cross-lingual annotation projection, as we can see from the way parallel data is used in, for example, McDonald et al. (201 1). A crucial component of direct transfer approaches is the unified feature representation. There are at least two such representations of lexical information (Klementiev et al., 2012; T ¨ackstr o¨m et al., 2012), but both work on word level. This makes it hard to account for phenomena that are expressed differently in the languages considered, for example the syntactic function of a certain word may be indicated by a preposition, inflection or word order, depending on the language. Accurate representation of such information would require an extra level of abstraction (Haji ˇc, 2002). A side-effect ofusing adaptation methods is that we are forced to use the same annotation scheme for the task in question (SRL, in our case), which in turn simplifies the development of cross-lingual tools for downstream tasks. Such representations are also likely to be useful in machine translation. Unsupervised semantic role labeling methods (Lang and Lapata, 2010; Lang and Lapata, 2011; Titov and Klementiev, 2012a; Lorenzo and Cerisara, 2012) also constitute an alternative to cross-lingual model transfer. For an overview of of semi-supervised approaches we refer the reader to Titov and Klementiev (2012b). 7 Conclusion We have considered the cross-lingual model transfer approach as applied to the task of semantic role labeling and observed that for closely related languages it performs comparably to annotation projection approaches. It allows one to quickly construct an SRL model for a new language without manual annotation or language-specific heuristics, provided an accurate model is available for one of the related languages along with a certain amount of parallel data for the two languages. While an1197 notation projection approaches require sentenceand word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus, cross-lingual model transfer can be performed using only a bilingual dictionary. Unsupervised SRL approaches have their advantages, in particular when no annotated data is available for any of the related languages and there is a syntactic parser available for the target one, but the annotation they produce is not always sufficient. In applications such as Information Retrieval it is preferable to have precise labels, rather than just clusters of arguments, for example. Also note that when applying cross-lingual model transfer in practice, one can improve upon the performance of the simplistic model we use for evaluation, for example by picking the features manually, taking into account the properties of the target language. Domain adaptation techniques can also be employed to adjust the model to the target language. Acknowledgments The authors would like to thank Alexandre Klementiev and Ryan McDonald for useful suggestions and T ¨ackstr o¨m et al. (2012) for sharing the cross-lingual word representations. This research is supported by the MMCI Cluster of Excellence. References Omri Abend, Roi Reichart, and Ari Rappoport. 2009. Unsupervised argument identification for semantic role labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL ’09, pages 28–36, Stroudsburg, PA, USA. Association for Computational Linguistics. Paolo Annesi and Roberto Basili. 2010. Cross-lingual alignment of FrameNet annotations through hidden Markov models. In Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing, CICLing’ 10, pages 12– 25, Berlin, Heidelberg. Springer-Verlag. Roberto Basili, Diego De Cao, Danilo Croce, Bonaventura Coppola, and Alessandro Moschitti. 2009. Cross-language frame semantics transfer in bilingual corpora. In Alexander F. Gelbukh, editor, Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Pro- cessing, pages 332–345. Anders Bj ¨orkelund, Love Hafdell, and Pierre Nugues. 2009. Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 43–48, Boulder, Colorado, June. Association for Computational Linguistics. Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2007. Guiding semi-supervision with constraint-driven learning. In ACL. Chenhua Chen, Alexis Palmer, and Caroline Sporleder. 2011. Enhancing active learning for semantic role labeling via compressed dependency trees. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 183–191, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing. Dipanjan Das and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. Proceedings of the Association for Computational Linguistics. Greg Durrett, Adam Pauls, and Dan Klein. 2012. Syntactic transfer using a bilingual lexicon. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1–1 1, Jeju Island, Korea, July. Association for Computational Linguistics. Andreas Eisele and Yu Chen. 2010. MultiUN: A multilingual corpus from United Nation documents. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, XiangRui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9: 1871–1874. Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of the 47th Annual Meeting of the ACL, pages 369–377, Stroudsburg, PA, USA. Association for Computational Linguistics. Qin Gao and Stephan Vogel. 2011. Corpus expansion for statistical machine translation with semantic role label substitution rules. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 294–298, Portland, Oregon, USA. Trond Grenager and Christopher D. Manning. 2006. Unsupervised discovery of a statistical verb lexicon. In Proceedings of EMNLP. Jan Haji cˇ. 2002. Tectogrammatical representation: Towards a minimal transfer in machine translation. In Robert Frank, editor, Proceedings of the 6th International Workshop on Tree Adjoining Grammars 1198 and Related Frameworks (TAG+6), pages 216— 226, Venezia. Universita di Venezia. Jan Haji cˇ, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Ant o`nia Mart ı´, Llu ı´s M `arquez, Adam Meyers, Joakim Nivre, Sebastian Pad o´, Jan Sˇt eˇp a´nek, Pavel Stra nˇ a´k, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. The CoNLL2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado. Jan Haji cˇ, Eva Haji cˇov a´, Jarmila Panevov a´, Petr Sgall, Ond ˇrej Bojar, Silvie Cinkov´ a, Eva Fuˇ c ´ıkov a´, Marie Mikulov a´, Petr Pajas, Jan Popelka, Ji ˇr´ ı Semeck´ y, Jana Sˇindlerov a´, Jan Sˇt eˇp a´nek, Josef Toman, Zde nˇka Ure sˇov a´, and Zden eˇk Zˇabokrtsk y´. 2012. Announcing Prague Czech-English dependency treebank 2.0. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet U gˇur Doˇ gan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May. European Language Resources Association (ELRA). Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel text. Natural Language Engineering, 11(3):3 11–325. Richard Johansson and Pierre Nugues. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 69–78, Honolulu, Hawaii. Michael Kaisser and Bonnie Webber. 2007. Question answering based on semantic roles. In ACL Workshop on Deep Linguistic Processing. Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, and Gary Geunbae Lee. 2010. A cross-lingual annotation projection approach for relation detection. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’ 10, pages 564–571, Stroudsburg, PA, USA. Association for Computational Linguistics. Paul Kingsbury, Nianwen Xue, and Martha Palmer. 2004. Propbanking in parallel. In In Proceedings of the Workshop on the Amazing Utility of Parallel and Comparable Corpora, in conjunction with LREC’04. Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING), Bombay, India. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT. Joel Lang and Mirella Lapata. 2010. Unsupervised induction of semantic roles. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 939–947, Los Angeles, California, June. Association for Computational Linguistics. Joel Lang and Mirella Lapata. 2011. Unsupervised semantic role induction via split-merge clustering. In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL). Ding Liu and Daniel Gildea. 2010. Semantic role features for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China. Adam Lopez, Daniel Zeman, Michael Nossal, Philip Resnik, and Rebecca Hwa. 2008. Cross-language parser adaptation between related languages. In IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 35–42, Hyderabad, India, January. Alejandra Lorenzo and Christophe Cerisara. 2012. Unsupervised frame based semantic role induction: application to French and English. In Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, pages 30–35, Jeju, Republic of Korea, July. Association for Computational Linguistics. Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 11, pages 62–72, Stroudsburg, PA, USA. Association for Computational Linguistics. Paola Merlo, Suzanne Stevenson, Vivian Tsang, and Gianluca Allaria. 2002. A multi-lingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pages 207– 214, Philadelphia, PA. Roland Meyer. 2011. New wine in old wineskins?– Tagging old Russian via annotation projection from modern translations. Russian Linguistics, 35(2):267(15). Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 629–637, Jeju Island, Korea, July. Association for Computational Linguistics. Joakim Nivre. 2008. Algorithms for deterministic incremental dependency parsing. Comput. Linguist., 34(4):513–553, December. 1199 Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). Sebastian Pad o´ and Mirella Lapata. 2009. Crosslingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36:307– 340. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31:71–105. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of LREC, May. Mark Sammons, Vinod Vydiswaran, Tim Vieira, Nikhil Johri, Ming wei Chang, Dan Goldwasser, Vivek Srikumar, Gourab Kundu, Yuancheng Tu, Kevin Small, Joshua Rule, Quang Do, and Dan Roth. 2009. Relation alignment for textual entailment recognition. In Text Analysis Conference (TAC). Burr Settles. 2010. Active learning literature survey. Computer Sciences Technical Report, 1648. Dan Shen and Mirella Lapata. 2007. Using semantic roles to improve question answering. In EMNLP. David A Smith and Jason Eisner. 2009. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of the 2009 Confer- ence on Empirical Methods in Natural Language Processing, pages 822–831. Association for Computational Linguistics. Benjamin Snyder and Regina Barzilay. 2008. Crosslingual propagation for morphological analysis. In Proceedings of the 23rd national conference on Artificial intelligence. Anders Søgaard. 2011. Data point selection for crosslanguage adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 2 of HLT ’11, pages 682–686, Stroudsburg, PA, USA. Association for Computational Linguistics. Kathrin Spreyer and Anette Frank. 2008. Projectionbased acquisition of a temporal labeller. Proceedings of IJCNLP 2008. Oscar T¨ ackstr o¨m, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), pages 477– 487, Montr ´eal, Canada. Cynthia A. Thompson, Roger Levy, and Christopher D. Manning. 2003. A generative model for seman- tic role labeling. In Proceedings of the 14th European Conference on Machine Learning, ECML 2003, pages 397–408, Dubrovnik, Croatia. Ivan Titov and Alexandre Klementiev. 2012a. A Bayesian approach to unsupervised semantic role induction. In Proc. of European Chapter of the Association for Computational Linguistics (EACL). Ivan Titov and Alexandre Klementiev. 2012b. Semisupervised semantic role labeling: Approaching from an unsupervised perspective. In Proceedings of the International Conference on Computational Linguistics (COLING), Bombay, India, December. Sara Tonelli and Emanuele Pianta. 2008. Frame information transfer from English to Italian. In Proceedings of LREC 2008. Lonneke van der Plas, James Henderson, and Paola Merlo. 2009. Domain adaptation with artificial data for semantic parsing of speech. In Proc. 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 125–128, Boulder, Colorado. Lonneke van der Plas, Paola Merlo, and James Henderson. 2011. Scaling up automatic cross-lingual semantic role annotation. In Proceedings of the 49th Annual Meeting of the Association for Computa- tional Linguistics: Human Language Technologies, HLT ’ 11, pages 299–304, Stroudsburg, PA, USA. Association for Computational Linguistics. Alina Wr o´blewska and Anette Frank. 2009. Crosslingual projection of LFG F-structures: Building an F-structure bank for Polish. In Eighth International Workshop on Treebanks and Linguistic Theories, page 209. Dekai Wu and Pascale Fung. 2009. Can semantic role labeling improve SMT? In Proceedings of 13th Annual Conference of the European Association for Machine Translation (EAMT 2009), Barcelona. Chenhai Xi and Rebecca Hwa. 2005. A backoff model for bootstrapping resources for non-English languages. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 85 1–858, Stroudsburg, PA, USA. David Yarowsky, Grace Ngai, and Ricahrd Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of Human Language Technology Conference. Daniel Zeman and Philip Resnik. 2008. Crosslanguage parser adaptation between related lan- guages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 35– 42, Hyderabad, India, January. Asian Federation of Natural Language Processing. Imed Zitouni and Radu Florian. 2008. Mention detection crossing the language barrier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1200
5 0.77046663 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
Author: Spence Green ; Sida Wang ; Daniel Cer ; Christopher D. Manning
Abstract: We present a fast and scalable online method for tuning statistical machine translation models with large feature sets. The standard tuning algorithm—MERT—only scales to tens of features. Recent discriminative algorithms that accommodate sparse features have produced smaller than expected translation quality gains in large systems. Our method, which is based on stochastic gradient descent with an adaptive learning rate, scales to millions of features and tuning sets with tens of thousands of sentences, while still converging after only a few epochs. Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features. Equally important is our analysis, which suggests techniques for mitigating overfitting and domain mismatch, and applies to other recent discriminative methods for machine translation. 1
6 0.76019204 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
7 0.75796968 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
8 0.75719464 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction
9 0.75316626 333 acl-2013-Summarization Through Submodularity and Dispersion
10 0.75314677 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
11 0.75265884 240 acl-2013-Microblogs as Parallel Corpora
12 0.75103533 288 acl-2013-Punctuation Prediction with Transition-based Parsing
13 0.75001585 267 acl-2013-PARMA: A Predicate Argument Aligner
14 0.7499463 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning
15 0.74952883 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
16 0.74866396 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction
17 0.74860537 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
18 0.74830663 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
19 0.74813902 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
20 0.74745649 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk