acl acl2011 acl2011-92 knowledge-graph by maker-knowledge-mining

92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

Source: pdf

Author: Anders Sgaard

Abstract: We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and stateof-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Data point selection for cross-language adaptation of dependency parsers Anders Søgaard Center for Language Technology University of Copenhagen Njalsgade 142, DK-2300 Copenhagen S s oegaard@hum . [sent-1, score-0.512]

2 dk Abstract We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. [sent-3, score-0.415]

3 We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. [sent-4, score-0.355]

4 We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. [sent-5, score-0.568]

5 We then train our target language parser on the most similar data points in the source labeled data. [sent-6, score-0.4]

6 The strategy achieves much better results than a non-adapted baseline and stateof-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms. [sent-7, score-0.518]

7 1 Introduction While unsupervised dependency parsing has seen rapid progress in recent years, results are still far from the results that can be achieved with supervised parsers and not yet good enough to solve real-world problems. [sent-8, score-0.32]

8 In this paper, we will be interested in an alternative strategy, namely cross-language adaptation of dependency parsers. [sent-9, score-0.375]

9 This is similar to, but more difficult than most domain adaptation or transfer learning scenarios, where differences between source and target distributions are smaller. [sent-11, score-0.445]

10 Most previous work in cross-language adaptation has used parallel corpora to project dependency 682 structures across translations using word alignments (Smith and Eisner, 2009; Spreyer and Kuhn, 2009; Ganchev et al. [sent-12, score-0.414]

11 , 2009), but in this paper we show that similar results can be achieved by much simpler means. [sent-13, score-0.038]

12 Specifically, we build on the cross-language adaptation algorithm for closely related languages developed by Zeman and Resnik (2008) and extend it to much less related languages. [sent-14, score-0.302]

13 1 Related work Zeman and Resnik (2008) simply mapped part-ofspeech tags of source and target language treebanks into a common tagset, delexicalized them (removed all words), trained a parser on the source language treebank and applied it to the target language. [sent-16, score-1.056]

14 The intuition is that, at least for relatively similar languages, features based on part-of-speech tags are enough to do reasonably well, and languages are relatively similar at this level of abstraction. [sent-17, score-0.304]

15 Of course annotations differ, but nouns are likely to be dependents of verbs, prepositions are likely to be dependents of nouns, and so on. [sent-18, score-0.122]

16 Specifically, Zeman and Resnik (2008) trained a constituent-based parser on the training section of the Danish treebank and evaluated it on sentences of up to 40 words in the test section of the Swedish treebank and obtained an F1-score of 66. [sent-19, score-0.291]

17 Danish and Swedish are of course very similar languages with almost identical syntax, so in a way this result is not very surprising. [sent-21, score-0.127]

18 In this paper, we present similar results (50-75%) on full length sentences for very different languages from different language families. [sent-22, score-0.127]

19 Since less related languages differ more in their syntax, we use data point selection to find syntactic Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o. [sent-23, score-0.177]

20 i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 682–686, constructions in the source language that are likely to be similar to constructions in the target language. [sent-25, score-0.34]

21 Smith and Eisner (2009) think of cross-language adaptation as unsupervised projection using word aligned parallel text to construct training material for the target language. [sent-26, score-0.509]

22 The authors report good results (65%-70%) for somewhat related languages, training on English and testing on German and Spanish, but they modified the annotation in the German data making the treatment of certain syntactic constructions more similar to the English annotations. [sent-28, score-0.187]

23 Spreyer and Kuhn (2009) use a similar approach to parse Dutch using labeled data from German and obtain good results, but again these are very similar languages. [sent-29, score-0.189]

24 , 2010), but also modified annotation considerably in order to do so. [sent-31, score-0.033]

25 (2009) report results of a similar approach for Bulgarian and Spanish; they report results with and without hand-written languagespecific rules that complete the projected partial dependency trees. [sent-33, score-0.234]

26 (2009) without hand-written rules and two recent contributions to unsupervised dependency parsing, Gillenwater et al. [sent-35, score-0.209]

27 (2010) is a fully unsupervised extension of the approach described in Klein and Manning (2004), whereas Naseem et al. [sent-39, score-0.047]

28 2 Data We use four treebanks from the CoNLL 2006 Shared Task with standard splits. [sent-41, score-0.272]

29 We use the tagset mappings also used by Zeman and Resnik (2008) to ob- tain a common tagset. [sent-42, score-0.168]

30 cz/user:zeman:interset 2We use the first letter in the common tag as coarse-grained part-of-speech, and the first three as fine-grained part-of-speech. [sent-48, score-0.036]

31 We only use four ofthese treebanks, since Bulgarian and Czech as well as Danish and Swedish are very similar languages. [sent-50, score-0.038]

32 The four treebanks used in our experiments are thus those for Arabic, Bulgarian, Danish and Portuguese. [sent-51, score-0.272]

33 Arabic is a Semitic VSO language with relatively free word order and rich morphology. [sent-52, score-0.082]

34 Bulgarian is a Slavic language with relatively free word order and rich morphology. [sent-53, score-0.082]

35 Danish is a Germanic V2 language with relatively poor morphology. [sent-54, score-0.046]

36 Finally, Portuguese is a Roman language with relatively free word order and rich morphology. [sent-55, score-0.082]

37 In sum, we consider four languages that are less related than the language pairs studied in earlier papers on crosslanguage adaptation of dependency parsers. [sent-56, score-0.464]

38 The idea is that we order the labeled source data from most similar to least similar to our target data, using perplexity per word as metric, and use only a portion of the source data that is similar to our target data. [sent-59, score-0.666]

39 In cross-language adaptation, the sample selection bias is primarily a bias in marginal distribution P(x). [sent-60, score-0.146]

40 Consequently, each sentence should be weighted by where Pt is the target distribution, and Ps the source distribution. [sent-62, score-0.194]

41 To see this let x ∈ X in lowercase denote a specifiTco ov saeluee t hoisf t lheet input variable, an usnela dbeenoletde exam- PPst( xx)) ×× ple. [sent-63, score-0.058]

42 y ∈ Y in lowercase denotes a class value, and hx, yi i ∈s a la inbe lloewd example. [sent-64, score-0.14]

43 P(hx, yi) si sv athluee joint probability olafb tehlee dlab eexlaemdp example, a,nydi )Pˆ i(shx th, yi) i ntst empirical distribution. [sent-65, score-0.111]

44 We simplify tphaiisr f huxn,cytiio nca nfu brthee rre assuming thhat PPst( xx))=? [sent-67, score-0.141]

45 We use perplexity per word of the source language POS sequences relative to a model trained on target language POS sequences to guess whether Pt(x) is high or low. [sent-69, score-0.41]

46 The treebanks are first delexicalized and all features except part-of-speech tags removed. [sent-70, score-0.445]

47 The part-of-speech tags are mapped into a common tagset using the technique described in Zeman and Resnik (2008). [sent-71, score-0.234]

48 For our main results, which are presented in Figure 1, we use the remaining three treebanks as training material for each language. [sent-72, score-0.392]

49 The test section of the language in question is used for testing, while the POS sequences in the target training section is used for training the unsmoothed language model. [sent-73, score-0.294]

50 We use an unsmoothed trigram language model rather than a smoothed language model since modified Knesser-Ney smoothing is not defined for sequences of part-of-speech tags. [sent-74, score-0.161]

51 3 In our experiments we use a graph-based secondorder non-projective dependency parser that induces models using MIRA (McDonald et al. [sent-75, score-0.246]

52 Our baseline is the accuracy of our dependency parser trained on three languages and evaluated on the fourth language, where treebanks have been delexicalized, and part-of-speech tags mapped into a common format. [sent-82, score-0.705]

53 We then present results using the 90% most similar data points and results where the amount of labeled data used is selected using 100 sentences sampled from the training data as held-out data. [sent-84, score-0.187]

54 It can be seen that using 90% of the labeled data seems to be a good strategy if using held-out data is not an option. [sent-85, score-0.169]

55 Since we consider the unsupervised scenario where no labeled data is available for the target language, we consider the results obtained using the 90% most similar sentences in the labeled data as our primary results. [sent-86, score-0.387]

56 That we obtain good results training on all the three remaining treebanks for each language illustrates the robustness of our approach. [sent-87, score-0.379]

57 The results presented in Figure 2 are the best results obtained with varying amounts of source language data (10%, 20%, . [sent-89, score-0.124]

58 In all cases, we obtain slightly results with training material from only one language that are better than or as good as our main results, but differences are marginal. [sent-94, score-0.106]

59 We obtain the best results for Arabic training using labeled data from the Bulgarian treebank, and the best results for Bulgarian training on Portuguese only. [sent-95, score-0.144]

60 The best results for Danish were, somewhat surprisingly, obtained using the Arabic treebank,5 and the best results for Portuguese were obtained training only on Bulgarian data. [sent-96, score-0.099]

61 4 Error analysis Consider our analysis of the Arabic sentence in Figure 3, using the three remaining treebanks as source data. [sent-97, score-0.407]

62 First note that our dependency labels are all wrong; we did not map the dependency labels of the source and target treebanks into a common set of labels. [sent-98, score-0.826]

63 Our labels seem meaningful, but come 5Arabic and Danish have in common that definiteness is expressed by inflectional morphology, though, and both languages frequently use VSO constructions. [sent-100, score-0.155]

64 859ues Figure 2: Best results obtained with different combinations of source and target languages. [sent-127, score-0.228]

65 ’pnct’ from the Danish treebank and ’PUNC’ from the Portuguese one. [sent-131, score-0.088]

66 t23701% %forBul- garian, error reductions are equally distributed. [sent-136, score-0.042]

67 Generally, it is interesting that cross-language 685 adaptation and data point selection were less effective for Danish. [sent-137, score-0.301]

68 The Danish dependency treebank is annotated very differently from most other dependency treebanks; for example, the treebank adopts a DP-analysis of noun phrases. [sent-139, score-0.5]

69 Finally, we note that all languages benefit from removing the least similar 10% of the labeled source data, but results are less influenced by how much of the remaining data we use. [sent-140, score-0.344]

70 For example, for Bulgarian our baseline result using 100% of the source data is 44. [sent-141, score-0.09]

71 5%, and the result obtained using 90% of the source data is 70. [sent-142, score-0.124]

72 Using held-out data, we only use 80% of the source data, which is slightly better (70. [sent-144, score-0.09]

73 3%), but even if we only use 10% of the source data, our accuracy is still significantly better than the baseline (66. [sent-145, score-0.09]

74 5 Conclusions This paper presented a simple data point selection strategy for semi-supervised cross language adaptation where no labeled target data is available. [sent-147, score-0.583]

75 Since our strategy is a parameter-free wrapper method it can easily be applied to other dependency parsers and other problems in natural language processing, incl. [sent-149, score-0.267]

76 Corpusbased induction of syntactic structure: models of dependency and constituency. [sent-161, score-0.162]

77 Improving predictive inference under covariate shift by weighting the loglikelihood function. [sent-173, score-0.199]

78 Data-driven dependency parsing of new languages using incomplete and noisy training data. [sent-181, score-0.313]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('zeman', 0.421), ('danish', 0.321), ('bulgarian', 0.289), ('treebanks', 0.272), ('adaptation', 0.213), ('spreyer', 0.21), ('portuguese', 0.162), ('dependency', 0.162), ('resnik', 0.158), ('arabic', 0.158), ('delexicalized', 0.126), ('ganchev', 0.124), ('gillenwater', 0.112), ('ppst', 0.111), ('target', 0.104), ('covariate', 0.102), ('tagset', 0.102), ('source', 0.09), ('languages', 0.089), ('treebank', 0.088), ('kathrin', 0.084), ('naseem', 0.084), ('labeled', 0.082), ('perplexity', 0.082), ('swedish', 0.079), ('projection', 0.07), ('sequences', 0.067), ('shift', 0.065), ('xx', 0.064), ('dependents', 0.061), ('unsmoothed', 0.061), ('jonas', 0.061), ('vso', 0.058), ('kuhn', 0.058), ('lowercase', 0.058), ('strategy', 0.056), ('argm', 0.056), ('hx', 0.056), ('copenhagen', 0.056), ('selection', 0.054), ('constructions', 0.054), ('parser', 0.05), ('kuzman', 0.05), ('mapped', 0.049), ('parsers', 0.049), ('unsupervised', 0.047), ('tags', 0.047), ('bias', 0.046), ('relatively', 0.046), ('yi', 0.045), ('remaining', 0.045), ('material', 0.044), ('jennifer', 0.042), ('reductions', 0.042), ('cross', 0.04), ('german', 0.04), ('alignments', 0.039), ('similar', 0.038), ('smith', 0.038), ('ntst', 0.037), ('vrelid', 0.037), ('dlab', 0.037), ('hum', 0.037), ('inbe', 0.037), ('nca', 0.037), ('nfu', 0.037), ('nix', 0.037), ('oegaard', 0.037), ('punc', 0.037), ('tehlee', 0.037), ('tphaiisr', 0.037), ('common', 0.036), ('free', 0.036), ('points', 0.036), ('grammar', 0.035), ('klein', 0.035), ('eisner', 0.035), ('obtained', 0.034), ('reestimate', 0.034), ('secondorder', 0.034), ('languagespecific', 0.034), ('anders', 0.034), ('gaard', 0.034), ('graca', 0.034), ('hidetoshi', 0.034), ('njalsgade', 0.034), ('shimodaira', 0.034), ('point', 0.034), ('modified', 0.033), ('spanish', 0.032), ('weighting', 0.032), ('good', 0.031), ('pt', 0.031), ('fernando', 0.031), ('training', 0.031), ('parsing', 0.031), ('rre', 0.03), ('tain', 0.03), ('definiteness', 0.03), ('germanic', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

Author: Anders Sgaard

2 0.15633591 243 acl-2011-Partial Parsing from Bitext Projections

Author: Prashanth Mannem ; Aswarth Dara

Abstract: Recent work has shown how a parallel corpus can be leveraged to build syntactic parser for a target language by projecting automatic source parse onto the target sentence using word alignments. The projected target dependency parses are not always fully connected to be useful for training traditional dependency parsers. In this paper, we present a greedy non-directional parsing algorithm which doesn’t need a fully connected parse and can learn from partial parses by utilizing available structural and syntactic information in them. Our parser achieved statistically significant improvements over a baseline system that trains on only fully connected parses for Bulgarian, Spanish and Hindi. It also gave a significant improvement over previously reported results for Bulgarian and set a benchmark for Hindi.

3 0.14612441 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

Author: Alexander Volokh ; Gunter Neumann

Abstract: Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable number of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our approach is able to find only a portion of the errors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high precision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that the errors that we are able to detect are mostly different and that our approaches are complementary. 1

4 0.14465041 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

Author: Kevin Duh ; Akinori Fujino ; Masaaki Nagata

Abstract: Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece, we take a step back and make some general statements about crosslingual adaptation problems. First, we claim that domain mismatch is not caused by MT errors, and accuracy degradation will occur even in the case of perfect MT. Second, we argue that the cross-lingual adaptation problem is qualitatively different from other (monolingual) adaptation problems in NLP; thus new adaptation algorithms ought to be considered. This paper will describe a series of carefullydesigned experiments that led us to these conclusions. 1 Summary Question 1: If MT gave perfect translations (semantically), do we still have a domain adaptation challenge in cross-lingual sentiment classification? Answer: Yes. The reason is that while many lations of a word may be valid, the MT system have a systematic bias. For example, the word some” might be prevalent in English reviews, transmight “awebut in 429 translated reviews, the word “excellent” is generated instead. From the perspective of MT, this translation is correct and preserves sentiment polarity. But from the perspective of a classifier, there is a domain mismatch due to differences in word distributions. Question 2: Can we apply standard adaptation algorithms developed for other (monolingual) adaptation problems to cross-lingual adaptation? Answer: No. It appears that the interaction between target unlabeled data and source data can be rather unexpected in the case of cross-lingual adaptation. We do not know the reason, but our experiments show that the accuracy of adaptation algorithms in cross-lingual scenarios have much higher variance than monolingual scenarios. The goal of this opinion piece is to argue the need to better understand the characteristics of domain adaptation in cross-lingual problems. We invite the reader to disagree with our conclusion (that the true barrier to good performance is not insufficient MT quality, but inappropriate domain adaptation methods). Here we present a series of experiments that led us to this conclusion. First we describe the experiment design (§2) and baselines (§3), before answering Question §12 (§4) dan bda Question 32) (§5). 2 Experiment Design The cross-lingual setup is this: we have labeled data from source domain S and wish to build a sentiment classifier for target domain T. Domain mismatch can arise from language differences (e.g. English vs. translated text) or market differences (e.g. DVD vs. Book reviews). Our experiments will involve fixing Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 429–433, T to a common testset and varying S. This allows us to experiment with different settings for adaptation. We use the Amazon review dataset of Prettenhofer (2010)1 , due to its wide range of languages (English [EN], Japanese [JP], French [FR], German [DE]) and markets (music, DVD, books). Unlike Prettenhofer (2010), we reverse the direction of cross-lingual adaptation and consider English as target. English is not a low-resource language, but this setting allows for more comparisons. Each source dataset has 2000 reviews, equally balanced between positive and negative. The target has 2000 test samples, large unlabeled data (25k, 30k, 50k samples respectively for Music, DVD, and Books), and an additional 2000 labeled data reserved for oracle experiments. Texts in JP, FR, and DE are translated word-by-word into English with Google Translate.2 We perform three sets of experiments, shown in Table 1. Table 2 lists all the results; we will interpret them in the following sections. Target (T) Source (S) 312BDMToVuasbDkil-ecE1N:ExpDMB eorVuimsDkice-JEnPtN,s eBD,MtuoVBDpuoVsk:-iFDck-iERxFN,T DB,vVoMaDruky-sSiDc.E-, 3 How much performance degradation occurs in cross-lingual adaptation? First, we need to quantify the accuracy degradation under different source data, without consideration of domain adaptation methods. So we train a SVM classifier on labeled source data3, and directly apply it on test data. The oracle setting, which has no domain-mismatch (e.g. train on Music-EN, test on Music-EN), achieves an average test accuracy of (81.6 + 80.9 + 80.0)/3 = 80.8%4. Aver1http://www.webis.de/research/corpora/webis-cls-10 2This is done by querying foreign words to build a bilingual dictionary. The words are converted to tfidf unigram features. 3For all methods we try here, 5% of the 2000 labeled source samples are held-out for parameter tuning. 4See column EN of Table 2, Supervised SVM results. 430 age cross-lingual accuracies are: 69.4% (JP), 75.6% (FR), 77.0% (DE), so degradations compared to oracle are: -11% (JP), -5% (FR), -4% (DE).5 Crossmarket degradations are around -6%6. Observation 1: Degradations due to market and language mismatch are comparable in several cases (e.g. MUSIC-DE and DVD-EN perform similarly for target MUSIC-EN). Observation 2: The ranking of source language by decreasing accuracy is DE > FR > JP. Does this mean JP-EN is a more difficult language pair for MT? The next section will show that this is not necessarily the case. Certainly, the domain mismatch for JP is larger than DE, but this could be due to phenomenon other than MT errors. 4 Where exactly is the domain mismatch? 4.1 Theory of Domain Adaptation We analyze domain adaptation by the concepts of labeling and instance mismatch (Jiang and Zhai, 2007). Let pt(x, y) = pt (y|x)pt (x) be the target distribution of samples x (e.g. unigram feature vec- tor) and labels y (positive / negative). Let ps (x, y) = ps (y|x)ps (x) be the corresponding source distributio(ny. Wx)pe assume that one (or both) of the following distributions differ between source and target: • Instance mismatch: ps (x) pt (x). • Labeling mismatch: ps (y|x) pt(y|x). Instance mismatch implies that the input feature vectors have different distribution (e.g. one dataset uses the word “excellent” often, while the other uses the word “awesome”). This degrades performance because classifiers trained on “excellent” might not know how to classify texts with the word “awesome.” The solution is to tie together these features (Blitzer et al., 2006) or re-weight the input distribution (Sugiyama et al., 2008). Under some assumptions (i.e. covariate shift), oracle accuracy can be achieved theoretically (Shimodaira, 2000). Labeling mismatch implies the same input has different labels in different domains. For example, the JP word meaning “excellent” may be mistranslated as “bad” in English. Then, positive JP = = 5See “Adapt by Language” columns of Table 2. Note JP+FR+DE condition has 6000 labeled samples, so is not directly comparable to other adaptation scenarios (2000 samples). Nevertheless, mixing languages seem to give good results. 6See “Adapt by Market” columns of Table 2. TargetClassifierOEraNcleJPAFdaRpt bDyE LanJgPu+agFeR+DEMUASdIaCpt D byV MDar BkeOtOK MUSIC-ENSAudpaeprtvedise TdS SVVMM8719..666783..50 7745..62 7 776..937880..36--7768..847745..16 DVD-ENSAudpaeprtveidse TdS SVVMM8801..907701..14 7765..54 7 767..347789..477754..28--7746..57 BOOK-ENSAudpaeprtveidse TdS SVVMM8801..026793..68 7775..64 7 767..747799..957735..417767..24-Table 2: Test accuracies (%) for English Music/DVD/Book reviews. Each column is an adaptation scenario using different source data. The source data may vary by language or by market. For example, the first row shows that for the target of Music-EN, the accuracy of a SVM trained on translated JP reviews (in the same market) is 68.5, while the accuracy of a SVM trained on DVD reviews (in the same language) is 76.8. “Oracle” indicates training on the same market and same language domain as the target. “JP+FR+DE” indicates the concatenation of JP, FR, DE as source data. Boldface shows the winner of Supervised vs. Adapted. reviews ps (y will be associated = +1|x = bad) co(nydit =io +na1l − |x = 1 will be high, whereas the true xdis =tr bibaudti)o wn bad) instead. labeling mismatch, with the word “bad”: lslh boeu hldi hha,v we high pt(y = There are several cases for depending on sheovwe tahle c polarity changes (Table 3). The solution is to filter out these noisy samples (Jiang and Zhai, 2007) or optimize loosely-linked objectives through shared parameters or Bayesian priors (Finkel and Manning, 2009). Which mismatch is responsible for accuracy degradations in cross-lingual adaptation? • Instance mismatch: Systematic Iantessta nwcoerd m diissmtraibtcuhti:on Ssy MT bias gener- sdtiefmferaetinct MfroTm b naturally- occurring English. (Translation may be valid.) Label mismatch: MT error mis-translates a word iLnatob something w: MithT Td eifrfreorren mti polarity. Conclusion from §4.2 and §4.3: Instance mismaCtcohn occurs often; M §4T. error appears Imnisntainmcael. • Mis-translated polarity Effect Taeb0+±.lge→ .3(:±“ 0−tgLhoae b”nd →l m− i“sg→m otbah+dce”h):mIfpoLAinse ca-ptsoriuaesncvieatl /ndioeansgbvcaewrptlimovaeshipntdvaei(+), negative (−), or neutral (0) words have different effects. Wnege athtiivnek ( −th)e, foirrs nt tuwtroa cases hoardves graceful degradation, but the third case may be catastrophic. 431 4.2 Analysis of Instance Mismatch To measure instance mismatch, we compute statistics between ps (x) and pt(x), or approximations thereof: First, we calculate a (normalized) average feature from all samples of source S, which represents the unigram distribution of MT output. Simi- larly, the average feature vector for target T approximates the unigram distribution of English reviews pt(x). Then we measure: • KL Divergence between Avg(S) and Avg(T), wKhLer De Avg() nisc eth bee average Avvegct(oSr.) • Set Coverage of Avg(T) on Avg(S): how many Sweotrd C (type) ien o Tf appears oatn le Aavsgt once ionw wS .m Both measures correlate strongly with final accuracy, as seen in Figure 1. The correlation coefficients are r = −0.78 for KL Divergence and r = 0.71 for Coverage, 0 b.7o8th statistically significant (p < 0.05). This implies that instance mismatch is an important reason for the degradations seen in Section 3.7 4.3 Analysis of Labeling Mismatch We measure labeling mismatch by looking at differences in the weight vectors of oracle SVM and adapted SVM. Intuitively, if a feature has positive weight in the oracle SVM, but negative weight in the adapted SVM, then it is likely a MT mis-translation 7The observant reader may notice that cross-market points exhibit higher coverage but equal accuracy (74-78%) to some cross-lingual points. This suggests that MT output may be more constrained in vocabulary than naturally-occurring English. 0.35 0.3 gnvLrDeiceKe0 0 0. 120.25 510 erts TeCovega0 0 0. .98657 68 70 72 7A4ccuracy76 78 80 82 0.4 68 70 72 7A4ccuracy76 78 80 82 Figure 1: KL Divergence and Coverage vs. accuracy. (o) are cross-lingual and (x) are cross-market data points. is causing the polarity flip. Algorithm 1 (with K=2000) shows how we compute polarity flip rate.8 We found that the polarity flip rate does not correlate well with accuracy at all (r = 0.04). Conclusion: Labeling mismatch is not a factor in performance degradation. Nevertheless, we note there is a surprising large number of flips (24% on average). A manual check of the flipped words in BOOK-JP revealed few MT mistakes. Only 3.7% of 450 random EN-JP word pairs checked can be judged as blatantly incorrect (without sentence context). The majority of flipped words do not have a clear sentiment orientation (e.g. “amazon”, “human”, “moreover”). 5 Are standard adaptation algorithms applicable to cross-lingual problems? One of the breakthroughs in cross-lingual text classification is the realization that it can be cast as domain adaptation. This makes available a host of preexisting adaptation algorithms for improving over supervised results. However, we argue that it may be 8The feature normalization in Step 1 is important that the weight magnitudes are comparable. to ensure 432 Algorithm 1 Measuring labeling mismatch Input: Weight vectors for source wsand target wt Input: Target data average sample vector avg(T) Output: Polarity flip rate f 1: Normalize: ws = avg(T) * ws ; wt = avg(T) * wt 2: Set S+ = { K most positive features in ws} 3: Set S− == {{ KK mmoosstt negative ffeeaattuurreess inn wws}} 4: Set T+ == {{ KK m moosstt npoesgiatitivvee f efeaatuturreess i inn w wt}} 5: Set T− == {{ KK mmoosstt negative ffeeaattuurreess inn wwt}} 6: for each= f{e a Ktur me io ∈t T+ adtiov 7: rif e ia c∈h S fe−a ttuhreen i if ∈ = T f + 1 8: enidf fio ∈r 9: for each feature j ∈ T− do 10: rif e j ∈h Sfe+a uthreen j f ∈ = T f + 1 11: enidf fjo r∈ 12: f = 2Kf better to “adapt” the standard adaptation algorithm to the cross-lingual setting. We arrived at this conclusion by trying the adapted counterpart of SVMs off-the-shelf. Recently, (Bergamo and Torresani, 2010) showed that Transductive SVMs (TSVM), originally developed for semi-supervised learning, are also strong adaptation methods. The idea is to train on source data like a SVM, but encourage the classification boundary to divide through low density regions in the unlabeled target data. Table 2 shows that TSVM outperforms SVM in all but one case for cross-market adaptation, but gives mixed results for cross-lingual adaptation. This is a puzzling result considering that both use the same unlabeled data. Why does TSVM exhibit such a large variance on cross-lingual problems, but not on cross-market problems? Is unlabeled target data interacting with source data in some unexpected way? Certainly there are several successful studies (Wan, 2009; Wei and Pal, 2010; Banea et al., 2008), but we think it is important to consider the possibility that cross-lingual adaptation has some fundamental differences. We conjecture that adapting from artificially-generated text (e.g. MT output) is a different story than adapting from naturallyoccurring text (e.g. cross-market). In short, MT is ripe for cross-lingual adaptation; what is not ripe is probably our understanding of the special characteristics of the adaptation problem. References Carmen Banea, Rada Mihalcea, Janyce Wiebe, and Samer Hassan. 2008. Multilingual subjectivity analysis using machine translation. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS). John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In Proc. of Conference on Empirical Methods in Natural Language Processing (EMNLP). Jenny Rose Finkel and Chris Manning. 2009. Hierarchical Bayesian domain adaptation. In Proc. of NAACL Human Language Technologies (HLT). Jing Jiang and ChengXiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In Proc. of the Association for Computational Linguistics (ACL). Peter Prettenhofer and Benno Stein. 2010. Crosslanguage text classification using structural correspondence learning. In Proc. of the Association for Computational Linguistics (ACL). Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the loglikelihood function. Journal of Statistical Planning and Inferenc, 90. Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B ¨unau, and Motoaki Kawanabe. 2008. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60(4). Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proc. of the Association for Computational Linguistics (ACL). Bin Wei and Chris Pal. 2010. Cross lingual adaptation: an experiment on sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers. 433

5 0.12790252 342 acl-2011-full-for-print

Author: Kuzman Ganchev

Abstract: unkown-abstract

6 0.12255412 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

7 0.11769816 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

8 0.10703953 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

9 0.10640965 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality

10 0.10208489 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

11 0.10200632 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

12 0.09827812 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

13 0.096712127 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

14 0.095301576 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

15 0.09476795 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

16 0.094264895 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

17 0.093941361 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

18 0.089158215 256 acl-2011-Query Weighting for Ranking Model Adaptation

19 0.085481033 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

20 0.084718309 109 acl-2011-Effective Measures of Domain Similarity for Parsing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.196), (1, -0.041), (2, 0.002), (3, -0.136), (4, -0.033), (5, -0.039), (6, 0.122), (7, 0.053), (8, 0.075), (9, 0.096), (10, 0.041), (11, -0.014), (12, -0.025), (13, -0.021), (14, 0.092), (15, -0.009), (16, -0.005), (17, 0.0), (18, -0.052), (19, -0.117), (20, -0.124), (21, -0.072), (22, 0.034), (23, 0.064), (24, 0.009), (25, 0.036), (26, -0.105), (27, -0.051), (28, 0.069), (29, -0.037), (30, 0.008), (31, 0.036), (32, 0.063), (33, -0.048), (34, 0.079), (35, -0.045), (36, -0.055), (37, 0.069), (38, -0.046), (39, -0.082), (40, -0.017), (41, 0.035), (42, -0.023), (43, -0.026), (44, -0.086), (45, -0.075), (46, -0.006), (47, 0.062), (48, -0.04), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94023091 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

Author: Anders Sgaard

2 0.65826714 243 acl-2011-Partial Parsing from Bitext Projections

Author: Prashanth Mannem ; Aswarth Dara

3 0.63036132 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

Author: Roy Schwartz ; Omri Abend ; Roi Reichart ; Ari Rappoport

Abstract: Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon.

4 0.62302405 342 acl-2011-full-for-print

Author: Kuzman Ganchev

Abstract: unkown-abstract

5 0.61841881 164 acl-2011-Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features

Author: Yuval Marton ; Nizar Habash ; Owen Rambow

Abstract: We explore the contribution of morphological features both lexical and inflectional to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and the undiacritzed lemma are most helpful for parsing on automatically tagged input. We further contrast the contribution of form-based and functional features, and show that functional gender and number (e.g., “broken plurals”) and the related rationality feature improve over form-based features. It is the first time functional morphological features are used for Arabic NLP. – –

6 0.61717737 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

7 0.61193627 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

8 0.6029222 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

9 0.60062236 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

10 0.5855121 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

11 0.5843398 109 acl-2011-Effective Measures of Domain Similarity for Parsing

12 0.54396778 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

13 0.53718054 7 acl-2011-A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality

14 0.53023744 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

15 0.52908128 238 acl-2011-P11-2093 k2opt.pdf

16 0.52296001 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

17 0.51039243 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

18 0.50525761 289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

19 0.50509721 278 acl-2011-Semi-supervised condensed nearest neighbor for part-of-speech tagging

20 0.4984948 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.013), (17, 0.045), (25, 0.214), (26, 0.024), (37, 0.174), (39, 0.075), (41, 0.086), (53, 0.012), (55, 0.022), (59, 0.044), (72, 0.024), (77, 0.042), (91, 0.031), (96, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80898046 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

Author: Hui Lin ; Jeff Bilmes

Abstract: We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches.

same-paper 2 0.80484951 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

Author: Anders Sgaard

3 0.76969004 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons

Author: Tara McIntosh ; Lars Yencken ; James R. Curran ; Timothy Baldwin

Abstract: State-of-the-art bootstrapping systems rely on expert-crafted semantic constraints such as negative categories to reduce semantic drift. Unfortunately, their use introduces a substantial amount of supervised knowledge. We present the Relation Guided Bootstrapping (RGB) algorithm, which simultaneously extracts lexicons and open relationships to guide lexicon growth and reduce semantic drift. This removes the necessity for manually crafting category and relationship constraints, and manually generating negative categories.

4 0.71786302 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.

5 0.71312153 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.

6 0.70683455 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

7 0.70556319 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

8 0.70424128 334 acl-2011-Which Noun Phrases Denote Which Concepts?

9 0.70308506 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

10 0.70026332 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

11 0.70011711 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

12 0.70007408 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

13 0.69973177 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

14 0.69933462 204 acl-2011-Learning Word Vectors for Sentiment Analysis

15 0.69816786 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

16 0.69712281 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

17 0.69674659 256 acl-2011-Query Weighting for Ranking Model Adaptation

18 0.69372797 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

19 0.69346976 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

20 0.69313926 85 acl-2011-Coreference Resolution with World Knowledge