acl acl2012 acl2012-172 knowledge-graph by maker-knowledge-mining

172 acl-2012-Selective Sharing for Multilingual Dependency Parsing


Source: pdf

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. [sent-3, score-0.7]

2 Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. [sent-4, score-0.441]

3 The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. [sent-5, score-0.529]

4 The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. [sent-6, score-0.678]

5 Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. [sent-7, score-0.291]

6 In contrast, the ordering decisions are only influenced by languages with similar properties. [sent-8, score-0.47]

7 We systematically model this cross-lingual sharing using typological features. [sent-9, score-0.684]

8 The largest improvement is achieved on the non Indo-European languages yielding a gain of 14. [sent-11, score-0.276]

9 Standard approaches for extending these techniques to resourcelean languages either use parallel corpora or rely on 1The source code for the work presented in this paper is available at http://groups. [sent-14, score-0.344]

10 Unfortunately, for many languages there are no available parallel corpora or annotated resources in related languages. [sent-24, score-0.269]

11 For such languages the only remaining option is to resort to unsupervised approaches, which are known to produce highly inaccurate results. [sent-25, score-0.261]

12 In contrast to previous approaches, this algorithm can learn dependency structures using annotations from a diverse set of source languages, even if this set is not related to the target language. [sent-27, score-0.425]

13 In our selective sharing approach, the algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. [sent-28, score-0.849]

14 However, the order of these dependents with respect to the parent is influenced by the typological features of each language. [sent-32, score-0.97]

15 To implement this intuition, we factorize generation of a dependency tree into two processes: selection of syntactic dependents and their ordering. [sent-33, score-0.632]

16 The first component models the distribution of dependents for each part-of-speech tag, abstracting over their order. [sent-34, score-0.439]

17 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi6c 2s9–637, ordering of dependents varies greatly across languages and therefore should only be influenced by languages with similar properties. [sent-38, score-1.093]

18 To systematically model this cross-lingual sharing, we rely on typological features that reflect ordering preferences of a given language. [sent-42, score-0.775]

19 In addition to the known typological features, our parsing model embeds latent features that can capture cross– lingual structural similarities. [sent-43, score-0.704]

20 While the approach described so far supports a seamless transfer of shared information, it does not account for syntactic properties of the target language unseen in the training languages. [sent-44, score-0.417]

21 To handle such cases, our approach augments cross-lingual sharing with unsupervised learning on the target languages. [sent-46, score-0.311]

22 We evaluated our selective sharing model on 17 languages from 10 language families. [sent-47, score-0.594]

23 On this diverse set, our model consistently outperforms stateof-the-art multilingual dependency parsers. [sent-48, score-0.341]

24 We also demonstrate that in the absence of observed typological information, a set of automatically induced latent features can effectively work as a proxy for typology. [sent-53, score-0.601]

25 However, recent work in multilingual parsing has demonstrated the feasibility of transfer in the absence of parallel data. [sent-57, score-0.377]

26 The challenge, however, is to enable dependency transfer for target languages that exhibit structural differences from source languages. [sent-65, score-0.833]

27 In such cases, the extent of multilingual transfer is determined by the relation between source and target languages. [sent-66, score-0.476]

28 (201 1) do not use a predefined linguistic hierarchy of language relations, but instead learn the contribution of source languages to the training mixture based on the likelihood of the target language. [sent-69, score-0.541]

29 While all of the above techniques demonstrate gains from modeling language relatedness, they still underperform when the source and target languages are unrelated. [sent-72, score-0.459]

30 Our model differs from the above approaches in its emphasis on the selective information sharing driven by language relatedness. [sent-73, score-0.366]

31 As our evaluation demonstrates, this layered approach broadens the advantages of multilingual learning to languages that exhibit significant differences from the languages in the training mix. [sent-75, score-0.669]

32 3 Linguistic Motivation Language-Independent Dependency Properties Despite significant syntactic differences, human languages exhibit striking similarity in dependency patterns. [sent-76, score-0.486]

33 For a given part-of-speech tag, the set of tags that can occur as its dependents is largely consistent across languages. [sent-77, score-0.432]

34 For instance, adverbs and nouns are likely to be dependents of verbs, while adjectives are not. [sent-78, score-0.358]

35 Shared Dependency Properties Unlike dependent selection, the ordering of dependents in a sentence differs greatly across languages. [sent-80, score-0.635]

36 In fact, crosslingual syntactic variations are primarily expressed in different ordering of dependents (Harris, 1968; Greenberg, 1963). [sent-81, score-0.642]

37 Moreover, a language may be close to different languages for different dependency types. [sent-85, score-0.368]

38 Therefore, we seek a model that can express parameter sharing at the level of dependency types and can benefit from known language relations. [sent-87, score-0.342]

39 This is particularly true given a limited number of supervised source languages; it is quite likely that a target language will have previously unseen syntactic phenomena. [sent-89, score-0.293]

40 4 Model We propose a probabilistic model for generating dependency trees that facilitates parameter sharing across languages. [sent-91, score-0.414]

41 We assume a setup where de- pendency tree annotations are available for a set of source languages and we want to use these annotations to infer a parser for a target language. [sent-92, score-0.548]

42 We also assume that both source and target languages are annotated with a coarse parts-of-speech tagset which is shared across languages. [sent-94, score-0.593]

43 The key feature of our model is a two-tier approach that separates the selection of dependents from their ordering: 631 1. [sent-98, score-0.49]

44 As mentioned in Section 3, the ordering of dependents is largely determined by the typological features of the language. [sent-106, score-1.124]

45 We also experiment with a variant of our model where typological features are not observed. [sent-108, score-0.608]

46 Instead, the model captures structural variations across languages by means of a small set of binary latent features. [sent-109, score-0.419]

47 1 Generative Process Our model generates dependency trees one fragment at a time. [sent-116, score-0.304]

48 A fragment is defined as a subtree comprising the immediate dependents of any node in the tree. [sent-117, score-0.441]

49 A fragment with head node h is generated in language lvia the following stages: h { , , , , D} (a) h { , , } (b) h { ,D} D (c) Figure 1: The steps of the generative process for a fragment with head h. [sent-121, score-0.292]

50 In step (a), the unordered set of dependents is chosen. [sent-122, score-0.477]

51 • Generate the set of dependents of h via a distribGuetnioenra Psel (S|h). [sent-125, score-0.358]

52 dTihsitsr irbeustuilotns in two u|an,ohrd,le)r,ed w hseertse SR, SL, ,tLhe} right sa rneds uleltfst dependents of h. [sent-132, score-0.397]

53 This part does depend on the language l, since the relative ordering of dependents is not likely to be universal. [sent-133, score-0.561]

54 The first step constitutes the selection component and the last two steps constitute the ordering component. [sent-139, score-0.409]

55 Given this generation scheme, the probability P(D) of generating a given fragment D with head h will be: • Psel({D}|h)YPord(dD(a)|a,h,l)n(DR)1n(DL) (1) Where we use the following notations: • DR, DL denote the parts of the fragment that are to the left and right of h. [sent-140, score-0.268]

56 , 3=We { acknowledge nth nat( assuming a uniform distribution over the permutations ofthe right and left dependents is linguistically counterintuitive. [sent-145, score-0.443]

57 1 Selection Component The selection component draws an unordered set of tags S given the head tag h. [sent-155, score-0.408]

58 First the number of dependents n is drawn from a distribution: Psize(n|h) = θsize(n|h) (2) where θsize(n| h) is a parameter for each value of n and h. [sent-157, score-0.358]

59 W(ne hre)st irsic at tphaer mmaextiemru fomr evaaclhue oalfu n otof four, since this is a reasonable bound on the total number of dependents for a single parent node in a tree. [sent-158, score-0.405]

60 (3) I8 D91657A FO erd ate ur o fDSAGNeudscbjnpmeoirtcpsvnait,eolOarn btdajievncNdtoa nuod NnVoeurbPSNADGoaVduesljmnOtpieoc,svrnSiaeOl-trNViao,tnVuvsoe,Sn-PONr,eopVu nOs,i-SNtGoAeuOndmVsi,jteS-IvDcrnatOepilvmoSsVtn raive Table 1: The set of typological features that we use in our model. [sent-164, score-0.526]

61 2 Ordering Component The ordering component consists of distributions Pord(d|a, h, l) that determine whether tag a will be mapped ,toh ,tlh)e lheaftt or right oef w thheet hheerad ta tag wh. [sent-169, score-0.441]

62 i l W bee model it using the following log-linear model: Pord(d|a,h,l) = Zord(a1,h,l)eword·g(d,a,h,vl) Zord(a,h,l) = X eword·g(d,a,h,vl) d∈X{R,L} Note that in the above equations the ordering component depends on the known typological features vl. [sent-170, score-0.856]

63 In the setup when typological features are not known, vl is replaced with the latent ordering feature set bl. [sent-171, score-0.804]

64 2 Typological Features The typological features we use are a subset of order-related typological features from “The World Atlas of Language Structure” (Haspelmath et al. [sent-176, score-1.052]

65 We include only those features whose values are available for all the languages in our dataset. [sent-178, score-0.272]

66 The derivations include the choice of unordered sets size n, the unordered sets themselves S, their left/right allocations and the orderings within the left and right branches. [sent-199, score-0.277]

67 Although it is possible to run this exact algorithm in our case, where the number of dependents is limited to 4, we use an approximation that works well in practice: instead of n(1Sr) we use |S1r|! [sent-218, score-0.358]

68 4This corresponds to the case when typological features are not known. [sent-222, score-0.526]

69 When the model involves latent typological variables, the initialization of these variables can impact the final performance. [sent-240, score-0.603]

70 As a selection criterion for initialization, we consider the performance of the final model averaged over the supervised source languages. [sent-241, score-0.255]

71 Likewise, the threshold value b for the PR constraint on the dependency length is tuned on the source languages, using average test set accuracy as the selection criterion. [sent-243, score-0.301]

72 Baselines We compare against the state-of-the-art multilingual dependency parsers that do not use parallel corpora for training. [sent-244, score-0.293]

73 The first baseline, Transfer, uses direct transfer of a discriminative parser trained on all the source languages (McDonald et al. [sent-246, score-0.537]

74 In the second baseline (Mixture), parameters of the target language are estimated as a weighted mixture of the parameters learned from annotated source languages (Cohen et al. [sent-249, score-0.619]

75 The underlying parsing model is the dependency model with valance (DMV) (Klein and Manning, 2004). [sent-251, score-0.289]

76 Originally, the baseline methods were evaluated on different sets of languages using a different tag map- ping. [sent-252, score-0.287]

77 For the Transfer baseline, for each target language we trained the model on all other languages in our dataset. [sent-254, score-0.43]

78 For the Mixture baseline, we trained the model on the same four languages used in the original paper English, German, Czech and Italian. [sent-255, score-0.308]

79 Comparison against Baselines On average, the selective sharing model outperforms both baselines, yielding 8. [sent-258, score-0.366]

80 Our model outperforms the weighted mixture model on 15 of the 17 languages and the transfer method on 12 of the 17 languages. [sent-263, score-0.565]

81 The average accuracy of our supervised model on these languages is 66. [sent-274, score-0.322]

82 Since Indo-European languages are overrepresented in our dataset, a target language from this family is likely to exhibit more similarity to the training data. [sent-277, score-0.42]

83 A similar trait can be seen by comparing the performance of our model to an oracle version of our model which selects the optimal source language for a given target language (column 7). [sent-279, score-0.289]

84 However, the gain for non Indo-European languages is 1. [sent-281, score-0.276]

85 We compare the performance of our model (column 6) against a variant (column 8) where this component is trained from annotations on the target language. [sent-285, score-0.364]

86 To assess the contribution of other layers of selective sharing, we first explore the role of typological features in learning the ordering component. [sent-288, score-0.893]

87 When the model does not have access to observed typological features, and does not use latent ones (column 4), the accuracy drops by 2. [sent-289, score-0.603]

88 Latent typological features (column 5) do not yield the same gain as observed ones, but they do improve the performance of the typology-free model by 1. [sent-294, score-0.62]

89 When the model has to make all the ordering decisions based on meta-linguistic features without account for unique properties of the target languages, the performance decreases by 0. [sent-297, score-0.454]

90 To assess the relative difficulty of learning the ordering and selection components, we consider model variants where each of these components is 6In this setup, the ordering component is trained in an unsu- pervised fashion on the target language. [sent-299, score-0.814]

91 MLE Table 2: Directed dependency accuracy of different variants of our selective sharing model and the baselines. [sent-304, score-0.506]

92 nTceo ionfd oiucart meso tdheel use offef roebnste srevtetidn typological cfaetaetsur thees fporre aelnl languages oanf dra Twl ainrdgiecta ltaensg tuhaeg use toaf d luartienngt typological features for all languages. [sent-310, score-1.236]

93 (Best Pair) Model parameters are borrowed from the best source language based on the accuracy on the target language b. [sent-312, score-0.255]

94 (MLE) All model parameters are trained on the target language in a supervised fashion. [sent-319, score-0.308]

95 This finding is expected given that ordering involves selective sharing from multiple languages. [sent-327, score-0.523]

96 Overall, the performance gap between the selective sharing model and its monolingual supervised counterpart is 7. [sent-328, score-0.455]

97 636 8 Conclusions We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unan- notated language. [sent-333, score-0.7]

98 Overall, our model consistently outperforms the multi-source transfer based dependency parser of McDonald et al. [sent-334, score-0.386]

99 Our experiments demonstrate that the model is particularly effective in processing languages that exhibit significant differences from the training languages. [sent-336, score-0.375]

100 Two languages are better than one (for syntactic parsing). [sent-349, score-0.276]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('typological', 0.482), ('dependents', 0.358), ('languages', 0.228), ('ordering', 0.203), ('transfer', 0.167), ('selective', 0.164), ('sharing', 0.156), ('dependency', 0.14), ('target', 0.122), ('naseem', 0.119), ('unordered', 0.119), ('multilingual', 0.112), ('mcdonald', 0.092), ('posterior', 0.086), ('selection', 0.086), ('sel', 0.085), ('fragment', 0.083), ('component', 0.081), ('zeman', 0.08), ('mixture', 0.078), ('column', 0.076), ('latent', 0.075), ('source', 0.075), ('gaard', 0.073), ('gra', 0.073), ('exhibit', 0.07), ('haspelmath', 0.069), ('pord', 0.069), ('psel', 0.069), ('tahira', 0.068), ('sr', 0.064), ('dl', 0.064), ('head', 0.063), ('cohen', 0.061), ('universals', 0.06), ('tag', 0.059), ('resnik', 0.059), ('tagset', 0.059), ('parameters', 0.058), ('parsing', 0.057), ('universal', 0.053), ('marginalize', 0.051), ('si', 0.049), ('gain', 0.048), ('mle', 0.048), ('supervised', 0.048), ('syntactic', 0.048), ('parent', 0.047), ('permutations', 0.046), ('bli', 0.046), ('comrie', 0.046), ('epsi', 0.046), ('eword', 0.046), ('pset', 0.046), ('psize', 0.046), ('model', 0.046), ('annotations', 0.045), ('features', 0.044), ('regina', 0.044), ('conll', 0.043), ('diverse', 0.043), ('parallel', 0.041), ('shared', 0.041), ('monolingual', 0.041), ('hwa', 0.041), ('phylogenetic', 0.04), ('atlas', 0.04), ('bernard', 0.04), ('csail', 0.04), ('zord', 0.04), ('constitutes', 0.039), ('influenced', 0.039), ('fashion', 0.039), ('properties', 0.039), ('right', 0.039), ('likelihood', 0.038), ('ryan', 0.037), ('pr', 0.037), ('largely', 0.037), ('dependent', 0.037), ('across', 0.037), ('dmv', 0.037), ('portuguese', 0.037), ('greenberg', 0.037), ('variant', 0.036), ('trees', 0.035), ('wt', 0.035), ('baselines', 0.035), ('gains', 0.034), ('trained', 0.034), ('parser', 0.033), ('variations', 0.033), ('unsupervised', 0.033), ('buchholz', 0.032), ('families', 0.032), ('ith', 0.032), ('smith', 0.031), ('mit', 0.031), ('coarse', 0.031), ('differences', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

2 0.139972 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Author: Wenliang Chen ; Min Zhang ; Haizhou Li

Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.

3 0.12214352 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata

Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1

4 0.11319434 64 acl-2012-Crosslingual Induction of Semantic Roles

Author: Ivan Titov ; Alexandre Klementiev

Abstract: We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. Specifically, we consider unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and use a stateof-the-art generative Bayesian non-parametric model. At inference time, instead of only seeking the model which explains the monolingual data available for each language, we regularize the objective by introducing a soft constraint penalizing for disagreement in argument labeling on aligned sentences. We propose a simple approximate learning algorithm for our set-up which results in efficient inference. When applied to German-English parallel data, our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on non-parallel sentences.

5 0.1073867 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.

6 0.10669561 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

7 0.10636681 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

8 0.097624011 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

9 0.096005812 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

10 0.095823698 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

11 0.093730882 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

12 0.093617544 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

13 0.092819437 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

14 0.091815941 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

15 0.091070406 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

16 0.088192761 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

17 0.083997518 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums

18 0.083664231 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

19 0.078088313 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

20 0.077348925 163 acl-2012-Prediction of Learning Curves in Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.252), (1, -0.011), (2, -0.109), (3, -0.092), (4, -0.037), (5, -0.003), (6, 0.0), (7, -0.023), (8, 0.021), (9, -0.029), (10, 0.078), (11, 0.012), (12, 0.017), (13, -0.043), (14, -0.09), (15, -0.019), (16, 0.044), (17, -0.011), (18, 0.017), (19, 0.102), (20, -0.013), (21, -0.053), (22, 0.056), (23, -0.129), (24, -0.038), (25, 0.071), (26, -0.023), (27, 0.101), (28, -0.008), (29, 0.141), (30, -0.071), (31, 0.001), (32, 0.088), (33, 0.019), (34, -0.019), (35, -0.008), (36, -0.116), (37, -0.097), (38, -0.044), (39, -0.102), (40, 0.006), (41, 0.131), (42, -0.132), (43, -0.071), (44, -0.093), (45, -0.106), (46, 0.092), (47, 0.03), (48, -0.129), (49, -0.091)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94312733 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

2 0.65456325 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

Author: Dave Golland ; John DeNero ; Jakob Uszkoreit

Abstract: We present LLCCM, a log-linear variant ofthe constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.

3 0.6027087 194 acl-2012-Text Segmentation by Language Using Minimum Description Length

Author: Hiroshi Yamaguchi ; Kumiko Tanaka-Ishii

Abstract: The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solution finds the segments and their languages through dynamic programming. Empirical results demonstrating the potential of this approach are presented for experiments using texts taken from the Universal Declaration of Human Rights and Wikipedia, covering more than 200 languages.

4 0.59988099 163 acl-2012-Prediction of Learning Curves in Machine Translation

Author: Prasanth Kolachina ; Nicola Cancedda ; Marc Dymetman ; Sriram Venkatapathy

Abstract: Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. Since ad-hoc manual translation can represent a significant investment in time and money, a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work, we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available. We propose methods for predicting learning curves in both these scenarios.

5 0.58738458 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

Author: Marco Lui ; Timothy Baldwin

Abstract: We present langid .py, an off-the-shelflanguage identification tool. We discuss the design and implementation of langid .py, and provide an empirical comparison on 5 longdocument datasets, and 2 datasets from the microblog domain. We find that langid .py maintains consistently high accuracy across all domains, making it ideal for end-users that require language identification without wanting to invest in preparation of in-domain training data.

6 0.54219413 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

7 0.53935283 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

8 0.52383 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

9 0.52172101 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

10 0.51805151 34 acl-2012-Automatically Learning Measures of Child Language Development

11 0.51487356 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

12 0.51463008 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

13 0.50276661 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing

14 0.48721638 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation

15 0.48042786 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

16 0.47170568 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

17 0.46734035 64 acl-2012-Crosslingual Induction of Semantic Roles

18 0.46061417 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

19 0.44300911 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

20 0.44140834 200 acl-2012-Toward Automatically Assembling Hittite-Language Cuneiform Tablet Fragments into Larger Texts


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.018), (26, 0.042), (28, 0.055), (30, 0.034), (37, 0.061), (39, 0.046), (44, 0.145), (57, 0.012), (59, 0.012), (61, 0.015), (71, 0.024), (74, 0.039), (82, 0.033), (84, 0.027), (85, 0.041), (90, 0.179), (92, 0.06), (94, 0.029), (99, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86958194 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing

Author: Tahira Naseem ; Regina Barzilay ; Amir Globerson

Abstract: We present a novel algorithm for multilingual dependency parsing that uses annotations from a diverse set of source languages to parse a new unannotated language. Our motivation is to broaden the advantages of multilingual learning to languages that exhibit significant differences from existing resource-rich languages. The algorithm learns which aspects of the source languages are relevant for the target language and ties model parameters accordingly. The model factorizes the process of generating a dependency tree into two steps: selection of syntactic dependents and their ordering. Being largely languageuniversal, the selection component is learned in a supervised fashion from all the training languages. In contrast, the ordering decisions are only influenced by languages with similar properties. We systematically model this cross-lingual sharing using typological features. In our experiments, the model consistently outperforms a state-of-the-art multilingual parser. The largest improvement is achieved on the non Indo-European languages yielding a gain of 14.4%.1

2 0.84246355 23 acl-2012-A Two-step Approach to Sentence Compression of Spoken Utterances

Author: Dong Wang ; Xian Qian ; Yang Liu

Abstract: This paper presents a two-step approach to compress spontaneous spoken utterances. In the first step, we use a sequence labeling method to determine if a word in the utterance can be removed, and generate n-best compressed sentences. In the second step, we use a discriminative training approach to capture sentence level global information from the candidates and rerank them. For evaluation, we compare our system output with multiple human references. Our results show that the new features we introduced in the first compression step improve performance upon the previous work on the same data set, and reranking is able to yield additional gain, especially when training is performed to take into account multiple references.

3 0.81829286 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

Author: Adam Pauls ; Dan Klein

Abstract: We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks, and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks, despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment.

4 0.8167432 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer

Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.

5 0.81379014 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Author: Weiwei Sun ; Hans Uszkoreit

Abstract: From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by constituent parsing and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated approaches yield a relative error reduction of 18% in total over a stateof-the-art baseline.

6 0.80975646 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

7 0.80890101 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

8 0.80829078 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

9 0.80816919 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

10 0.80756986 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

11 0.80441058 140 acl-2012-Machine Translation without Words through Substring Alignment

12 0.8040061 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization

13 0.80284286 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

14 0.8023954 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

15 0.80224401 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval

16 0.80159217 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

17 0.79975069 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

18 0.79931766 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

19 0.79876679 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

20 0.79854643 136 acl-2012-Learning to Translate with Multiple Objectives