acl acl2010 acl2010-195 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. [sent-3, score-0.333]
2 Instead, the phylogenetic prior couples languages at a parameter level. [sent-5, score-0.795]
3 Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. [sent-6, score-0.46]
4 Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. [sent-7, score-0.353]
5 1 Introduction Learning multiple languages together should be easier than learning them separately. [sent-9, score-0.208]
6 (2007) in the context of phonology) show that extending beyond two languages can provide increasing benefit. [sent-15, score-0.208]
7 However, multitexts are only available for limited languages and domains. [sent-16, score-0.208]
8 Rather, we capture multilingual constraints at a parameter level, using a phylogeny-structured prior to tie together the various individual languages’ learning problems. [sent-19, score-0.274]
9 Our joint, hierarchical prior couples model parameters for different languages in a way that respects knowledge about how the languages evolved. [sent-20, score-0.771]
10 In their work, structurally constrained covariance in a logistic normal prior is used to couple parameters between the two languages. [sent-24, score-0.211]
11 Our work, though also different in technical approach, differs most centrally in the extension to multiple languages and the use of a phylogeny. [sent-25, score-0.208]
12 (2007) considers an entirely different problem, phonological reconstruction, but shares with this work both the use of a phylogenetic structure as well as the use of log-linear parameterization of local model components. [sent-27, score-0.499]
13 phonology) and the variables governed by the phylogeny: in our model it is the grammar parameters that drift (in the prior) rather than individual word forms (in the likeli- hood model). [sent-29, score-0.192]
14 Specifically, we consider dependency induction in the DMV model of Klein and Manning (2004). [sent-30, score-0.196]
15 Our focus is not the DMV model itself, which is well-studied, but rather the prior which couples the various languages’ parameters. [sent-32, score-0.267]
16 While some choices of prior structure can greatly complicate inference (Cohen and Smith, 2009), we choose a hierarchical Gaussian form for the drift term, which allows the gradient of the observed data likelihood to be easily computed using standard dynamic programming methods. [sent-33, score-0.355]
17 In our experiments, joint multilingual learning substantially outperforms independent monolingual learning. [sent-34, score-0.351]
18 Using a limited phylogeny that 1288 ProceedingUsp opfs thaela 4, 8Stwhe Adnennu,a 1l1- M16ee Jtiunlgy o 2f0 t1h0e. [sent-35, score-0.377]
19 c ss2o0c1ia0ti Aosnso focria Ctioonm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s288–1297, only couples languages within linguistic families reduces error by 5. [sent-37, score-0.544]
20 Using a flat, global phylogeny gives a greater reduction, almost 10%. [sent-39, score-0.465]
21 Finally, a more articulated phylogeny that captures both inter- and intrafamily effects gives an even larger average relative error reduction of 21. [sent-40, score-0.575]
22 The prior is what couples the θℓ parameter vectors across languages; it is the focus of this work. [sent-46, score-0.308]
23 Each edge of the tree specifies a directed dependency from a head token to a de- pendent, or argument token. [sent-50, score-0.197]
24 Thus, the basic observed “word” types are Global IndoEuropean SinoTibetan Figure 1: An example of a linguistically-plausible phylogenetic tree over the languages in our training data. [sent-59, score-0.578]
25 1 Log-Linear Parameterization The DMV’s local conditional distributions were originally given as simple multinomial distributions with one parameter per outcome. [sent-64, score-0.174]
26 Consider a phylogeny like the one shown in Figure 1, where each modern language ℓ in L is a leaf. [sent-72, score-0.377]
27 However, in the simple case of our diagonal covariance Gaussians, the gradient of the observed data likelihood can be computed directly using the DMV’s expected counts and maximum-likelihood estimation can be accomplished by applying standard gradient optimization methods. [sent-82, score-0.409]
28 Second, while the choice of diagonal covariance is efficient, it causes components of θ that correspond to features occurring in only one language to be marginally independent of the parameters of all other languages. [sent-83, score-0.197]
29 3 Projected Features With diagonal covariance in the Gaussian drift terms, each parameter evolves independently of the others. [sent-87, score-0.203]
30 Therefore, our prior will be most informative when features activate in multiple languages. [sent-88, score-0.176]
31 This feature will now occur in multiple languages and will contribute to each of those languages’ attachment models. [sent-144, score-0.269]
32 The coarse features are defined via a projection π from language-specific part-of-speech labels to coarser, cross-lingual word classes, and hence we refer to them as SHARED features. [sent-146, score-0.197]
33 Again, only the coarse features occur in multiple languages, so all phylogenetic influence is through those. [sent-152, score-0.469]
34 Nonetheless, the effect of the phylogeny turns out to be quite strong. [sent-153, score-0.377]
35 4 Learning We now turn to learning with the phylogenetic prior. [sent-155, score-0.32]
36 Since the prior couples parameters across languages, this learning problem requires parameters for all languages be estimated jointly. [sent-156, score-0.589]
37 The form of log P(Θ) immediately shows how parameters are penalized for being different across languages, more so for languages that are near each other in the phylogeny. [sent-160, score-0.353]
38 This requires computation of the gradient of the observed data likelihood log P(sℓ |θℓ) which is given by: ∇logP(sℓ|θℓ) = Etℓ|sℓ ? [sent-165, score-0.231]
39 fCONTINUE(c,h,dir,adj) − The expected gradient of the log joint likelihood of sentences and parses is equal to the gradient of the log marginal likelihood of just sentences, or the observed data likelihood (Salakhutdinov et al. [sent-168, score-0.55]
40 ea,h,dir (sℓ; θℓ) is the expected count of the number of times head h is attached to a in direction dir given the observed sentences sℓ and DMV parameters θℓ. [sent-170, score-0.307]
41 The computation time is dominated by the computation of each sentence’s posterior expected counts, which are independent given the parameters, so the time required per iteration is essentially the same whether training all languages jointly or independently. [sent-173, score-0.208]
42 For all languages but English and Chinese, we used corpora from the 2006 CoNLL-X Shared Task dependency parsing data set (Buchholz and Marsi, 2006). [sent-177, score-0.305]
43 We used the Bikel Chinese head finder (Bikel and Chiang, 2000) and the Collins English head finder (Collins, 1999) to transform the gold constituency parses into gold dependency parses. [sent-185, score-0.372]
44 2 Models Compared We evaluated three phylogenetic priors, each with a different phylogenetic structure. [sent-201, score-0.64]
45 We compare with two monolingual baselines, as well as an allpairs multilingual model that does not have a phylogenetic interpretation, but which provides very similar capacity for parameter coupling. [sent-202, score-0.921]
46 1 Phylogenetic Models The first phylogenetic model uses the shallow phylogeny shown in Figure 2(a), in which only languages within the same family have a shared parent node. [sent-205, score-1.097]
47 Under this prior, the learning task decouples into independent subtasks for each family, but no regularities across families can be captured. [sent-207, score-0.194]
48 Figure 2(b) shows another simple configuration, wherein all languages share a common parent node in the prior, meaning that global regularities that are consistent across all languages can be captured. [sent-209, score-0.648]
49 While the global model couples the parameters for all eight languages, it does so without sensitivity to the articulated structure of their descent. [sent-211, score-0.39]
50 Figure 2(c) shows a more nuanced prior structure, LINGUISTIC, which groups languages first by family and then under a global node. [sent-212, score-0.528]
51 This structure allows global regularities as well as regularities within families to be learned. [sent-213, score-0.294]
52 4 in terms of multiple sets of weights, one at each node in the phylogeny (the hierarchical parameterization, described in Section 2. [sent-219, score-0.41]
53 3 for each node in the phylogeny, each of which is active only on the languages that are its descendants. [sent-222, score-0.208]
54 In the flat parameterization, it seems equally reasonable to simply tie all pairs of languages by adding duplicate sets of features for each pair. [sent-225, score-0.297]
55 This gives the ALLPAIRS setting, which we also compare to the tree-structured phylogenetic models above. [sent-226, score-0.32]
56 3 Baselines To evaluate the impact of multilingual constraint, we compared against two monolingual baselines. [sent-228, score-0.317]
57 To facilitate comparison to past work, we used no prior for this monolingual model. [sent-230, score-0.246]
58 This model includes a simple isotropic Gaussian prior on pa1292 Table 2: Directed dependency accuracy of monolingual and multilingual models, and relative error reduction over the monolin- gual baseline with SHARED features macro-averaged over languages. [sent-232, score-0.744]
59 Additionally, more nuanced phylogenetic structures out- performed cruder ones. [sent-234, score-0.383]
60 4 Evaluation For each setting, we evaluated the directed dependency accuracy of the minimum Bayes risk (MBR) dependency parses produced by our models under maximum (posterior) likelihood parameter estimates. [sent-238, score-0.407]
61 In addition, for multilingual models, we computed the relative error reduction over the strong monolingual baseline, macro-averaged over languages. [sent-240, score-0.452]
62 5 Training Our implementation used the flat parameterization described in Section 3. [sent-242, score-0.198]
63 In practice, optimizing with the hierarchical parameterization also seemed to underperform. [sent-246, score-0.214]
64 (2010) suggest that directly optimizing the observed data likelihood may offer improvements over the more standard expectation-maximization (EM) optimization procedure for models such as the DMV, especially when the model is parameterized using features. [sent-251, score-0.192]
65 In all cases, methods which coupled the languages in some way outperformed the independent baselines that considered each language independently. [sent-264, score-0.25]
66 1 Bilingual Models The weakest of the coupled models was FAMILIES, which had an average relative error reduction of 5. [sent-266, score-0.177]
67 The limited improvement of the family-level prior compared to other phylogenies suggests that there are important multilingual interactions that do not happen within families. [sent-269, score-0.347]
68 When pairs of languages were trained together in isolation, the largest benefit was seen for languages with small training corpora, not necessarily languages with common ancestry. [sent-272, score-0.624]
69 In our setup, Spanish, Slovene, and Chinese have substantially smaller training corpora than the rest of the languages considered. [sent-273, score-0.242]
70 2 Multilingual Models Models that coupled multiple languages performed better in general than models that only considered pairs of languages. [sent-276, score-0.25]
71 The GLOBAL model, which couples all languages, if crudely, yielded an average relative error reduction of 9. [sent-277, score-0.282]
72 This improvement comes as the number of languages able to exert mutual constraint increases. [sent-279, score-0.242]
73 For example, Dutch and Danish had large improvements, over and above any improvements these two languages gained when trained with a single additional language. [sent-280, score-0.252]
74 Indeed, the LINGUISTIC model is the only model we evaluated that gave improvements for all the languages we considered. [sent-282, score-0.326]
75 It is reasonable to worry that the improvements from these multilingual models might be partially due to having more total training data in the multilingual setting. [sent-283, score-0.352]
76 The GLOBAL phylogeny captures only “universals,” while FAMILIES captures only correlations between languages that are known to be similar. [sent-288, score-0.585]
77 ALLPAIRS The phylogeny is capable of allowing appropriate influence to pass between languages at multiple levels. [sent-293, score-0.585]
78 However, the rich phylogeny of the LINGUISTIC model, which incorporates linguistic constraints, outperformed the freer ALLPAIRS model. [sent-297, score-0.43]
79 We found that the improved English analyses produced by the LINGUISTIC model were more consistent with this model’s analyses of other languages. [sent-299, score-0.269]
80 5 Comparison to Related Work The likelihood models for both the strong monolingual baseline and the various multilingual mod1294 els are the same, both expanding upon the standard DMV by adding coarse SHARED features. [sent-303, score-0.494]
81 These coarse features, even in a monolingual setting, improved performance slightly over the weak baseline, perhaps by encouraging consistent treatment of the different finer-grained variants of partsof-speech (Berg-Kirkpatrick et al. [sent-304, score-0.329]
82 When Cohen and Smith compared their best shared logistic-normal bilingual mod- els to monolingual counter-parts for the languages they investigate (Chinese and English), they reported a relative error reduction of 5. [sent-308, score-0.575]
83 5 Analysis By examining the proposed parses we found that the LINGUISTIC and ALLPAIRS models produced analyses that were more consistent across languages than those of the other models. [sent-320, score-0.467]
84 We also observed that the most common errors can be summarized succinctly by looking at attachment counts between coarse parts-of-speech. [sent-321, score-0.274]
85 For example, the monolingual learners are divided as to whether determiners or nouns head noun phrases. [sent-329, score-0.225]
86 Dutch has the problem that verbs modify pronouns more often than pronouns modify verbs, and pronouns are predicted to head sentences as often as verbs are. [sent-331, score-0.26]
87 More subtly, the monolingual analyses are inconsistent in the way they head prepositional phrases. [sent-333, score-0.354]
88 In the monolingual Portuguese hypotheses, prepositions modify nouns more often than nouns modify prepositions. [sent-334, score-0.307]
89 Under the LINGUISTIC model, Dutch now attaches pronouns to verbs, and thus looks more like English, its sister in the phylogenetic tree. [sent-340, score-0.354]
90 The LINGUISTIC model has also chosen consistent analyses for prepositional phrases and noun phrases, calling prepositions and nouns the heads of each, respectively. [sent-341, score-0.264]
91 Figure 3(b) shows dependency counts for the GLOBAL multilingual model. [sent-343, score-0.298]
92 Unsurprisingly, the analyses proposed under global constraint appear somewhat more consistent than those proposed under no multi-lingual constraint (now three lan1295 Figure 3: Dependency counts in proposed parses. [sent-344, score-0.344]
93 Analyses proposed by monolingual baseline show significant inconsistencies across languages. [sent-350, score-0.204]
94 Analyses proposed by LINGUISTIC model are more consistent across languages than those proposed by either the monolingual baseline or the GLOBAL model. [sent-351, score-0.499]
95 Finally, Figure 3(d) shows dependency counts in the hand-labeled dependency parses. [sent-353, score-0.241]
96 It appears that even the very consistent LINGUISTIC parses do not capture the non-determinism of prepositional phrase attachment to both nouns and verbs. [sent-354, score-0.226]
97 6 Conclusion Even without translated texts, multilingual constraints expressed in the form of a phylogenetic prior on parameters can give substantial gains in grammar induction accuracy over treating languages in isolation. [sent-355, score-0.925]
98 Computational Natural Language Learning-X shared task on multilingual dependency parsing. [sent-386, score-0.32]
99 Two languages are better than one (for syntactic parsing). [sent-392, score-0.208]
100 Adding more languages improves unsupervised multilingual part-of-speech tagging: A Bayesian non-parametric approach. [sent-498, score-0.362]
wordName wordTfidf (topN-words)
[('phylogeny', 0.377), ('phylogenetic', 0.32), ('dmv', 0.32), ('allpairs', 0.21), ('languages', 0.208), ('monolingual', 0.163), ('multilingual', 0.154), ('couples', 0.147), ('parameterization', 0.142), ('dir', 0.14), ('slovene', 0.128), ('coarse', 0.116), ('dutch', 0.113), ('phylogenies', 0.11), ('spanish', 0.101), ('families', 0.1), ('dependency', 0.097), ('swedish', 0.093), ('chinese', 0.093), ('danish', 0.092), ('analyses', 0.091), ('cohen', 0.09), ('portuguese', 0.089), ('global', 0.088), ('family', 0.086), ('prior', 0.083), ('par', 0.082), ('attach', 0.079), ('smith', 0.078), ('parses', 0.077), ('covariance', 0.073), ('gradient', 0.071), ('shared', 0.069), ('reduction', 0.066), ('klein', 0.063), ('articulated', 0.063), ('nuanced', 0.063), ('head', 0.062), ('induction', 0.062), ('likelihood', 0.061), ('attachment', 0.061), ('activate', 0.06), ('gaussian', 0.059), ('drift', 0.057), ('flat', 0.056), ('parameters', 0.055), ('linguistic', 0.053), ('regularities', 0.053), ('continue', 0.052), ('snyder', 0.051), ('ancestral', 0.05), ('consistent', 0.05), ('observed', 0.05), ('adj', 0.049), ('log', 0.049), ('modify', 0.048), ('distributions', 0.048), ('manning', 0.048), ('prepositions', 0.048), ('projection', 0.048), ('counts', 0.047), ('logp', 0.045), ('tagset', 0.045), ('improvements', 0.044), ('grammar', 0.043), ('bikel', 0.043), ('coupled', 0.042), ('isotropic', 0.042), ('ome', 0.042), ('pattach', 0.042), ('pcontinue', 0.042), ('sinitic', 0.042), ('tfattach', 0.042), ('tfcontinue', 0.042), ('multinomial', 0.041), ('finkel', 0.041), ('across', 0.041), ('seemed', 0.039), ('headed', 0.039), ('prepositional', 0.038), ('conjunction', 0.038), ('directed', 0.038), ('parameter', 0.037), ('model', 0.037), ('english', 0.037), ('slavic', 0.037), ('naseem', 0.037), ('salakhutdinov', 0.037), ('finder', 0.037), ('error', 0.036), ('diagonal', 0.036), ('phonology', 0.036), ('pronouns', 0.034), ('constraint', 0.034), ('substantially', 0.034), ('burkett', 0.034), ('conjunctions', 0.033), ('relative', 0.033), ('features', 0.033), ('hierarchical', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 195 acl-2010-Phylogenetic Grammar Induction
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
2 0.21688302 214 acl-2010-Sparsity in Dependency Grammar Induction
Author: Jennifer Gillenwater ; Kuzman Ganchev ; Joao Graca ; Fernando Pereira ; Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In ex- periments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
3 0.16566329 162 acl-2010-Learning Common Grammar from Multilingual Corpus
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
4 0.12304989 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
Author: Valentin I. Spitkovsky ; Daniel Jurafsky ; Hiyan Alshawi
Abstract: We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning’s Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to 50.4%, beating previous state-of-the- art by more than 5%. Web-scale experiments show that the DMV, perhaps because it is unlexicalized, does not benefit from orders of magnitude more annotated but noisier data. Our model, trained on a single blog, generalizes to 53.3% accuracy out-of-domain, against the Brown corpus nearly 10% higher than the previous published best. The fact that web mark-up strongly correlates with syntactic structure may have broad applicability in NLP.
5 0.10614375 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
7 0.094449081 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
8 0.088024363 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
9 0.08087083 116 acl-2010-Finding Cognate Groups Using Phylogenies
10 0.077487193 79 acl-2010-Cross-Lingual Latent Topic Extraction
11 0.076767907 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
12 0.076109849 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models
13 0.075891316 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
14 0.072645091 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
15 0.071454823 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
16 0.068223149 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
17 0.067276306 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging
18 0.066922732 217 acl-2010-String Extension Learning
19 0.065417133 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
20 0.06482394 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
topicId topicWeight
[(0, -0.218), (1, -0.034), (2, 0.046), (3, 0.015), (4, -0.03), (5, -0.045), (6, 0.087), (7, -0.031), (8, 0.105), (9, 0.078), (10, -0.094), (11, 0.028), (12, 0.068), (13, -0.036), (14, 0.059), (15, -0.155), (16, -0.024), (17, -0.012), (18, 0.095), (19, -0.089), (20, -0.142), (21, -0.085), (22, 0.027), (23, -0.049), (24, -0.02), (25, 0.054), (26, 0.015), (27, 0.045), (28, 0.02), (29, -0.033), (30, -0.079), (31, 0.027), (32, 0.014), (33, -0.096), (34, 0.178), (35, -0.092), (36, -0.01), (37, -0.035), (38, -0.097), (39, -0.201), (40, -0.008), (41, -0.132), (42, 0.022), (43, 0.153), (44, -0.073), (45, 0.022), (46, 0.01), (47, -0.044), (48, -0.011), (49, -0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.94140369 195 acl-2010-Phylogenetic Grammar Induction
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
2 0.81071562 214 acl-2010-Sparsity in Dependency Grammar Induction
Author: Jennifer Gillenwater ; Kuzman Ganchev ; Joao Graca ; Fernando Pereira ; Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In ex- periments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
3 0.69838578 162 acl-2010-Learning Common Grammar from Multilingual Corpus
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
4 0.49812451 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
5 0.49458724 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
Author: Jungi Kim ; Jin-Ji Li ; Jong-Hyeok Lee
Abstract: Subjectivity analysis is a rapidly growing field of study. Along with its applications to various NLP tasks, much work have put efforts into multilingual subjectivity learning from existing resources. Multilingual subjectivity analysis requires language-independent criteria for comparable outcomes across languages. This paper proposes to measure the multilanguage-comparability of subjectivity analysis tools, and provides meaningful comparisons of multilingual subjectivity analysis from various points of view.
6 0.49325404 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
7 0.48979518 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
8 0.48316798 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
9 0.48003429 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
10 0.46046492 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
11 0.45759869 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging
12 0.4431656 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
13 0.43980095 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
14 0.4362956 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
15 0.43595397 79 acl-2010-Cross-Lingual Latent Topic Extraction
16 0.42995149 116 acl-2010-Finding Cognate Groups Using Phylogenies
17 0.41294381 130 acl-2010-Hard Constraints for Grammatical Function Labelling
18 0.40714288 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
19 0.40280771 61 acl-2010-Combining Data and Mathematical Models of Language Change
20 0.39904401 99 acl-2010-Efficient Third-Order Dependency Parsers
topicId topicWeight
[(14, 0.031), (25, 0.059), (26, 0.232), (39, 0.012), (42, 0.026), (44, 0.013), (59, 0.1), (71, 0.018), (73, 0.04), (76, 0.027), (78, 0.045), (83, 0.12), (84, 0.046), (98, 0.134)]
simIndex simValue paperId paperTitle
same-paper 1 0.8165217 195 acl-2010-Phylogenetic Grammar Induction
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
2 0.69642925 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
3 0.68766367 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
4 0.6871016 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling
Author: Nobuhiro Kaji ; Yasuhiro Fujiwara ; Naoki Yoshinaga ; Masaru Kitsuregawa
Abstract: The Viterbi algorithm is the conventional decoding algorithm most widely adopted for sequence labeling. Viterbi decoding is, however, prohibitively slow when the label set is large, because its time complexity is quadratic in the number of labels. This paper proposes an exact decoding algorithm that overcomes this problem. A novel property of our algorithm is that it efficiently reduces the labels to be decoded, while still allowing us to check the optimality of the solution. Experiments on three tasks (POS tagging, joint POS tagging and chunking, and supertagging) show that the new algorithm is several orders of magnitude faster than the basic Viterbi and a state-of-the-art algo- rithm, CARPEDIEM (Esposito and Radicioni, 2009).
5 0.68461108 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
Author: Yun-Cheng Ju ; Tim Paek
Abstract: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search. 1
6 0.68404812 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
7 0.6831187 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
8 0.68037051 158 acl-2010-Latent Variable Models of Selectional Preference
9 0.67861277 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
10 0.67828071 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
11 0.67804635 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
12 0.67793345 71 acl-2010-Convolution Kernel over Packed Parse Forest
13 0.67738104 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
14 0.67626703 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
15 0.67428505 39 acl-2010-Automatic Generation of Story Highlights
16 0.67340016 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
17 0.67232609 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
18 0.67072797 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
19 0.67057502 13 acl-2010-A Rational Model of Eye Movement Control in Reading
20 0.67056048 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation