acl acl2010 acl2010-162 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. [sent-5, score-0.555]
2 For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. [sent-6, score-0.891]
3 We also develop a variational method for efficient inference. [sent-7, score-0.208]
4 Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method. [sent-8, score-0.407]
5 1 Introduction Languages share certain common properties (Pinker, 1994). [sent-9, score-0.089]
6 For example, the word order in most European languages is subject-verb-object (SVO), and some words with similar forms are used with similar meanings in different languages. [sent-10, score-0.122]
7 The reasons for these common properties can be attributed to: 1) a common ancestor language, 2) borrowing from nearby languages, and 3) the innate abilities of humans (Chomsky, 1965). [sent-11, score-0.178]
8 We assume hidden commonalities in syntax across languages, and try to extract a common grammar from non-parallel multilingual corpora. [sent-12, score-0.688]
9 For this purpose, we propose a generative model for multilingual grammars that is learned in an unsupervised fashion. [sent-13, score-0.322]
10 There are some computational models for capturing commonalities at the phoneme and word level (Oakes, 2000; BouchardC ˆot ´e et al. [sent-14, score-0.182]
11 , 2008), but, as far as we know, no attempt has been made to extract commonalities in syntax level from non-parallel and non-annotated multilingual corpora. [sent-15, score-0.38]
12 In our scenario, we use probabilistic contextfree grammars (PCFGs) as our monolingual grammar model. [sent-16, score-0.365]
13 We assume that a PCFG for each language is generated from a general model that are common across languages, and each sentence in multilingual corpora is generated from the language dependent PCFG. [sent-17, score-0.362]
14 The inference of the general model as well as the multilingual PCFGs can be performed by using a variational method for efficiency. [sent-18, score-0.441]
15 Our approach is based on a Bayesian multitask learning framework (Yu et al. [sent-19, score-0.056]
16 Hierarchical Bayesian modeling provides a natural way of obtaining a joint regularization for individual models by assuming that the model parameters are drawn from a common prior distribution (Yu et al. [sent-21, score-0.208]
17 2 Related work The unsupervised grammar induction task has been extensively studied (Carroll and Charniak, 1992; Stolcke and Omohundro, 1994; Klein and Manning, 2002; Klein and Manning, 2004; Liang et al. [sent-23, score-0.228]
18 Recently, models have been pro- posed that outperform PCFG in the grammar induction task (Klein and Manning, 2002; Klein and Manning, 2004). [sent-25, score-0.228]
19 We used PCFG as a first step for capturing commonalities in syntax across languages because of its simplicity. [sent-26, score-0.412]
20 The proposed framework can be used for probabilistic grammar models other than PCFG. [sent-27, score-0.225]
21 Grammar induction using bilingual parallel corpora has been studied mainly in machine translation research (Wu, 1997; Melamed, 2003; Eisner, 2003; Chiang, 2005; Blunsom et al. [sent-28, score-0.093]
22 These methods require sentencealigned parallel data, which can be costly to obtain and difficult to scale to many languages. [sent-31, score-0.035]
23 To our knowledge, the only grammar induction work on non-parallel corpora is (Cohen and Smith, 2009), but their method does not model a common grammar, and requires prior information such as part-of-speech tags. [sent-36, score-0.366]
24 In contrast, our method does not require any such prior information. [sent-37, score-0.049]
25 The task is to learn multilingual PCFGs G = {Gl}l∈L and a common grammar that geneGrat =es tGhe}sel∈ PCFGs. [sent-40, score-0.44]
26 Here, Gl = (K, Wl, Φl) represents a PCFG of language l, where K is a set of nonterminals, Wl is a set of terminals, and Φl is a set of rule probabilities. [sent-41, score-0.08]
27 Note that a set of nonterminals K is shared among languages, but a set of terminals Wl and rule probabilities Φl are specific to the language. [sent-42, score-0.308]
28 For simplicity, we consider Chomsky normal form grammars, which have two types of rules: emissions rewrite a non- terminal as a terminal A → w, and binary prodteurcmtiionnasl aresw ari tteer a nnoanlt Aerm →ina wl as ntwdo b innoanryte prmroi-nals A → BC, where A, B, C ∈ K and w ∈ Wl. [sent-43, score-0.211]
29 In the proposed m∑odel, multinom∑ial parameters θlA aen pdr φlA are generated tfrinomoDirichlet distributions that are common across languages: θlA ∼ Dir(αθA) and φlA ∼ since we assume Dthirat( languages shar∼e a common syntax structure. [sent-46, score-0.478]
30 αθA and represent the parameters of a common grammar. [sent-47, score-0.159]
31 We use the Dirichlet prior because it is the conjugate prior for the multinomial distribution. [sent-48, score-0.137]
32 In summary, the proposed Dir(αφA), αφA model assumes the following generative process for a multilingual corpus, 1. [sent-49, score-0.224]
33 For each nonterminal A ∈ K: Figure 1: Graphical model. [sent-50, score-0.112]
34 αDrAθatw∼ co Gmammo(na rθu,lbeθ) type parameters (b) For each nonterminal pair (B, C): i. [sent-52, score-0.182]
35 Draw common production parameters αAφBC ∼ Gam(aφ , bφ) 2. [sent-53, score-0.242]
36 For each language l ∈ L: (a) For each nonterminal A ∈ K: i. [sent-54, score-0.112]
37 Draw∼ emission parameters ψlA ∼ Dir(αψ) (b) For each node iin the parse tree: i. [sent-57, score-0.25]
38 Generate children nonterminals (zlL(i) zlR(i)) ∼ Mult(φlzi), , where L(i) and R(i) represent the left and right children of node i. [sent-62, score-0.135]
39 2 Inference The inference of the proposed model can be efficiently computed using a variational Bayesian × method. [sent-65, score-0.26]
40 We extend the variational method to the monolingual PCFG learning of Kurihara and Sato (2004) for multilingual corpora. [sent-66, score-0.389]
41 The goal is to estimate posterior p(Z, Φ, α |X), where Z i ss a seestti moaft parse trees, ΦZ = {Φl}l∈L ies a set of language dependent parameters, Φl = {θlA, ψlA}A∈K, = {αθA, αAφ}K φlA, and α {isθ a set of commAon∈ parameters. [sent-67, score-0.195]
42 =In { αthe vari}atAio∈nal method, posterior p(Z, Φ, α |X) is approximated by a otrdac, ptaobsltee rviaorria pt(ioZn,aΦl d,αist|rXib)ut iiosn a q(Z, Φ, α). [sent-68, score-0.119]
43 185 We use the following variational distribution, q(Z,Φ,α) = ∏q(αθA)q(αφA)∏q(zld) ∏A ∏l,d ∏ q(θlA)q(φlA)q(ψlA), (1) ∏l,A where we assume that hyperparameters q(αθA) and are degenerated, or q(α) = δα∗ (α), and infer them by point estimation instead of distribution estimation. [sent-69, score-0.264]
44 We find an approximate posterior distribution that minimizes the Kullback-Leibler q(αφA) × divergence from the true posterior. [sent-70, score-0.183]
45 The variational distribution of the parse tree of the dth sentence in language lis obtained as follows, q(zld) ∝ ∏(πlθA1πlφABC)C(A→BC;zld,l,d) ∏(πlθA0πlψAw)C(A→w;zld,l,d), (2) where C(r; z, l, d) is the count of rule r that occurs in the dth sentence of language l with parse tree z. [sent-71, score-0.57]
46 The multinomial weights are calculated as follows, πlθAt = exp(Eq(θlA) [logθlAt] ) , ,. [sent-72, score-0.039]
47 The common rule type parameter αAθt that minimizes the KL divergence between the true posterior and the approximate posterior can be obtained by using the fixed-point iteration method described in (Minka, 2000). [sent-74, score-0.432]
48 The update rule is as follows, αAθ(tnew)←aθb−θ+1+∑αlAθ(tΨL((∑Ψ(t0∑γlθt0Atα0Aθ)t −0) Ψ−Ψ(γ(lθAαtAθ)t))), )(9) where L is the nu∑mbe(r of∑ ∑languages, and Ψ(x)) = ∂ lo∂gΓx(x) is the digamma function. [sent-75, score-0.08]
49 Similarly, the αAφBC can be up- αφA(BneCw)←aφ−bφ 1+ +∑ αAφlBJCl0ALBJCABC, (10) common production parameter dated as follows, = Ψ(∑B0,C0 αAφB0C0) Ψ(αAφBC), and Jl0ABC = Ψ(∑B∑0,C0 γlφAB0C0) − Ψ(γlφABC). [sent-76, score-0.172]
50 where JABC − Since factored ∑variational distr)ib −u Ψtio(γns depend on each other, an optimal approximated posterior can be obtained by updating parameters by (2) (10) alternatively until convergence. [sent-77, score-0.248]
51 The updating of language dependent distributions by (2) (8) is also described in (Kurihara and Sato, 2004; Liang et al. [sent-78, score-0.102]
52 , 2007) while the updating of common grammar parameters by (9) and (10) is new. [sent-79, score-0.388]
53 The inference can be carried out efficiently using the inside-outside algorithm based on dynamic programming (Lari and Young, 1990). [sent-80, score-0.052]
54 After the inference, the probability of a common grammar rule A → BC is calculated by Awh →ere φmˆAo→nB gCra = ˆθa1rφˆ AruBleC, A Bˆθ1C = α cθ1a/l(cαulθ0a + dα bθ1y) φˆABC αAφBC/∑B0,C0 αAφB0C0 and = represent the mean values of θl0 an∑d φlABC, respectively. [sent-81, score-0.339]
55 We set the number of nonterminals at |K| = 20, and omitted sentences with more than |teKn |w =ord 20s ,f aorn tractability. [sent-84, score-0.17]
56 Figure 2 shows the most probable terminals of emission for each language and nonterminal with a high probability of selecting the emission rule. [sent-87, score-0.491]
57 ) 13: determiner (DT) Figure 2: Probable terminals of emission for each language and nonterminal. [sent-89, score-0.236]
58 0067 Figure 3: Examples of inferred common grammar rules in eleven languages, and their probabilities. [sent-104, score-0.363]
59 We named nonterminals by using grammatical categories after the inference. [sent-106, score-0.175]
60 We can see that words in the same grammatical category clustered across languages as well as within a language. [sent-107, score-0.211]
61 Fig- ure 3 shows examples of inferred common grammar rules with high probabilities. [sent-108, score-0.259]
62 Grammar rules that seem to be common to European languages have been extracted. [sent-109, score-0.211]
63 5 Discussion We have proposed a Bayesian hierarchical PCFG model for capturing commonalities at the syntax level for non-parallel multilingual corpora. [sent-110, score-0.457]
64 For example, we can infer the number of nonterminals with a nonparametric Bayesian model (Liang et al. [sent-115, score-0.191]
65 , 2007), infer the model more robustly based on a Markov chain Monte Carlo inference (Johnson et al. [sent-116, score-0.108]
66 , 2007), and use probabilistic grammar models other than PCFGs. [sent-117, score-0.225]
67 In our model, all the multilingual grammars are generated from a gen- eral model. [sent-118, score-0.279]
68 That model may help to infer an evolutionary tree of languages in terms of grammatical structure without the etymological information that is generally and Atkinson, 2003). [sent-120, score-0.258]
69 used (Gray Finally, the proposed ap- proach may help to indicate the presence of a universal grammar (Chomsky, 1965), or to find it. [sent-121, score-0.17]
70 Two experiments on learning probabilistic dependency grammars from corpora. [sent-142, score-0.153]
71 Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. [sent-158, score-0.17]
72 Languagetree divergence times support the Anatolian theory of Indo-European November. [sent-175, score-0.055]
73 Bayesian inference for PCFGs via Markov chain Monte Carlo. [sent-179, score-0.052]
74 Corpusbased induction of syntactic structure: models of dependency and constituency. [sent-198, score-0.058]
75 An applica- tion of the variational Bayesian approach to probabilistic context-free grammars. [sent-207, score-0.263]
76 The estimation of stochastic context-free grammars using the inside-outside algorithm. [sent-214, score-0.136]
77 Computer estimation of vocabulary in a protolanguage from word lists in four daughter languages. [sent-236, score-0.035]
78 Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. [sent-255, score-0.133]
wordName wordTfidf (topN-words)
[('la', 0.3), ('labc', 0.236), ('zld', 0.236), ('variational', 0.208), ('bc', 0.197), ('multilingual', 0.181), ('grammar', 0.17), ('dir', 0.167), ('pcfg', 0.153), ('abc', 0.148), ('emission', 0.143), ('commonalities', 0.14), ('nonterminals', 0.135), ('pcfgs', 0.128), ('languages', 0.122), ('lzi', 0.118), ('nonterminal', 0.112), ('wl', 0.109), ('law', 0.109), ('bayesian', 0.109), ('eleven', 0.104), ('gl', 0.104), ('kurihara', 0.104), ('grammars', 0.098), ('mult', 0.095), ('lat', 0.095), ('terminals', 0.093), ('common', 0.089), ('eq', 0.084), ('production', 0.083), ('posterior', 0.08), ('rule', 0.08), ('morristown', 0.072), ('parameters', 0.07), ('chomsky', 0.069), ('tli', 0.069), ('liang', 0.067), ('klein', 0.065), ('dth', 0.064), ('sato', 0.063), ('nj', 0.063), ('aw', 0.062), ('np', 0.062), ('updating', 0.059), ('lari', 0.059), ('syntax', 0.059), ('induction', 0.058), ('association', 0.058), ('multitask', 0.056), ('exp', 0.056), ('yu', 0.056), ('infer', 0.056), ('probabilistic', 0.055), ('divergence', 0.055), ('xl', 0.054), ('inference', 0.052), ('terminal', 0.051), ('prior', 0.049), ('across', 0.049), ('gray', 0.048), ('minimizes', 0.048), ('snyder', 0.048), ('dirichlet', 0.048), ('europarl', 0.046), ('blunsom', 0.046), ('draw', 0.045), ('monte', 0.045), ('manning', 0.045), ('vp', 0.045), ('generative', 0.043), ('dependent', 0.043), ('capturing', 0.042), ('koller', 0.042), ('contextfree', 0.042), ('percy', 0.041), ('grammatical', 0.04), ('thomas', 0.04), ('carroll', 0.04), ('tree', 0.04), ('dan', 0.039), ('multinomial', 0.039), ('daum', 0.039), ('approximated', 0.039), ('stochastic', 0.038), ('parse', 0.037), ('stolcke', 0.036), ('hierarchical', 0.035), ('parallel', 0.035), ('synchronous', 0.035), ('daichi', 0.035), ('mochihashi', 0.035), ('emit', 0.035), ('rul', 0.035), ('zll', 0.035), ('meters', 0.035), ('minka', 0.035), ('protolanguage', 0.035), ('tresp', 0.035), ('moaft', 0.035), ('aorn', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 162 acl-2010-Learning Common Grammar from Multilingual Corpus
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
Author: Mark Johnson
Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
3 0.16566329 195 acl-2010-Phylogenetic Grammar Induction
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
4 0.14269966 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
5 0.13134341 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
6 0.11972844 169 acl-2010-Learning to Translate with Source and Target Syntax
7 0.11536831 214 acl-2010-Sparsity in Dependency Grammar Induction
8 0.11431765 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
10 0.099846005 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
11 0.094409764 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
12 0.082436264 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
13 0.081112474 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities
14 0.079930387 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
15 0.076308519 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
16 0.07602261 217 acl-2010-String Extension Learning
17 0.074586175 133 acl-2010-Hierarchical Search for Word Alignment
18 0.073524855 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
19 0.07239607 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
20 0.072213501 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
topicId topicWeight
[(0, -0.215), (1, -0.071), (2, 0.035), (3, -0.032), (4, -0.072), (5, -0.069), (6, 0.168), (7, -0.045), (8, 0.13), (9, -0.12), (10, -0.02), (11, -0.052), (12, 0.128), (13, -0.065), (14, 0.01), (15, -0.122), (16, -0.004), (17, -0.069), (18, 0.037), (19, -0.062), (20, -0.191), (21, -0.131), (22, -0.011), (23, -0.044), (24, -0.062), (25, -0.062), (26, 0.006), (27, 0.094), (28, 0.064), (29, -0.032), (30, 0.038), (31, 0.027), (32, -0.065), (33, 0.065), (34, 0.178), (35, 0.042), (36, 0.005), (37, 0.058), (38, -0.02), (39, -0.115), (40, -0.087), (41, -0.033), (42, -0.017), (43, 0.081), (44, -0.05), (45, 0.099), (46, -0.036), (47, 0.002), (48, -0.015), (49, -0.082)]
simIndex simValue paperId paperTitle
same-paper 1 0.96188861 162 acl-2010-Learning Common Grammar from Multilingual Corpus
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
Author: Mark Johnson
Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
3 0.68451434 195 acl-2010-Phylogenetic Grammar Induction
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21. 1% and reaching as high as 39%.
4 0.62267262 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
5 0.56678492 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
Author: Emily M. Bender ; Scott Drellishak ; Antske Fokkens ; Michael Wayne Goodman ; Daniel P. Mills ; Laurie Poulson ; Safiyyah Saleem
Abstract: This demonstration presents the LinGO Grammar Matrix grammar customization system: a repository of distilled linguistic knowledge and a web-based service which elicits a typological description of a language from the user and yields a customized grammar fragment ready for sustained development into a broad-coverage grammar. We describe the implementation of this repository with an emphasis on how the information is made available to users, including in-browser testing capabilities.
6 0.56173944 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
7 0.55528927 214 acl-2010-Sparsity in Dependency Grammar Induction
8 0.55147928 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
9 0.54127198 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
10 0.52418888 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
11 0.5226416 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
12 0.46172619 67 acl-2010-Computing Weakest Readings
13 0.45961705 217 acl-2010-String Extension Learning
14 0.44206804 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
15 0.42696965 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
16 0.42289805 79 acl-2010-Cross-Lingual Latent Topic Extraction
17 0.39313006 234 acl-2010-The Use of Formal Language Models in the Typology of the Morphology of Amerindian Languages
18 0.38743034 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
19 0.38508976 61 acl-2010-Combining Data and Mathematical Models of Language Change
20 0.38311282 103 acl-2010-Estimating Strictly Piecewise Distributions
topicId topicWeight
[(14, 0.033), (25, 0.087), (31, 0.258), (33, 0.021), (39, 0.016), (42, 0.017), (59, 0.083), (69, 0.025), (71, 0.012), (73, 0.07), (76, 0.015), (78, 0.035), (83, 0.086), (84, 0.042), (98, 0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.77193397 162 acl-2010-Learning Common Grammar from Multilingual Corpus
Author: Tomoharu Iwata ; Daichi Mochihashi ; Hiroshi Sawada
Abstract: We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.
Author: Mark Johnson
Abstract: This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.
3 0.61067861 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
4 0.60829365 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
5 0.60024762 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.59916013 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
7 0.59724063 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
8 0.59714425 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
9 0.59607857 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
10 0.5956884 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
11 0.5954864 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
12 0.59548086 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
13 0.59347987 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
14 0.59318894 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
15 0.59288573 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
16 0.59248841 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
17 0.59247804 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
18 0.59241569 116 acl-2010-Finding Cognate Groups Using Phylogenies
19 0.59229434 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
20 0.59217489 248 acl-2010-Unsupervised Ontology Induction from Text