jmlr jmlr2011 jmlr2011-77 jmlr2011-77-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
S. Afonso, E. Bick, R. Haber, and D. Santos. Floresta Sinta(c)tica: A treebank for Portuguese. In Proc. LREC, 2002. 487 G ILLENWATER , G ANCHEV, G RAÇA , P EREIRA AND TASKAR K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In Proc. UAI, 2009. D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1995. A. Bohomovà, J. Hajic, E. Hajicova, and B. Hladka. The Prague dependency treebank: Threelevel annotation scenario. In Anne Abeillé, editor, Treebanks: Building and Using Syntactically Annotated Corpora. Kluwer Academic Publishers, 2001. S. Brants, S. Dipper, S. Hansen, W. Lezius, and G. Smith. The TIGER treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, 2002. M. Civit and M.A. Martí. Building cast3lb: A Spanish treebank. Research on Language & Computation, 2004. S.B. Cohen and N.A. Smith. The shared logistic normal distribution for grammar induction. In Proc. NAACL, 2009. S.B. Cohen, K. Gimpel, and N.A. Smith. Logistic normal priors for unsupervised probabilistic grammar induction. In Proc. NIPS, 2008. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1–38, 1977. S. Džeroski, T. Erjavec, N. Ledinek, P. Pajas, Z. Žabokrtsky, and A. Žele. Towards a Slovene dependency treebank. In Proc. LREC, 2006. J. Finkel, T. Grenager, and C. Manning. The infinite tree. In Proc. ACL, 2007. K. Ganchev, J. Graça, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 2010. J. Graça, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In Proc. NIPS, 2007. J. Graça, K. Ganchev, B. Taskar, and F. Pereira. Posterior sparsity vs parameter sparsity. In Proc. NIPS, 2009. J. Graça, K. Ganchev, and B. Taskar. Learning tractable word alignment models with complex constraints. Computational Linguistics, 2010. W. Headden III, D. McClosky, and E. Charniak. Evaluating unsupervised part-of-speech tagging for grammar induction. In Proc. CoNLL, 2008. W.P. Headden III, M. Johnson, and D. McClosky. Improving unsupervised dependency parsing with richer contexts and smoothing. In Proc. NAACL, 2009. M. Johnson, T.L. Griffiths, and S. Goldwater. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Proc. NIPS, 2007. Y. Kawata and J. Bartels. Stylebook for the Japanese treebank in VERBMOBIL. Technical report, Eberhard-Karls-Universitat Tubingen, 2000. 488 P OSTERIOR S PARSITY IN U NSUPERVISED D EPENDENCY PARSING D. Klein and C. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proc. ACL, 2004. M.T. Kromann, L. Mikkelsen, and S.K. Lynge. Danish dependency treebank. In Proc. TLT, 2003. K. Kurihara and T. Sato. An application of the variational Bayesian approach to probabilistic context-free grammars. In IJC-NLP Workshop: Beyond Shallow Analyses, 2004. P. Liang, S. Petrov, M.I. Jordan, and D. Klein. The infinite PCFG using hierarchical Dirichlet processes. In Proc. EMNLP, 2007. P. Liang, M.I. Jordan, and D. Klein. Learning from measurements in exponential families. In Proc. ICML, 2009. G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, 2007. G. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proc. ACL, 2008. M. Marcus, M. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 1993. D. McClosky. Modeling valence effects in unsupervised grammar induction. Technical report, CS-09-01, Brown University, 2008. R. Neal and G. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355–368. MIT Press, 1998. J. Nilsson and J. Hall, J.and Nivre. MAMBA meets TIGER: Reconstructing a Swedish treebank from antiquity. NODALIDA Special Session on Treebanks, 2005. K. Oflazer, B. Say, D.Z. Hakkani-Tür, and G. Tür. Building a Turkish treebank. Treebanks: Building and Using Parsed Corpora, 2003. S. Ravi, J. Baldridge, and K. Knight. Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons. In Proc. ACL, 2010. R. Reichart and A. Rappoport. Automatic selection of high quality parses created by a fully unsupervised parser. In Proc. CoNLL, 2009. K. Simov, P. Osenova, M. Slavcheva, S. Kolkovska, E. Balabanova, D. Doikoff, K. Ivanova, A. Simov, E. Simov, and M. Kouylekov. Building a linguistically interpreted corpus of Bulgarian: the BulTreebank. In Proc. LREC, 2002. N. Smith. Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text. PhD thesis, Johns Hopkins University, 2006. N. Smith and J. Eisner. Contrastive estimation: Training log-linear models on unlabeled data. In Proc. IJC-AI Workshop: Grammatical Inference Applications, 2005a. 489 G ILLENWATER , G ANCHEV, G RAÇA , P EREIRA AND TASKAR N. Smith and J. Eisner. Guiding unsupervised grammar induction using contrastive estimation. In Proc. IJC-AI Workshop: Grammatical Inference Applications, 2005b. N. Smith and J. Eisner. Annealing structural bias in multilingual weighted grammar induction. In Proc. ACL, 2006. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. In Proc. NAACL-HLT, 2010. P. Tseng. An analysis of the EM algorithm and entropy-like proximal point methods. Mathematics of Operations Research, 29(1):27–44, 2004. L. Van der Beek, G. Bouma, R. Malouf, and G. Van Noord. The Alpino dependency treebank. Language and Computers, 2002. 490