jmlr jmlr2010 jmlr2010-53 jmlr2010-53-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Trevor Cohn, Phil Blunsom, Sharon Goldwater
Abstract: Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model’s complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model’s efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results. Keywords: grammar induction, tree substitution grammar, Bayesian non-parametrics, Pitman-Yor process, Chinese restaurant process
Steven Paul Abney. The English Noun Phrase in its Sentential Aspect. PhD thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1987. ´ David Aldous. Exchangeability and related topics. In Ecole d’´ t´ de probabilit´ s de Saint-Flour, ee e XIII—1983, pages 1–198. Springer, Berlin, 1985. Phil Blunsom and Trevor Cohn. Unsupervised induction of tree substitution grammars for dependency parsing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1204–1213, Boston, Massachusetts, October 2010. 3091 C OHN , B LUNSOM AND G OLDWATER Rens Bod. Using an annotated language corpus as a virtual stochastic grammar. In 11th National Conference on Artificial Intelligence, pages 778–783, Washington D.C., USA, July 1993. Rens Bod. The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars. In Proceedings of the 7th conference on European chapter of the Association for Computational Linguistics, pages 104–111, Dublin, Ireland, 1995. Rens Bod. Combining semantic and syntactic structure for language modeling. In Proceedings of the 6th International Conference on Spoken Language Processing, pages 106–109, Beijing, China, 2000. Rens Bod. An efficient implementation of a new DOP model. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, April 2003. Rens Bod. An all-subtrees approach to unsupervised parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 865–872, Sydney, Australia, July 2006. Rens Bod, Remko Scha, and Khalil Sima’an, editors. Data-oriented parsing. Center for the Study of Language and Information — Studies in Computational Linguistics. University of Chicago Press, 2003. Glenn Carroll and Eugene Charniak. Two experiments on learning probabilistic dependency grammars from corpora. In Proceedings of the AAAI Workshop on Statistically-Based Natural Language Processing Techniques, San Jose, California, 1992. Eugene Charniak and Mark Johnson. Coarse-to-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 173–180, Ann Arbor, Michigan, June 2005. David Chiang and Daniel M. Bikel. Recovering latent information in treebanks. In Proceedings of the 19th International Conference on Computational Linguistics, pages 183–189, Taipei, Taiwan, 2002. Alexander Clark. Unsupervised induction of stochastic context-free grammars using distributional clustering. In Proceedings of the 2001 workshop on Computational Natural Language Learning, pages 1–8, Toulouse, France, 2001. Shay B. Cohen and Noah A. Smith. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 74–82, 2009. Shay B. Cohen, Kevin Gimpel, and Noah A. Smith. Logistic normal priors for unsupervised probabilistic grammar induction. In Daphne Koller, Dale Schuurmans, Yoshua Bengio, and Lon Bottou, editors, Advances in Neural Information Processing Systems 21, pages 321–328. 2009. 3092 I NDUCING T REE S UBSTITUTION G RAMMARS Shay B. Cohen, David M. Blei, and Noah A. Smith. Variational inference for adaptor grammars. In Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 564–572, 2010. Trevor Cohn and Phil Blunsom. Blocked inference in Bayesian tree substitution grammars. In Proceedings of the ACL 2010 Conference Short Papers, pages 225–230, Uppsala, Sweden, July 2010. Trevor Cohn, Sharon Goldwater, and Phil Blunsom. Inducing compact but accurate tree-substitution grammars. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 548–556, Boulder, Colorado, June 2009. Jason Eisner. Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt, editors, Advances in Probabilistic and Other Parsing Technologies, pages 29–62. Kluwer Academic Publishers, October 2000. Thomas S. Ferguson. A bayesian analysis of some nonparametric problems. Annals of Statistics, 1(2):209–230, 1973. Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. The infinite tree. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 272–279, Prague, Czech Republic, June 2007. Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721– 741, 1984. E. Mark Gold. Language identification in the limit. Information and Control, 10(5):447–474, 1967. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 673–680, Sydney, Australia, July 2006. Joshua Goodman. Parsing Inside-Out. PhD thesis, Harvard University, 1998. Joshua Goodman. Efficient parsing of DOP with PCFG-reductions. In Bod et al. (2003), chapter 8. William P. Headden III, Mark Johnson, and David McClosky. Improving unsupervised dependency parsing with richer contexts and smoothing. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 101–109, Boulder, Colorado, June 2009. Hemant Ishwaran and Lancelot F. James. Generalized weighted Chinese restaurant processes for species sampling mixture models. Statistica Sinica, 13:1211–1235, 2003. Mark Johnson. The DOP estimation method is biased and inconsistent. Computational Lingusitics, 28(1):71–76, March 2002. 3093 C OHN , B LUNSOM AND G OLDWATER Mark Johnson. Transforming projective bilexical dependency grammars into efficiently-parsable cfgs with unfold-fold. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 168–175, Prague, Czech Republic, June 2007. Mark Johnson. Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure. In Proceedings of ACL-08: HLT, pages 398–406, Columbus, Ohio, June 2008a. Mark Johnson. Unsupervised word segmentation for Sesotho using adaptor grammars. In Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology, pages 20–27, Columbus, Ohio, June 2008b. Mark Johnson and Sharon Goldwater. Improving nonparameteric bayesian inference: experiments on unsupervised word segmentation with adaptor grammars. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 317–325, Boulder, Colorado, June 2009. Mark Johnson, Thomas Griffiths, and Sharon Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, pages 139–146, Rochester, NY, April 2007a. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In B. Sch¨ lkopf, J. Platt, and o T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 641–648. 2007b. Aravind Joshi. Tree adjoining grammars. In Ruslan Mikkov, editor, The Oxford Handbook of Computational Linguistics, pages 483–501. Oxford University Press, Oxford, England, 2003. Dan Klein and Christopher D. Manning. A generative constituent-context model for improved grammar induction. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 128–135, Philadelphia, Pennsylvania, USA, July 2002. Dan Klein and Christopher D. Manning. Corpus-based induction of syntactic structure: models of dependency and constituency. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 478–485, 2004. Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990. Percy Liang, Slav Petrov, Michael Jordan, and Dan Klein. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 688–697, Prague, Czech Republic, June 2007. Percy Liang, Michael I. Jordan, and Dan Klein. Type-based mcmc. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 573–581, Los Angeles, California, June 2010. 3094 I NDUCING T REE S UBSTITUTION G RAMMARS Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics, 19(2):313–330, 1993. ˇ Igor′ A. Mel′ cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988. Bernard Merialdo. Tagging English text with a probabilistic model. Computational Linguistics, 20 (2):155–172, 1994. Radford Neal. Slice sampling. Annals of Statistics, 31:705–767, 2003. Timothy J. O’Donnell, Noah D. Goodman, and Joshua B. Tenenbaum. Fragment grammar: Exploring reuse in hierarchical generative processes. Technical Report MIT-CSAIL-TR-2009-013, MIT, 2009. Slav Petrov. Products of random latent variable grammars. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 19–27, Los Angeles, California, June 2010. Slav Petrov and Dan Klein. Improved inference for unlexicalized parsing. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, pages 404–411, Rochester, NY, April 2007. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 433–440, Sydney, Australia, July 2006. Jim Pitman. Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields, 102:145–158, 1995. Jim Pitman. Combinatorial Stochastic Processes. Springer-Verlag, New York, 2006. Jim Pitman and Marc Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855–900, 1997. Matt Post and Daniel Gildea. Bayesian learning of a tree substitution grammar. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 45–48, Suntec, Singapore, August 2009. Detlef Prescher, Remko Scha, Khalil Sima’an, and Andreas Zollmann. On the statistical consistency of DOP estimators. In Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands, Antwerp, Belgium, 2004. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. From Baby Steps to Leapfrog: How “Less is More” in unsupervised dependency parsing. In Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 751–759, 2010. Fei Xia. Automatic grammar generation from two different perspectives. PhD thesis, University of Pennsylvania, 2002. 3095 C OHN , B LUNSOM AND G OLDWATER Andreas Zollmann and Khalil Sima’an. A consistent and efficient estimator for data-oriented parsing. Journal of Automata, Languages and Combinatorics, 10(2):367–388, 2005. Willem Zuidema. Parsimonious data-oriented parsing. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 551–560, Prague, Czech Republic, June 2007. 3096