acl acl2013 acl2013-348 acl2013-348-reference knowledge-graph by maker-knowledge-mining

348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

Source: pdf

Author: Shay B. Cohen ; Mark Johnson

Abstract: Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.

reference text

K. B. Atherya and P. E. Ney. 1972. Branching Processes. Dover Publications. Y. Bar-Hillel, M. Perles, and E. Shamir. 1964. On formal properties of simple phrase structure grammars. Language and Information: Selected Essays on Their Theory and Application, pages 116–150. T. L. Booth and R. A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22:442–450. Z. Chi and S. Geman. 1998. Estimation of probabilistic context-free grammars. Computational Linguistics, 24(2):299–305. Z. Chi. 1999. Statistical properties of probabilistic context-free grammars. Computational Linguistics, 25(1): 131–160. S. B. Cohen and N. A. Smith. 2012. Empirical risk minimization for probabilistic grammars: Sample complexity and hardness of learning. Computational Linguistics, 38(3):479–526. M. H. DeGroot. 1991 . Probability and Statistics (3rd edition). Addison-Wesley. M. Johnson, T. L. Griffiths, and S. Goldwater. 2007. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proceedings of NAACL. D. Klein and C. D. Manning. 2004. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of ACL. K. Kurihara and T. Sato. 2006. Variational Bayesian grammar induction for natural language. In 8th International Colloquium on Grammatical Inference. K. Lari and S.J. Young. 1990. The estimation of Stochastic Context-Free Grammars using the InsideOutside algorithm. Computer Speech and Language, 4(35-56). M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19:313–330. M.-J. Nederhof and G. Satta. 2008. Computing partition functions of PCFGs. Research on Language and Computation, 6(2): 139–162. C. P. Robert and G. Casella. 2004. Monte Carlo Statistical Methods. Springer-Verlag New York. N. A. Smith. 2006. Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text. Ph.D. thesis, Johns Hopkins University. C. S. Wetherell. 1980. Probabilistic languages: A review and some open questions. Computing Surveys, 12:361–379. 1041