jmlr jmlr2010 jmlr2010-29 jmlr2010-29-reference knowledge-graph by maker-knowledge-mining

29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars

Source: pdf

Author: Shay B. Cohen, Noah A. Smith

Abstract: Probabilistic grammars offer great ﬂexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar’s parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data. Keywords: dependency grammar induction, variational inference, logistic normal distribution, Bayesian inference

reference text

S. Afonso, E. Bick, R. Haber, and D. Santos. Floresta sinta(c)tica: a treebank for Portuguese. In Proceedings of LREC, 2002. A. Ahmed and E. Xing. On tight approximate inference of the logistic normal topic admixture model. In Proceedings of AISTATS, 2007. J. Aitchison. The Statistical Analysis of Compositional Data. Chapman and Hall, London, 1986. H. Alshawi and A. L. Buchsbaum. Head automata and bilingual tiling: Translation with minimal representations. In Proceedings of ACL, 1996. D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, pages 87–106, 1988. N. B. Atalay, K. Oﬂazer, and B. Say. The annotation process in the Turkish treebank. In Proceedings of LINC, 2003. J. Baker. Trainable grammars for speech recognition. In The 97th meeting of the Acoustical Society of America, 1979. A. Banerjee. On Bayesian bounds. In Proceedings of ICML, 2006. T. Berg-Kirkpatrick and D. Klein. Phylogenetic grammar induction. In Proceedings of ACL, 2010. T. Berg-Kirkpatrick, A. Bouchard-Cote, J. DeNero, and D. Klein. Unsupervised learning with features. In Proceedings of NAACL, 2010. D. M. Blei and J. D. Lafferty. Correlated topic models. In Proceedings of NIPS, 2006. D. M. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. J. L. Boyd-Graber and D. M. Blei. Syntactic topic models. CoRR, abs/1002.4665, 2010. P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 1990. S. Buchholz and E. Marsi. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of CoNLL, 2006. D. Burkett and D. Klein. Two languages are better than one (for syntactic parsing). In Proceedings of EMNLP, 2008. G. Carroll and E. Charniak. Two experiments on learning probabilistic dependency grammars from corpora. Technical report, Brown University, 1992. E. Charniak and M. Johnson. Coarse-to-ﬁne n-best parsing and maxent discriminative reranking. In Proceedings of ACL, 2005. S. F. Chen. Bayesian grammar induction for language modeling. In Proceedings of ACL, 1995. 3047 C OHEN AND S MITH D. Chiang. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL, 2005. A. Clark and F. Thollard. PAC-learnability of probabilistic deterministic ﬁnite state automata. Journal of Machine Learning Research, 5:473–497, 2004. A. Clark, R. Eyraud, and A. Habrard. A polynomial algorithm for the inference of context free languages. In Proceedings of ICGI, 2008. S. B. Cohen and N. A. Smith. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proceedings of NAACL-HLT, 2009. S. B. Cohen and N. A. Smith. Empirical risk minimization with approximations of probabilistic grammars. In NIPS, 2010. S. B. Cohen, K. Gimpel, and N. A. Smith. Logistic normal priors for unsupervised probabilistic grammar induction. In NIPS, 2008. S. B. Cohen, D. M. Blei, and N. A. Smith. Variational inference for adaptor grammars. In Proceedings of NAACL, 2010. T. Cohn, S. Goldwater, and P. Blunsom. Inducing compact but accurate tree-substitution grammars. In Proceedings of HLT-NAACL, 2009. M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, U. Penn., 1999. M. Collins. Head-driven statistical models for natural language processing. Computational Linguistics, 29:589–637, 2003. I. Dagan. Two languages are more informative than one. In Proceedings of ACL, 1991. D. Das, N. Schneider, D. Chen, and N. A. Smith. Probabilistic frame-semantic parsing. In Proceedings of ACL, 2010. C. de la Higuera. A bibliographical study of grammatical inference. Pattern Recognition, 38:1332– 1348, 2005. J. Dean and S. Ghemawat. MapReduce: Simpliﬁed data processing on large clusters. In Proceedings of OSDI, 2004. A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1–38, 1977. Y. Ding and M. Palmer. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of ACL, 2005. J. Eisner. Bilexical grammars and a cubic-time probabilistic parser. In Proceedings of IWPT, 1997. J. R. Finkel, T. Grenager, and C. D. Manning. The inﬁnite tree. In Proceedings of ACL, 2007. H. Gaifman. Dependency systems and phrase-structure systems. Information and Control, 8, 1965. 3048 C OVARIANCE IN U NSUPERVISED L EARNING OF P ROBABILISTIC G RAMMARS K. Ganchev, J. Gillenwater, and B. Taskar. Dependency grammar induction via bitext projection constraints. In Proceedings of ACL, 2009. J. Gillenwater, K. Ganchev, J. Graca, F. Pereira, and B. Taskar. Sparsity in dependency grammar ¸ induction. In Proceedings of ACL, 2010. K. Gimpel and N. A. Smith. Feature-rich translation by quasi-synchronous lattice parsing. In Proceedings of EMNLP, 2009. S. Goldwater. Nonparametric Bayesian models of lexical acquisition. PhD thesis, Brown University, 2006. S. Goldwater and T. L. Grifﬁths. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of ACL, 2007. A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. Learning bilingual lexicons from monolingual corpora. In Proceedings of ACL, 2008. J. Hajiˇ , A. B¨ hmov´ , E. Hajiˇ ov´ , and B. Vidov´ Hldak´ . The Prague dependency treebank: A c o a c a a a three-level annotation scenario. Treebanks: Building and Using Parsed Corpora, pages 103–127, 2000. W. P. Headden, M. Johnson, and D. McClosky. Improving unsupervised dependency parsing with richer contexts and smoothing. In Proceedings of NAACL-HLT, 2009. G. E. Hinton. Products of experts. In Proceedings of ICANN, 1999. R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak. Bootstrapping parsers via syntactic projection across parallel texts. Journal of Natural Language Engineering, 11(3):311–25, 2005. R. Johansson and P. Nugues. LTH: Semantic structure extraction using nonprojective dependency trees. In Proceedings of SemEval, 2007. M. Johnson. Why doesn’t EM ﬁnd good HMM POS-taggers? In Proceedings EMNLP-CoNLL, 2007. M. Johnson, T. L. Grifﬁths, and S. Goldwater. Adaptor grammars: A framework for specifying compositional nonparameteric Bayesian models. In NIPS, 2006. M. Johnson, T. L. Grifﬁths, and S. Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proceedings of NAACL, 2007. Y. Kawata and J. Bartels. Stylebook for the Japanese treebank in VERBMOBIL. Technical Report Verbmobil-Report 240, Seminar f¨ r Sprachwissenschaft, Univerisit¨ t T¨ bingen, 2000. u a u D. Klein and C. D. Manning. A generative constituent-context model for improved grammar induction. In Proceedings of ACL, 2002. D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423– 430, 2003. 3049 C OHEN AND S MITH D. Klein and C. D. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of ACL, 2004. K. Kurihara and T. Sato. Variational Bayesian grammar induction for natural language. In Proceedings of ICGI, 2006. K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990. P. Liang, S. Petrov, M. Jordan, and D. Klein. The inﬁnite PCFG using hierarchical Dirichlet processes. In Proceedings of EMNLP, 2007. D. Lin. A path-based transfer model for machine translation. In Proceedings of COLING, 2004. M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 19:313–330, 1993. D. McAllester. PAC-Bayesian model averaging. Machine Learning Journal, 5:5–21, 2003. D. Mimno, H. Wallach, and A. McCallum. Gibbs sampling for logistic normal topic models with graph-based priors. In In Proceedings of NIPS Workshop on Analyzing Graphs, 2008. J. Nivre, J. Hall, S. K¨ bler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. The CoNLL 2007 u shared task on dependency parsing. In Proceedings of the CoNLL Shared Task, EMNLP-CoNLL, 2007. K. Oﬂazer, B. Say, D. Z. Hakkani-T¨ r, and G. T¨ r. Building a Turkish treebank. In A. Abeille, u u editor, Building and Exploiting Syntactically-Annotated Corpora. Kluwer, 2003. F. C. N. Pereira and Y. Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings of ACL, 1992. H. Raiffa and R. Schaifer. Applied Statistical Decision Theory. Wiley-Interscience, 1961. M. Seeger. PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3:233–269, 2002. D. A. Smith and N. A. Smith. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceedings of EMNLP, 2004. N. A. Smith. Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text. PhD thesis, Johns Hopkins University, 2006. N. A. Smith and J. Eisner. Guiding unsupervised grammar induction using contrastive estimation. In Proceedings of IJCAI Workshop on Grammatical Inference Applications, 2005. N. A. Smith and J. Eisner. Annealing structural bias in multilingual weighted grammar induction. In Proceedings of COLING-ACL, 2006. B. Snyder and R. Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proceedings of ACL, 2008. 3050 C OVARIANCE IN U NSUPERVISED L EARNING OF P ROBABILISTIC G RAMMARS V. Spitkovsky, H. Alshawi, and D. Jurafsky. From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. In Proceedings of NAACL, 2010a. V. I. Spitkovsky, H. Alshawi, D. Jurafsky, and C. D. Manning. Viterbi training improves unsupervised dependency parsing. In Proceedings of CoNLL, 2010b. V. I. Spitkovsky, D. Jurafsky, and H. Alshawi. Proﬁting from mark-up: Hyper-text annotations for guided parsing. In Proceedings of ACL, 2010c. ´e L. Tesni` re. El´ ment de Syntaxe Structurale. Klincksieck, 1959. e K. Toutanova and M. Johnson. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In Proceedings of NIPS, 2007. M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1:1–305, 2008. M. Wang, N. A. Smith, and T. Mitamura. What is the Jeopardy model? a quasi-synchronous grammar for question answering. In Proceedings of EMNLP, 2007. D. Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–404, 1997. N. Xue, F. Xia, F.-D. Chiou, and M. Palmer. The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 10(4):1–30, 2004. D. Yarowsky, G. Ngai, and R. Wicentoswki. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of HLT, 2001. 3051