nips nips2012 nips2012-156 nips2012-156-reference knowledge-graph by maker-knowledge-mining

156 nips-2012-Identifiability and Unmixing of Latent Parse Trees

Source: pdf

Author: Percy Liang, Daniel J. Hsu, Sham M. Kakade

Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identiﬁable from inﬁnite data? We use a general technique for numerically checking identiﬁability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identiﬁable models, how do we estimate the parameters efﬁciently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models. 1

reference text

[1] A. Anandkumar, D. Hsu, and S. M. Kakade. A method of moments for mixture models and hidden Markov models. In COLT, 2012.

[2] F. Pereira and Y. Shabes. Inside-outside reestimation from partially bracketed corpora. In ACL, 1992.

[3] G. Carroll and E. Charniak. Two experiments on learning probabilistic dependency grammars from corpora. In Workshop Notes for Statistically-Based NLP Techniques, AAAI, pages 1–13, 1992.

[4] M. A. Paskin. Grammatical bigrams. In NIPS, 2002.

[5] D. Klein and C. D. Manning. Conditional structure versus conditional estimation in NLP models. In EMNLP, 2002.

[6] D. Klein and C. D. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In ACL, 2004.

[7] P. Liang and D. Klein. Analyzing the errors of unsupervised learning. In HLT/ACL, 2008.

[8] J. T. Chang. Full reconstruction of Markov models on evolutionary trees: Identiﬁability and consistency. Mathematical Biosciences, 137:51–73, 1996.

[9] A. Anandkumar, K. Chaudhuri, D. Hsu, S. M. Kakade, L. Song, and T. Zhang. Spectral methods for learning multivariate latent tree structure. In NIPS, 2011.

[10] J. B. Kruskal. Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and Applications, 18:95–138, 1977.

[11] E. S. Allman, C. Matias, and J. A. Rhodes. Identiﬁability of parameters in latent structure models with many observed variables. Annals of Statistics, 37:3099–3132, 2009.

[12] E. S. Allman, S. Petrovi, J. A. Rhodes, and S. Sullivant. Identiﬁability of 2-tree mixtures for group-based models. Transactions on Computational Biology and Bioinformatics, 8:710–722, 2011.

[13] J. A. Rhodes and S. Sullivant. Identiﬁability of large phylogenetic mixture models. Bulletin of Mathematical Biology, 74(1):212–231, 2012.

[14] T. J. Rothenberg. Identiﬁcation in parameteric models. Econometrica, 39:577–591, 1971.

[15] L. A. Goodman. Exploratory latent structure analysis using both identiﬁabile and unidentiﬁable models. Biometrika, 61(2):215–231, 1974.

[16] D. Bamber and J. P. H. van Santen. How many parameters can a model have and still be testable? Journal of Mathematical Psychology, 29:443–473, 1985.

[17] D. Geiger, D. Heckerman, H. King, and C. Meek. Stratiﬁed exponential families: graphical models and model selection. Annals of Statistics, 29:505–529, 2001.

[18] K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990.

[19] M. Johnson, T. Grifﬁths, and S. Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In HLT/NAACL, 2007.

[20] E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability, 16(2):583–614, 2006.

[21] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In COLT, 2009.

[22] S. M. Siddiqi, B. Boots, and G. J. Gordon. Reduced-rank hidden Markov models. In AISTATS, 2010.

[23] A. Parikh, L. Song, and E. P. Xing. A spectral algorithm for latent tree graphical models. In ICML, 2011.

[24] S. B. Cohen, K. Stratos, M. Collins, D. P. Foster, and L. Ungar. Spectral learning of latent-variable PCFGs. In ACL, 2012.

[25] F. M. Luque, A. Quattoni, B. Balle, and X. Carreras. Spectral learning for non-deterministic dependency parsing. In EACL, 2012.

[26] P. Dhillon, J. Rodue, M. Collins, D. P. Foster, and L. Ungar. Spectral dependency parsing with latent variables. In EMNLP-CoNLL, 2012.

[27] J. Eisner. Three new probabilistic models for dependency parsing: An exploration. In COLING, 1996.

[28] S. Sahni. Computationally related problems. SIAM Journal on Computing, 3:262–279, 1974.

[29] J. Eisner. Bilexical grammars and their cubic-time parsing algorithms. In Advances in Probabilistic and Other Parsing Technologies, pages 29–62, 2000. 9