nips nips2009 nips2009-192 nips2009-192-reference knowledge-graph by maker-knowledge-mining

192 nips-2009-Posterior vs Parameter Sparsity in Latent Variable Models


Source: pdf

Author: Kuzman Ganchev, Ben Taskar, Fernando Pereira, João Gama

Abstract: We address the problem of learning structured unsupervised models with moment sparsity typical in many natural language induction tasks. For example, in unsupervised part-of-speech (POS) induction using hidden Markov models, we introduce a bias for words to be labeled by a small number of tags. In order to express this bias of posterior sparsity as opposed to parametric sparsity, we extend the posterior regularization framework [7]. We evaluate our methods on three languages — English, Bulgarian and Portuguese — showing consistent and significant accuracy improvement over EM-trained HMMs, and HMMs with sparsity-inducing Dirichlet priors trained by variational EM. We increase accuracy with respect to EM by 2.3%-6.5% in a purely unsupervised setting as well as in a weaklysupervised setting where the closed-class words are provided. Finally, we show improvements using our method when using the induced clusters as features of a discriminative model in a semi-supervised setting. 1


reference text

[1] S. Afonso, E. Bick, R. Haber, and D. Santos. Floresta Sinta(c)tica: a treebank for Portuguese. In In Proc. LREC, pages 1698–1703, 2002.

[2] K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In In Proc. UAI, 2009.

[3] D.P. Bertsekas, M.L. Homer, D.A. Logan, and S.D. Patek. Nonlinear programming. Athena scientific, 1995.

[4] Jianfeng Gao and Mark Johnson. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In In Proc. EMNLP, pages 344–352, Honolulu, Hawaii, October 2008. ACL.

[5] Y. Goldberg, M. Adler, and M. Elhadad. Em can find pretty good hmm pos-taggers (when given a good start). In Proc. ACL, pages 746–754, 2008.

[6] S. Goldwater and T. Griffiths. A fully bayesian approach to unsupervised part-of-speech tagging. In In Proc. ACL, volume 45, page 744, 2007.

[7] J. Graça, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In In Proc. NIPS. MIT Press, 2008.

[8] A. Haghighi and D. Klein. Prototype-driven learning for sequence models. In In Proc. NAACL, pages 320–327, 2006.

[9] M Johnson. Why doesn’t EM find good HMM POS-taggers. In In Proc. EMNLP-CoNLL, 2007.

[10] P. Liang, M. I. Jordan, and D. Klein. Learning from measurements in exponential families. In In proc. ICML, 2009.

[11] G. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, 2007.

[12] G. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional random fields. In In Proc. ACL, pages 870 – 878, 2008.

[13] M.P. Marcus, M.A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2):313–330, 1993.

[14] B. Merialdo. Tagging English text with a probabilistic model. Computational linguistics, 20(2):155–171, 1994.

[15] Sujith Ravi and Kevin Knight. Minimized models for unsupervised part-of-speech tagging. In In Proc. ACL, 2009.

[16] Kiril Simov, Petya Osenova, Milena Slavcheva, Sia Kolkovska, Elisaveta Balabanova, Dimitar Doikoff, Krassimira Ivanova, Alexander Simov, Er Simov, and Milen Kouylekov. Building a linguistically interpreted corpus of bulgarian: the bultreebank. In In Proc. LREC, page pages, 2002.

[17] N.A. Smith and J. Eisner. Contrastive estimation: Training log-linear models on unlabeled data. In In Proc. ACL, pages 354–362, 2005.

[18] K. Toutanova and M. Johnson. A Bayesian LDA-based model for semi-supervised part-ofspeech tagging. In Proc. NIPS, 20, 2007. 9