emnlp emnlp2010 emnlp2010-71 emnlp2010-71-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Lamar ; Yariv Maron ; Elie Bienenstock
Abstract: We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.
Omri Abend, Roi Reichart and Ari Rappoport. Improved Unsupervised POS Induction through Prototype Discovery. 2010. In Proceedings of the 48th Annual Meeting of the ACL. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag, New York, LLC. Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless Unsupervised Learning with Features. In proceedings of NAACL 2010. Alexander Clark. 2001 . The unsupervised induction of stochastic context-free grammars using distributional clustering. In CoNLL. Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In 10th Conference of the European Chapter of the Association for Computational Linguistics, pages 59–66. Jianfeng Gao and Mark Johnson. 2008. A comparison of bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 344–352. 808 Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 744–75 1. João V. Graça, Kuzman Ganchev, Ben Taskar, and Fernando Pereira. 2009. Posterior vs. Parameter Sparsity in Latent Variable Models. Neural Information Processing Systems Conference (NIPS). Michael Lamar, Yariv Maron, Mark Johnson, Elie Bienenstock. 2010. SVD and Clustering for Unsupervised POS Tagging. In Proceedings of the 48th Annual Meeting of the ACL. Aria Haghighi and Dan Klein. 2006. Prototypedriven learning for sequence models. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 320– 327, New York City, USA, June. Association for Computational Linguistics. William P. Headden, David McClosky, and Eugene Charniak. 2008. Evaluating unsupervised part-ofspeech tagging for grammar induction. In Proceedings of the International Conference on Computational Linguistics (COLING ’08). Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 296–305. Hermann Ney, Ute Essen, and Reinhard Kneser. 1994. On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8, 1-38. Roi Reichart, Raanan Fattal and Ari Rappoport. 2010. Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipfian Constraint. CoNLL. Sujith Ravi and Kevin Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 504–512. Hinrich Schütze. 1995. Distributional part-of-speech tagging. In Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics, pages 141–148. Noah A. Smith and Jason Eisner. 2005. Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL ’05), pages 354–362. Kristina Toutanova, Dan Klein, Christopher D. Manning and Yoram Singer. 2003. Feature-rich partof-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL 2003, pages 252-259. Yoshimasa Tsuruoka and Jun'ichi Tsujii. 2005. Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data. In Proceedings of HLT/EMNLP, pp. 467-474. 809