nips nips2007 nips2007-84 nips2007-84-reference knowledge-graph by maker-knowledge-mining

84 nips-2007-Expectation Maximization and Posterior Constraints

Source: pdf

Author: Kuzman Ganchev, Ben Taskar, João Gama

Abstract: The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to ﬁnd a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difﬁcult to add even simple a-priori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efﬁcient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1

reference text

[1] D. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, Belmont, MA, 1999.

[2] P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, M. J. Goldsmith, J. Hajic, R. L. Mercer, and S. Mohanty. But dictionaries are data too. In Proc. HLT, 1993.

[3] Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311, 1994.

[4] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, o Cambridge, MA, 2006.

[5] I. Csiszar. I-divergence geometry of probability distributions and minimization problems. The Annals of Probability, 3, 1975.

[6] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977.

[7] Alexander Fraser and Daniel Marcu. Measuring word alignment quality for statistical machine translation. Comput. Linguist., 33(3):293–303, 2007.

[8] Nir Friedman, Ori Mosenzon, Noam Slonim, and Naftali Tishby. Multivariate information bottleneck. In UAI, 2001.

[9] Michael I. Jordan, Zoubin Ghahramani, Tommi Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.

[10] Philipp Koehn. Europarl: A multilingual corpus for evaluation of machine translation, 2002.

[11] P. Lambert, A.De Gispert, R. Banchs, and J. B. Mari˜ o. Guidelines for word alignment evaln uation and manual alignment. In Language Resources and Evaluation, Volume 39, Number 4, pages 267–285, 2005.

[12] Percy Liang, Ben Taskar, and Dan Klein. Alignment by agreement. In Proc. HLT-NAACL, 2006.

[13] Gideon S. Mann and Andrew McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, 2007.

[14] R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justiﬁes incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355– 368. Kluwer, 1998.

[15] Franz Josef Och and Hermann Ney. Improved statistical alignment models. In ACL, 2000.

[16] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1):19–51, 2003.

[17] Noah A. Smith and Jason Eisner. Annealing structural bias in multilingual weighted grammar induction. In Proc. ACL, pages 569–576, 2006.

[18] Martin Szummer and Tommi Jaakkola. Information regularization with partially labeled data. In Proc. NIPS, pages 1025–1032, 2003.

[19] L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189–201, 1979.

[20] Stephan Vogel, Hermann Ney, and Christoph Tillmann. Hmm-based word alignment in statistical translation. In Proc. COLING, 1996. 8