nips nips2011 nips2011-170 nips2011-170-reference knowledge-graph by maker-knowledge-mining

170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables


Source: pdf

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1


reference text

[1] Shankar Kumar, William Byrne, and Speech Processing. Minimum bayes-risk decoding for statistical machine translation. In HLT-NAACL, 2004.

[2] David Sontag and Tommi Jaakkola. New outer bounds on the marginal polytope. In In Advances in Neural Information Processing Systems, 2007.

[3] Amir Globerson and Tommi Jaakkola. Fixing max-product: Convergent message passing algorithms for map lp-relaxations. In NIPS, 2007.

[4] Pradeep Ravikumar, Alekh Agarwal, and Martin J. Wainwright. Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes. In ICML, 2008.

[5] Qiang Liu and Alexander Ihler. Variational algorithms for marginal map. In UAI, 2011.

[6] James D. Park. MAP Complexity Results and Approximation Methods. In UAI, 2002.

[7] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

[8] Shaul K. Bar-Lev, Daoud Bshouty, Peter Enis, Gerard Letac, I-Li Lu, and Donald Richards. The diagnonal multivariate natural exponential families and their classification. In Journal of Theoretical Probability, pages 883–929, 1994.

[9] Vaibhava Goel and William J. Byrne. Minimum Bayes-risk automatic speech recognition. Computer Speech and Language, 14(2), 2000.

[10] M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, 2008.

[11] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.

[12] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Generalized belief propagation. In NIPS, 2000.

[13] Martin J. Wainwright, Tommi S. Jaakkola, and Alan S. Willsky. Exact map estimates by tree agreement. In NIPS, 2002.

[14] Martin J. Wainwright, Tommi S. Jaakkola, and Alan S. Willsky. Tree-reweighted belief propagation algorithms and approximate ml estimation by pseudo-moment matching. In AISTATS, 2003.

[15] Mark Johnson. Why doesnt em find good hmm pos-taggers. In EMNLP, pages 296–305, 2007.

[16] Pradeep Ravikumar, Martin J. Wainwright, and Alekh Agarwal. Message-passing for graph-structured linear programs: Proximal methods and rounding schemes, 2008.

[17] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of The Royal Statistica Society, 1977.

[18] Radford M. Neal and Geoffrey E. Hinton. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in graphical models, pages 355–368, 1999.

[19] Slav Petrov and Dan Klein. Discriminative log-linear grammars with latent variables. In NIPS, 2008.

[20] Ivan Titov and James Henderson. A latent variable model for generative dependency parsing. In IWPT, 2007.

[21] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. Learning structured prediction models: a large margin approach. 2004.

[22] Ioannis Tsochantaridis, Google Inc, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, and Yoram Singer. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005.

[23] Chen Yanover, Talya Meltzer, and Yair Weiss. Linear programming relaxations and belief propagation – an empirical study. Journal of Machine Learning Research, 7:1887–1907, 2006.

[24] Chen Yanover, Ora Schueler-furman, and Yair Weiss. Minimizing and learning energy functions for side-chain prediction. In RECOMB2007, 2007. 9