jmlr jmlr2013 jmlr2013-120 jmlr2013-120-reference knowledge-graph by maker-knowledge-mining

120 jmlr-2013-Variational Algorithms for Marginal MAP


Source: pdf

Author: Qiang Liu, Alexander Ihler

Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models


reference text

F. Altarelli, A. Braunstein, A. Ramezanpour, and R. Zecchina. Stochastic optimization by message passing. Journal of Statistical Mechanics: Theory and Experiment, 2011(11):P11009, 2011. J.R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer Verlag, 1997. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2010. T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, INC., 2nd edition, 2006. C.P. De Campos. New complexity results for MAP in Bayesian networks. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), pages 2100–2106, 2011. R. Dechter and I. Rish. Mini-buckets: A general scheme for bounded inference. Journal of the ACM (JACM), 50(2):107–153, 2003. 3197 L IU AND I HLER A. Doucet, S.J. Godsill, and C.P. Robert. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Statistics and Computing, 12(1), January 2002. A. Globerson and T.S. Jaakkola. Approximate inference using conditional entropy decompositions. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS-07), 2007. D.M. Greig, B.T. Porteous, and A.H. Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society. Series B (Methodological), pages 271–279, 1989. T. Hazan and A. Shashua. Norm-product belief propagation: Primal-dual message-passing for approximate inference. Information Theory, IEEE Transactions on, 56(12):6294–6316, 2010. T. Hazan, J. Peng, and A. Shashua. Tightening fractional covering upper bounds on the partition function for high-order region graphs. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12), 2012. R.A. Howard and J.E. Matheson. Influence diagrams. Decision Analysis, 2(3):127–143, 2005. D.R. Hunter and K. Lange. A tutorial on MM algorithms. The American Statistician, 1(58), February 2004. M. Ibrahimi, A. Javanmard, Y. Kanoria, and A. Montanari. Robust max-product belief propagation. In Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, pages 43–49. IEEE, 2011. A. Iusem and M. Teboulle. On the convergence rate of entropic proximal optimization methods. Computational and Applied Mathematics, 12:153–168, 1993. E.T. Jaynes. Information theory and statistical mechanics. Physical Review, 106(4):620–630, May 1957. M. Jerrum and A. Sinclair. Polynomial-time approximation algorithms for the ising model. SIAM Journal on Computing, 22(5):1087–1116, 1993. J. Jiang, P. Rai, and H. Daum´ III. Message-passing for approximate map inference with latent e variables. In Advances in Neural Information Processing Systems (NIPS-11), 2011. V. Jojic, S. Gould, and D. Koller. Accelerated dual decomposition for MAP inference. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010. D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT press, 2009. V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(10):1568 –1583, oct. 2006. Q. Liu and A. Ihler. Bounding the partition function using H¨ lder’s inequality. In Proceedings of o the 28th International Conference on Machine Learning (ICML-11), pages 849–856, June 2011a. 3198 VARIATIONAL ALGORITHMS FOR M ARGINAL MAP Q. Liu and A. Ihler. Variational algorithms for marginal MAP. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-11). 2011b. Q. Liu and A. Ihler. Belief propagation for structured decision making. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12). August 2012. B. Martinet. R´ gularisation d’in´ quations variationnelles par approximations successives. Revue e e Francaise dInformatique et de Recherche Op´ rationelle, 4:154–158, 1970. ¸ e R. Mateescu, K. Kask, V. Gogate, and R. Dechter. Join-graph propagation algorithms. Journal of Artificial Intelligence Research, 37(1):279–328, 2010. D.D. Mau´ and C.P. de Campos. Anytime marginal maximum a posteriori inference. In Proceedings a of the 29th International Conference on Machine Learning (ICML-12), 2012. T. Meltzer, A. Globerson, and Y. Weiss. Convergent message passing algorithms: a unifying view. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI-09), 2009. O.P. Meshi, T. Jaakkola, and A. Globerson. Convergence rate analysis of MAP coordinate minimization algorithms. In Advances in Neural Information Processing Systems (NIPS-12), 2012. R. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. Jordan, editor, Learning in Graphical Models, pages 355–368. Kluwer, 1998. J. Park and A. Darwiche. Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research, 21:101–133, 2004. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988. P. Ravikumar, A. Agarwal, and M.J. Wainwright. Message-passing for graph-structured linear programs: Proximal projections, convergence, and rounding schemes. Journal of Machine Learning Research, 11:1043–1080, Mar 2010. R.T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14(5):877, 1976. B. Savchynskyy, S. Schmidt, J.H. Kappes, and C. Schn¨ rr. Efficient MRF energy minimization o via adaptive diminishing smoothing. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI-12), 2010. D. Sontag, A. Globerson, and T.S. Jaakkola. Introduction to dual decomposition for inference. In Optimization for Machine Learning. MIT Press, 2011. M. Teboulle. Entropic proximal mappings with applications to nonlinear programming. Mathematics of Operations Research, 17(3):pp. 670–690, 1992. P. Tseng and D.P. Bertsekas. On the convergence of the exponential multiplier method for convex programming. Mathematical Programming, 60(1):1–19, 1993. 3199 L IU AND I HLER M.J. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Foundation and Trends in Machine Learning, 1(1-2):1–305, 2008. M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. Tree-based reparameterization framework for analysis of sum-product and related algorithms. Information Theory, IEEE Transactions on, 45: 1120–1146, 2003. M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. A new class of upper bounds on the log partition function. Information Theory, IEEE Transactions on, 51(7):2313–2335, July 2005a. M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. MAP estimation via agreement on trees: message-passing and linear programming. Information Theory, IEEE Transactions on, 51(11): 3697–3717, 2005b. Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propagation with convex free energies. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI-07), 2007. T. Werner. A linear programming approach to max-sum problem: A review. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(7):1165–1179, 2007. T. Werner. Revisiting the linear programming relaxation approach to Gibbs energy minimization and weighted constraint satisfaction. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(8):1474–1488, 2010. W. Wiegerinck. Approximations with reweighted generalized belief propagation. In In Proceedings of the 9th International Conference on Artificial Intelligence and Statistics (AISTATS-05), 2005. J.S. Yedidia, W.T. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations. Exploring Artificial Intelligence in the New Millennium, 8:236–239, 2003. J.S. Yedidia, W.T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized BP algorithms. Information Theory, IEEE Transactions on, 51, July 2005. C. Yuan, T.C. Lu, and M.J. Druzdzel. Annealed MAP. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI-04), pages 628–635, 2004. A. L. Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14(7):1691–1722, July 2002. 3200