nips nips2010 nips2010-118 nips2010-118-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Justin Domke
Abstract: This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification. 1
[1] Percy Liang and Michael Jordan. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In ICML, 2008.
[2] Samuel Gross, Olga Russakovsky, Chuong Do, and Serafim Batzoglou. Training conditional random fields for maximum labelwise accuracy. In NIPS. 2006.
[3] Sham Kakade, Yee Whye Teh, and Sam Roweis. An alternate objective function for Markovian fields. In ICML, 2002.
[4] Martin Wainwright. Estimating the
[5] Justin Domke. Learning convex inference of marginals. In UAI, 2008.
[6] Frederik Eaton and Zoubin Ghahramani. Choosing a variable to clamp. In AISTATS, 2009.
[7] Max Welling and Yee Whye Teh. Belief optimization for binary networks: A stable alternative to loopy belief propagation. In UAI, 2001.
[8] Tom Heskes, Kees Albers, and Bert Kappen. Approximate inference and constrained optimization. In UAI, 2003.
[9] Alan Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14:2002, 2002.
[10] Martin Wainwright and Michael Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2):1–305, 2008.
[11] Jonathan Yedidia, William Freeman, and Yair Weiss. Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51:2282–2312, 2005.
[12] Martin Wainwright, Tommi Jaakkola, and Alan Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):2313–2335, 2005.
[13] Ofer Meshi, Ariel Jaimovich, Amir Globerson, and Nir Friedman. Convexifying the bethe free energy. In UAI, 2009.
[14] Tom Heskes. Convexity arguments for efficient minimization of the bethe and kikuchi free energies. J. Artif. Intell. Res. (JAIR), 26:153–190, 2006.
[15] Varun Ganapathi, David Vickrey, John Duchi, and Daphne Koller. Constrained approximate maximum entropy learning of markov random fields. In UAI, 2008.
[16] Tamir Hazan and Amnon Shashua. Convergent message-passing algorithms for inference over general graphs with convex free energies. In UAI, pages 264–273, 2008.
[17] Neculai Andrei. Accelerated conjugate gradient algorithm with finite difference hessian/vector product approximation for unconstrained optimization. J. Comput. Appl. Math., 230(2):570– 582, 2009.
[18] Talya Meltzer, Amir Globerson, and Yair Weiss. Convergent message passing algorithms - a unifying view, 2009.
[19] Joris M. Mooij et al. libDAI 0.2.4: A free/open source C++ library for Discrete Approximate Inference. http://www.libdai.org/, 2010.
[20] Sanjiv Kumar, Jonas August, and Martial Hebert. Exploiting inference for approximate parameter learning in discriminative fields: An empirical study. In EMMCVPR, 2005.
[21] Sanjiv Kumar and Martial Hebert. Discriminative random fields. International Journal of Computer Vision, 68(2):179–201, 2006.
[22] S. V. N. Vishwanathan, Nicol Schraudolph, Mark Schmidt, and Kevin Murphy. Accelerated training of conditional random fields with stochastic gradient methods. In ICML, 2006.
[23] Patrick Pletscher, Cheng Soon Ong, and Joachim Buhmann. Spanning tree approximations for conditional random fields. In AISTATS, 2009. 9