nips nips2010 nips2010-118 nips2010-118-reference knowledge-graph by maker-knowledge-mining

118 nips-2010-Implicit Differentiation by Perturbation

Source: pdf

Author: Justin Domke

Abstract: This paper proposes a simple and eﬃcient ﬁnite diﬀerence method for implicit diﬀerentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, deﬁned on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to ﬁt graphical models with hidden variables, high treewidth and/or model misspeciﬁcation. 1

reference text

[1] Percy Liang and Michael Jordan. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In ICML, 2008.

[2] Samuel Gross, Olga Russakovsky, Chuong Do, and Seraﬁm Batzoglou. Training conditional random ﬁelds for maximum labelwise accuracy. In NIPS. 2006.

[3] Sham Kakade, Yee Whye Teh, and Sam Roweis. An alternate objective function for Markovian ﬁelds. In ICML, 2002.

[4] Martin Wainwright. Estimating the

[5] Justin Domke. Learning convex inference of marginals. In UAI, 2008.

[6] Frederik Eaton and Zoubin Ghahramani. Choosing a variable to clamp. In AISTATS, 2009.

[7] Max Welling and Yee Whye Teh. Belief optimization for binary networks: A stable alternative to loopy belief propagation. In UAI, 2001.

[8] Tom Heskes, Kees Albers, and Bert Kappen. Approximate inference and constrained optimization. In UAI, 2003.

[9] Alan Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14:2002, 2002.

[10] Martin Wainwright and Michael Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2):1–305, 2008.

[11] Jonathan Yedidia, William Freeman, and Yair Weiss. Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51:2282–2312, 2005.

[12] Martin Wainwright, Tommi Jaakkola, and Alan Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):2313–2335, 2005.

[13] Ofer Meshi, Ariel Jaimovich, Amir Globerson, and Nir Friedman. Convexifying the bethe free energy. In UAI, 2009.

[14] Tom Heskes. Convexity arguments for eﬃcient minimization of the bethe and kikuchi free energies. J. Artif. Intell. Res. (JAIR), 26:153–190, 2006.

[15] Varun Ganapathi, David Vickrey, John Duchi, and Daphne Koller. Constrained approximate maximum entropy learning of markov random ﬁelds. In UAI, 2008.

[16] Tamir Hazan and Amnon Shashua. Convergent message-passing algorithms for inference over general graphs with convex free energies. In UAI, pages 264–273, 2008.

[17] Neculai Andrei. Accelerated conjugate gradient algorithm with ﬁnite diﬀerence hessian/vector product approximation for unconstrained optimization. J. Comput. Appl. Math., 230(2):570– 582, 2009.

[18] Talya Meltzer, Amir Globerson, and Yair Weiss. Convergent message passing algorithms - a unifying view, 2009.

[19] Joris M. Mooij et al. libDAI 0.2.4: A free/open source C++ library for Discrete Approximate Inference. http://www.libdai.org/, 2010.

[20] Sanjiv Kumar, Jonas August, and Martial Hebert. Exploiting inference for approximate parameter learning in discriminative ﬁelds: An empirical study. In EMMCVPR, 2005.

[21] Sanjiv Kumar and Martial Hebert. Discriminative random ﬁelds. International Journal of Computer Vision, 68(2):179–201, 2006.

[22] S. V. N. Vishwanathan, Nicol Schraudolph, Mark Schmidt, and Kevin Murphy. Accelerated training of conditional random ﬁelds with stochastic gradient methods. In ICML, 2006.

[23] Patrick Pletscher, Cheng Soon Ong, and Joachim Buhmann. Spanning tree approximations for conditional random ﬁelds. In AISTATS, 2009. 9