nips nips2011 nips2011-306 nips2011-306-reference knowledge-graph by maker-knowledge-mining

306 nips-2011-t-divergence Based Approximate Inference

Source: pdf

Author: Nan Ding, Yuan Qi, S.v.n. Vishwanathan

Abstract: Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions. We extend the idea of approximate inference to the t-exponential family by deﬁning a new t-divergence. This divergence measure is obtained via convex duality between the log-partition function of the t-exponential family and a new t-entropy. We illustrate our approach on the Bayes Point Machine with a Student’s t-prior. 1

reference text

[1] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman & Hall, 1995.

[2] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1 – 2):1 – 305, 2008.

[3] T. Minka. Expectation Propagation for approximative Bayesian inference. PhD thesis, MIT Media Labs, Cambridge, USA, 2001.

[4] Y. Weiss. Comparing the mean ﬁeld method and belief propagation for approximate inference in MRFs. In David Saad and Manfred Opper, editors, Advanced Mean Field Methods. MIT Press, 2001.

[5] T. Minka. Divergence measures and message passing. Report 173, Microsoft Research, 2005.

[6] C. Bishop, N. Lawrence, T. Jaakkola, and M. Jordan. Approximating posterior distributions in belief networks using mixtures. In Advances in Neural Information Processing Systems 10, 1997.

[7] G. Bouchard and O. Zoeter. Split variational inference. In Proc. Intl. Conf. Machine Learning, 2009.

[8] P. Grunwald and A. Dawid. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Annals of Statistics, 32(4):1367–1433, 2004.

[9] C. R. Shalizi. Maximum likelihood estimation for q-exponential (tsallis) distributions, 2007. URL http: //arxiv.org/abs/math.ST/0701854.

[10] C. Tsallis. Possible generalization of boltzmann-gibbs statistics. J. Stat. Phys., 52:479–487, 1988.

[11] J. Naudts. Deformed exponentials and logarithms in generalized thermostatistics. Physica A, 316:323– 334, 2002. URL http://arxiv.org/pdf/cond-mat/0203489.

[12] T. D. Sears. Generalized Maximum Entropy, Convexity, and Machine Learning. PhD thesis, Australian National University, 2008.

[13] A. Sousa and C. Tsallis. Student’s t- and r-distributions: Uniﬁed derivation from an entropic variational principle. Physica A, 236:52–57, 1994.

[14] C. Tsallis, R. S. Mendes, and A. R. Plastino. The role of constraints within generalized nonextensive statistics. Physica A: Statistical and Theoretical Physics, 261:534–554, 1998.

[15] J. Naudts. Generalized thermostatistics based on deformed exponential and logarithmic functions. Physica A, 340:32–40, 2004.

[16] J. Naudts. Generalized thermostatistics and mean-ﬁeld theory. Physica A, 332:279–300, 2004.

[17] J. Naudts. Estimators, escort proabilities, and φ-exponential families in statistical physics. Journal of Inequalities in Pure and Applied Mathematics, 5(4), 2004.

[18] N. Ding and S. V. N. Vishwanathan. t-logistic regression. In Richard Zemel, John Shawe-Taylor, John Lafferty, Chris Williams, and Alan Culota, editors, Advances in Neural Information Processing Systems 23, 2010.

[19] A. R´ nyi. On measures of information and entropy. In Proc. 4th Berkeley Symposium on Mathematics, e Statistics and Probability, pages 547–561, 1960.

[20] J. D. Lafferty. Additive models, boosting, and inference for generalized divergences. In Proc. Annual Conf. Computational Learning Theory, volume 12, pages 125–133. ACM Press, New York, NY, 1999.

[21] I. Csisz´ r. Information type measures of differences of probability distribution and indirect observations. a Studia Math. Hungarica, 2:299–318, 1967.

[22] K. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001. Special issue on Theoretical Advances in On-line Learning, Game Theory and Boosting.

[23] W. Wiegerinck and T. Heskes. Fractional belief propagation. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 438–445, 2003.

[24] M. Opper. A Bayesian approach to online learning. In On-line Learning in Neural Networks, pages 363–378. Cambridge University Press, 1998.

[25] X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In UAI, 1998.

[26] J. S. Rosenthal. A First Look at Rigorous Probability Theory. World Scientiﬁc Publishing, 2006. 9