nips nips2013 nips2013-46 nips2013-46-reference knowledge-graph by maker-knowledge-mining

46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models

Source: pdf

Author: Jie Liu, David Page

Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneﬁcial to group the parameters for more efﬁcient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1

reference text

[1] A. U. Asuncion, Q. Liu, A. T. Ihler, and P. Smyth. Particle ﬁltered MCMC-MLE with connections to contrastive divergence. In ICML, 2010.

[2] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 9:485–516, June 2008.

[3] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The inﬁnite hidden Markov model. In NIPS, 2002.

[4] J. Besag. Statistical analysis of non-lattice data. JRSS-D, 24(3):179–195, 1975.

[5] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.

[6] S. P. Chatzis and G. Tsechpenakis. The inﬁnite hidden Markov random ﬁeld model. In ICCV, 2009.

[7] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209– 230, 1973.

[8] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the inﬁnite hidden Markov model. In ICML, 2008.

[9] A. Gelman and J. Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York, 2007.

[10] S. J. Gershman and D. M. Blei. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1):1–12, 2012.

[11] C. J. Geyer. Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics, pages 156–163, 1991.

[12] M. Gutmann and J. Hirayama. Bregman divergence as general framework to estimate unnormalized statistical models. In UAI, pages 283–290, Corvallis, Oregon, 2011. AUAI Press.

[13] M. Gutmann and A. Hyv¨ rinen. Noise-contrastive estimation: A new estimation principle for unnormala ized statistical models. In AISTATS, 2010.

[14] L. A. Hannah, D. M. Blei, and W. B. Powell. Dirichlet process mixtures of generalized linear models. JMLR, 12:1923–1953, 2011.

[15] G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.

[16] A. Hyvarinen. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Neural Networks, IEEE Transactions on, 18(5):1529–1531, 2007.

[17] A. Hyv¨ rinen. Some extensions of score matching. Computational statistics & data analysis, 51(5):2499– a 2512, 2007.

[18] S. Lyu. Unifying non-maximum likelihood learning objectives with minimum KL contraction. NIPS, 2011.

[19] M. Meila. Comparing clusterings by the variation of information. In COLT, 2003.

[20] J. Møller, A. Pettitt, R. Reeves, and K. Berthelsen. An efﬁcient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006.

[21] I. Murray, Z. Ghahramani, and D. J. C. MacKay. MCMC for doubly-intractable distributions. In UAI, 2006.

[22] R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.

[23] P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning. Springer, 2010.

[24] K. Palla, D. A. Knowles, and Z. Ghahramani. An inﬁnite latent attribute model for network data. In ICML, 2012.

[25] J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random structures and Algorithms, 9(1-2):223–252, 1996.

[26] C. E. Rasmussen. The inﬁnite Gaussian mixture model. In NIPS, 2000.

[27] C. E. Rasmussen and Z. Ghahramani. Inﬁnite mixtures of Gaussian process experts. In NIPS, 2001.

[28] B. Shahbaba and R. Neal. Nonlinear models using Dirichlet process mixtures. JMLR, 10:1829–1850, 2009.

[29] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML, 2008.

[30] T. Tieleman and G. Hinton. Using fast weights to improve persistent contrastive divergence. In ICML, 2009.

[31] D. Vickrey, C. Lin, and D. Koller. Non-local contrastive objectives. In Proc. of the International Conference on Machine Learning. Citeseer, 2010.

[32] J. Zhu, N. Chen, and E. P. Xing. Inﬁnite latent SVM for classiﬁcation and multi-task learning. In NIPS, 2011.

[33] J. Zhu, N. Chen, and E. P. Xing. Inﬁnite SVM: a Dirichlet process mixture of large-margin kernel machines. In ICML, 2011.

[34] S. C. Zhu and X. Liu. Learning in Gibbsian ﬁelds: How accurate and how fast can it be? IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1001–1006, 2002. 9