nips nips2013 nips2013-46 nips2013-46-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
[1] A. U. Asuncion, Q. Liu, A. T. Ihler, and P. Smyth. Particle filtered MCMC-MLE with connections to contrastive divergence. In ICML, 2010.
[2] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. JMLR, 9:485–516, June 2008.
[3] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden Markov model. In NIPS, 2002.
[4] J. Besag. Statistical analysis of non-lattice data. JRSS-D, 24(3):179–195, 1975.
[5] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.
[6] S. P. Chatzis and G. Tsechpenakis. The infinite hidden Markov random field model. In ICCV, 2009.
[7] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209– 230, 1973.
[8] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the infinite hidden Markov model. In ICML, 2008.
[9] A. Gelman and J. Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York, 2007.
[10] S. J. Gershman and D. M. Blei. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56(1):1–12, 2012.
[11] C. J. Geyer. Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics, pages 156–163, 1991.
[12] M. Gutmann and J. Hirayama. Bregman divergence as general framework to estimate unnormalized statistical models. In UAI, pages 283–290, Corvallis, Oregon, 2011. AUAI Press.
[13] M. Gutmann and A. Hyv¨ rinen. Noise-contrastive estimation: A new estimation principle for unnormala ized statistical models. In AISTATS, 2010.
[14] L. A. Hannah, D. M. Blei, and W. B. Powell. Dirichlet process mixtures of generalized linear models. JMLR, 12:1923–1953, 2011.
[15] G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.
[16] A. Hyvarinen. Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Neural Networks, IEEE Transactions on, 18(5):1529–1531, 2007.
[17] A. Hyv¨ rinen. Some extensions of score matching. Computational statistics & data analysis, 51(5):2499– a 2512, 2007.
[18] S. Lyu. Unifying non-maximum likelihood learning objectives with minimum KL contraction. NIPS, 2011.
[19] M. Meila. Comparing clusterings by the variation of information. In COLT, 2003.
[20] J. Møller, A. Pettitt, R. Reeves, and K. Berthelsen. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006.
[21] I. Murray, Z. Ghahramani, and D. J. C. MacKay. MCMC for doubly-intractable distributions. In UAI, 2006.
[22] R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.
[23] P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning. Springer, 2010.
[24] K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML, 2012.
[25] J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random structures and Algorithms, 9(1-2):223–252, 1996.
[26] C. E. Rasmussen. The infinite Gaussian mixture model. In NIPS, 2000.
[27] C. E. Rasmussen and Z. Ghahramani. Infinite mixtures of Gaussian process experts. In NIPS, 2001.
[28] B. Shahbaba and R. Neal. Nonlinear models using Dirichlet process mixtures. JMLR, 10:1829–1850, 2009.
[29] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML, 2008.
[30] T. Tieleman and G. Hinton. Using fast weights to improve persistent contrastive divergence. In ICML, 2009.
[31] D. Vickrey, C. Lin, and D. Koller. Non-local contrastive objectives. In Proc. of the International Conference on Machine Learning. Citeseer, 2010.
[32] J. Zhu, N. Chen, and E. P. Xing. Infinite latent SVM for classification and multi-task learning. In NIPS, 2011.
[33] J. Zhu, N. Chen, and E. P. Xing. Infinite SVM: a Dirichlet process mixture of large-margin kernel machines. In ICML, 2011.
[34] S. C. Zhu and X. Liu. Learning in Gibbsian fields: How accurate and how fast can it be? IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:1001–1006, 2002. 9