nips nips2011 nips2011-228 nips2011-228-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yichuan Zhang, Charles A. Sutton
Abstract: The performance of Markov chain Monte Carlo methods is often sensitive to the scaling and correlations between the random variables of interest. An important source of information about the local correlation and scale is given by the Hessian matrix of the target distribution, but this is often either computationally expensive or infeasible. In this paper we propose MCMC samplers that make use of quasiNewton approximations, which approximate the Hessian of the target distribution from previous samples and gradients generated by the sampler. A key issue is that MCMC samplers that depend on the history of previous states are in general not valid. We address this problem by using limited memory quasi-Newton methods, which depend only on a fixed window of previous samples. On several real world datasets, we show that the quasi-Newton sampler is more effective than standard Hamiltonian Monte Carlo at a fraction of the cost of MCMC methods that require higher-order derivatives. 1
[1] S. Barthelme and N. Chopin. Discussion on Riemannian Manifold Hamiltonian Monte Carlo. Journal of the Royal Statistical Society, B (Statistical Methodology), 73:163–164, 2011. doi: 10.1111/j.1467-9868.2010.00765.x.
[2] K. Brodlie, A. Gourlay, and J. Greenstadt. Rank-one and rank-two corrections to positive definite matrices expressed in product form. IMA Journal of Applied Mathematics, 11(1): 73–82, 1973.
[3] S. Chib, E. Greenberg, and R. Winkelmann. Posterior simulation and bayes factors in panel count data models. Journal of Econometrics, 86(1):33–54, June 1998. URL http: //ideas.repec.org/a/eee/econom/v86y1998i1p33-54.html.
[4] W. R. Gilks, G. O. Roberts, and E. I. George. Adaptive direction sampling. The Statistician, 43(1):179–9, 1994.
[5] M. Girolami and B. Calderhead. Riemannian manifold Hamiltonian Monte Carlo (with discussion). Journal of the Royal Statistical Society, B (Statistical Methodology), 73:123–214, 2011. doi: 10.1111/j.1467-9868.2010.00765.x.
[6] H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolis algorithm. Bernoulli, 7(2): 223–242, 2001.
[7] J. S. Liu, F. Liang, and W. H. Wong. The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449):pp. 121–134, 2000.
[8] A. McCallum. Frequently asked questions data set. ˜mccallum/data/faqdata. http://www.cs.umass.edu/
[9] R. M. Neal. MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, editors, Handbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press, 2010.
[10] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag, New York, 1999. ISBN 0-387-98793-2.
[11] Y. Qi and T. P. Minka. Hessian-based Markov chain Monte Carlo algorithms. In First Cape Cod Workshop on Monte Carlo Methods, September 2002.
[12] Y. Qi, M. Szummer, and T. P. Minka. Bayesian conditional random fields. In Artificial Intelligence and Statistics (AISTATS). Barbados, January 2005.
[13] G. O. Roberts and J. S. Rosenthal. Coupling and ergodicity of adaptive mcmc. Journal of Applied Probability, 44(2):458–475, 2007.
[14] D. M. Roy. Discussion on Riemannian Manifold Hamiltonian Monte Carlo. Journal of the Royal Statistical Society, B (Statistical Methodology), 73:194–195, 2011. doi: 10.1111/j. 1467-9868.2010.00765.x.
[15] M. Welling and S. Parise. Bayesian random fields: The Bethe-Laplace approximation. In Uncertainty in Artificial Intelligence (UAI), 2006. 9