nips nips2012 nips2012-118 nips2012-118-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Seong-hwan Jun, Liangliang Wang, Alexandre Bouchard-côté
Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles. 1
[1] D. P. Anderson. BOINC: A System for Public-Resource Computing and Storage. In GRID ’04: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pages 4–10, Washington, DC, USA, 2004. IEEE Computer Society.
[2] Y. W. Teh, H. Daum´ III, and D. M. Roy. Bayesian agglomerative clustering with coalescents. e In Advances in Neural Information Processing Systems (NIPS), 2008.
[3] J. G. Propp and D. B. Wilson. Coupling from the past: a user’s guide. Microsurveys in Discrete Probability. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 41:181–192, 1998.
[4] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of The Royal Statistical Society Series B-statistical Methodology, 68(3):411–436, 2006.
[5] P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo for Bayesian computation. Bayesian Statistics, 8:1–34, 2007.
[6] A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics, 19(4):769–789, 2010.
[7] S. Singh and A. McCallum. Towards asynchronous distributed MCMC inference for large graphical models. In Neural Information Processing Systems (NIPS), Big Learning Workshop on Algorithms, Systems, and Tools for Learning at Scale, 2011.
[8] J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs sampling: From colored fields to thin junction trees. In In Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, May 2011.
[9] S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Large-scale cross-document coreference using distributed inference and hierarchical models. In Association for Computational Linguistics: Human Language Technologies (ACL HLT), 2011.
[10] R. H. Swendsen and J-S. Wang. Replica Monte Carlo simulation of spin-glasses. Phys. Rev. Lett., 57:2607–2609, Nov 1986.
[11] A. Bouchard-Cˆ t´ , S. Sankararaman, and M. I. Jordan. Phylogenetic inference via Sequential oe Monte Carlo. Systematic Biology, 2011.
[12] A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo methods in practice. Springer, 2001.
[13] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM 2001, pages 149–160, 2001.
[14] J. F. C. Kingman. On the Genealogy of Large Populations. Journal of Applied Probability, 19:27–43, 1982.
[15] J. Felsenstein. Inferring phylogenies. Sinauer Associates, 2003.
[16] C. Semple and M. Steel. Phylogenetics. Oxford, 2003.
[17] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, August 2001.
[18] J.J. Cannone, S. Subramanian, M.N. Schnare, J.R. Collett, L.M. D’Souza, Y. Du, B. Feng, N. Lin, L.V. Madabusi, K.M. Muller, N. Pande, Z. Shang, N. Yu, and R.R. Gutell. The comparative RNA web (CRW) site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BioMed Central Bioinformatics, 2002. 9