nips nips2007 nips2007-31 nips2007-31-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yee W. Teh, Daniel J. Hsu, Hal Daume
Abstract: We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics. 1
[1] R. O. Duda and P. E. Hart. Pattern Classification And Scene Analysis. Wiley and Sons, New York, 1973.
[2] R. M. Neal. Defining priors for distributions using Dirichlet diffusion trees. Technical Report 0104, Department of Statistics, University of Toronto, 2001.
[3] C. K. I. Williams. A MCMC approach to hierarchical mixture modelling. In Advances in Neural Information Processing Systems, volume 12, 2000.
[4] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems, volume 16, 2004.
[5] D. M. Roy, C. Kemp, V. Mansinghka, and J. B. Tenenbaum. Learning annotated hierarchies from relational data. In Advances in Neural Information Processing Systems, volume 19, 2007.
[6] K. A. Heller and Z. Ghahramani. Bayesian hierarchical clustering. In Proceedings of the International Conference on Machine Learning, volume 22, 2005.
[7] N. Friedman. Pcluster: Probabilistic agglomerative clustering of gene expression profiles. Technical Report Technical Report 2003-80, Hebrew University, 2003.
[8] J. F. C. Kingman. On the genealogy of large populations. Journal of Applied Probability, 19:27–43, 1982. Essays in Statistical Science.
[9] J. F. C. Kingman. The coalescent. Stochastic Processes and their Applications, 13:235–248, 1982.
[10] P. Fearnhead. Sequential Monte Carlo Method in Filter Theory. PhD thesis, Merton College, University of Oxford, 1998.
[11] R. M. Neal. Annealed importance sampling. Technical Report 9805, Department of Statistics, University of Toronto, 1998.
[12] A. McMahon and R. McMahon. Language Classification by Numbers. Oxford University Press, 2005.
[13] M. Haspelmath, M. Dryer, D. Gil, and B. Comrie, editors. The World Atlas of Language Structures. Oxford University Press, 2005.
[14] H. Daum´ III and L. Campbell. A Bayesian model for discovering typological implications. In Proceede ings of the Annual Meeting of the Association for Computational Linguistics, 2007.
[15] J. Pitman. Coalescents with multiple collisions. Annals of Probability, 27:1870–1902, 1999.