nips nips2010 nips2010-276 nips2010-276-reference knowledge-graph by maker-knowledge-mining

276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

Source: pdf

Author: Zoubin Ghahramani, Michael I. Jordan, Ryan P. Adams

Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a ﬂexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are inﬁnitely exchangeable. One can view our model as providing inﬁnite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1

reference text

[1] Christopher K. I. Williams. A MCMC approach to hierarchical mixture modelling. In Advances in Neural Information Processing Systems 12, pages 680–686. 2000.

[2] Radford M. Neal. Density modeling and clustering using Dirichlet diffusion trees. In Bayesian Statistics 7, pages 619–629, 2003.

[3] Katherine A. Heller and Zoubin Ghahramani. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning, 2005. e

[4] Yee Whye Teh, Hal Daum´ III, and Daniel Roy. Bayesian agglomerative clustering with coalescents. In Advances in Neural Information Processing Systems 20, 2007.

[5] Charles Kemp, Thomas L. Grifﬁths, Sean Stromsten, and Joshua B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16. 2004.

[6] Daniel M. Roy, Charles Kemp, Vikash K. Mansinghka, and Joshua B. Tenenbaum. Learning annotated hierarchies from relational data. In Advances in Neural Information Processing Systems 19, 2007.

[7] Hal Daum´ III. Bayesian multitask learning with latent hierarchies. In Proceedings of the 25th Conference e on Uncertainty in Artiﬁcial Intelligence, 2009.

[8] Edward Meeds, David A. Ross, Richard S. Zemel, and Sam T. Roweis. Learning stick-ﬁgure models using nonparametric Bayesian priors over trees. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[9] Evgeniy Bart, Ian Porteous, Pietro Perona, and Max Welling. Unsupervised learning of visual taxonomies. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[10] Jayaram Sethuraman. A constructive deﬁnition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.

[11] Jim Pitman. Poisson–Dirichlet and GEM invariant distributions for split-and-merge transformation of an interval partition. Combinatorics, Probability and Computing, 11:501–514, 2002.

[12] Jim Pitman and Marc Yor. The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.

[13] R. Daniel Mauldin, William D. Sudderth, and S. C. Williams. P´ lya trees and random distributions. The o Annals of Statistics, 20(3):1203–1221, September 1992.

[14] Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, March 2001. o

[15] David Blackwell and James B. MacQueen. Ferguson distributions via P´ lya urn schemes. Annals of Statistics, 1(2):353–355, 1973.

[16] Jim Pitman. Random discrete distributions invariant under size-biased permutation. Advances in Applied Probability, 28(2):525–539, 1996.

[17] David M. Blei, Thomas L. Grifﬁths, and Michael I. Jordan. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2):1–30, 2010.

[18] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

[19] Radford M. Neal. Slice sampling (with discussion). The Annals of Statistics, 31(3):705–767, 2003.

[20] Stephen G. Walker. Sampling the Dirichlet mixture model with slices. Communications in Statistics, 36:45–54, 2007.

[21] Omiros Papaspiliopoulos and Gareth O. Roberts. Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika, 95(1):169–186, 2008.

[22] Antonio Torralba, Rob Fergus, and William T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, 2008.

[23] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.

[24] Radford M. Neal. MCMC using Hamiltonian dynamics. In Handbook of Markov chain Monte Carlo. Chapman and Hall / CRC Press.

[25] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[26] Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. Evaluation methods for topic models. In Proceedings of the 26th International Conference on Machine Learning, 2009.

[27] Tom L. Grifﬁths and Mark Steyvers. Finding scientiﬁc topics. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl. 1):5228–5235, 2004.

[28] Steven N. MacEachern. Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science, 1999.

[29] Steven N. MacEachern, Athanasios Kottas, and Alan E. Gelfand. Spatial nonparametric Bayesian models. Technical Report 01-10, Institute of Statistics and Decision Sciences, Duke University, 2001. 9