jmlr jmlr2011 jmlr2011-90 jmlr2011-90-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
´ D. Aldous. Exchangeability and related topics. In Ecole d’´ t´ de probabilit´ s de Saint-Flour, XIII— ee e 1983, pages 1–198. Springer, Berlin, 1985. C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2:1152–1174, 1974. M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden Markov model. Machine Learning, pages 29–245, 2002. A. J. Bell and T. J. Sejnowski. An information maximisation approach to blind separation and blind deconv olution. Neural Computation, 7(6):1129–1159, 1995. J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, New York, 1994. D. Blackwell and J. MacQueen. Ferguson distributions via Polya urn schemes. The Annals of Statistics, 1:353–355, 1973. D. Blei and M. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1:121–144, 2006. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. C. A. Bush and S. N. MacEachern. A semi-parametric Bayesian model for randomized block designs. Biometrika, 83:275–286, 1996. J.-F. Cardoso. Blind signal separation: statistical principles. Proceedings of the IEEE, 86(10): 2009–2025, Oct 1998. W. Chu, Z. Ghahramani, R. Krause, and D. L. Wild. Identifying protein complexes in highthroughput protein interaction screens using an infinite latent feature model. In BIOCOMPUTING 2006: Proceedings of the Pacific Symposium, volume 11, pages 231–242, 2006. P. Comon. Independent component analysis: A new concept. Signal Processing, 36:287–314, 1994. 1219 G RIFFITHS AND G HAHRAMANI D. B. Dahl. An improved merge-split sampler for conjugate Dirichlet process mixture models. Technical Report 1086, Department of Statistics, University of Wisconsin, 2003. A. d’Aspremont, L. El Ghaoui, I. Jordan, and G. R. G. Lanckriet. A direct formulation for sparse PCA using semidefinite programming. Technical Report UCB/CSD-04-1330, Computer Science Division, University of California, Berkeley, 2004. F. Doshi-Velez and Z. Ghahramani. Accelerated Sampling for the Indian Buffet Process. In International Conference on Machine Learning (ICML 2009), 2009a. F. Doshi-Velez and Z. Ghahramani. Correlated non-parametric latent feature models. In Proceedings of the Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-09), pages 143–150, 2009b. F. Doshi-Velez, K.T. Miller, J. Van Gael, and Y.W. Teh. Variational Inference for the Indian Buffet Process. In Artificial Intelligence and Statistics Conference (AISTATS 2009), 2009. F. Doshi-Velez, D. Knowles, S. Mohamed, and Z. Ghahramani. Large scale nonparametric Bayesian inference: Data parallelisation in the Indian buffet process. In Advances in Neural Information Processing Systems 22, 2010. M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588, 1995. P. Fearnhead. Particle filters for mixture models with an unknown number of components. Statistics and Computing, 14:11–21, 2004. T. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1: 209–230, 1973. T. S. Ferguson. Bayesian density estimation by mixtures of normal distributions. In M. Rizvi, J. Rustagi, and D. Siegmund, editors, Recent Advances in Statistics, pages 287–302. Academic Press, New York, 1983. E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. Sharing features among dynamical systems with beta processes. In Advances in Neural Information Processing Systems 22, 2010. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984. Z. Ghahramani. Factorial learning and the EM algorithm. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, San Francisco, CA, 1995. Z. Ghahramani, T. L. Griffiths, and P. Sollich. Bayesian nonparametric latent feature models. In Bayesian Statistics 8. Oxford University Press, Oxford, 2007. W.R. Gilks, S. Richardson, and D. J. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman and Hall, Suffolk, UK, 1996. 1220 I NDIAN B UFFET P ROCESS D. G¨ r¨ r, F. J¨ kel, and C. E. Rasmussen. A choice model with infinitely many latent features. ou a In Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), pages 361–368, New York, 2006. ACM Press. P. Green and S. Richardson. Modelling heterogeneity with and without the Dirichlet process. Scandinavian Journal of Statistics, 28:355–377, 2001. T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. Technical Report 2005-001, Gatsby Computational Neuroscience Unit, 2005. T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18, Cambridge, MA, 2006. MIT Press. K. A. Heller and Z. Ghahramani. Bayesian hierarchical clustering. In International Conference on Machine Learning (ICML 2005), 2005. N. L. Hjort. Nonparametric Bayes estimators based on Beta processes in models for life history data. Annals of Statistics, 18:1259–1294, 1990. H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96:1316–1332, 2001. S. Jain and R. M. Neal. A split-merge Markov chain Monte Carlo procedure for the Dirichlet Process mixture model. Journal of Computational and Graphical Statistics, 13:158–182, 2004. I. T. Jolliffe. Principal component analysis. Springer, New York, 1986. I. T. Jolliffe and M. Uddin. A modified principal component technique based on the lasso. Journal of Computational and Graphical Statistics, 12:531–547, 2003. C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006. D. Knowles and Z. Ghahramani. Infinite sparse factor analysis and infinite independent components analysis. In 7th International Conference on Independent Component Analysis and Signal Separation (ICA 2007), Lecture Notes in Computer Science Series (LNCS). Springer, 2007. D. J. C. MacKay. Maximum likelihood and covariant algorithms for independent component analysis. Technical Report Draft 3.7, Cavendish Laboratory, University of Cambridge, Madingley Road, Cambridge CB3 0HE, December 1996. E. Meeds, Z. Ghahramani, R. Neal, and S. T. Roweis. Modeling dyadic data with binary latent factors. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information o Processing Systems, Cambridge, MA, 2007. MIT Press. K. T. Miller, T. L. Griffiths, and M. I. Jordan. The phylogenetic Indian buffet process: A nonexchangeable nonparametric prior for latent features. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI 2008), 2008. 1221 G RIFFITHS AND G HAHRAMANI K. T. Miller, T. L. Griffiths, and M. I. Jordan. Nonparametric latent feature models for link predictions. In Advances in Neural Information Processing Systems 22, 2010. T. Minka. Bayesian linear regression. Technical report, MIT Media Lab, 2000. http://research.microsoft.com/en-us/um/people/minka/papers/linear.html. D. J. Navarro and T. L. Griffiths. A nonparametric Bayesian model for inferring features from similarity judgments. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural o Information Processing Systems 19, Cambridge, MA, 2007. MIT Press. R. M. Neal. Bayesian mixture modeling. In Maximum Entropy and Bayesian Methods: Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, pages 197–211. Kluwer, Dordrecht, 1992. R. M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9:249–265, 2000. R. M. Neal. Density modeling and clustering using dirichlet diffusion trees. In J. M. Bernardo et al., editor, Bayesian Statistics 7, pages 619–629, 2003. K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96:1077–1087, 2001. P. Orbanz. Construction of nonparametric Bayesian models from parametric Bayes equations. In Advances in Neural Information Processing Systems 22, 2010. J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco, CA, 1988. J. Pitman. Combinatorial stochastic processes, 2002. Notes for Saint Flour Summer School. P. Rai and H. Daum´ . The infinite hierarchical factor regression model. In Advances in Neural e Information Processing Systems, volume 21, 2009. C. Rasmussen. The infinite Gaussian mixture model. In Advances in Neural Information Processing Systems 12. MIT Press, Cambridge, MA, 2000. C. E. Rasmussen and Z. Ghahramani. Occam’s razor. In Advances in Neural Information Processing Systems 13. MIT Press, Cambridge, MA, 2001. S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Computation, 11:305–345, 1999. D. M. Roy and Y. W. Teh. The mondrian process. In Advances in Neural Information Processing Systems 21, 2009. J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994. R. Shepard and P. Arabie. Additive clutering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86:87–123, 1979. 1222 I NDIAN B UFFET P ROCESS P. Sollich. Indian buffet process with tunable feature repulsion, 2005. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18, Cambridge, MA, 2006. MIT Press. Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2004. Y. W. Teh and D. G¨ r¨ r. Indian buffet processes with power-law behavior. In Advances in Neural ou Information Processing Systems 22, 2010. Y. W. Teh, D. G¨ r¨ r, and Z. Ghahramani. Stick-breaking construction for the Indian buffet process. ou In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), San Juan, Puerto Rico, 2007. Y. W. Teh, H. Daum´ , and D. M. Roy. Bayesian agglomerative clustering with coalescents. In e Advances in Neural Information Processing Systems, volume 20, 2008. J. B. Tenenbaum. Learning the structure of similarity. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in neural information processing systems 8, pages 3–9. MIT Press, Cambridge, MA, 1996. R. Thibaux and M. I. Jordan. Hierarchical Beta processes and the Indian buffet process. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), 2007. M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61:611–622, 1999. M. Titsias. The infinite gamma-poisson feature model. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008. A. Tversky. Elimination by aspects: A theory of choice. Psychological Review, 79:281–299, 1972. N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. In Advances in Neural Information Processing Systems 15, Cambridge, 2003. MIT Press. J. Van Gael, Y.W. Teh, and Z. Ghahramani. The infinite factorial hidden Markov model. In Advances in Neural Information Processing Systems, volume 21, 2009. Y. J. Wang and G. Y. Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82:8–19, 1987. M. West, P. Muller, and M. Escobar. Hierarchical priors and mixture models, with application in regression and density estimation. In P. Freeman and A. Smith, editors, Aspects of Uncertainty, pages 363–386. Wiley, New York, 1994. F. Wood and T. L. Griffiths. Particle filtering for nonparametric Bayesian matrix factorization. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing o Systems 19, pages 1513–1520. MIT Press, Cambridge, MA, 2007. 1223 G RIFFITHS AND G HAHRAMANI F. Wood, T. L. Griffiths, and Z. Ghahramani. A non-parametric Bayesian method for inferring hidden causes. In Proceedings of the 22nd Conference in Uncertainty in Artificial Intelligence (UAI ’06), 2006. R. S. Zemel and G. E. Hinton. Developing population codes by minimizing description length. In Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Francisco, CA, 1994. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15:262–286, 2006. 1224