nips nips2011 nips2011-281 nips2011-281-reference knowledge-graph by maker-knowledge-mining

281 nips-2011-The Doubly Correlated Nonparametric Topic Model

Source: pdf

Author: Dae I. Kim, Erik B. Sudderth

Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual speciﬁcation of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the ﬁrst model to simultaneously capture all three of these properties. The DCNT models metadata via a ﬂexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1

reference text

[1] A. Agovic and A. Banerjee. Gaussian process topic models. In UAI, 2010.

[2] D. M. Blei and J. D. Lafferty. A correlated topic model of science. AAS, 1(1):17–35, 2007.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March 2003.

[4] J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, 2009.

[5] T. S. Ferguson. A Bayesian analysis of some nonparametric problems. An. Stat., 1(2):209–230, 1973.

[6] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall, 2004.

[7] T. L. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. PNAS, 2004.

[8] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, Mar. 2001.

[9] W. Li, D. Blei, and A. McCallum. Nonparametric Bayes Pachinko allocation. In UAI, 2008.

[10] H. F. Lopes and M. West. Bayesian model assessment in factor analysis. Stat. Sinica, 14:41–67, 2004.

[11] D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In UAI, 2008.

[12] I. Murray and R. Salakhutdinov. Evaluating probabilities under high-dimensional latent variable models. In NIPS 21, pages 1137–1144. 2009.

[13] J. Paisley, C. Wang, and D. Blei. The discrete inﬁnite logistic normal distribution for mixed-membership modeling. In AISTATS, 2011.

[14] L. Ren, L. Du, L. Carin, and D. B. Dunson. Logistic stick-breaking process. JMLR, 12, 2011.

[15] A. Rodriguez and D. B. Dunson. Nonparametric bayesian models through probit stick-breaking processes. J. Bayesian Analysis, 2011.

[16] J. Sethuraman. A constructive deﬁnition of Dirichlet priors. Stat. Sin., 4:639–650, 1994.

[17] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

[18] Y. W. Teh, M. Seeger, and M. I. Jordan. Semiparametric latent factor models. In AIStats 10, 2005.

[19] H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno. Evaluation methods for topic models. In ICML, 2009. 9