nips nips2007 nips2007-189 nips2007-189-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10] P. Bickel and K. Doksum. Mathematical Statistics. Prentice Hall, 2000. D. Blei and M. Jordan. Modeling annotated data. In SIGIR, pages 127–134. ACM Press, 2003. D. Blei and J. McAuliffe. Supervised topic models. In preparation, 2007. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003. P. Flaherty, G. Giaever, J. Kumm, M. Jordan, and A. Arkin. A latent variable model for chemogenomic profiling. Bioinformatics, 21(15):3286–3293, 2005. K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. 2001. A. McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/discriminative training for clustering and classification. In AAAI, 2006. P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall, 1989. B. Pang and L. Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL, 2005. 8