nips nips2008 nips2008-64 nips2008-64-reference knowledge-graph by maker-knowledge-mining

64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

Source: pdf

Author: Simon Lacoste-julien, Fei Sha, Michael I. Jordan

Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in ﬁnding a reduced dimensionality representation. Speciﬁcally, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classiﬁcation. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classiﬁcation task and show how our model can identify shared topics across classes as well as class-dependent topics.

reference text

[1] T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y. W. Teh, E. Learned-Miller, and D. A. Forsyth. Names and faces in the news. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, 2004.

[2] D. Blei and J. McAuliffe. Supervised topic models. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, Cambridge, MA, 2008. MIT Press.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[4] F. Chiaromonte and R. D. Cook. Sufﬁcient dimension reduction and graphics in regression. Annals of the Institute of Statistical Mathematics, 54(4):768–795, 2002.

[5] L. Fei-fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, 2005.

[6] P. Flaherty, G. Giaever, J. Kumm, M. I. Jordan, and A. P. Arkin. A latent variable model for chemogenomic proﬁling. Bioinformatics, 21:3286–3293, 2005.

[7] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. Annals of Statistics, 2008. To appear.

[8] T. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. Proceedings of the National Academy of Sciences, 101:5228–5235, 2004.

[9] D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with Dirichletmultinomial regression. In Proceedings of the 24th Annual Conference on Uncertainty in Artiﬁcial Intelligence, Helsinki, Finland, 2008.

[10] M. Rosen-Zvi, T. Grifﬁths T, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Annual Conference on Uncertainty in Artiﬁcial Intelligence, Banff, Canada, 2004.

[11] L. J. P. van der Maaten and G. E. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605, 2008.