jmlr jmlr2012 jmlr2012-65 jmlr2012-65-reference knowledge-graph by maker-knowledge-mining

65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models

Source: pdf

Author: Jun Zhu, Amr Ahmed, Eric P. Xing

Abstract: A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploited for seeking predictive representations of data and more discriminative topic bases for the corpus. In this paper, we propose the maximum entropy discrimination latent Dirichlet allocation (MedLDA) model, which integrates the mechanism behind the max-margin prediction models (e.g., SVMs) with the mechanism behind the hierarchical Bayesian topic models (e.g., LDA) under a uniﬁed constrained optimization framework, and yields latent topical representations that are more discriminative and more suitable for prediction tasks such as document classiﬁcation or regression. The principle underlying the MedLDA formalism is quite general and can be applied for jointly max-margin and maximum likelihood learning of directed or undirected topic models when supervising side information is available. Efﬁcient variational methods for posterior inference and parameter estimation are derived and extensive empirical studies on several real data sets are also provided. Our experimental results demonstrate qualitatively and quantitatively that MedLDA could: 1) discover sparse and highly discriminative topical representations; 2) achieve state of the art prediction performance; and 3) be more efﬁcient than existing supervised topic models, especially for classiﬁcation. Keywords: supervised topic models, max-margin learning, maximum entropy discrimination, latent Dirichlet allocation, support vector machines

reference text

Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, (9):1981–2014, 2008. David Blei and John Lafferty. Correlated topic models. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, o editors, Advances in Neural Information Processing Systems (NIPS), pages 147–154, Cambridge, MA, 2005. MIT Press. David Blei and Jon D. McAuliffe. Supervised topic models. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems (NIPS), pages 121–128, Cambridge, MA, 2007. MIT Press. David Blei, Andrew Ng, and Michael Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, (3):993–1022, 2003. Gal Chechik and Naftali Tishby. Extracting relevant structures with side information. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems (NIPS), pages 857–864, Cambridge, MA, 2002. MIT Press. Ning Chen, Jun Zhu, and Eric P. Xing. Predictive subspace learning for multi-view data: a large margin approach. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS), pages 361–369, 2010. Koby Crammer and Yoram Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, (2):265–292, 2001. Paul Damien and Stephen G. Walker. Sampling truncated Normal, Beta, and Gamma densities. Journal of Computational and Graphical Statistics, 10(2):206–215, 2001. Li Fei-Fei and Pietro Perona. A Bayesian hierarchical model for learning natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 524–531, San Diego, CA, 2005. Pedro Felzenszwalb, Ross Girshick, David McAllester, and Deva Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627 – 1645, 2010. Thomas L. Grifﬁths and Mark Steyvers. Finding scientiﬁc topics. Proceedings of the National Academy of Sciences, (101):5228–5235, 2004. William E. Grifﬁths. A Gibbs sampler for the parameters of a truncated multivariate normal distribution. No 856, Department of Economics, University of Melbourne, 2002. Amit Gruber, Michal Rosen-Zvi, and Yair Weiss. Hidden topic Markov models. In International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pages 163–170, San Juan, Puerto Rico, 2007. Xuming He and Richard S. Zemel. Learning hybrid models for image annotation with partially labeled data. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 625–632, 2008. 2275 Z HU , A HMED AND X ING Tommi Jaakkola, Marina Meila, and Tony Jebara. Maximum entropy discrimination. In Advances in Neural Information Processing Systems (NIPS), pages 470–476, Denver, Colorado, 1999. Tony Jebara. Discriminative, Generative and Imitative Learning. PhD thesis, Media Laboratory, MIT, Dec 2001. Thorsten Joachims. Making large-scale SVM learning practical. Advances in Kernel Methods– Support Vector Learning, MIT-Press, 1999. Thorsten Joachims, Thomas Finley, and Chun-Nam Yu. Cutting-plane training of structural SVMs. Machine Learning Journal, 77(1):27–59, 2009. Michael I. Jordan, Zoubin Ghahramani, Tommis Jaakkola, and Lawrence K. Saul. An Introduction to Variational Methods for Graphical Models. M. I. Jordan (Ed.), Learning in Graphical Models, Cambridge: MIT Press, Cambridge, MA, 1999. Simon Lacoste-Julien. Discriminative Machine Learning with Structure. PhD thesis, EECS Department, University of California, Berkeley, Jan 2009. URL http://www.eecs.berkeley.edu/ Pubs/TechRpts/2010/EECS-2010-4.html. Simon Lacoste-Julien, Fei Sha, and Michael I. Jordan. DiscLDA: Discriminative learning for dimensionality reduction and classiﬁcation. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 897–904, 2008. Dingcheng Li, Swapna Somasundaran, and Amit Chakraborty. A combination of topic models with max-margin learning for relation detection. In ACL TextGraphs-6 Workshop, 2011. Li-Jia Li, Richard Socher, and Fei-Fei Li. Towards total scene understanding: Classiﬁcation, annotation and segmentation in an automatic framework. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 2036–2043, Miami, Florida, 2009. David Mimno and Andrew McCallum. Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. In International Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 411–418, Corvallis, Oregon, 2008. Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceddings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 115–124, Ann Arbor, Michigan, 2005. Dmitry Pavlov, Alexandrin Popescul, David M. Pennock, and Lyle H. Ungar. Mixtures of conditional maximum entropy models. In International Conference on Machine Learning (ICML), Washington, DC USA, 2003. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceddings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 248–256, Singapore, 2009. Gabriel Rodriguez-Yam, Richard Davis, and Louis Scharf. Efﬁcient Gibbs sampling of truncated multivariate normal with application to constrained linear regression. Technical Report, Department of Statistics, Columbia University, 2004. 2276 M ED LDA: M AXIMUM M ARGIN S UPERVISED T OPIC M ODELS Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1-3):157–173, 2008. Ruslan Salakhutdinov and Geoffrey Hinton. Replicated softmax: an undirected topic model. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS), pages 1607–1614, Vancouver, B.C., Canada, 2009. Edward Schoﬁeld. Fitting Maximum-Entropy Models on Large Sample Spaces. PhD thesis, Department of Computing, Imperial College London, Jan 2007. Alex J. Smola and Bernhard Sch¨ lkopf. A tutorial on support vector regression. Statistics and o Computing, 14(3):199–222, 2003. Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Ng. Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In Proceddings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 254–263, Honolulu, Hawaii, 2008. Erik Sudderth, Antonio Torralba, William Freeman, and Alan Willsky. Learning hierarchical models of scenes, objects, and parts. In IEEE International Conference on Computer Vision (ICCV), pages 1331–1338, Beijing, China, 2005. Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin Markov networks. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, editors, Advances in Neural Information Proo cessing Systems (NIPS), Cambridge, MA, 2003. MIT Press. Yee Whye Teh, David Newman, and Max Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, o Advances in Neural Information Processing Systems (NIPS), pages 1353–1360, Cambridge, MA, 2006. MIT Press. Ivan Titov and Ryan McDonald. A joint model of text and aspect ratings for sentiment summarization. In Proceddings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 308–316, Columbus, Ohio, 2008. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, (9):2579–2605, 2008. Vladimir Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. Chong Wang, David Blei, and Li Fei-Fei. Simultaneous image classiﬁcation and annotation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 1903–1910, Miami, Florida, 2009. Yang Wang and G. Mori. Max-margin latent Dirichlet allocation for image classiﬁcation and annotation. In British Machine Vision Conference (BMVC), 2011. 2277 Z HU , A HMED AND X ING Max Welling, Michal Rosen-Zvi, and Geoffrey Hinton. Exponential family harmoniums with an application to information retrieval. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, e Advances in Neural Information Processing Systems (NIPS), pages 1481–1488, Cambridge, MA, 2004. MIT Press. Eric P. Xing, Rong Yan, and Alexander G. Hauptmann. Mining associated text and images with dual-wing Harmoniums. In International Conference on Uncertainty in Artiﬁcal Intelligence (UAI), pages 633–641, Arlington, Virginia, 2005. Shuanghong Yang, Jiang Bian, and Hongyuan Zha. Hybrid generative/discriminative learning for automatic image annotation. In International Conference on Uncertainty in Artiﬁcal Intelligence (UAI), pages 683–690, Corvallis, Oregon, 2010. Chun-Nam Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In L´ on e Bottou and Michael Littman, editors, International Conference on Machine Learning (ICML), pages 1169–1176, Montreal, 2009. Bing Zhao and Eric P. Xing. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems (NIPS), pages 1689–1696, Cambridge, MA, 2007. MIT Press. Jun Zhu and Eric P. Xing. Maximum entropy discrimination Markov networks. Journal of Machine Learning Research, (10):2531–2569, 2009. Jun Zhu and Eric P. Xing. Conditional topic random ﬁelds. In J. F¨ rnkranz and T. Joachims, editors, u International Conference on Machine Learning (ICML), pages 1239–1246, Haifa, Israel, 2010. Jun Zhu, Eric P. Xing, and Bo Zhang. Partially observed maximum entropy discrimination Markov networks. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 1977–1984, 2008. Jun Zhu, Amr Ahmed, and Eric P. Xing. MedLDA: Maximum margin supervised topic models for regression and classiﬁcation. In L´ on Bottou and Michael Littman, editors, International e Conference on Machine Learning (ICML), pages 1257–1264, Montreal, 2009. Jun Zhu, Li-Jia Li, Li Fei-Fei, and Eric P. Xing. Large margin training of upstream scene understanding models. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS), pages 2586–2594, 2010. Jun Zhu, Ning Chen, and Eric P. Xing. Inﬁnite SVM: a Dirichlet process mixture of large-margin kernel machines. In L. Getoor and T. Scheffer, editors, International Conference on Machine Learning (ICML), pages 617–624, Bellevue, Washington, USA, 2011a. Jun Zhu, Ning Chen, and Eric P. Xing. Inﬁnite latent SVM for classiﬁcation and multi-task learning. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems (NIPS), pages 1620–1628, 2011b. 2278