acl acl2013 acl2013-191 acl2013-191-reference knowledge-graph by maker-knowledge-mining

191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

Source: pdf

Author: Jun Zhu ; Xun Zheng ; Bo Zhang

Abstract: Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency.

reference text

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. 2012. Scalable inference in latent variable models. In International Conference on Web Search and Data Mining (WSDM). D.M. Blei and J.D. McAuliffe. 2010. Supervised topic models. arXiv:1003.0783v1. D.M. Blei, A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. JMLR, 3:993–1022. M. Chen, J. Ibrahim, and C. Yiannoutsos. 1999. Prior elicitation, variable selection and Bayesian computation for logistic regression models. Journal of Royal Statistical Society, Ser. B, (61):223–242. P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. 2009. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), pages 353–360. A. Globerson, T. Koo, X. Carreras, and M. Collins. 2007. Exponentiated gradient algorithms for loglinear structured prediction. In ICML, pages 305– 312. J.E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. Powergraph: Distributed graphparallel computation on natural graphs. In the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI). T.L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proceedings of National Academy of Science (PNAS), pages 5228–5235. Y. Halpern, S. Horng, L. Nathanson, N. Shapiro, and D. Sontag. 2012. A comparison of dimensionality reduction techniques for unstructured clinical text. In ICML 2012 Workshop on Clinical Data Analysis. C. Holmes and L. Held. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1): 145–168. Q. Jiang, J. Zhu, M. Sun, and E.P. Xing. 2012. Monte Carlo methods for maximum margin supervised topic models. In Advances in Neural Information Processing Systems (NIPS). T. Joachims. 1999. Making large-scale SVM learning practical. MIT press. S. Lacoste-Jullien, F. Sha, and M.I. Jordan. 2009. DiscLDA: Discriminative learning for dimensionality reduction and classification. Advances in Neural Information Processing Systems (NIPS), pages 897– 904. Y. Lin. 2001 . A note on margin-based loss functions in classification. Technical Report No. 1044. University of Wisconsin. D. McAllester. 2003. PAC-Bayesian stochastic model selection. Machine Learning, 5 1:5–21 . M. Meyer and P. Laud. 2002. Predictive variable selection in generalized linear models. Journal of American Statistical Association, 97(459):859–871 . D. Newman, A. Asuncion, P. Smyth, and M. Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research (JMLR), (10): 1801–1828. N.G. Polson, J.G. Scott, and J. Windle. 2012. Bayesian inference for logistic models using Polya-Gamma latent variables. arXiv:1205.0310v1. R. Rifkin and A. Klautau. 2004. In defense of onevs-all classification. Journal of Machine Learning Research (JMLR), (5): 101–141 . L. Rosasco, E. De Vito, A. Caponnetto, M. Piana, and A. Verri. 2004. Are loss functions all the same? Neural Computation, (16): 1063–1076. A. Smola and S. Narayanamurthy. 2010. An architecture for parallel topic models. Very Large Data Base (VLDB), 3(1-2):703–710. M.A. Tanner and W.-H. Wong. 1987. The calculation of posterior distributions by data augmentation. Journal of the Americal Statistical Association (JASA), 82(398):528–540. D. van Dyk and X. Meng. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics (JCGS), 10(1):1–50. C. Wang, D.M. Blei, and Li F.F. 2009. Simultaneous image classification and annotation. IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR). J. Zhu, N. Chen, and E.P. Xing. 2011. Infinite latent SVM for classification and multi-task learning. In Advances in Neural Information Processing Systems (NIPS), pages 1620–1628. J. Zhu, A. Ahmed, and E.P. Xing. 2012. MedLDA: maximum margin supervised topic models. Journal of Machine Learning Research (JMLR), (13):2237– 2278. J. Zhu, N. Chen, H. Perkins, and B. Zhang. 2013a. Gibbs max-margin topic models with fast sampling algorithms. In International Conference on Machine Learning (ICML). J. Zhu, N. Chen, and E.P. Xing. 2013b. Bayesian inference with posterior regularization and applications to infinite latent svms. arXiv:1210.1766v2. 195