emnlp emnlp2012 emnlp2012-90 emnlp2012-90-reference knowledge-graph by maker-knowledge-mining

90 emnlp-2012-Modelling Sequential Text with an Adaptive Topic Model


Source: pdf

Author: Lan Du ; Wray Buntine ; Huidong Jin

Abstract: Topic models are increasingly being used for text analysis tasks, often times replacing earlier semantic techniques such as latent semantic analysis. In this paper, we develop a novel adaptive topic model with the ability to adapt topics from both the previous segment and the parent document. For this proposed model, a Gibbs sampler is developed for doing posterior inference. Experimental results show that with topic adaptation, our model significantly improves over existing approaches in terms of perplexity, and is able to uncover clear sequential structure on, for example, Herman Melville’s book “Moby Dick”.


reference text

R. Arora and B. Ravindran. 2008. Latent Dirichlet allocation and singular value decomposition based multidocument summarization. In ICDM ’08: Proc. of 2008 Eighth IEEE Inter. Conf. on Data Mining, pages 713–718. R. Barzilay and L. Lee. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In HLT-NAACL 2004: Main Proceedings, pages 113–120. Association for Computational Linguistics. D.M. Blei and J.D. Lafferty. 2006. Dynamic topic models. In ICML ’06: Proc. of 23rd international conference on Machine learning, pages 113–120. D.M. Blei and P.J. Moreno. 2001. Topic segmentation with an aspect hidden Markov model. In Proc. of 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 343–348. D.M. Blei, A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022. W. Buntine and M. Hutter. 2012. A Bayesian view of the Poisson-Dirichlet process. Technical Report arXiv: 1007.0296v2, ArXiv, Cornell, February. H. Chen, S.R.K. Branavan, R. Barzilay, and D.R. Karger. 2009. Global models of document structure using latent permutations. In Proceedings of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, pages 371–379, Stroudsburg, PA, USA. Association for Computational Linguistics. C. Chen, L. Du, and W. Buntine. 2011. Sampling for the Poisson-Dirichlet process. In European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Database, pages 296–3 11. S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407. L. Du, W. Buntine, and H. Jin. 2010. A segmented topic model based on the two-parameter Poisson-Dirichlet process. Machine Learning, 81:5–19. L. Du, W. Buntine, H. Jin, and C. Chen. 2012. Sequential latent dirichlet allocation. Knowledge andInformation Systems, 31(3):475–503. J. Eisenstein and R. Barzilay. 2008. Bayesian unsupervised topic segmentation. In Proc. of Conf. on Empirical Methods in Natural Language Processing, pages 334–343. Association for Computational Linguistics. T.L. Griffiths, M. Steyvers, D.M. Blei, and J.B. Tenenbaum. 2005. Integrating topics and syntax. In Advances in Neural Information Processing Systems 1 7, pages 537–544. 545 A. Gruber, Y. Weiss, and M. Rosen-Zvi. 2007. Hidden topic markov models. Journal of Machine Learning Research - Proceedings Track, 2: 163–170. E.A. Hardisty, J. Boyd-Graber, and P. Resnik. 2010. Modeling perspective using adaptor grammars. In Proc. of the 2010 Conf. on Empirical Methods in Natural Language Processing, pages 284–292, Stroudsburg, PA, USA. Association for Computational Linguistics. M. Johnson. 2010. PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proc. of 48th Annual Meeting of the ACL, pages 1148–1 157, Uppsala, Sweden, July. Association for Computational Linguistics. H. Misra, F. Yvon, O. Capp, and J. Jose. 2011. Text segmentation: A topic modeling perspective. Information Processing & Management, 47(4):528–544. D. Newman, J.H. Lau, K. Grieser, and T. Baldwin. 2010. Automatic evaluation of topic coherence. In North American Chapter of the Association for Computational Linguistics - Human Language Technologies, pages 100–108. D. Newman, E.V. Bonilla, and W. Buntine. 2011. Improving topic coherence with regularized topic models. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 496–504. C.P. Robert and G. Casella. 2004. Monte Carlo statistical methods. Springer. second edition. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. 2004. The author-topic model for authors and documents. In Proc. of 20th conference on Uncertainty in Artificial Intelligence, pages 487–494. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2006. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581. Y. W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. of 21st Inter. Conf. on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 985–992. H. Wallach, D. Mimno, and A. McCallum. 2009. Rethinking LDA: Why priors matter. In Advances in Neural Information Processing Systems 19. H. Wang, D. Zhang, and C. Zhai. 2011. Structural topic model for latent topical structure analysis. In Proc. of 49th Annual Meeting of the Association for Compu- tational Linguistics: Human Language Technologies Volume 1, pages 1526–1535, Stroudsburg, PA, USA. Association for Computational Linguistics.