nips nips2010 nips2010-150 nips2010-150-reference knowledge-graph by maker-knowledge-mining

150 nips-2010-Learning concept graphs from text with stick-breaking priors


Source: pdf

Author: America Chambers, Padhraic Smyth, Mark Steyvers

Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1


reference text

[1] David Blei, Andrew Ng, and Michael Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[2] David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the Acm, 57, 2010.

[3] David Mimno, Wei Li, and Andrew McCallum. mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 21st Intl. Conf. on Machine Learning, 2007.

[4] Wei Li, David Blei, and Andrew McCallum. Nonparametric bayes pachinko allocation. In Proceedings of the Twenty-Third Annual Conference on Uncertainty in Artificial Intelligence (UAI-07), pages 243–250, 2007.

[5] Blaz Fortuna, Marko Grobelnki, and Dunja Mladenic. Ontogen: Semi-automatic ontology editor. In Proceedings of theHuman Computer Interaction International Conference, volume 4558, pages 309–318, 2007.

[6] S. Bloehdorn, P. Cimiano, and A. Hotho. Learning ontologies to improve text clustering and classification. In From Data and Inf. Analysis to Know. Eng.: Proc. of the 29th Annual Conf. the German Classification Society (GfKl ’05), volume 30 of Studies in Classification, Data Analysis and Know. Org., pages 334–341. Springer, Feb. 2005.

[7] P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text using formal concept analysis. J. Artificial Intelligence Research (JAIR), 24:305–339, 2005.

[8] Hemant Ishwaran and Lancelot F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, March 2001.

[9] Tom Griffiths and Mark Steyvers. Finding scientific topics. Proceedings of the Natl. Academy of the Sciences of the U.S.A., 101 Suppl 1:5228–5235, 2004.

[10] Ian Porteous, Alex Ihler, Padhraic Smyth, and Max Welling. Gibbs sampling for coupled infinite mixture models in the stick-breaking representation. In Proceedings of UAI 2006, pages 385–392, July 2006.

[11] Andrew Kachites McCallum. http://mallet.cs.umass.edu, 2002. Mallet: A machine learning for language toolkit.

[12] Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. Evaluation methods for topic models. In Proceedings of the 26th Intl. Conf. on Machine Learning (ICML 2009), 2009. 9