nips nips2002 nips2002-162 nips2002-162-reference knowledge-graph by maker-knowledge-mining

162 nips-2002-Parametric Mixture Models for Multi-Labeled Text


Source: pdf

Author: Naonori Ueda, Kazumi Saito

Abstract: We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of text using PMMs. We derive efficient learning and prediction algorithms for PMMs. We also empirically show that our method could significantly outperform the conventional binary methods when applied to multi-labeled text categorization using real World Wide Web pages. 1


reference text

[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. to appear Advances in Neural Information Processing Systems 14. MIT Press.

[2] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38. 1977.

[3] S. T. Dumais, J. Platt, D. Heckerman, & M. Sahami. Inductive learning algorithms and representations for text categorization. In Proc. of ACM-CIKM’98, 1998.

[4] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proc. of the European Conference on Machine Learning, 137-142, Berlin, 1998.

[5] D. Lewis & M. Ringuette. A comparison of two learning algorithms for text categorization. In Third Anual Symposium on Document Analysis and Information Retrieval, 81-93. 1994.

[6] K. Morik, P. Brockhausen, and T. Joachims. Combining statistical learning with knowledge-based approach. A case study in intensive care monitoring. In Proc. of International Conference on Machine Learning (ICML’99), 1999.

[7] K. Nigam, A. K. McCallum, S. Thrun, & T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103-134, 2000.

[8] Y. Yang & J. Pederson. A comparative study on feature selection in text categorization. In Proc of International Conference on Machine Learning, 412-420, 1997.

[9] V. N. Vapnik. Statistical learning theory. John Wiley & Sons, Inc., New York. 1998.