nips nips2005 nips2005-195 nips2005-195-reference knowledge-graph by maker-knowledge-mining

195 nips-2005-Transfer learning for text classification

Source: pdf

Author: Chuong B. Do, Andrew Y. Ng

Abstract: Linear text classiﬁcation algorithms work by computing an inner product between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parameters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in text classiﬁcation over the last few decades has consisted of manual efforts to identify better parameter functions. In this paper, we propose an algorithm for automatically learning this function from related classiﬁcation problems. The parameter function found by our algorithm then deﬁnes a new learning algorithm for text classiﬁcation, which we can apply to novel classiﬁcation tasks. We ﬁnd that our learned classiﬁer outperforms existing methods on a variety of multiclass text classiﬁcation tasks. 1

reference text

[1] K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classiﬁcation. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67, 1999.

[2] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, pages 137–142, 1998.

[3] A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classiﬁcation. In AAAI-98 Workshop on Learning for Text Categorization, 1998.

[4] G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 29(5):513–523, 1988.

[5] T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of ICML-97, pages 143–151, 1997.

[6] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of naive Bayes text classiﬁers. In ICML, pages 616–623, 2003.

[7] A. Moffat and J. Zobel. Exploring the similarity space. In ACM SIGIR Forum 32, 1998.

[8] C. Manning and H. Schutze. Foundations of statistical natural language processing, 1999.

[9] A. Ng and M. Jordan. On discriminative vs. generative classiﬁers: a comparison of logistic regression and naive Bayes. In NIPS 14, 2002.

[10] G. Kimeldorf and G. Wahba. Some results on Tchebychefﬁan spline functions. J. Math. Anal. Appl., 33:82–95, 1971.

[11] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.

[12] R. Rifkin and A. Klautau. In defense of one-vs-all classiﬁcation. J. Mach. Learn. Res., 5:101– 141, 2004.

[13] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res., 2:265–292, 2001.

[14] C-C. Chang and C-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

[15] T. Joachims. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.

[16] S. Thrun. Lifelong learning: A case study. CMU tech report CS-95-208, 1995.

[17] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[18] P. N. Bennett, S. T. Dumais, and E. Horvitz. Inductive transfer for text classiﬁcation using generalized reliability indicators. In Proceedings of ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.

[19] J. Teevan and D. R. Karger. Empirical development of an exponential probabilistic model for text retrieval: Using textual analysis to build a better model. In SIGIR ’03, 2003.

[20] S. Thrun and J. O’Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489–497, 1996.