nips nips2005 nips2005-195 nips2005-195-reference knowledge-graph by maker-knowledge-mining

195 nips-2005-Transfer learning for text classification


Source: pdf

Author: Chuong B. Do, Andrew Y. Ng

Abstract: Linear text classification algorithms work by computing an inner product between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parameters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in text classification over the last few decades has consisted of manual efforts to identify better parameter functions. In this paper, we propose an algorithm for automatically learning this function from related classification problems. The parameter function found by our algorithm then defines a new learning algorithm for text classification, which we can apply to novel classification tasks. We find that our learned classifier outperforms existing methods on a variety of multiclass text classification tasks. 1


reference text

[1] K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61–67, 1999.

[2] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, pages 137–142, 1998.

[3] A. McCallum and K. Nigam. A comparison of event models for Naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.

[4] G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 29(5):513–523, 1988.

[5] T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proceedings of ICML-97, pages 143–151, 1997.

[6] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In ICML, pages 616–623, 2003.

[7] A. Moffat and J. Zobel. Exploring the similarity space. In ACM SIGIR Forum 32, 1998.

[8] C. Manning and H. Schutze. Foundations of statistical natural language processing, 1999.

[9] A. Ng and M. Jordan. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In NIPS 14, 2002.

[10] G. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. J. Math. Anal. Appl., 33:82–95, 1971.

[11] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.

[12] R. Rifkin and A. Klautau. In defense of one-vs-all classification. J. Mach. Learn. Res., 5:101– 141, 2004.

[13] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res., 2:265–292, 2001.

[14] C-C. Chang and C-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

[15] T. Joachims. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.

[16] S. Thrun. Lifelong learning: A case study. CMU tech report CS-95-208, 1995.

[17] R. Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

[18] P. N. Bennett, S. T. Dumais, and E. Horvitz. Inductive transfer for text classification using generalized reliability indicators. In Proceedings of ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.

[19] J. Teevan and D. R. Karger. Empirical development of an exponential probabilistic model for text retrieval: Using textual analysis to build a better model. In SIGIR ’03, 2003.

[20] S. Thrun and J. O’Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In International Conference on Machine Learning, pages 489–497, 1996.