emnlp emnlp2010 emnlp2010-85 emnlp2010-85-reference knowledge-graph by maker-knowledge-mining

85 emnlp-2010-Negative Training Data Can be Harmful to Text Classification

Source: pdf

Author: Xiao-Li Li ; Bing Liu ; See-Kiong Ng

Abstract: This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Traditional binary classification involves building a classifier using labeled positive and negative training examples. The classifier is then applied to classify test instances into positive and negative classes. A fundamental assumption is that the training and test data are identically distributed. However, this assumption may not hold in practice. In this paper, we study a particular problem where the positive data is identically distributed but the negative data may or may not be so. Many practical text classification and retrieval applications fit this model. We argue that in this setting negative training data should not be used, and that PU learning can be employed to solve the problem. Empirical evaluation has been con- ducted to support our claim. This result is important as it may fundamentally change the current binary classification paradigm.

reference text

Agirre E., Lacalle L.O. 2009. Supervised Domain Adaption for WSD. Proceedings of the 12th Conference of the European Chapter for Computational Linguistics (EACL09), pp 42-50. Andrew A., Nallapati R., Cohen W., 2008. Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition, ACL. Bickel, S., Bruckner, M., and Scheffer. 2009. T. Discriminative learning under covariate shift. Journal of Machine Learning Research. Bickel S. and Scheffer T. 2007. Dirichlet-enhanced spam filtering based on biased samples. In Advances in Neural Information Processing Systems. Bollmann, P.,& Cherniavsky, V. 1981 . Measurementtheoretical investigation of the mz-metric. Information Retrieval Research. Buckley, C., Salton, G., & Allan, J. 1994. The effect of adding relevance information in a relevance feedback environment, SIGIR. Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proc. of Computational Learning Theory, pp. 92–10. Chan Y. S., Ng H. T. 2007. Domain Adaptation with Active Learning for Word Sense Disambiguation, ACL. Dempster A., Laird N. and Rubin D.. 1977. Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Denis F., PAC learning from positive statistical queries. ALT, 1998. Denis F., Laurent A., Rémi G., Marc T. 2003. Text classification and co-training from positive and unlabeled examples. ICML. Denis, F, Rémi G, and Marc T. 2002. Text Classification from Positive and Unlabeled Examples. In Pro- ceedings of the 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Downey, D., Broadhead, M. and Etzioni, O. 2007. Locating complex named entities in Web Text. IJCAI. Dudik M., Schapire R., and Phillips S. 2005. Correcting sample selection bias in maximum entropy density estimation. In Advances in Neural Information Processing Systems. Elkan, C. and Noto, K. 2008. Learning classifiers from only positive and unlabeled data. KDD, 213-220. Goldwasser, D., Roth D. 2008. Active Sample Selection for Named Entity Transliteration, ACL. Heckman J. 1979. Sample selection bias as a specification error. Econometrica, 47: 153–161 . Huang J., Smola A., Gretton A., Borgwardt K., and Scholkopf B. 2007. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems. Jiang J. and Zhai C. X. 2007. Instance Weighting for Domain Adaptation in NLP, ACL. Lee, W. S. and Liu, B. 2003. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. ICML. Lewis D. 1995. A sequential algorithm for training text classifiers: corrigendum and additional data. SIGIR Forum, 13-19. Li, S., Zong C., 2008. Multi-Domain Sentiment Classification, ACL. Li, X., Liu, B. 2003. Learning to classify texts using positive and unlabeled data, IJCAI. Li, X., Liu, B., 2005. Learning from Positive and Unlabeled Examples with Different Data Distributions. ECML. Li, X., Liu, B., 2007. Learning to Identify Unexpected Instances in the Test Set. IJCAI. Li, X., Yu, P. S., Liu B., and Ng, S. 2009. Positive Unlabeled Learning for Data Stream Classification, SDM. Li, X., Zhang L., Liu B., and Ng, S. 2010. Distributional Similarity vs. PU Learning for Entity Set Expansion, ACL. Liu, B, Dai, Y., Li, X., Lee, W-S., and Yu. P. 2003. Building text classifiers using positive and unlabeled examples. ICDM, 179-188. Liu, B, Lee, W-S, Yu, P. S, and Li, X. 2002. Partially supervised text classification. ICML, 387-394. Nigam, K., McCallum, A., Thrun, S. and Mitchell, T. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3), 103–134. 228 Pan, S. J. and Yang, Q. 2009. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 99, No. 1. Rocchio, J. 1971 . Relevant feedback in information retrieval. In G. Salton (ed.). The smart retrieval system: experiments in automatic document processing, Englewood Cliffs, NJ, 1971.Sagae K., Tsujii J. 2008. Online Methods for Multi-Domain Learning and Adaptation, EMNLP. Salton G. and McGill M. J. 1986. Introduction to Modern Information Retrieval. Schölkop f B., Platt J.C., Shawe-Taylor J., Smola A.J., and Williamson R.C. 1999. Estimating the support of a high-dimensional distribution. Technical report, Microsoft Research, MSR-TR-99-87. Shimodaira H. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90:227–244. Sugiyama M. and Muller K.-R. 2005. Input-dependent estimation of generalization error under covariate shift. Statistics and Decision, 23(4):249–279. Sugiyama M., Nakajima S., Kashima H., von Bunau P., and Kawanabe M. 2008. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems. Tsuboi J., Kashima H., Hido S., Bickel S., and Sugiyama M. 2008. Direct density ratio estimation for large-scale covariate shift adaptation. In Proceedings of the SIAM International Conference on Data Mining, 2008. Wu D., Lee W.S., Ye N. and Chieu H. L. 2009. Domain adaptive bootstrapping for named entity recognition, ACL. Wu Q., Tan S. and Cheng X. 2009. Graph Ranking for Sentiment Transfer, ACL. Yang Q., Chen Y., Xue G., Dai W., Yu Y. 2009. Heterogeneous Transfer Learning for Image Clustering via the SocialWeb, ACL Yu, H., Han, J., K. Chang. 2002. PEBL: Positive example based learning for Web page classification using SVM. KDD, 239-248. Zadrozny B. 2004. Learning and evaluating classifiers under s ample selection bias, ICML. Zhou Z., Gao J., Soong F., Meng H. 2006. A Comparative Study of Discriminative Methods for Reranking LVCSR N-best Hypotheses in Domain Adaptation and Generalization. ICASSP.