emnlp emnlp2012 emnlp2012-15 emnlp2012-15-reference knowledge-graph by maker-knowledge-mining

15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification


Source: pdf

Author: Shoushan Li ; Shengfeng Ju ; Guodong Zhou ; Xiaojun Li

Abstract: Active learning is a promising way for sentiment classification to reduce the annotation cost. In this paper, we focus on the imbalanced class distribution scenario for sentiment classification, wherein the number of positive samples is quite different from that of negative samples. This scenario posits new challenges to active learning. To address these challenges, we propose a novel active learning approach, named co-selecting, by taking both the imbalanced class distribution issue and uncertainty into account. Specifically, our co-selecting approach employs two feature subspace classifiers to collectively select most informative minority-class samples for manual annotation by leveraging a certainty measurement and an uncertainty measurement, and in the meanwhile, automatically label most informative majority-class samples, to reduce humanannotation efforts. Extensive experiments across four domains demonstrate great potential and effectiveness of our proposed co-selecting approach to active learning for imbalanced sentiment classification. 1


reference text

Attenberg J. and F. Provost. 2010. Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance. In Proceeding of KDD-10, 423-432. Blitzer J., M. Dredze and F. Pereira. 2007. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of ACL-07, 440-447. Cui H., V. Mittal, and M. Datar. 2006. Comparative Experiments on Sentiment Classification for Online Product Reviews. In Proceedings of AAAI-06, pp.1265-1270. Doyle S., J. Monaco, M. Feldman, J. Tomaszewski and A. Madabhushi. 2011. An Active Learning based Classification Strategy for the Minority Class Problem: Application to Histopathology Annotation. BMC Bioinformatics, 12: 424, 1471-2105. Ertekin S., J. Huang, L. Bottou and C. Giles. 2007a. Learning on the Border: Active Learning in 147 Imbalanced Data Classification. In Proceedings CIKM-07, 127-136. of Ertekin S., J. Huang, L. Bottou and C. Giles. 2007b. Active Learning in Class Imbalanced Problem. In Proceedings of SIGIR-07, 823-824. Freund Y., H. Seung, E. Shamir and N. Tishby. 1997. Selective Sampling using the Query by Committee algorithm. Machine Learning, 28(2-3), 133-168. He Y., C. Lin and H. Alani. 2011. Automatically Extracting Polarity-Bearing Topics for CrossDomain Sentiment Classification. In Proceeding of ACL-11, 123-131. Lewis D. and W. Gale. 1994. Training Text Classifiers by Uncertainty Sampling. In Proceedings of SIGIR94, 3-12. Li F., Y. Tang, M. Huang and X. Zhu. 2009. Answering Opinion Questions with Random Walks on Graphs. In Proceedings of ACL-IJCNLP-09, 737-745. Li S. and C. Zong. 2008. Multi-domain Sentiment Classification. In Proceedings of ACL-08, short paper, pp.257-260. Li S., C. Huang, G. Zhou and S. Lee. 2010. Employing Personal/Impersonal Views in Supervised and Semisupervised Sentiment Classification. In Proceedings of ACL-10, pp.414-423. Li S., Z. Wang, G. Zhou and S. Lee. 2011a. Semisupervised Learning for Imbalanced Sentiment Classification. In Proceeding of IJCAI-11, 826-183 1. Li S., G. Zhou, Z. Wang, S. Lee and R. Wang. 2011b. Imbalanced Sentiment Classification. In Proceedings of CIKM-11, poster paper, 2469-2472. Lloret E., A. Balahur, M. Palomar, and A. Montoyo. 2009. Towards Building a Competitive Opinion Summarization System. In Proceedings of NAACL09 Student Research Workshop and Doctoral Consortium, 72-77. Kubat M. and S. Matwin. 1997. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of ICML-97, 179–186. Muslea I., S. Minton and C. Knoblock . 2006. Active Learning with Multiple Views. Journal of Artificial Intelligence Research, vol.27, 203-233. Pang B. and L. Lee. 2008. Opinion Mining and Sentiment Analysis: Foundations and Trends. Information Retrieval, vol.2(12), 1-135. Pang B., L. Lee and S. Vaithyanathan. 2002.Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of EMNLP-02, 79-86. 2009. Active Learning Literature Survey. Sciences Technical Report 1648, University of Wisconsin, Madison, 2009. Turney P. 2002. Thumbs up or Thumbs down? Semantic Orientation Applied to Unsupervised Settles B. Computer Classification of reviews. In Proceedings of ACL-02, 417-424. Wan X. 2009. Co-Training for Cross-Lingual Sentiment Classification. In Proceedings of ACL-IJCNLP-09, 235–243. Yang Y. and G. Ma. 2010. Ensemble-based Active Learning for Class Imbalance Problem. J. Biomedical Science and Engineering, vol.3,1021-1028. Zhang M. and X. Ye. 2008. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval. In Proceedings of SIGIR-08, 411-418. Zhu J. and E. Hovy. 2007. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proceedings of ACL-07, 783-793. 148