acl acl2011 acl2011-289 acl2011-289-reference knowledge-graph by maker-knowledge-mining

289 acl-2011-Subjectivity and Sentiment Analysis of Modern Standard Arabic

Source: pdf

Author: Muhammad Abdul-Mageed ; Mona Diab ; Mohammed Korayem

Abstract: Although Subjectivity and Sentiment Analysis (SSA) has been witnessing a flurry of novel research, there are few attempts to build SSA systems for Morphologically-Rich Languages (MRL). In the current study, we report efforts to partially fill this gap. We present a newly developed manually annotated corpus ofModern Standard Arabic (MSA) together with a new polarity lexicon.The corpus is a collection of newswire documents annotated on the sentence level. We also describe an automatic SSA tagging system that exploits the annotated data. We investigate the impact of different levels ofpreprocessing settings on the SSA classification task. We show that by explicitly accounting for the rich morphology the system is able to achieve significantly higher levels of performance.

reference text

A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. , 26: 1–34. M. Abdul-Mageed. 2008. Online News Sites and Journalism 2.0: Reader Comments on Al Jazeera Arabic. tripleC-Cognition, Communication, Cooperation, 6(2):59. A. Banfield. 1982. Unspeakable Sentences: Narration and Representation in the Language of Fiction. Routledge Kegan Paul, Boston. R. Bruce and J. Wiebe. 1999. Recognizing subjectivity. a case study of manual tagging. Natural Language Engineering, 5(2). T. Joachims. 2008. Svmlight: Support vector machine. http://svmlight.joachims.org/, Cornell University, 2008. S. Kim and E. Hovy. 2004. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics, pages 1367–1373. M. Maamouri, A. Bies, T. Buckwalter, and W. Mekki. 2004. The penn arabic treebank: Building a largescale annotated arabic corpus. In NEMLAR Conference on Arabic Language Resources and Tools, pages 102–109. R. Tsarfaty, D. Seddah, Y. Goldberg, S. Kuebler, Y. Versley, M. Candito, J. Foster, I. Rehbein, and L. Tounsi. 2010. Statistical parsing of morphologically rich languages (spmrl) what, how and whither. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, Los Angeles, CA. J. Wiebe, R. Bruce, and T. O’Hara. 1999. Development and use of a gold standard data set for subjectivity classifications. In Proc. 37th Annual Meeting of the Assoc. for Computational Linguistics (ACL-99), pages 246– 253, University of Maryland: ACL. J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin. 2004. Learning subjective language. Computational linguistics, 30(3):277–308. J. Wiebe. 1994. Tracking point of view in narrative. Computional Linguistics, 20(2):233–287. T. Wilson, J. Wiebe, and P. Hoffmann. 2009. Recognizing Contextual Polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3):399–433. J. Yi, T. Nasukawa, R. Bunescu, and W. Niblack. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining, pages 427–434. H. Yu and V. Hatzivassiloglou. 2003. The penn arabic treebank: Building a large-scale annotated arabic corpus. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 129– 136. 591