acl acl2010 acl2010-256 acl2010-256-reference knowledge-graph by maker-knowledge-mining

256 acl-2010-Vocabulary Choice as an Indicator of Perspective

Source: pdf

Author: Beata Beigman Klebanov ; Eyal Beigman ; Daniel Diermeier

Abstract: We establish the following characteristics of the task of perspective classification: (a) using term frequencies in a document does not improve classification achieved with absence/presence features; (b) for datasets allowing the relevant comparisons, a small number of top features is found to be as effective as the full feature set and indispensable for the best achieved performance, testifying to the existence of perspective-specific keywords. We relate our findings to research on word frequency distributions and to discourse analytic studies of perspective.

reference text

Herald Baayen. 2001 . Word frequency distributions. Dordrecht: Kluwer. Marco Baroni and Stefan Evert. 2007. Words and Echoes: Assessing and Mitigating the NonRandomness Problem in Word Frequency Distribution Modeling. In Proceedings of the ACL, pages 904–91 1, Prague, Czech Republic. Suma Bhat and Richard Sproat. 2009. Knowing the Unseen: Estimating Vocabulary Size over Unseen Samples. In Proceedings of the ACL, pages 109– 117, Suntec, Singapore, August. Stephan Greene and Philip Resnik. 2009. More than Words: Syntactic Packaging and Implicit Sentiment. In Proceedings of HLT-NAACL, pages 503– 511, Boulder, CO, June. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringe, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations, 11(1). Marti Hearst. 1997. TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages. Computational Linguistics, 23(1):33–64. Michael Hoey. 1991 . Patterns of Lexis in Text. Oxford University Press. Martin Jansche. 2003. Parametric Models of Linguistic Count Data. In Proceedings of the ACL, pages 288–295, Sapporo, Japan, July. Thorsten Joachims. 1999. Making large-scale SVM learning practical. In B. Schlkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods Support Vector Learning. MIT Press. Wei-Hao Lin and Alexander Hauptmann. 2006. Are these documents written from different perspec- tives? A test of different perspectives based on statistical distribution divergence. In Proceedings of the ACL, pages 1057–1064, Morristown, NJ, USA. Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, and Alexander Hauptmann. 2006. Which side are you on? Identifying perspectives at the document and sentence levels. In Proceedings of CoNLL, pages 109–1 16, Morristown, NJ, USA. Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for Naive Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization, pages 41–48, Madison, WI, July. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of EMNLP, Philadelphia, PA, July. Wolfgang Teubert. 2001. A Province of a Federal Superstate, Ruled by an Unelected Bureaucracy Keywords of the Euro-Sceptic Discourse in Britain. In Andreas Musolff, Colin Good, Petra Points, and Ruth Wittlinger, editors, Attitudes towards Europe: Language in the unification process, pages 45–86. Ashgate Publishing Ltd, Hants, England. – Ian H. Witten and Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2 edition. Bei Yu, Stefan Kaufmann, and Daniel Diermeier. 2008. Classifying party affiliation from political speech. Journal of Information Technology and Politics, 5(1):33–48. 257