acl acl2013 acl2013-379 acl2013-379-reference knowledge-graph by maker-knowledge-mining

379 acl-2013-Utterance-Level Multimodal Sentiment Analysis


Source: pdf

Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency

Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.


reference text

C. Alm, D. Roth, and R. Sproat. 2005. Emotions from text: Machine learning for text-based emotion prediction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 347–354, Vancouver, Canada. C. Anagnostopoulos and E. Vovoli. 2010. Sound processing features for speaker-dependent and phraseindependent emotion recognition in berlin database. In Information Systems Development, pages 413– 421. Springer. A. Athar and S. Teufel. 2012. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montr ´eal, Canada, June. P. K. Atrey, M. A. Hossain, A. El Saddik, and M. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia Systems, 16. M. El Ayadi, M. Kamel, and F. Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3):572 – 587. K. Balog, G. Mishne, and M. de Rijke. 2006. Why are they excited? identifying and explaining spikes in blog mood levels. In Proceedings of the 11th Meeting of the European Chapter of the As sociation for Computational Linguistics (EACL-2006). Dmitri Bitouk, Ragini Verma, and Ani Nenkova. 2010. Class-level spectral features for emotion recognition. Speech Commun., 52(7-8):613–625, July. 980 J. Blitzer, M. Dredze, and F. Pereira. 2007. Biogra- phies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Association for Computational Linguistics. A. J. Calder, A. M. Burton, P. Miller, A. W. Young, and S. Akamatsu. 2001 . A principal component analysis of facial expressions. Vision research, 41(9): 1179– 1208, April. G. Carenini, R. Ng, and X. Zhou. 2008. Summarizing emails with conversational cohesion and subjectivity. In Proceedings ofthe Associationfor Computational Linguistics: Human Language Technologies (ACLHLT 2008), Columbus, Ohio. P. Carvalho, L. Sarmento, J. Teixeira, and M. Silva. 2011. Liars and saviors in a sentiment annotated corpus of comments to political debates. In Proceedings of the Association for Computational Linguistics (ACL 2011), Portland, OR. L. S. Chen, T. S. Huang, T. Miyasato, and R. Nakatsu. 1998. Multimodal human emotion/expression recognition. In Proceedings of the 3rd. International Conference on Face & Gesture Recognition, pages 366–, Washington, DC, USA. IEEE Computer Society. L C De Silva, T Miyasato, and R Nakatsu, 1997. Facial emotion recognition using multi-modal information, volume 1, page 397401. IEEE Signal Processing Society. P. Ekman, W. Friesen, and J. Hager. 2002. Facial action coding system. P. Ekman. 1993. Facial expression of emotion. American Psychologist, 48:384–392. I.A. Essa and A.P. Pentland. 1997. Coding, analysis, interpretation, and recognition of facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(7):757 –763, jul. A. Esuli and F. Sebastiani. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 2006), Genova, IT. D.L. Hall and J. Llinas. 1997. An introduction to multisensor fusion. IEEE Special Issue on Data Fusion, 85(1). S. Haq and P. Jackson. 2009. Speaker-dependent audio-visual emotion recognition. In International Conference on Audio-Visual Speech Processing. V. Hatzivassiloglou and K. McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, pages 174–181 . M. Hu and B. Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, Washington. F. Li, S. J. Pan, O. Jin, Q. Yang, and X. Zhu. 2012. Cross-domain co-extraction of sentiment and topic lexicons. In Proceedings of the 50th Annual Meeting ofthe Associationfor Computational Linguistics, Jeju Island, Korea. G. Littlewort, J. Whitehill, Tingfan Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett. 2011. The computer expression recognition toolbox (cert). In Automatic Face Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 298 –305, march. A. Maas, R. Daly, P. Pham, D. Huang, A. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the Association for Computational Linguistics (ACL 2011), Portland, OR. F. Mairesse, J. Polifroni, and G. Di Fabbrizio. 2012. Can prosody inform sentiment analysis? experiments on short spoken reviews. InAcoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 5093 –5096, march. X. Meng, F. Wei, X. Liu, M. Zhou, G. Xu, and H. Wang. 2012. Cross-lingual mixture model for sentiment classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea. F. Metze, T. Polzehl, and M. Wagner. 2009. Fusion of acoustic and linguistic features for emotion detection. In Semantic Computing, 2009. ICSC ’09. IEEE International Conference on, pages 153 –160, sept. R. Mihalcea, C. Banea, and J. Wiebe. 2007. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the Association for Computational Linguistics, Prague, Czech Republic. L.P. Morency, R. Mihalcea, and P. Doshi. 2011. Towards multimodal sentiment analysis: Harvesting opinions from the web. In Proceedings of the International Conference on Multimodal Computing, Alicante, Spain. J. Oh, K. Torisawa, C. Hashimoto, T. Kawada, S. De Saeger, J. Kazama, and Y. Wang. 2012. Why question answering using sentiment analysis and word classes. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. B. Pang and L. Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, Barcelona, Spain, July. 981 V. Perez-Rosas, R. Mihalcea, and L.-P. Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intelligent Systems. T. Polzin and A. Waibel. 1996. Recognizing emotions in speech. In In ICSLP. S. Raaijmakers, K. Truong, and T. Wilson. 2008. Multimodal subjectivity analysis of multiparty conversation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 466–474, Honolulu, Hawaii. M. Rosenblum, Y. Yacoob, and L.S. Davis. 1996. Human expression recognition from motion using a ra- dial basis function network architecture. Neural Networks, IEEE Transactions on, 7(5): 1121 –1 138, sep. B. Schuller, M. Valstar, R. Cowie, and M. Pantic, editors. 2011a. Audio/Visual Emotion Challenge and Workshop (AVEC 2011). B. Schuller, M. Valstar, F. Eyben, R. Cowie, and M. Pantic, editors. 2011b. Audio/Visual Emotion Challenge and Workshop (AVEC 2011). F. Eyben M. Wollmer B. Schuller. 2009. Openear introducing the munich open-source emotion and affect recognition toolkit. In ACII. N. Sebe, I. Cohen, T. Gevers, and T.S. Huang. 2006. Emotion recognition based on joint visual and audio cues. In ICPR. D. Silva, T. Miyasato, and R. Nakatsu. 1997. Facial emotion recognition using multi-modal information. In Proceedings of the International Conference on Information and Communications Security. S. Somasundaran, J. Wiebe, P. Hoffmann, and D. Litman. 2006. Manual annotation of opinion categories in meetings. In Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006. P. Stone. 1968. General Inquirer: Computer Approach to Content Analysis. MIT Press. C. Strapparava and R. Mihalcea. 2007. Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on the Semantic Evaluations (SemEval 2007), Prague, Czech Republic. M. Taboada, J. Brooke, M. Tofiloski, K. Voli, and M. Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(3). R. Tato, R. Santos, R. Kompe, and J. M. Pardo. 2002. Emotional space improves emotion recognition. In In Proc. ICSLP 2002, pages 2029–2032. Y.-I. Tian, T. Kanade, and J.F. Cohn. 2001. Recognizing action units for facial expression analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):97 –1 15, feb. P. Turney. 2002. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pages 417–424, Philadelphia. D. Ververidis and C. Kotropoulos. 2006. Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9): 1162–1 181, September. J. Wagner, E. Andre, F. Lingenfelser, and Jonghwa Kim. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. Affective Computing, IEEE Transactions on, 2(4):206 –218, oct.-dec. X. Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference oftheAssociation ofComputational Linguistics and the International Joint Conference on Natural Language Processing, Singapore, August. J. Wiebe and E. Riloff. 2005. Creating subjective and objective sentence classifiers from unannotated texts. In Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005) (invited paper), Mexico City, Mexico. J. Wiebe, T. Wilson, and C. Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2-3): 165– 210. M. Wiegand and D. Klakow. 2009. The role of knowledge-based features in polarity classification at sentence level. In Proceedings of the International Conference of the Florida Artificial Intelli- gence Research Society. T. Wilson, J. Wiebe, and R. Hwa. 2004. Just how mad are you? finding strong and weak opinion clauses. In Proceedings of the American Association for Artificial Intelligence. M. Wollmer, B. Schuller, F. Eyben, and G. Rigoll. 2010. Combining long short-term memory and dynamic bayesian networks for incremental emotionsensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing, 4(5), October. B. Yang and C. Cardie. 2012. Extracting opinion expressions with semi-markov conditional random fields. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. Z. Zhihong, M. Pantic G.I. Roisman, and T.S. Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. PAMI, 31(1). 982