acl acl2013 acl2013-379 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency
Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
Reference: text
sentIndex sentText sentNum sentScore
1 With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. [sent-8, score-0.728]
2 This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. [sent-9, score-1.201]
3 Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10. [sent-10, score-1.883]
4 Popular web platforms such as YouTube, Amazon, Facebook, and ExpoTV have reported a significant increase in the number of consumer reviews in video format over the past five years. [sent-13, score-0.288]
5 Compared to traditional text reviews, video reviews provide a more natural experience as they allow the viewer to better sense the reviewer’s emotions, beliefs, and intentions through richer channels such as intonations, facial expressions, and body language. [sent-14, score-0.543]
6 , podcasts), the ability to address the identification of opinions in the presence of diverse modalities is becoming increasingly important. [sent-23, score-0.324]
7 This has motivated researchers to start exploring multimodal clues for the detection of sentiment and emotions in video content (Morency et al. [sent-24, score-0.887]
8 In this paper, we explore the addition of speech and visual modalities to text analysis in order to identify the sentiment expressed in video reviews. [sent-27, score-1.204]
9 This is in line with earlier work on text-based sentiment analysis, where it has been observed that full-document reviews often contain both positive and negative comments, which led to a number of methods addressing opinion analysis at sentence level. [sent-29, score-0.539]
10 Our results show that relying on the joint use of linguistic, acoustic, and visual modalities allows us to better sense the sentiment being expressed as compared to the use of only one modality at a time. [sent-30, score-1.006]
11 Another important aspect of this paper is the introduction of a new multimodal opinion database annotated at the utterance level which is, to our knowledge, the first of its kind. [sent-31, score-0.49]
12 In our work, this dataset enabled a wide range of multimodal sentiment analysis experiments, addressing the relative importance of modalities and individual features. [sent-32, score-0.924]
13 The following section presents related work in text-based sentiment analysis and audio-visual emotion recognition. [sent-33, score-0.591]
14 Section 3 describes our new multimodal datasets with utterance-level sentiment annotations. [sent-34, score-0.583]
15 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 973–982, timent analysis approach, including details about our linguistic, acoustic, and visual features. [sent-37, score-0.335]
16 Our experiments and results on multimodal sentiment classification are presented in Section 5, with a detailed discussion and analysis in Section 6. [sent-38, score-0.65]
17 2 Related Work In this section we provide a brief overview of related work in text-based sentiment analysis, as well as audio-visual emotion analysis. [sent-39, score-0.555]
18 , 2005), tracking sentiment timelines in on-line forums and news (Balog et al. [sent-43, score-0.319]
19 , 2008), and citation sentiment detection (Athar and Teufel, 2012). [sent-47, score-0.319]
20 One of the first lexicons used in sentiment analysis is the General Inquirer (Stone, 1968). [sent-48, score-0.355]
21 , 2012) portability have been addressed, not much has been done in terms of extending the applicability of sentiment analysis to other modalities, such as speech or facial expressions. [sent-59, score-0.672]
22 , 2009), where speech and text have been analyzed jointly for the purpose of subjectivity or sentiment identification, without, however, addressing other modalities such as visual cues; and the work reported in (Morency et al. [sent-64, score-0.991]
23 , 2013), where multimodal cues have been used for the analysis of sentiment in product reviews, but where the analysis was done at the much coarser level of full videos rather than individual utterances as we do in our work. [sent-66, score-0.92]
24 Proposed methods for emotion recognition from speech focus both on what is being said and how is be- ing said, and rely mainly on the analysis of the speech signal by sampling the content at utterance or frame level (Bitouk et al. [sent-71, score-0.638]
25 , pitch, speaking rate, Mel frequency coefficients) for speech-based emotion recognition (Polzin and Waibel, 1996; Tato et al. [sent-75, score-0.299]
26 There are also studies that analyzed the visual cues, such as facial expressions and body movements (Calder et al. [sent-78, score-0.585]
27 Emotions can be also expressed unconsciously, through subtle movements of facial muscles such as smiling or eyebrow raising, often measured and described using the Facial Action Coding System (FACS) (Ekman et al. [sent-83, score-0.286]
28 , 1998) presented one of the early works that integrate both acoustic and visual information for emotion recognition. [sent-91, score-0.864]
29 In addition to work that considered individual modalities, there is also a growing body of work concerned with multimodal emotion analysis (Silva et al. [sent-92, score-0.536]
30 neg But the problem with those lipsticks is that when you apply them, they leave a very nasty taste Sinceramente, es no es que sea el olor sino que es mas bien el gusto. [sent-103, score-0.38]
31 More recently, two challenges have been organized focusing on the recognition of emotions using audio and visual cues (Schuller et al. [sent-107, score-0.562]
32 Note however that most of the previous work on audio-visual emotion analysis has focused exclusively on the audio and video modalities, and did not consider textual features, as we do in our work. [sent-110, score-0.574]
33 The final video set includes 80 videos randomly selected from the videos retrieved from YouTube that also met the guidelines above. [sent-118, score-0.458]
34 Since the reviewers often switched topics when expressing their opinions, we manually selected a 30 seconds opinion segment from each video to avoid having multiple topics in a single review. [sent-121, score-0.304]
35 Each utterance is linked to the corresponding audio and video stream, as well as its manual transcription. [sent-128, score-0.444]
36 2 Sentiment Annotation To enable the use of this dataset for sentiment detection, we performed sentiment annotations at utterance level. [sent-132, score-0.817]
37 Annotations were done using Elan,2 which is a widely used tool for the annotation of video and audio resources. [sent-133, score-0.302]
38 The annotation was done after seeing the video corresponding to an utterance (along with the corresponding audio source). [sent-135, score-0.444]
39 Table 1shows the five utterances obtained from a video in our dataset, along with their corresponding 2http://tla. [sent-142, score-0.332]
40 Note also that sentiment is not always explicit in the text: for example, the last utterance “Honestly, it is not the smell, it is the taste” has an implicit reference to the “nasty taste” expressed in the previous utterance, and thus it was also labeled as negative by both annotators. [sent-146, score-0.493]
41 4 Multimodal Sentiment Analysis The main advantage that comes with the analysis of video opinions, as compared to their textual counterparts, is the availability ofvisual and speech cues. [sent-147, score-0.318]
42 In textual opinions, the only source of information consists of words and their dependencies, which may sometime prove insufficient to convey the exact sentiment of the user. [sent-148, score-0.319]
43 Instead, video opinions naturally contain multiple modalities, consisting of visual, acoustic, and linguistic datastreams. [sent-149, score-0.32]
44 We hypothesize that the simultaneous use of these three modalities will help create a better opinion analysis model. [sent-150, score-0.388]
45 1 Feature Extraction This section describes the process of automatically extracting linguistic, acoustic and visual features from the video reviews. [sent-152, score-0.888]
46 1 Linguistic Features We use a bag-of-words representation of the video transcriptions of each utterance to derive unigram counts, which are then used as linguistic features. [sent-158, score-0.406]
47 These simple weighted unigram features have been successfully used in the past to build sentiment classifiers on text, and in conjunction with Support Vector Machines (SVM) have been shown to lead to state-ofthe-art performance (Maas et al. [sent-162, score-0.359]
48 We used the open source software OpenEAR (Schuller, 2009) to automatically compute a set of acoustic features. [sent-167, score-0.329]
49 3 Facial Features Facial expressions can provide important clues for affect recognition, which we use to complement the linguistic and acoustic features extracted from the speech stream. [sent-187, score-0.475]
50 The most widely used system for measuring and describing facial behaviors is the Facial Action Coding System (FACS), which allows for the description of face muscle activities through the use of a set of Action Units (AUs). [sent-188, score-0.349]
51 , 2011), which allows us to automatically extract the following visual features: • • Smile and head pose estimates. [sent-193, score-0.299]
52 These features are the raw estimates for 30 facial AUs related to muscle movements for the eyes, eyebrows, nose, lips, 3http://www. [sent-200, score-0.359]
53 They provide detailed information about facial behaviors from which we expect to find differences between positive and negative states. [sent-205, score-0.287]
54 For example, the unit A12 describes the pulling of lip corners movement, which usually suggests a smile but when associated with a check raiser movement (unit A6), represents a marker for the emotion of happiness. [sent-209, score-0.303]
55 We extract a total of 40 visual features, each of them obtained at frame level. [sent-210, score-0.299]
56 Since only one person is present in each video clip, most of the time facing the camera, the facial tracking was successfully applied for most of our data. [sent-211, score-0.475]
57 5 Experiments and Results We run our sentiment classification experiments on the MOUD dataset introduced earlier. [sent-215, score-0.387]
58 From the dataset, we remove utterances labeled as neutral, thus keeping only the positive and negative utterances with valid visual features. [sent-216, score-0.555]
59 Second, previous work in subjectivity and sentiment analysis has demonstrated that a layered approach (where neutral statements are first separated from opinion statements followed by a separation between positive and negative statements) works better than a single three-way classification. [sent-219, score-0.634]
60 09% Table 2: Utterance-level sentiment classification with linguistic, acoustic, and visual features. [sent-230, score-0.649]
61 In this approach, the features collected from all the multimodal streams are combined into a single feature vector, thus resulting in one vector for each utterance in the dataset which is used to make a decision about the sentiment orientation of the utterance. [sent-233, score-0.833]
62 We run several comparative experiments, using one, two, and three modalities at a time. [sent-234, score-0.268]
63 5 In line with previous work on emotion recognition in speech (Haq and Jackson, 2009; Anagnostopoulos and Vovoli, 2010) where utterances are selected in a speaker dependent manner (i. [sent-236, score-0.512]
64 , utterances from the same speaker are included in both training and test), as well as work on sentence-level opinion classification where document boundaries are not considered in the split performed between the training and test sets (Wilson et al. [sent-238, score-0.266]
65 Table 2 shows the results of the utterance-level sentiment classification experiments. [sent-240, score-0.35]
66 6 Discussion The experimental results show that sentiment classification can be effectively performed on multimodal datastreams. [sent-242, score-0.614]
67 nz/ml/weka/ 978 Figure 2: Visual and acoustic feature weights. [sent-247, score-0.329]
68 Among the individual classifiers, the linguistic classifier appears to be the most accurate, followed by the classifier that relies on visual clues, and by the audio classifier. [sent-250, score-0.425]
69 The results obtained with this multimodal utterance classifier are found to be significantly better than the best individual results (obtained with the text modality), with significance being tested with a t-test (p=0. [sent-253, score-0.406]
70 To determine the role played by each of the visual and acoustic features, we compare the feature weights assigned by the learning algorithm, as shown in Figure 2. [sent-256, score-0.628]
71 Other informative features for sentiment classification are the voice probability, representing the energy in speech, the combined visual features that represent an angry face, and two of the cepstral coefficients. [sent-258, score-0.897]
72 To reach a better understanding of the relation between features, we also calculate the Pearson correlation between the visual and acoustic fea- tures. [sent-259, score-0.628]
73 To understand the role played by the size of the video-segments considered in the sentiment classification experiments, as well as the potential effect of a speaker-independence assumption, we also run a set of experiments where we use full videos for the classification. [sent-266, score-0.469]
74 In these experiments, once again the sentiment annotation is done by two independent annotators, using the same protocol as in the utterance-based annotations. [sent-267, score-0.319]
75 Videos that were ambivalent about the general sentiment were either labeled as neutral (and thus removed from the experiments), or labeled with the dominant sentiment. [sent-268, score-0.361]
76 As before, the linguistic, acoustic, and visual features are averaged over the entire video, and we use an SVM classifier in ten-fold cross validation experiments. [sent-271, score-0.339]
77 While the combination of modalities still helps, the improvement is smaller than the one obtained during the utterance-level classification. [sent-273, score-0.268]
78 Specifically, the combined effect of acoustic and visual features improves significantly over the individual modalities. [sent-274, score-0.668]
79 However, the combination of linguistic features with other modalities does not lead to clear improvements. [sent-275, score-0.352]
80 Another possible reason is the fact that the acoustic and visual modalities are significantly weaker than the linguistic modality, most likely due to the fact that the feature vectors are now speaker-independent, which makes it harder to improve over the linguistic modality alone. [sent-277, score-1.104]
81 7 Conclusions In this paper, we presented a multimodal approach for utterance-level sentiment classification. [sent-278, score-0.583]
82 We introduced a new multimodal dataset consisting 979 AU6AU12AU45AUs 1,1+4PitchVoice probabilityIntensityLoudness AU6 1. [sent-279, score-0.301]
83 00 Table 3: Correlations between several visual and acoustic features. [sent-315, score-0.628]
84 66% Two modalities at a time Linguistic + Acoustic72. [sent-324, score-0.268]
85 66% Table 4: Video-level sentiment classification with linguistic, acoustic, and visual features. [sent-328, score-0.649]
86 of sentiment annotated utterances extracted from video reviews, where each utterance is associated with a video, acoustic, and linguistic datastream. [sent-329, score-0.837]
87 Our experiments show that sentiment annotation of utterance-level visual datastreams can be effectively performed, and that the use of multiple modalities can lead to error rate reductions of up to 10. [sent-330, score-0.886]
88 In future work, we plan to explore alternative multimodal fusion methods, such as decision-level and meta-level fusion, to improve the integration of the visual, acoustic, and linguistic modalities. [sent-332, score-0.39]
89 Acknowledgments We would like to thank Alberto Castro for his help with the sentiment annotations. [sent-333, score-0.319]
90 Sound processing features for speaker-dependent and phraseindependent emotion recognition in berlin database. [sent-351, score-0.339]
91 Survey on speech emotion recognition: Features, classification schemes, and databases. [sent-375, score-0.329]
92 Biogra- phies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. [sent-395, score-0.319]
93 Liars and saviors in a sentiment annotated corpus of comments to political debates. [sent-422, score-0.319]
94 Facial emotion recognition using multi-modal information, volume 1, page 397401. [sent-437, score-0.299]
95 Fusion of acoustic and linguistic features for emotion detection. [sent-542, score-0.649]
96 Towards multimodal sentiment analysis: Harvesting opinions from the web. [sent-559, score-0.639]
97 Why question answering using sentiment analysis and word classes. [sent-570, score-0.355]
98 Openear introducing the munich open-source emotion and affect recognition toolkit. [sent-627, score-0.299]
99 Emotion recognition based on joint visual and audio cues. [sent-636, score-0.444]
100 Exploring fusion methods for multimodal emotion recognition with missing data. [sent-709, score-0.645]
wordName wordTfidf (topN-words)
[('acoustic', 0.329), ('sentiment', 0.319), ('visual', 0.299), ('modalities', 0.268), ('multimodal', 0.264), ('facial', 0.255), ('emotion', 0.236), ('video', 0.22), ('utterance', 0.142), ('modality', 0.12), ('videos', 0.119), ('que', 0.115), ('utterances', 0.112), ('aus', 0.103), ('schuller', 0.1), ('youtube', 0.095), ('emotions', 0.084), ('opinion', 0.084), ('morency', 0.083), ('audio', 0.082), ('fusion', 0.082), ('loudness', 0.074), ('reviews', 0.068), ('pitch', 0.068), ('intensity', 0.067), ('smile', 0.067), ('voice', 0.066), ('silva', 0.064), ('recognition', 0.063), ('speech', 0.062), ('face', 0.061), ('cepstral', 0.059), ('ieee', 0.058), ('opinions', 0.056), ('recommended', 0.054), ('ekman', 0.05), ('eyben', 0.05), ('miyasato', 0.05), ('moud', 0.05), ('pantic', 0.05), ('recomiendo', 0.05), ('wollmer', 0.05), ('prosody', 0.049), ('maas', 0.049), ('wiebe', 0.046), ('transcription', 0.045), ('cert', 0.044), ('linguistic', 0.044), ('energy', 0.043), ('subjectivity', 0.043), ('neutral', 0.042), ('mihalcea', 0.041), ('taste', 0.041), ('honestly', 0.041), ('features', 0.04), ('statements', 0.039), ('speaker', 0.039), ('el', 0.038), ('signal', 0.037), ('dataset', 0.037), ('frames', 0.037), ('analysis', 0.036), ('coding', 0.036), ('cues', 0.034), ('athar', 0.033), ('atrey', 0.033), ('balog', 0.033), ('bitouk', 0.033), ('calder', 0.033), ('essa', 0.033), ('facs', 0.033), ('favoritos', 0.033), ('haq', 0.033), ('littlewort', 0.033), ('metze', 0.033), ('muscle', 0.033), ('nasty', 0.033), ('openear', 0.033), ('peliculas', 0.033), ('perfumes', 0.033), ('pinta', 0.033), ('polzin', 0.033), ('rosenblum', 0.033), ('sebe', 0.033), ('sinceramente', 0.033), ('valstar', 0.033), ('ververidis', 0.033), ('wiegand', 0.033), ('zhihong', 0.033), ('non', 0.033), ('spectral', 0.033), ('action', 0.032), ('negative', 0.032), ('classification', 0.031), ('movements', 0.031), ('orientation', 0.031), ('carvalho', 0.029), ('cowie', 0.029), ('libros', 0.029), ('pauses', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency
Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
2 0.23171419 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Author: Elia Bruni ; Marco Baroni
Abstract: unkown-abstract
3 0.22852023 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions
Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li
Abstract: Emotion classification can be generally done from both the writer’s and reader’s perspectives. In this study, we find that two foundational tasks in emotion classification, i.e., reader’s emotion classification on the news and writer’s emotion classification on the comments, are strongly related to each other in terms of coarse-grained emotion categories, i.e., negative and positive. On the basis, we propose a respective way to jointly model these two tasks. In particular, a cotraining algorithm is proposed to improve semi-supervised learning of the two tasks. Experimental evaluation shows the effectiveness of our joint modeling approach. . 1
4 0.22794051 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue
Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda
Abstract: While there have been many attempts to estimate the emotion of an addresser from her/his utterance, few studies have explored how her/his utterance affects the emotion of the addressee. This has motivated us to investigate two novel tasks: predicting the emotion of the addressee and generating a response that elicits a specific emotion in the addressee’s mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by . five human workers.
5 0.20999856 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
Author: Hongliang Yu ; Zhi-Hong Deng ; Shiyingxue Li
Abstract: Sentiment Word Identification (SWI) is a basic technique in many sentiment analysis applications. Most existing researches exploit seed words, and lead to low robustness. In this paper, we propose a novel optimization-based model for SWI. Unlike previous approaches, our model exploits the sentiment labels of documents instead of seed words. Several experiments on real datasets show that WEED is effective and outperforms the state-of-the-art methods with seed words.
6 0.20914422 318 acl-2013-Sentiment Relevance
7 0.20475903 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
8 0.19899978 249 acl-2013-Models of Semantic Representation with Visual Attributes
9 0.18677753 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions
10 0.18063366 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays
11 0.17762272 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
12 0.16956863 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
13 0.14968422 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts
14 0.14290448 380 acl-2013-VSEM: An open library for visual semantics representation
15 0.13599537 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
16 0.13498825 175 acl-2013-Grounded Language Learning from Video Described with Sentences
17 0.1321876 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
18 0.12494691 184 acl-2013-Identification of Speakers in Novels
19 0.12418984 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction
20 0.12222871 244 acl-2013-Mining Opinion Words and Opinion Targets in a Two-Stage Framework
topicId topicWeight
[(0, 0.177), (1, 0.288), (2, -0.04), (3, 0.214), (4, -0.126), (5, -0.14), (6, 0.074), (7, -0.021), (8, 0.006), (9, 0.228), (10, -0.115), (11, -0.175), (12, -0.039), (13, 0.117), (14, 0.08), (15, -0.009), (16, 0.012), (17, 0.052), (18, 0.051), (19, 0.091), (20, -0.06), (21, -0.153), (22, 0.034), (23, 0.02), (24, -0.095), (25, 0.113), (26, 0.08), (27, -0.046), (28, 0.048), (29, 0.033), (30, 0.032), (31, 0.047), (32, -0.06), (33, -0.003), (34, -0.022), (35, -0.072), (36, -0.001), (37, -0.005), (38, 0.022), (39, -0.009), (40, -0.026), (41, -0.057), (42, -0.008), (43, 0.019), (44, -0.003), (45, 0.048), (46, -0.0), (47, 0.033), (48, -0.003), (49, -0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.94670093 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency
Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
2 0.76680523 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays
Author: Eric T. Nalisnick ; Henry S. Baird
Abstract: We present an automatic method for analyzing sentiment dynamics between characters in plays. This literary format’s structured dialogue allows us to make assumptions about who is participating in a conversation. Once we have an idea of who a character is speaking to, the sentiment in his or her speech can be attributed accordingly, allowing us to generate lists of a character’s enemies and allies as well as pinpoint scenes critical to a character’s emotional development. Results of experiments on Shakespeare’s plays are presented along with discussion of how this work can be extended to unstructured texts (i.e. novels).
3 0.68568957 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions
Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li
Abstract: Emotion classification can be generally done from both the writer’s and reader’s perspectives. In this study, we find that two foundational tasks in emotion classification, i.e., reader’s emotion classification on the news and writer’s emotion classification on the comments, are strongly related to each other in terms of coarse-grained emotion categories, i.e., negative and positive. On the basis, we propose a respective way to jointly model these two tasks. In particular, a cotraining algorithm is proposed to improve semi-supervised learning of the two tasks. Experimental evaluation shows the effectiveness of our joint modeling approach. . 1
4 0.65170032 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions
Author: Mitra Mohtarami ; Man Lan ; Chew Lim Tan
Abstract: Sentiment Similarity of word pairs reflects the distance between the words regarding their underlying sentiments. This paper aims to infer the sentiment similarity between word pairs with respect to their senses. To achieve this aim, we propose a probabilistic emotionbased approach that is built on a hidden emotional model. The model aims to predict a vector of basic human emotions for each sense of the words. The resultant emotional vectors are then employed to infer the sentiment similarity of word pairs. We apply the proposed approach to address two main NLP tasks, namely, Indirect yes/no Question Answer Pairs inference and Sentiment Orientation prediction. Extensive experiments demonstrate the effectiveness of the proposed approach.
5 0.64318389 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue
Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda
Abstract: While there have been many attempts to estimate the emotion of an addresser from her/his utterance, few studies have explored how her/his utterance affects the emotion of the addressee. This has motivated us to investigate two novel tasks: predicting the emotion of the addressee and generating a response that elicits a specific emotion in the addressee’s mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by . five human workers.
6 0.56923181 318 acl-2013-Sentiment Relevance
7 0.56221622 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
8 0.54705375 117 acl-2013-Detecting Turnarounds in Sentiment Analysis: Thwarting
9 0.52709657 384 acl-2013-Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
10 0.52270329 211 acl-2013-LABR: A Large Scale Arabic Book Reviews Dataset
11 0.51775193 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use
12 0.50290954 249 acl-2013-Models of Semantic Representation with Visual Attributes
13 0.49742076 380 acl-2013-VSEM: An open library for visual semantics representation
14 0.48651507 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams
15 0.4767946 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
16 0.46760628 49 acl-2013-An annotated corpus of quoted opinions in news articles
17 0.44311893 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
18 0.43584713 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus
19 0.42644534 184 acl-2013-Identification of Speakers in Novels
20 0.42518261 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning
topicId topicWeight
[(0, 0.082), (6, 0.038), (11, 0.025), (15, 0.011), (24, 0.075), (26, 0.063), (28, 0.012), (35, 0.065), (42, 0.036), (48, 0.036), (70, 0.055), (82, 0.256), (88, 0.037), (90, 0.027), (95, 0.074)]
simIndex simValue paperId paperTitle
1 0.80544788 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction
Author: Bo Han ; Paul Cook ; Timothy Baldwin
Abstract: We implement a city-level geolocation prediction system for Twitter users. The system infers a user’s location based on both tweet text and user-declared metadata using a stacking approach. We demonstrate that the stacking method substantially outperforms benchmark methods, achieving 49% accuracy on a benchmark dataset. We further evaluate our method on a recent crawl of Twitter data to investigate the impact of temporal factors on model generalisation. Our results suggest that user-declared location metadata is more sensitive to temporal change than the text of Twitter messages. We also describe two ways of accessing/demoing our system.
same-paper 2 0.79816717 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis
Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency
Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
3 0.67871815 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
Author: Zhenhua Tian ; Hengheng Xiang ; Ziqi Liu ; Qinghua Zheng
Abstract: This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences. Based on the measure of preferences between predicates and arguments, the model aggregates all the transitions from a given predicate to its nearby predicates, and propagates their argument preferences as the given predicate’s smoothed preferences. Experimental results show that this approach outperforms several state-of-the-art methods on the pseudo-disambiguation task, and it better correlates with human plausibility judgements.
4 0.56341553 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
Author: Young-Bum Kim ; Benjamin Snyder
Abstract: In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsu- pervised accuracy of 89% across the same set of languages.
5 0.56055003 373 acl-2013-Using Conceptual Class Attributes to Characterize Social Media Users
Author: Shane Bergsma ; Benjamin Van Durme
Abstract: We describe a novel approach for automatically predicting the hidden demographic properties of social media users. Building on prior work in common-sense knowledge acquisition from third-person text, we first learn the distinguishing attributes of certain classes of people. For example, we learn that people in the Female class tend to have maiden names and engagement rings. We then show that this knowledge can be used in the analysis of first-person communication; knowledge of distinguishing attributes allows us to both classify users and to bootstrap new training examples. Our novel approach enables substantial improvements on the widelystudied task of user gender prediction, ob- taining a 20% relative error reduction over the current state-of-the-art.
6 0.55790365 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
7 0.55606508 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
8 0.5551759 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
9 0.55474991 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
10 0.55379742 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
11 0.55351484 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
12 0.55331039 267 acl-2013-PARMA: A Predicate Argument Aligner
13 0.55167925 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
14 0.55144948 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision
15 0.55024159 316 acl-2013-SenseSpotting: Never let your parallel data tie you to an old domain
16 0.5499177 318 acl-2013-Sentiment Relevance
17 0.5494799 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification
19 0.54761112 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
20 0.54665631 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization