acl acl2013 acl2013-209 knowledge-graph by maker-knowledge-mining

209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions

Source: pdf

Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li

Abstract: Emotion classification can be generally done from both the writer’s and reader’s perspectives. In this study, we find that two foundational tasks in emotion classification, i.e., reader’s emotion classification on the news and writer’s emotion classification on the comments, are strongly related to each other in terms of coarse-grained emotion categories, i.e., negative and positive. On the basis, we propose a respective way to jointly model these two tasks. In particular, a cotraining algorithm is proposed to improve semi-supervised learning of the two tasks. Experimental evaluation shows the effectiveness of our joint modeling approach. . 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Joint Modeling of News Reader’s and Comment Writer’s Emotions Huanhuan Liu† Shoushan Li†‡* Guodong Zhou† Chu-Ren Huang‡ Peifeng Li† †Natural Language Processing Lab ‡Department of CBS Soochow University, China { huanhuanl iu suda shoushan churenhuang } @ gmai l com . [sent-1, score-0.052]

2 Abstract Emotion classification can be generally done from both the writer’s and reader’s perspectives. [sent-3, score-0.075]

3 In this study, we find that two foundational tasks in emotion classification, i. [sent-4, score-0.715]

4 , reader’s emotion classification on the news and writer’s emotion classification on the comments, are strongly related to each other in terms of coarse-grained emotion categories, i. [sent-6, score-2.496]

5 In particular, a cotraining algorithm is proposed to improve semi-supervised learning of the two tasks. [sent-10, score-0.03]

6 Experimental evaluation shows the effectiveness of our joint modeling approach. [sent-11, score-0.013]

7 1 Introduction Emotion classification aims to predict the emotion categories (e. [sent-13, score-0.838]

8 With the rapid growth of computer mediated communication applications, such as social websites and miro-blogs, the research on emotion classification has been attracting more and more attentions recently from the natural language processing (NLP) community (Chen et al. [sent-16, score-0.82]

9 In general, a single text may possess two kinds of emotions, writer’s emotion and reader’s emotion, where the former concerns the emotion expressed by the writer when writing the text and the latter concerns the emotion expressed by a reader after reading the text. [sent-18, score-2.757]

10 For example, consider two short texts drawn from a news and corresponding comments, as shown in Figure 1. [sent-19, score-0.201]

11 On * Corresponding author the Hong Kong Polytechnic University l i { gdzhou pfl i @ suda . [sent-20, score-0.051]

12 cn } , , one hand, for the news text, while its writer just objectively reports the news and thus does not express his emotion in the text, a reader could yield sad or worried emotion. [sent-22, score-1.818]

13 On the other hand, for the comment text, its writer clearly expresses his sad emotion while the emotion of a reader after reading the comments is not clear (Some may feel sorry but others might feel careless). [sent-23, score-2.511]

14 Iecsatdilhrcs’an emotions on a news and its comments Accordingly, emotion classification can be grouped into two categories: reader’s emotion and writer’s emotion classifications. [sent-30, score-2.816]

15 Although both emotion classification tasks have been widely studied in recent years, they are always considered independently and treated separately. [sent-31, score-0.79]

16 However, news and their corresponding comments often appear simultaneously. [sent-32, score-0.327]

17 For example, in many news websites, it is popular to see a news followed by many comments. [sent-33, score-0.402]

18 In this case, because the writers of the comments are a part of the readers of the news, the writer’s emotions on the comments are exactly certain reflection of the reader’s emotions on the news. [sent-34, score-0.829]

19 That is, the comment writer’s emotions and the news reader’s emotions are strongly related. [sent-35, score-1.035]

20 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgicess 511–515, in Figure 1, the comment writer’s emotion ‘sad’ is among the news reader’s emotions. [sent-38, score-1.212]

21 Above observation motivates joint modeling of news reader’s and comment writer’s emotions. [sent-39, score-0.528]

22 In this study, we systematically investigate the relationship between the news reader’s emotions and the comment writer’s emotions. [sent-40, score-0.811]

23 Specifically, we manually analyze their agreement in a corpus collected from a news website. [sent-41, score-0.237]

24 It is interesting to find that such agreement only applies to coarsegrained emotion categories (i. [sent-42, score-0.831]

25 , positive and negative) with a high probability and does not apply to fine-grained emotion categories (e. [sent-44, score-0.787]

26 This motivates our joint modeling in terms of the coarse-grained emotion categories. [sent-47, score-0.746]

27 Specifically, we consider the news text and the comment text as two different views of expressing either the news reader’s or comment writer’s emotions. [sent-48, score-1.014]

28 Given the two views, a co-training algorithm is proposed to perform semi-supervised emotion classification so that the information in the unlabeled data can be exploited to improve the classification performance. [sent-49, score-0.893]

29 , 2005; Wilson et emotion classification has topic in NLP during the last 2002; Turney, 2002; Alm et al. [sent-53, score-0.79]

30 , 2009) and previous stud- ies can be mainly grouped into two categories: coarse-grained and fine-grained emotion classification. [sent-54, score-0.715]

31 Coarse-grained emotion classification, also called sentiment classification, concerns only two emotion categories, such as like or dislike and positive or negative (Pang and Lee, 2008; Liu, 2012). [sent-55, score-1.549]

32 This kind of emotion classification has attracted much attention since the pioneer work by Pang et al. [sent-56, score-0.803]

33 In comparison, fine-grained emotion classification aims to classify a text into multiple emotion categories, such as happy, angry, and sad. [sent-62, score-1.505]

34 One main group of related studies on this task is about emotion resource construction, such as emotion lexicon building (Xu et al. [sent-63, score-1.447]

35 Besides, all the related studies focus on supervised learning (Alm et al. [sent-66, score-0.017]

36 , 2011), and so far, we have not seen any studies on semi-supervised learning on fine-grained emotion classification. [sent-69, score-0.732]

37 2 News Reader’s Emotion Classification While comment writer’s emotion classification has been extensively studied, there are only a few studies on news reader’s emotion classification from the NLP and related communities. [sent-71, score-2.094]

38 (2007) first describe the task of reader’s emotion classification on the news articles and then employ some standard machine learning approaches to train a classifier for determining the reader’s emotion towards a news. [sent-73, score-1.769]

39 Unlike all the studies mentioned above, our study is the first attempt on exploring the relationship between comment writer’s emotion classification and news reader’s emotion classification. [sent-76, score-2.048]

40 3 Relationship between News Reader’s and Comment Writer’s Emotions To investigate the relationship between news reader’s and comment writer’s emotions, we collect a corpus of Chinese news articles and their corresponding comments from Yahoo! [sent-77, score-0.887]

41 com), where each news article is voted with emotion tags from eight categories: happy, sad, angry, meaningless, boring, heartwarming, worried, and useful. [sent-81, score-0.936]

42 These emotion tags on each news are selected by the readers of the news. [sent-82, score-0.933]

43 Note that because the categories of “useful” and “meaningless” are not real emotion categories, we ignore them in our study. [sent-83, score-0.763]

44 (2008), we consider the voted emotions as reader’s emotions on the news, i. [sent-86, score-0.558]

45 We only select the news articles with a dominant emotion (possessing more than 50% votes) in our data. [sent-89, score-0.964]

46 Besides, as we attempt to consider the comment writer’s emotions, the news articles without any comments are filtered. [sent-90, score-0.657]

47 As a result, we obtain a corpus of 3495 news articles together with their comments and the numbers of the articles of happy, sad, angry, boring, heartwarming, and worried are 1405, 230, 1673, 75, 92 and 20 respectively. [sent-91, score-0.441]

48 For coarse-grained categories, happy and heartwarming are merged into the positive category while 512 sad, angry, boring and worried are merged into the negative category. [sent-92, score-0.242]

49 Besides the tags of the reader’s emotions, each news article is followed by some comments, which can be seen as a reflection of the writer’s emotions (Averagely, each news is followed by 15 comments). [sent-93, score-0.693]

50 In order to know the exact relationship between these two kinds of emotions, we select 20 news from each category and ask two human annotators, named A and B, to manually annotate the writer’s emotion (single-label) according to the comments of each news. [sent-94, score-1.089]

51 Table 1 reports the agreement on annotators and emotions, measured with Cohen’s kappa (κ) value (Cohen, 1960). [sent-95, score-0.056]

52 et7-aim5g4lo6r2unaetsi)noeds Agreement between two annotators: The annotation agreement between the two annotators is 0. [sent-98, score-0.056]

53 Agreement between news reader’s and comment writer’s emotions: We compare the news reader’s emotion (automatically extracted from the web page) and the comment writer’s emotion (manually annotated by annotator A). [sent-101, score-2.424]

54 The annotation agreement between the two kinds of emotions is 0. [sent-102, score-0.323]

55 From the results, we can see that the agreement on the fine-grained emotions is a bit low while the agreement between the coarsegrained emotions, i. [sent-105, score-0.373]

56 We find that although some finegrained emotions of the comments are not consistent with the dominant emotion of the news, they belong to the same coarse-grained category. [sent-108, score-1.124]

57 In a word, the agreement between news reader’s and comment writer’s emotions on the coarse-grained emotions is very high, even high- er than the agreement between the two annotators (0. [sent-109, score-1.127]

58 In the following, we focus on the coarsegrained emotions in emotion classification. [sent-113, score-1.016]

59 In semi-supervised learning, the unlabeled data is exploited to improve the models with a small amount of the labeled data. [sent-115, score-0.062]

60 In our approach, we consider the news text and the comment text as two different views to express the news or comment emotion and build the two classifiers CN and CC . [sent-116, score-1.748]

61 Given the two-view classifiers, we perform co-training for semisupervised emotion classification, as shown in Figure 2, on both news reader’s and comment writer’s emotion classification. [sent-117, score-1.953]

62 Input: LNews the labeled data on the news LComment the labeled data on the comments UNews the unlabeled data on the news UComment the labeled data on the comments Output: LNews New labeled data on the news LComment New labeled data on the comments Procedure: Loop (1). [sent-118, score-1.179]

63 UNews UNews N1 N2 UComment  UComment M1 M2 Figure 2: Co-training algorithm for semisupervised emotion classification 513 5 Experimentation 5. [sent-128, score-0.816]

64 1 Experimental Settings Data Setting: The data set includes 3495 news articles (1572 positive and 1923 negative) and their comments as described in Section 3. [sent-129, score-0.385]

65 Although the emotions of the comments are not given in the website, we just set their coarse-grained emotion categories the same as the emotions of their source news due to their close relationship, as described in Section 3. [sent-130, score-1.628]

66 To make the data balanced, we randomly select 1500 positive and 1500 negative news with their comments for the empirical study. [sent-131, score-0.38]

67 Among them, we randomly select 400 news with their comments as the test data. [sent-132, score-0.327]

68 Features: Each news or comment text is treated as a bag-of-words and transformed into a binary vector encoding the presence or absence of word unigrams. [sent-133, score-0.497]

69 Classification algorithm: the maximum entropy (ME) classifier implemented with the public tool, Mallet Toolkits*. [sent-134, score-0.029]

70 2 Experimental Results News reader’s emotion classifier: The classifier trained with the news text. [sent-136, score-0.945]

71 Comment writer’s emotion classifier: The classifier trained with the comment text. [sent-137, score-1.04]

72 Figure 3 demonstrates the performances of the news reader’s and comment writer’s emotion classifiers trained with the 10 and 50 initial labeled samples plus automatically labeled data from co-training. [sent-138, score-1.387]

73 Here, in each iteration, we pick 2 positive and 2 negative most confident samples, i. [sent-139, score-0.053]

74 From this figure, we can see that our co-training algorithm is very effective: using only 10 labeled samples in each category achieves a very promising performance on either news reader’s or comment writer’s emotion classification. [sent-141, score-1.302]

75 Especially, the performance when using only 10 labeled samples is comparable to that when using more than 1200 labeled samples on supervised learning of comment writer’s emotion classification. [sent-142, score-1.191]

76 For comparison, we also implement a selftraining algorithm for the news reader’s and comment writer’s emotion classifiers, each of which automatically labels the samples from the unlabeled data independently. [sent-143, score-1.316]

77 For news reader’s emotion classification, the performances of selftraining are 0. [sent-144, score-0.968]

78 For comment writer’s emotion classification, the performances of self-training are 0. [sent-150, score-1.043]

79 These results are much lower than the performances of our cotraining approach, especially on the comment writer’s emotion classification i. [sent-153, score-1.148]

80 , reader’s emotion classification on the news and writer’s emotion classification on the comments. [sent-166, score-1.781]

81 From the data analysis, we find that the news reader’s and comment writer’s emotions are highly consistent to each other in terms of the coarse-grained emotion categories, positive and negative. [sent-167, score-1.505]

82 On the basis, we propose a co-training approach to perform semisupervised learning on the two tasks. [sent-168, score-0.026]

83 Evaluation shows that the co-training approach is so effective that using only 10 labeled samples achieves nice performances on both news reader’s and comment writer’s emotion classification. [sent-169, score-1.334]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('emotion', 0.715), ('writer', 0.337), ('comment', 0.296), ('emotions', 0.269), ('reader', 0.211), ('news', 0.201), ('comments', 0.126), ('sad', 0.081), ('angry', 0.08), ('classification', 0.075), ('lcomment', 0.075), ('lnews', 0.075), ('ucomment', 0.075), ('unews', 0.075), ('happy', 0.066), ('samples', 0.056), ('categories', 0.048), ('worried', 0.046), ('heartwarming', 0.045), ('sentiment', 0.043), ('purver', 0.04), ('alm', 0.037), ('ini', 0.037), ('quan', 0.037), ('agreement', 0.036), ('labeled', 0.034), ('articles', 0.034), ('proceeding', 0.034), ('performances', 0.032), ('pang', 0.032), ('coarsegrained', 0.032), ('boring', 0.032), ('aman', 0.03), ('battersby', 0.03), ('cotraining', 0.03), ('ial', 0.03), ('moshfeghi', 0.03), ('cc', 0.029), ('relationship', 0.029), ('negative', 0.029), ('classifier', 0.029), ('unlabeled', 0.028), ('shoushan', 0.027), ('zhejiang', 0.027), ('bandyopadhyay', 0.027), ('duin', 0.027), ('pami', 0.027), ('cn', 0.026), ('semisupervised', 0.026), ('thumbs', 0.025), ('suda', 0.025), ('confidently', 0.025), ('volkova', 0.025), ('positive', 0.024), ('concerns', 0.023), ('reflection', 0.022), ('meaningless', 0.022), ('ze', 0.021), ('annotators', 0.02), ('views', 0.02), ('voted', 0.02), ('selftraining', 0.02), ('dasgupta', 0.02), ('classifiers', 0.019), ('grants', 0.018), ('motivates', 0.018), ('lin', 0.018), ('kinds', 0.018), ('websites', 0.018), ('readers', 0.017), ('das', 0.017), ('li', 0.017), ('cui', 0.017), ('cohen', 0.017), ('studies', 0.017), ('ren', 0.016), ('systematically', 0.016), ('zhou', 0.016), ('feel', 0.015), ('riloff', 0.015), ('opinion', 0.015), ('wilson', 0.015), ('dominant', 0.014), ('besides', 0.014), ('choose', 0.013), ('pioneer', 0.013), ('gdzhou', 0.013), ('pfl', 0.013), ('odta', 0.013), ('grf', 0.013), ('polytechnic', 0.013), ('possessing', 0.013), ('kong', 0.013), ('modeling', 0.013), ('hong', 0.013), ('china', 0.013), ('attentions', 0.012), ('averagely', 0.012), ('peifeng', 0.012), ('cbs', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions

Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li

2 0.47069609 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue

Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

Abstract: While there have been many attempts to estimate the emotion of an addresser from her/his utterance, few studies have explored how her/his utterance affects the emotion of the addressee. This has motivated us to investigate two novel tasks: predicting the emotion of the addressee and generating a response that elicits a specific emotion in the addressee’s mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by . five human workers.

3 0.22852023 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency

Abstract: During real-life interactions, people are naturally gesturing and modulating their voice to emphasize specific points or to express their emotions. With the recent growth of social websites such as YouTube, Facebook, and Amazon, video reviews are emerging as a new source of multimodal and natural opinions that has been left almost untapped by automatic opinion analysis techniques. This paper presents a method for multimodal sentiment classification, which can identify the sentiment expressed in utterance-level visual datastreams. Using a new multimodal dataset consisting of sentiment annotated utterances extracted from video reviews, we show that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.

4 0.22830421 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

Author: Mitra Mohtarami ; Man Lan ; Chew Lim Tan

Abstract: Sentiment Similarity of word pairs reflects the distance between the words regarding their underlying sentiments. This paper aims to infer the sentiment similarity between word pairs with respect to their senses. To achieve this aim, we propose a probabilistic emotionbased approach that is built on a hidden emotional model. The model aims to predict a vector of basic human emotions for each sense of the words. The resultant emotional vectors are then employed to infer the sentiment similarity of word pairs. We apply the proposed approach to address two main NLP tasks, namely, Indirect yes/no Question Answer Pairs inference and Sentiment Orientation prediction. Extensive experiments demonstrate the effectiveness of the proposed approach.

5 0.18599534 257 acl-2013-Natural Language Models for Predicting Programming Comments

Author: Dana Movshovitz-Attias ; William W. Cohen

Abstract: Statistical language models have successfully been used to describe and analyze natural language documents. Recent work applying language models to programming languages is focused on the task of predicting code, while mainly ignoring the prediction of programmer comments. In this work, we predict comments from JAVA source files of open source projects, using topic models and n-grams, and we analyze the performance of the models given varying amounts of background data on the project being predicted. We evaluate models on their comment-completion capability in a setting similar to codecompletion tools built into standard code editors, and show that using a comment completion tool can save up to 47% of the comment typing. 1 Introduction and Related Work Statistical language models have traditionally been used to describe and analyze natural language documents. Recently, software engineering researchers have adopted the use of language models for modeling software code. Hindle et al. (2012) observe that, as code is created by humans it is likely to be repetitive and predictable, similar to natural language. NLP models have thus been used for a variety of software development tasks such as code token completion (Han et al., 2009; Jacob and Tairas, 2010), analysis of names in code (Lawrie et al., 2006; Binkley et al., 2011) and mining software repositories (Gabel and Su, 2008). An important part of software programming and maintenance lies in documentation, which may come in the form of tutorials describing the code, or inline comments provided by the programmer. The documentation provides a high level description of the task performed by the code, and may William W. Cohen Computer Science Department Carnegie Mellon University wcohen @ c s .cmu .edu include examples of use-cases for specific code segments or identifiers such as classes, methods and variables. Well documented code is easier to read and maintain in the long-run but writing comments is a laborious task that is often overlooked or at least postponed by many programmers. Code commenting not only provides a summarization of the conceptual idea behind the code (Sridhara et al., 2010), but can also be viewed as a form of document expansion where the comment contains significant terms relevant to the described code. Accurately predicted comment words can therefore be used for a variety of linguistic uses including improved search over code bases using natural language queries, code categorization, and locating parts of the code that are relevant to a specific topic or idea (Tseng and Juang, 2003; Wan et al., 2007; Kumar and Carterette, 2013; Shepherd et al., 2007; Rastkar et al., 2011). A related and well studied NLP task is that of predicting natural language caption and commentary for images and videos (Blei and Jordan, 2003; Feng and Lapata, 2010; Feng and Lapata, 2013; Wu and Li, 2011). In this work, our goal is to apply statistical language models for predicting class comments. We show that n-gram models are extremely successful in this task, and can lead to a saving of up to 47% in comment typing. This is expected as n-grams have been shown as a strong model for language and speech prediction that is hard to improve upon (Rosenfeld, 2000). In some cases however, for example in a document expansion task, we wish to extract important terms relevant to the code regardless of local syntactic dependencies. We hence also evaluate the use of LDA (Blei et al., 2003) and link-LDA (Erosheva et al., 2004) topic models, which are more relevant for the term ex- traction scenario. We find that the topic model performance can be improved by distinguishing code and text tokens in the code. 35 Proce dinSgosfi oa,f tB huel 5g1arsita, An Anu gauls Mt 4e-e9ti n2g01 o3f. th ?c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 35–40, 2 Method 2.1 Models We train n-gram models (n = 1, 2, 3) over source code documents containing sequences of combined code and text tokens from multiple training datasets (described below). We use the Berkeley Language Model package (Pauls and Klein, 2011) with absolute discounting (Kneser-Ney smoothing; (1995)) which includes a backoff strategy to lower-order n-grams. Next, we use LDA topic models (Blei et al., 2003) trained on the same data, with 1, 5, 10 and 20 topics. The joint distribution of a topic mixture θ, and a set of N topics z, for a single source code document with N observed word tokens, d = {wi}iN=1, given the Dirichlet parameters α sa,n dd β, {isw th}erefore p(θ, z, w|α, β) = p(θ|α) Yp(z|θ)p(w|z, (1) β) Yw Under the models described so far, there is no distinction between text and code tokens. Finally, we consider documents as having a mixed membership of two entity types, code and text tokens, d = where tthexet text ws,o drd =s are tok}ens f,r{owm comment and string literals, and the code words include the programming language syntax tokens (e.g., publ ic, private, for, etc’ ) and all identifiers. In this case, we train link-LDA models (Erosheva et al., 2004) with 1, 5, 10 and 20 topics. Under the linkLDA model, the mixed-membership joint distribution of a topic mixture, words and topics is then ({wciode}iC=n1, {witext}iT=n1), p(θ, z, w|α, β) = p(θ|α) Y wYtext · p(ztext|θ)p(wtext|ztext,β)· (2) Y p(zcode|θ)p(wcode|zcode,β) wYcode where θ is the joint topic distribution, w is the set of observed document words, ztext is a topic associated with a text word, and zcode a topic associated with a code word. The LDA and link-LDA models use Gibbs sampling (Griffiths and Steyvers, 2004) for topic inference, based on the implementation of Balasubramanyan and Cohen (201 1) with single or multiple entities per document, respectively. 2.2 Testing Methodology Our goal is to predict the tokens of the JAVA class comment (the one preceding the class definition) in each of the test files. Each of the models described above assigns a probability to the next comment token. In the case of n-grams, the probability of a token word wi is given by considering previous words p(wi |wi−1 , . . . , w0). This probability is estimated given the previous n 1tokens as p(wi|wi−1, wi−(n−1)). For t|hwe topic models, we separate the docu- ..., − ment tokens into the class definition and the comment we wish to predict. The set of tokens of the class comment are all considered as text tokens. The rest of the tokens in the document are considered to be the class definition, and they may contain both code and text tokens (from string literals and other comments in the source file). We then compute the posterior probability of document topics by solving the following inference problem conditioned on the tokens wc, wr, wr p(θ,zr|wr,α,β) =p(θp,(zwr,rw|αr,|αβ),β) (3) This gives us an estimate of the document distribution, θ, with which we infer the probability of the comment tokens as p(wc|θ,β) = Xp(wc|z,β)p(z|θ) (4) Xz Following Blei et al. (2003), for the case of a single entity LDA, the inference problem from equation (3) can be solved by considering p(θ, z, w|α, β), as in equation (1), and by taking tph(eθ marginal )di,s atrsib iunti eoqnu aotfio othne ( 1d)o,c aunmde bnyt t toakkeinngs as a continuous mixture distribution for the set w = by integrating over θ and summing over the set of topics z wr, p(w|α,β) =Zp(θ|α)· (5) YwXzp(z|θ)p(w|z,β)!dθ For the case of link-LDA where the document is comprised of two entities, in our case code tokens and text tokens, we can consider the mixedmembership joint distribution θ, as in equation (2), and similarly the marginal distribution p(w|α, β) over bimoithla rclyod teh ean mda tregxint tlok deisntsri bfruotmion w pr(.w |Sαi,nβce) comment words in are all considered as text tokens they are sampled using text topics, namely ztext, in equation (4). wc 36 3 Experimental Settings 3.1 Data and Training Methodology We use source code from nine open source JAVA projects: Ant, Cassandra, Log4j, Maven, MinorThird, Batik, Lucene, Xalan and Xerces. For each project, we divide the source files into a training and testing dataset. Then, for each project in turn, we consider the following three main training scenarios, leading to using three training datasets. To emulate a scenario in which we are predicting comments in the middle of project development, we can use data (documented code) from the same project. In this case, we use the in-project training dataset (IN). Alternatively, if we train a comment prediction model at the beginning of the development, we need to use source files from other, possibly related projects. To analyze this scenario, for each of the projects above we train models using an out-of-project dataset (OUT) containing data from the other eight projects. Typically, source code files contain a greater amount ofcode versus comment text. Since we are interested in predicting comments, we consider a third training data source which contains more English text as well as some code segments. We use data from the popular Q&A; website StackOverflow (SO) where users ask and answer technical questions about software development, tools, algorithms, etc’ . We downloaded a dataset of all actions performed on the site since it was launched in August 2008 until August 2012. The data includes 3,453,742 questions and 6,858,133 answers posted by 1,295,620 users. We used only posts that are tagged as JAVA related questions and answers. All the models for each project are then tested on the testing set of that project. We report results averaged over all projects in Table 1. Source files were tokenized using the Eclipse JDT compiler tools, separating code tokens and identifiers. Identifier names (of classes, methods and variables), were further tokenized by camel case notation (e.g., ’minMargin’ was converted to ’min margin’). Non alpha-numeric tokens (e.g., dot, semicolon) were discarded from the code, as well as numeric and single character literals. Text from comments or any string literals within the code were further tokenized with the Mallet statistical natural language processing package (Mc- Callum, 2002). Posts from SO were parsed using the Apache Tika toolkit1 and then tokenized with the Mallet package. We considered as raw code tokens anything labeled using a markup (as indicated by the SO users who wrote the post). 3.2 Evaluation Since our models are trained using various data sources the vocabularies used by each of them are different, making the comment likelihood given by each model incomparable due to different sets of out-of-vocabulary tokens. We thus evaluate models using a character saving metric which aims at quantifying the percentage of characters that can be saved by using the model in a word-completion settings, similar to standard code completion tools built into code editors. For a comment word with n characters, w = w1, . . . , wn, we predict the two most likely words given each model filtered by the first 0, . . . , n characters ofw. Let k be the minimal ki for which w is in the top two predicted word tokens where tokens are filtered by the first ki characters. Then, the number of saved characters for w is n k. In Table 1we report the average percentage o−f ksa.v Iend T Tcahbalera 1cte wrse per ocrotm thmee avnet using eearcchen not-f the above models. The final results are also averaged over the nine input projects. As an example, in the predicted comment shown in Table 2, taken from the project Minor-Third, the token entity is the most likely token according to the model SO trigram, out of tokens starting with the prefix ’en’ . The saved characters in this case are ’tity’ . − 4 Results Table 1 displays the average percentage of characters saved per class comment using each of the models. Models trained on in-project data (IN) perform significantly better than those trained on another data source, regardless of the model type, with an average saving of 47. 1% characters using a trigram model. This is expected, as files from the same project are likely to contain similar comments, and identifier names that appear in the comment of one class may appear in the code of another class in the same project. Clearly, in-project data should be used when available as it improves comment prediction leading to an average increase of between 6% for the worst model (26.6 for OUT unigram versus 33.05 for IN) and 14% for the best (32.96 for OUT trigram versus 47. 1for IN). 1http://tika.apache.org/ 37 Model n / topics n-gram LDA Link-LDA 1 2 3 20 10 5 1 20 10 5 1 IN 33.05 (3.62) 43.27 (5.79) 47.1 (6.87) 34.20 (3.63) 33.93 (3.67) 33.63 (3.67) 33.05 (3.62) 35.76 (3.95) 35.81 (4.12) 35.37 (3.98) 34.59 (3.92) OUT 26.6 (3.37) 31.52 (4.17) 32.96 (4.33) 26.79 (3.26) 26.8 (3.36) 26.86 (3.44) 26.6 (3.37) 28.03 (3.60) 28 (3.56) 28 (3.67) 27.82 (3.62) SO 27.8 (3.51) 33.29 (4.40) 34.56 (4.78) 27.25 (3.67) 27.22 (3.44) 27.34 (3.55) 27.8 (3.51) 28.08 (3.48) 28.12 (3.58) 27.94 (3.56) 27.9 (3.45) Table 1: Average percentage of characters saved per comment using n-gram, LDA and link-LDA models trained on three training sets: IN, OUT, and SO. The results are averaged over nine JAVA projects (with standard deviations in parenthesis). Model Predicted Comment trigram IN link-LDA OUT trigram SO trigram “Train “Train “Train “Train IN named-entity a named-entity a named-entity a named-entity a extractor“ extractor“ extractor“ extractor“ Table 2: Sample comment from the Minor-Third project predicted using IN, OUT and SO based models. Saved characters are underlined. Of the out-of-project data sources, models using a greater amount of text (SO) mostly outperformed models based on more code (OUT). This increase in performance, however, comes at a cost of greater run-time due to the larger word dictionary associated with the SO data. Note that in the scope of this work we did not investigate the contribution of each of the background projects used in OUT, and how their relevance to the target prediction project effects their performance. The trigram model shows the best performance across all training data sources (47% for IN, 32% for OUT and 34% for SO). Amongst the tested topic models, link-LDA models which distinguish code and text tokens perform consistently better than simple LDA models in which all tokens are considered as text. We did not however find a correlation between the number of latent topics learned by a topic model and its performance. In fact, for each of the data sources, a different num- ber of topics gave the optimal character saving results. Note that in this work, all topic models are based on unigram tokens, therefore their results are most comparable with that of the unigram in Dataset n-gram link-LDA IN 2778.35 574.34 OUT 1865.67 670.34 SO 1898.43 638.55 Table 3: Average words per project for which each tested model completes the word better than the other. This indicates that each of the models is better at predicting a different set of comment words. Table 1, which does not benefit from the backoff strategy used by the bigram and trigram models. By this comparison, the link-LDA topic model proves more successful in the comment prediction task than the simpler models which do not distin- guish code and text tokens. Using n-grams without backoff leads to results significantly worse than any of the presented models (not shown). Table 2 shows a sample comment segment for which words were predicted using trigram models from all training sources and an in-project linkLDA. The comment is taken from the TrainExtractor class in the Minor-Third project, a machine learning library for annotating and categorizing text. Both IN models show a clear advantage in completing the project-specific word Train, compared to models based on out-of-project data (OUT and SO). Interestingly, in this example the trigram is better at completing the term namedentity given the prefix named. However, the topic model is better at completing the word extractor which refers to the target class. This example indicates that each model type may be more successful in predicting different comment words, and that combining multiple models may be advantageous. 38 This can also be seen by the analysis in Table 3 where we compare the average number of words completed better by either the best n-gram or topic model given each training dataset. Again, while n-grams generally complete more words better, a considerable portion of the words is better completed using a topic model, further motivating a hybrid solution. 5 Conclusions We analyze the use of language models for predicting class comments for source file documents containing a mixture of code and text tokens. Our experiments demonstrate the effectiveness of using language models for comment completion, showing a saving of up to 47% of the comment characters. When available, using in-project training data proves significantly more successful than using out-of-project data. However, we find that when using out-of-project data, a dataset based on more words than code performs consistently better. The results also show that different models are better at predicting different comment words, which motivates a hybrid solution combining the advantages of multiple models. Acknowledgments This research was supported by the NSF under grant CCF-1247088. References Ramnath Balasubramanyan and William W Cohen. 2011. Block-lda: Jointly modeling entity-annotated text and entity-entity links. In Proceedings ofthe 7th SIAM International Conference on Data Mining. Dave Binkley, Matthew Hearn, and Dawn Lawrie. 2011. Improving identifier informativeness using part of speech information. In Proc. of the Working Conference on Mining Software Repositories. ACM. David M Blei and Michael I Jordan. 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM. David M Blei, Andrew Y Ng, and Michael IJordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. Elena Erosheva, Stephen Fienberg, and John Lafferty. 2004. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences of the United States of America. Yansong Feng and Mirella Lapata. 2010. How many words is a picture worth? automatic caption generation for news images. In Proc. of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. Yansong Feng and Mirella Lapata. 2013. Automatic caption generation for news images. IEEE transactions on pattern analysis and machine intelligence. Mark Gabel and Zhendong Su. 2008. Javert: fully automatic mining of general temporal properties from dynamic traces. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 339–349. ACM. Thomas L Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. of the National Academy of Sciences of the United States of America. Sangmok Han, David R Wallace, and Robert C Miller. 2009. Code completion from abbreviated input. In Automated Software Engineering, 2009. ASE’09. 24th IEEE/ACM International Conference on, pages 332–343. IEEE. Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Software Engineering (ICSE), 2012 34th International Conference on. IEEE. Ferosh Jacob and Robert Tairas. 2010. Code template inference using language models. In Proceedings of the 48th Annual Southeast Regional Conference. ACM. Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., volume 1, pages 181–184. IEEE. Naveen Kumar and Benjamin Carterette. 2013. Time based feedback and query expansion for twitter search. In Advances in Information Retrieval, pages 734–737. Springer. Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. Whats in a name? a study of identifiers. In Program Comprehension, 2006. ICPC 2006. 14th IEEE International Conference on, pages 3–12. IEEE. Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. Adam Pauls and Dan Klein. 2011. Faster and smaller language models. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 258–267. n-gram Sarah Rastkar, Gail C Murphy, and Alexander WJ Bradley. 2011. Generating natural language summaries for crosscutting source code concerns. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on, pages 103–1 12. IEEE. 39 Ronald Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88(8): 1270–1278. David Shepherd, Zachary P Fry, Emily Hill, Lori Pollock, and K Vijay-Shanker. 2007. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th international conference on Aspect-oriented software development, pages 212–224. ACM. Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering, pages 43–52. ACM. Yuen-Hsien Tseng and Da-Wei Juang. 2003. Document-self expansion for text categorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 399–400. ACM. Xiaojun Wan, Jianwu Yang, and Jianguo Xiao. 2007. Single document summarization with document expansion. In Proc. of the National Conference on Artificial Intelligence. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999. Roung-Shiunn Wu and Po-Chun Li. 2011. Video annotation using hierarchical dirichlet process mixture model. Expert Systems with Applications, 38(4):3040–3048. 40

6 0.13514873 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays

7 0.069016278 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

8 0.066459849 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

9 0.065702006 333 acl-2013-Summarization Through Submodularity and Dispersion

10 0.059397869 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media

11 0.05278755 49 acl-2013-An annotated corpus of quoted opinions in news articles

12 0.052101016 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

13 0.050157771 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words

14 0.048195846 318 acl-2013-Sentiment Relevance

15 0.045832362 310 acl-2013-Semantic Frames to Predict Stock Price Movement

16 0.045679219 148 acl-2013-Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

17 0.045301843 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features

18 0.043335348 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

19 0.043316908 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

20 0.042412337 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.088), (1, 0.133), (2, -0.02), (3, 0.094), (4, -0.002), (5, -0.07), (6, 0.065), (7, 0.004), (8, 0.03), (9, 0.116), (10, -0.007), (11, -0.016), (12, -0.067), (13, 0.038), (14, -0.024), (15, -0.005), (16, -0.017), (17, 0.07), (18, 0.105), (19, 0.091), (20, -0.109), (21, -0.308), (22, 0.067), (23, -0.006), (24, -0.188), (25, 0.324), (26, 0.225), (27, -0.056), (28, 0.086), (29, 0.133), (30, 0.07), (31, 0.069), (32, -0.105), (33, 0.067), (34, -0.055), (35, -0.144), (36, 0.01), (37, -0.004), (38, 0.093), (39, 0.04), (40, -0.022), (41, -0.134), (42, 0.025), (43, 0.011), (44, -0.029), (45, -0.103), (46, -0.05), (47, 0.06), (48, 0.098), (49, 0.109)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98087436 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions

Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li

2 0.8867563 282 acl-2013-Predicting and Eliciting Addressee's Emotion in Online Dialogue

Author: Takayuki Hasegawa ; Nobuhiro Kaji ; Naoki Yoshinaga ; Masashi Toyoda

3 0.581173 284 acl-2013-Probabilistic Sense Sentiment Similarity through Hidden Emotions

Author: Mitra Mohtarami ; Man Lan ; Chew Lim Tan

4 0.53801376 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

Author: Veronica Perez-Rosas ; Rada Mihalcea ; Louis-Philippe Morency

5 0.4520936 79 acl-2013-Character-to-Character Sentiment Analysis in Shakespeare's Plays

Author: Eric T. Nalisnick ; Henry S. Baird

Abstract: We present an automatic method for analyzing sentiment dynamics between characters in plays. This literary format’s structured dialogue allows us to make assumptions about who is participating in a conversation. Once we have an idea of who a character is speaking to, the sentiment in his or her speech can be attributed accordingly, allowing us to generate lists of a character’s enemies and allies as well as pinpoint scenes critical to a character’s emotional development. Results of experiments on Shakespeare’s plays are presented along with discussion of how this work can be extended to unstructured texts (i.e. novels).

6 0.42089057 278 acl-2013-Patient Experience in Online Support Forums: Modeling Interpersonal Interactions and Medication Use

7 0.38200074 184 acl-2013-Identification of Speakers in Novels

8 0.37215745 257 acl-2013-Natural Language Models for Predicting Programming Comments

9 0.29238796 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

10 0.25609276 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation

11 0.24377058 171 acl-2013-Grammatical Error Correction Using Integer Linear Programming

12 0.2387968 178 acl-2013-HEADY: News headline abstraction through event pattern clustering

13 0.21182232 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features

14 0.19447225 327 acl-2013-Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison

15 0.17845146 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

16 0.17839111 311 acl-2013-Semantic Neighborhoods as Hypergraphs

17 0.17583494 310 acl-2013-Semantic Frames to Predict Stock Price Movement

18 0.1720534 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?

19 0.16559529 49 acl-2013-An annotated corpus of quoted opinions in news articles

20 0.1633019 322 acl-2013-Simple, readable sub-sentences

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.091), (6, 0.016), (11, 0.045), (23, 0.322), (24, 0.042), (26, 0.123), (28, 0.015), (35, 0.03), (42, 0.026), (48, 0.021), (70, 0.035), (88, 0.034), (90, 0.016), (95, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78584689 209 acl-2013-Joint Modeling of News Readerâ•Žs and Comment Writerâ•Žs Emotions

Author: Huanhuan Liu ; Shoushan Li ; Guodong Zhou ; Chu-ren Huang ; Peifeng Li

2 0.60445005 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits

Author: Vidhya Govindaraju ; Ce Zhang ; Christopher Re

Abstract: Tabular information in text documents contains a wealth of information, and so tables are a natural candidate for information extraction. There are many cues buried in both a table and its surrounding text that allow us to understand the meaning of the data in a table. We study how natural-language tools, such as part-of-speech tagging, dependency paths, and named-entity recognition, can be used to improve the quality of relation extraction from tables. In three domains we show that (1) a model that performs joint probabilistic inference across tabular and natural language features achieves an F1 score that is twice as high as either a puretable or pure-text system, and (2) using only shallower features or non-joint inference results in lower quality.

3 0.57460266 328 acl-2013-Stacking for Statistical Machine Translation

Author: Majid Razmara ; Anoop Sarkar

Abstract: We propose the use of stacking, an ensemble learning technique, to the statistical machine translation (SMT) models. A diverse ensemble of weak learners is created using the same SMT engine (a hierarchical phrase-based system) by manipulating the training data and a strong model is created by combining the weak models on-the-fly. Experimental results on two language pairs and three different sizes of training data show significant improvements of up to 4 BLEU points over a conventionally trained SMT model.

4 0.56598854 333 acl-2013-Summarization Through Submodularity and Dispersion

Author: Anirban Dasgupta ; Ravi Kumar ; Sujith Ravi

Abstract: We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles—and show that the performance of our algorithm outperforms those that rely only on submodularity.

5 0.44941422 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Author: Young-Bum Kim ; Benjamin Snyder

Abstract: In this paper, we present a solution to one aspect of the decipherment task: the prediction of consonants and vowels for an unknown language and alphabet. Adopting a classical Bayesian perspective, we performs posterior inference over hundreds of languages, leveraging knowledge of known languages and alphabets to uncover general linguistic patterns of typologically coherent language clusters. We achieve average accuracy in the unsupervised consonant/vowel prediction task of 99% across 503 languages. We further show that our methodology can be used to predict more fine-grained phonetic distinctions. On a three-way classification task between vowels, nasals, and nonnasal consonants, our model yields unsu- pervised accuracy of 89% across the same set of languages.

6 0.44922358 257 acl-2013-Natural Language Models for Predicting Programming Comments

7 0.44516319 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies

8 0.43936348 305 acl-2013-SORT: An Interactive Source-Rewriting Tool for Improved Translation

9 0.43453732 221 acl-2013-Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

10 0.43310204 115 acl-2013-Detecting Event-Related Links and Sentiments from Social Media Texts

11 0.43223912 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections

12 0.42497283 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

13 0.42324758 236 acl-2013-Mapping Source to Target Strings without Alignment by Analogical Learning: A Case Study with Transliteration

14 0.42307812 163 acl-2013-From Natural Language Specifications to Program Input Parsers

15 0.42125323 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

16 0.41889688 20 acl-2013-A Stacking-based Approach to Twitter User Geolocation Prediction

17 0.41825542 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing

18 0.41642013 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

19 0.41612256 95 acl-2013-Crawling microblogging services to gather language-classified URLs. Workflow and case study

20 0.4142822 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing