acl acl2012 acl2012-98 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiming Diao ; Jing Jiang ; Feida Zhu ; Ee-Peng Lim
Abstract: Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. The former helps find eventdriven posts while the latter helps identify and filter out “personal” posts. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs.
Reference: text
sentIndex sentText sentNum sentScore
1 Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. [sent-7, score-1.343]
2 To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. [sent-8, score-1.831]
3 Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. [sent-10, score-0.982]
4 We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs. [sent-11, score-1.295]
5 In particular, microblogging sites such as Twitter allow users to easily publish short instant posts about any topic to be shared with the 536 general public. [sent-14, score-0.592]
6 A sudden increase of topically similar posts usually indicates a burst of interest in some event that has happened offline (such as a product launch or a natural disaster) or online (such as the spread of a viral video). [sent-16, score-0.441]
7 Finding bursty topics from mi- croblogs therefore can help us identify the most popular events that have drawn the public’s attention. [sent-17, score-0.947]
8 In this paper, we study the problem of finding bursty topics from a stream of microblog posts generated by different users. [sent-18, score-1.287]
9 Retrospective bursty event detection from text streams is not new (Kleinberg, 2002; Fung et al. [sent-20, score-0.837]
10 , 2007), but finding bursty topics from microblog steams has not been well studied. [sent-22, score-0.995]
11 In contrast, discovering interesting topics that have drawn bursts of interest from a stream of topically diverse microblog posts is itself a challenge. [sent-26, score-0.667]
12 For microblogs, where posts are short and often event-driven, temporal information can sometimes be critical in determining the topic of a post. [sent-29, score-0.595]
13 To capture this intuition, one solution is to assume that posts published within the same short time window follow the same topic distribution. [sent-34, score-0.612]
14 (2007) proposed a PLSA-based topic model that exploits this idea to find correlated bursty patterns across multiple text streams. [sent-36, score-1.035]
15 In order to detect global bursty events from microblog posts, it is important to filter out these “personal” posts. [sent-41, score-0.934]
16 In this paper, we propose a topic model designed for finding bursty topics from microblogs. [sent-42, score-1.216]
17 Our model is based on the following two assumptions: (1) If a post is about a global event, it is likely to follow a global topic distribution that is time-dependent. [sent-43, score-0.558]
18 (2) If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable over time. [sent-44, score-0.643]
19 We find that compared with bursty topics discovered by standard LDA and by two degenerate variations of our model, bursty topics discovered by our model are more accurate and less redundant within the topranked results. [sent-48, score-2.019]
20 We also use some example bursty topics to explain the advantages of our model. [sent-49, score-0.898]
21 2 Related Work To find bursty patterns from data streams, Kleinberg (2002) proposed a state machine to model the arrival times ofdocuments in a stream. [sent-50, score-0.825]
22 Different states generate time gaps according to exponential density 537 functions with different expected values, and bursty intervals can be discovered from the underlying state sequence. [sent-51, score-0.806]
23 To apply these methods to find bursty topics, the data stream used must represent a single topic. [sent-54, score-0.748]
24 The method first finds individual words that have bursty patterns. [sent-57, score-0.695]
25 It then finds groups of words that tend to share bursty periods and co-occur in the same documents to form topics. [sent-58, score-0.695]
26 A major problem with these methods is that the word clustering step can be expensive when the number of bursty words is large. [sent-60, score-0.695]
27 Weng and Lee (201 1) applied word clustering to only the top bursty words within a single day, and subsequently their topics mostly consist of two or three words. [sent-63, score-0.898]
28 In contrast, our method is scalable and each detected bursty topic is directly associated with a word distribution and a set of tweets (see Table 3), which makes it easier to interpret the topic. [sent-64, score-1.082]
29 A number of temporal topic models have been proposed to consider topic changes over time. [sent-67, score-0.653]
30 word distributions, which is not relevant to bursty topic detection (Blei and Lafferty, 2006; Nallapati et al. [sent-70, score-1.034]
31 Some other work looks at the temporal evolution of topics, but the focus is not on bursty patterns (Wang and McCallum, 2006; Ahmed and Xing, 2008; Masada et al. [sent-73, score-0.776]
32 As we will show later in our experiments, for microblogs it is critical to model users’ personal interests in addition to global topical trends. [sent-79, score-0.47]
33 (201 1) further assume that each post is assigned a single topic and some words can be background words. [sent-83, score-0.432]
34 However, these studies do not aim to detect bursty patterns. [sent-84, score-0.726]
35 Our work is novel in that it combines users’ interests and temporal information to detect bursty topics. [sent-85, score-0.872]
36 We define a bursty topic b as a word distribution coupled with a bursty interval, denoted as (ϕb, tsb, teb), where ϕb is a multinomial distribution ui, overthevocabulary, andtsb andteb (1 ≤ tsb ≤ teb ≤ T) are the start and the end timestamps o≤f tthe≤ bursty Tin-) terval, respectively. [sent-99, score-2.51]
37 Our task is to find meaningful bursty topics from the input text stream. [sent-100, score-0.922]
38 Our method consists of a topic discovery step and a burst detection step. [sent-101, score-0.5]
39 At the topic discovery step, we propose a topic model that considers both users’ topical interests and the global topic trends. [sent-102, score-1.158]
40 2 Our Topic Model We assume that there are C (latent) topics in the text stream, where each topic c has a word distribution ϕc. [sent-105, score-0.549]
41 On the other hand, a topic may have multiple bursty intervals and hence leads to multiple bursty topics. [sent-107, score-1.687]
42 In standard LDA, a document contains a mixture of topics, represented by a topic distribution, and each word has a hidden topic label. [sent-110, score-0.636]
43 As we have discussed in Section 1, an important observation we have is that when everything else + is equal, a pair of posts published around the same time is more likely to be about the same topic than a random pair of posts. [sent-117, score-0.59]
44 To model this observation, we assume that there is a global topic distribution θt for each time point t. [sent-118, score-0.453]
45 Unlike news articles from traditional media, which are mostly about current affairs, an important property of microblog posts is that many posts are about users’ personal encounters and interests rather than global events. [sent-120, score-0.839]
46 When user u publishes a post at time point t, she first decides whether to write about a global trendy topic or a personal top- ηu ic. [sent-126, score-0.618]
47 If she chooses the former, she then selects a topic according to Otherwise, she selects a topic according to her own topic distribution ηu. [sent-127, score-0.918]
48 With the chosen topic, words in the post are generated from the word distribution for that topic or from the background word distribution that captures white noise. [sent-128, score-0.44]
49 We use π to denote the probability of choosing to talk about a global topic rather than a personal topic. [sent-153, score-0.474]
50 In this model, we only consider the time-dependent topic distributions that capture the global topical trends. [sent-158, score-0.451]
51 M(π0) is the number of posts generated by personal interests, while M(π1) is the number of posts coming from global topical trends. [sent-173, score-0.726]
52 M(uc)i is the number of posts by user ui and assigned to topic c, and M(u·)i is the total number of posts by ui. [sent-175, score-0.888]
53 M(tic) is the number and M(ti·) of posts assigned to topic c at time point ti, is the total number of posts at ti. [sent-176, score-0.823]
54 E(v) is the numb(e·)r of times word v occurs in the i-th post and is labeled as a topic word, while E(·) is the total number of topic words in the i-th post. [sent-177, score-0.683]
55 M(cv) is the number of times word v is assigned to topic c, and M(c·) is the total number of words assigned to topic c. [sent-179, score-0.642]
56 M(zwii,j) counts the number of times word wi,j is assigned to topic zi, and M(z·)i is the total number of words assigned to topic zi. [sent-184, score-0.642]
57 Figure 1: (a) Our topic model for burst detection. [sent-186, score-0.453]
58 3 Burst Detection Just like standard LDA, our topic model itself finds a set of topics represented by ϕc but does not directly generate bursty topics. [sent-190, score-1.216]
59 To identify bursty topics, we use the following mechanism, which is based on the idea by Kleinberg (2002) and Ihler et al. [sent-191, score-0.695]
60 We assume that after topic modeling, for each discovered topic c, we can obtain a series of counts (mc1 , m2c, . [sent-194, score-0.665]
61 For TimeUserLDA, these are the numbers of posts which are in topic c and generated by the global topic distribution θti , i. [sent-199, score-0.922]
62 For other models, these are the numbers of posts in topic c. [sent-201, score-0.536]
63 We assume that these counts are generated by two Poisson distributions corresponding to a bursty state and a normal state, respectively. [sent-202, score-0.776]
64 Let µ0 denote the expected count for the normal state and µ1 for the bursty state. [sent-203, score-0.733]
65 Let vt denote the state for time point t, where vt = 0 indicates the normal state and vt = 1 indicates the bursty state. [sent-204, score-0.957]
66 756/ A07 0 Table 2: Precision at K for the various models after we remove redundant bursty topics. [sent-219, score-0.76]
67 Finally, a burst is marked by a consecutive subse- T1 quence of bursty states. [sent-228, score-0.83]
68 These Twitter users were obtained by starting from a set of seed Singapore users who are active online and tracing Bursty PeriodTop WordsExample TweetsLabel Table 3: Top-5 bursty topics ranked by TimeUserLDA. [sent-232, score-1.041]
69 The 3rd and the 4th bursty topics come from the same topic but have different bursty periods. [sent-234, score-1.89]
70 RankLDAUserLDATimeLDA 41235gSirtelvseMlaJN poA /bMA ins’gAdmeaotmh#zamfoainopM thbNroAainm/lMAeagA4arysmecho lgirslM aNpA p/iM Ang A mo Table 4: Top-5 bursty topics ranked by other models. [sent-235, score-0.929]
71 As we have explained in Section 3, each model gives us time series data for a number of topics, and by applying a Poisson-based state machine, we can obtain a set of bursty topics. [sent-246, score-0.778]
72 For each method, we rank the obtained bursty topics by the number 541 of tweets (or words in the case of the LDA model) assigned to the topics and take the top-30 bursty topics from each model. [sent-247, score-2.086]
73 In the case of the LDA model, only 23 bursty topics were detected. [sent-248, score-0.898]
74 The judges were given the bursty period and 100 randomly selected tweets for the given topic within that period for each bursty topic. [sent-251, score-1.828]
75 A bursty topic was scored 1 if the 100 tweets coherently describe a bursty event based on the human judge’s understanding. [sent-253, score-1.815]
76 For ground truth, we consider a bursty topic to be correct if both human judges have scored it 1. [sent-256, score-1.074]
77 Since some models gave redundant bursty topics, we also asked one of the judges to identify unique bursty topics from the ground truth bursty topics. [sent-257, score-2.435]
78 As we have pointed out, some of the bursty topics are redundant, i. [sent-267, score-0.898]
79 We will further discuss redundant bursty topics in the next section. [sent-273, score-0.963]
80 First, we show the top-5 bursty topics discovered by the TimeUserLDA model in Table 3. [sent-276, score-0.968]
81 As we can see, all these bursty topics are meaningful. [sent-277, score-0.898]
82 For comparison, we also show the top-5 bursty topics discovered by other models in Table 4. [sent-279, score-0.947]
83 We 542 find that this can help separate bursty topics from general ones. [sent-283, score-0.898]
84 A close inspection tells us that the topic under UserLDA is actually related to the subway systems in Singapore in general, which include a few other subway lines, and the Circle Line topic is merged with this general topic. [sent-289, score-0.698]
85 On the other hand, TimeLDA and TimeUserLDA are both able to separate the Circle Line topic from the general subway topic because the Circle Line has several bursts. [sent-290, score-0.646]
86 We can see that TimeLDA and TimeUserLDA show clearer bursty patterns than UserLDA for this topic. [sent-292, score-0.717]
87 All our topic models give multiple topics related to Korean pop music, and many of them have a burst on November 29, 2011. [sent-299, score-0.681]
88 Under the TimeLDA and UserLDA models, this leads to several redundant bursty topics for the MAMA event ranked within the top-30. [sent-300, score-1.038]
89 We find that this is because with TimeUserLDA, we can remove tweets that are considered personal and therefore do not contribute to bursty topic ranking. [sent-302, score-1.17]
90 We show the topic intensity of a topic about a Korean pop singer in UserLDA TimeLDA TimeUserLDA m1 6204820 0 0 0 102 304 506 708 90 m1 6204820 0 0 0 102 304 506 708 90 m1 6204820 0 0 0 102 304 506 708 90 t t t Figure 3: Topic intensity over time for the topic on the Circle Line. [sent-303, score-1.143]
91 UserLDA TimeLDA TimeUserLDA m 35726140 0 0 0 0 102 304 506 708 90 m 35726140 0 0 0 0 102 304 506 708 90 m 35726140 0 0 0 0 102 304 506 708 90 t t t Figure 4: Topic intensity over time for the topic about a Korean pop singer. [sent-304, score-0.458]
92 We can see that because this topic is related to Korean pop music, it has a burst on day 90 (November 29). [sent-308, score-0.521]
93 But if we consider the relative intensity of this burst compared with Steve Jobs’ death, under TimeLDA and UserLDA, this topic is still strong but under TimeUserLDA its intensity can almost be ignored. [sent-309, score-0.614]
94 This is why with TimeLDA and UserLDA this topic leads to a redundant burst within the top30 results but with TimeUserLDA the burst is not ranked high. [sent-310, score-0.663]
95 5 Conclusions In this paper, we studied the problem of finding bursty topics from the text streams on microblogs. [sent-311, score-0.954]
96 Because existing work on burst detection from text streams may not be suitable for microblogs, we proposed a new topic model that considers both the temporal information of microblog posts and users’ personal interests. [sent-312, score-1.061]
97 We then applied a Poissonbased state machine to identify bursty periods from the topics discovered by our model. [sent-313, score-0.985]
98 Our quantitative evaluation showed that our 543 model could more accurately detect unique bursty topics among the top ranked results. [sent-315, score-0.981]
99 Our method currently can only detect bursty topics in a retrospective and offline manner. [sent-317, score-0.96]
100 Mining correlated bursty topic patterns from coordinated text streams. [sent-388, score-1.014]
wordName wordTfidf (topN-words)
[('bursty', 0.695), ('topic', 0.297), ('posts', 0.239), ('topics', 0.203), ('timeuserlda', 0.182), ('timelda', 0.156), ('userlda', 0.143), ('burst', 0.135), ('personal', 0.115), ('microblogs', 0.114), ('microblog', 0.097), ('intensity', 0.091), ('post', 0.089), ('interests', 0.087), ('circle', 0.077), ('topical', 0.071), ('jobs', 0.068), ('redundant', 0.065), ('twitter', 0.063), ('tweets', 0.063), ('lda', 0.062), ('global', 0.062), ('temporal', 0.059), ('ui', 0.058), ('users', 0.056), ('streams', 0.056), ('zi', 0.055), ('vt', 0.054), ('stream', 0.053), ('korean', 0.052), ('subway', 0.052), ('bursts', 0.052), ('singapore', 0.052), ('discovered', 0.049), ('events', 0.049), ('music', 0.048), ('pop', 0.046), ('november', 0.046), ('weng', 0.045), ('event', 0.044), ('day', 0.043), ('detection', 0.042), ('hidden', 0.042), ('ahmed', 0.04), ('ihler', 0.039), ('degenerate', 0.039), ('state', 0.038), ('draw', 0.037), ('judges', 0.036), ('fung', 0.035), ('kleinberg', 0.035), ('mm', 0.033), ('sigkdd', 0.033), ('user', 0.031), ('retrospective', 0.031), ('poisson', 0.031), ('detect', 0.031), ('steve', 0.031), ('ranked', 0.031), ('published', 0.03), ('ti', 0.027), ('blei', 0.027), ('yi', 0.027), ('distribution', 0.027), ('death', 0.026), ('aedr', 0.026), ('jianshu', 0.026), ('mama', 0.026), ('masada', 0.026), ('mct', 0.026), ('methodp', 0.026), ('mnet', 0.026), ('mtc', 0.026), ('ofdocuments', 0.026), ('teb', 0.026), ('timutiemsuledresl', 0.026), ('tsb', 0.026), ('discovery', 0.026), ('dirichlet', 0.025), ('ground', 0.025), ('wang', 0.025), ('time', 0.024), ('meaningful', 0.024), ('assigned', 0.024), ('topically', 0.023), ('padhraic', 0.023), ('arrival', 0.023), ('amr', 0.023), ('publisher', 0.023), ('assume', 0.022), ('xing', 0.022), ('multinomial', 0.022), ('patterns', 0.022), ('model', 0.021), ('period', 0.021), ('scored', 0.021), ('truth', 0.021), ('distributions', 0.021), ('timestamp', 0.021), ('michal', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 98 acl-2012-Finding Bursty Topics from Microblogs
Author: Qiming Diao ; Jing Jiang ; Feida Zhu ; Ee-Peng Lim
Abstract: Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. The former helps find eventdriven posts while the latter helps identify and filter out “personal” posts. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs.
2 0.47413298 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
Author: Xin Zhao ; Rishan Chen ; Kai Fan ; Hongfei Yan ; Xiaoming Li
Abstract: Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, we proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. Meanwhile, it significantly reduces the number of non-zero entries in the representation. We test it via scalable event detection, and experiments in a 10-year news archive show that our methods are both effective and efficient.
3 0.25432366 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
Author: Xinyan Xiao ; Deyi Xiong ; Min Zhang ; Qun Liu ; Shouxun Lin
Abstract: Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. Our model also achieves a better performance and a faster speed than previous approaches that work at the word level.
Author: Viet-An Nguyen ; Jordan Boyd-Graber ; Philip Resnik
Abstract: One of the key tasks for analyzing conversational data is segmenting it into coherent topic segments. However, most models of topic segmentation ignore the social aspect of conversations, focusing only on the words used. We introduce a hierarchical Bayesian nonparametric model, Speaker Identity for Topic Segmentation (SITS), that discovers (1) the topics used in a conversation, (2) how these topics are shared across conversations, (3) when these topics shift, and (4) a person-specific tendency to introduce new topics. We evaluate against current unsupervised segmentation models to show that including personspecific information improves segmentation performance on meeting corpora and on political debates. Moreover, we provide evidence that SITS captures an individual’s tendency to introduce new topics in political contexts, via analysis of the 2008 US presidential debates and the television program Crossfire. 1 Topic Segmentation as a Social Process Conversation, interactive discussion between two or more people, is one of the most essential and common forms of communication. Whether in an informal situation or in more formal settings such as a political debate or business meeting, a conversation is often not about just one thing: topics evolve and are replaced as the conversation unfolds. Discovering this hidden structure in conversations is a key problem for conversational assistants (Tur et al., 2010) and tools that summarize (Murray et al., 2005) and display (Ehlen et al., 2007) conversational data. Topic segmentation also can illuminate individuals’ agendas (Boydstun et al., 2011), patterns of agree- ment and disagreement (Hawes et al., 2009; Abbott 78 Jordan Boyd-Graber iSchool and UMIACS University of Maryland College Park, MD jbg@ umiac s .umd .edu Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik @ umd .edu al., 2011), and relationships among conversational participants (Ireland et al., 2011). One of the most natural ways to capture conversational structure is topic segmentation (Reynar, 1998; Purver, 2011). Topic segmentation approaches range from simple heuristic methods based on lexical similarity (Morris and Hirst, 1991 ; Hearst, 1997) to more intricate generative models and supervised methods (Georgescul et al., 2006; Purver et al., 2006; Gruber et al., 2007; Eisenstein and Barzilay, 2008), which have been shown to outperform the established heuristics. However, previous computational work on conversational structure, particularly in topic discovery and topic segmentation, focuses primarily on conet tent, ignoring the speakers. We argue that, because conversation is a social process, we can understand conversational phenomena better by explicitly modeling behaviors of conversational participants. In Section 2, we incorporate participant identity in a new model we call Speaker Identity for Topic Segmentation (SITS), which discovers topical structure in conversation while jointly incorporating a participantlevel social component. Specifically, we explicitly model an individual’s tendency to introduce a topic. After outlining inference in Section 3 and introducing data in Section 4, we use SITS to improve state-ofthe-art-topic segmentation and topic identification models in Section 5. In addition, in Section 6, we also show that the per-speaker model is able to discover individuals who shape and influence the course of a conversation. Finally, we discuss related work and conclude the paper in Section 7. 2 Modeling Multiparty Discussions Data Properties We are interested in turn-taking, multiparty discussion. This is a broad category, inProce Jedijung, sR oefpu thbeli c50 othf K Aonrneua,a8l -M14e Jtiunlgy o 2f0 t1h2e. A ?c s 2o0c1ia2ti Aosns fo cria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsteiscs 78–87, cluding political debates, business meetings, and online chats. More formally, such datasets contain C conversations. A conversation c has Tc turns, each of which is a maximal uninterrupted utterance by one speaker.1 In each turn t ∈ [1, Tc], a speaker ac,t utters N words {wc,t,n}. Eatch ∈ w [1o,rTd is from a vocabulary of size V , {awnd th}ere are M distinct speakers. Modeling Approaches The key insight of topic segmentation is that segments evince lexical cohesion (Galley et al., 2003; Olney and Cai, 2005). Words within a segment will look more like their neighbors than other words. This insight has been used to tune supervised methods (Hsueh et al., 2006) and inspire unsupervised models of lexical cohesion using bags of words (Purver et al., 2006) and language models (Eisenstein and Barzilay, 2008). We too take the unsupervised statistical approach. It requires few resources and is applicable in many domains without extensive training. Like previous approaches, we consider each turn to be a bag of words generated from an admixture of topics. Topics—after the topic modeling literature (Blei and Lafferty, 2009)—are multinomial distributions over terms. These topics are part of a generative model posited to have produced a corpus. However, topic models alone cannot model the dynamics of a conversation. Topic models typically do not model the temporal dynamics of individual documents, and those that do (Wang et al., 2008; Gerrish and Blei, 2010) are designed for larger documents and are not applicable here because they assume that most topics appear in every time slice. Instead, we endow each turn with a binary latent variable lc,t, called the topic shift. This latent variable signifies whether the speaker changed the topic of the conversation. To capture the topic-controlling behavior of the speakers across different conversations, we further associate each speaker m with a latent topic shift tendency, πm. Informally, this variable is intended to capture the propensity of a speaker to effect a topic shift. Formally, it represents the probability that the speaker m will change the topic (distribution) of a conversation. We take a Bayesian nonparametric approach (M¨uller and Quintana, 2004). Unlike 1Note the distinction with phonetic definition are bounded by silence. utterances, which by 79 parametric models, which a priori fix the number of topics, nonparametric models use a flexible number of topics to better represent data. Nonparametric distributions such as the Dirichlet process (Ferguson, 1973) share statistical strength among conversations using a hierarchical model, such as the hierarchical Dirichlet process (HDP) (Teh et al., 2006). 2.1 Generative Process In this section, we develop SITS, a generative model of multiparty discourse that jointly discovers topics and speaker-specific topic shifts from an unannotated corpus (Figure 1a). As in the hierarchical Dirichlet process (Teh et al., 2006), we allow an unbounded number of topics to be shared among the turns of the corpus. Topics are drawn from a base distribution H over multinomial distributions over the vocabulary, a finite Dirichlet with symmetric prior λ. Unlike the HDP, where every document (here, every turn) draws a new multinomial distribution from a Dirichlet process, the social and temporal dynamics of a conversation, as specified by the binary topic shift indicator lc,t, determine when new draws happen. The full generative process is as follows: 1. For speaker m ∈ [1, M], draw speaker shift probability πm ∼ Beta(γ) 2. Draw∼ global probability measure G0 ∼ DP(α, H) 3. For each conversation c ∈ [1, C] (a) Draw conversation distribution Gc ∼ DP(α0 , G0) (b) For each turn t ∈ [1, Tc] with speaker ac,t i. If t = 1, set the topic shift lc,t = 1. Otherwise, draw lc,t ∼ Bernoulli(πac,t ). ii. If lc,t = 1∼, d Breawrn Gc,t ∼ DP(αc, Gc). Otherwise, set Gc,t ≡ Gc,t−1 . iii. For each word ≡ind Gex n ∈ [1, Nc,t] • Draw ψc,t,n ∼ Gc,t • DDrraaww wc,t,n ∼ Multinomial(ψc,t,n) The hierarchy of Dirichlet processes allows statistical strength to be shared across contexts; within a conversation and across conversations. The perspeaker topic shift tendency πm allows speaker identity to influence the evolution of topics. To make notation concrete and aligned with the topic segmentation, we introduce notation for segments in a conversation. A segment s of conversation c is a sequence of turns [τ, τ0] such that lc,τ = lc,τ0+1 = 1and lc,t = 0, ∀t ∈ (τ, τ0] . When lc,t = 0, Gc,t is the same =Gc 0,t,−∀1t a ∈nd ( aτ,llτ τtopics (i.e. multinomial distributions over words) {ψc,t,n} that generate words in turn t and the topics{ ψ{ψc,t}−1,n} that generate words in turn t −1 come from{ψ ψthc,et −s1a,mn}e as Figure 1: Graphical model representations of our proposed models: (a) the nonparametric version; (b) the parametric version. Nodes represent random variables (shaded ones are observed), lines are probabilistic dependencies. Plates represent repetition. The innermost plates are turns, grouped in conversations. distribution. Thus all topics used in a segment s are drawn from a single distribution, Gc,s, , , , Gc,s | lc,1 lc,2 · · · lc,Tc , αc, Gc ∼ DP(αc, Gc) (1) For notational convenience, Sc denotes the number of segments in conversation c, and st denotes the segment index of turn t. We emphasize that all segment-related notations are derived from the posterior over the topic shifts land not part of the model itself. Parametric Version SITS is a generalization of a parametric model (Figure 1b) where each turn has a multinomial distribution over K topics. In the parametric case, the number of topics K is fixed. Each topic, as before, is a multinomial distribution φ1 . . . φK. In the parametric case, each turn t in conversation c has an explicit multinomial distribution over K topics θc,t, identical for turns within a segment. A new topic distribution θ is drawn from a Dirichlet distribution parameterized by α when the topic shift indicator lis 1. The parametric version does not share strength within or across conversations, unlike SITS. When applied on a single conversation without speaker identity (all speakers are identical) it is equivalent to (Purver et al., 2006). In our experiments (Section 5), we compare against both. 80 3 Inference To find the latent variables that best explain observed data, we use Gibbs sampling, a widely used Markov chain Monte Carlo inference technique (Neal, 2000; Resnik and Hardisty, 2010). The state space is latent variables for topic indices assigned to all tokens z = {zc,t,n} and topic shifts assigned to turns l= {lc,t}. {Wze marginalize over all other latent variablle =s. Here, we only present the conditional sampling equations; for more details, see our supplement.2 3.1 Sampling Topic Assignments To sample zc,t,n, the index of the shared topic assigned to token n of turn t in conversation c, we need to sample the path assigning each word token to a segment-specific topic, each segment-specific topic to a conversational topic and each conversational topic to a shared topic. For efficiency, we make use of the minimal path assumption (Wallach, 2008) to generate these assignments.3 Under the minimal path assumption, an observation is assumed to have been generated by using a new distribution if and only if there is no existing distribution with the same value. 2 http://www.cs.umd.edu/∼vietan/topicshift/appendix.pdf 3We also investigated using the maximal assumption and fully sampling assignments. We found the minimal path assumption worked as well as explicitly sampling seating assignments and that the maximal path assumption worked less well. We use Nc,s,k to denote the number of tokens in segment s in conversation c assigned topic k; Nc,k denotes the total number of segment-specific topics in conversation c assigned topic k and Nk denotes the number of conversational topics assigned topic k. TWk,w denotes the number of times the shared topic k is assigned to word w in the vocabulary. Marginal counts are represented with · and ∗ represents all hyperparameters. The condit·ional d∗istribution for zc,t,n is P(zc,t,n = k | wc,t,n = w, z−c,t,n, w−c,t,n, l, ∗) ∝ Nc−,sct ,kn+αNc −c,s−ct,kct·,n Nn+c −,·αc ,t0cnN +k−· αc,t0 ,n + αK × VT1 W k−, ·c,wctk, n e+w V.λ( 2), Here V is the size of the vocabulary, K is the current number of shared topics and the superscript −c,t,n denotes counts without considering wc,t,n. In Equation 2, the first factor is proportional to the probability of sampling a path according to the minimal path assumption; the second factor is proportional to the likelihood of observing w given the sampled topic. Since an uninformed prior is used, when a new topic is sampled, all tokens are equiprobable. 3.2 Sampling Topic Shifts Sampling the topic shift variable lc,t requires us to consider merging or splitting segments. We use kc,t to denote the shared topic indices of all tokens in turn t of conversation c; Sac,t,x to denote the number of times speaker ac,t is assigned the topic shift with value x ∈ {0, 1}; Jcx,s to denote the number of topics in segment s 1o}f conversation c if lc,t = x and Ncx,s,j to denote the number of tokens assigned to the segment-specific topic j when lc,t = x.4 Again, the superscript −c,t is used to denote exclusion of turn t of conversation c in the corresponding counts. Recall that the topic shift is a binary variable. We use 0 to represent the case that the topic distribution is identical to the previous turn. We sample this assignment P(lc,t = 0 | l−c,t, w, k, a, ∗) ∝ SSa−a−cc,c,ct,t , t·,0++ 2 γγ×αcJ0c,sNtx=Qc01,sjJt=c,0·,1s(tx(N −c0 1,s +t,j α−c) 1)!. (3) 4Deterministically knowQing the path assignments is the primary efficiency motivation for using the minimal path assumption. The alternative is to explicitly sample the path assignments, which is more complicated (for both notation and computation). This option is spelled in full detail in the supplementary material. 81 In Equation 3, the first factor is proportional to the probability of assigning a topic shift of value 0 to speaker ac,t and the second factor is proportional to the joint probability of all topics in segment st of conversation c when lc,t = 0. The other alternative is for the topic shift to be 1, which represents the introduction of a new distri- bution over topics inside an existing segment. We sample this as P(lc,t = 1 | l−c,t, w, k, a, ∗) ∝ S −a −c ,c t, t, t, ·1+ 2 γ ×αcJc1,(st−1x)NQ=c1,1(jJs=ct1−,1(s1t)−,·1()x(N −c1 1,( +st− α1c) ,j− 1)! αcJcQ1,sNxt=c1Q1,stJj,c=1·,(s1xt( −N 1c1, +stj α−c) 1)!. (4) As above, the first faQctor in Equation 4 is proportional to the probability of assigning a topic shift of value 1to speaker ac,t; the second factor in the big bracket is proportional to the joint distribution of the topics in segments st − 1 and st. In this case lc,t = 1 means splitting the current segment, which results in two joint probabilities for two segments. 4 Datasets This section introduces the three corpora we use. We preprocess the data to remove stopwords and remove turns containing fewer than five tokens. The ICSI Meeting Corpus: The ICSI Meeting Corpus (Janin et al., 2003) is 75 transcribed meetings. For evaluation, we used a standard set of reference segmentations (Galley et al., 2003) of 25 meetings. Segmentations are binary, i.e., each point of the document is either a segment boundary or not, and on average each meeting has 8 segment boundaries. After preprocessing, there are 60 unique speakers and the vocabulary contains 3346 non-stopword tokens. The 2008 Presidential Election Debates Our second dataset contains three annotated presidential debates (Boydstun et al., 2011) between Barack Obama and John McCain and a vice presidential debate between Joe Biden and Sarah Palin. Each turn is one of two types: questions (Q) from the moderator or responses (R) from a candidate. Each clause in a turn is coded with a Question Topic (TQ) and a Response Topic (TR). Thus, a turn has a list of TQ’s and TR’s both of length equal to the number of clauses in the turn. Topics are from the Policy Agendas Topics SpeakerTypeTurn clausesTQTR BrokawQbSeenfo.r Oeib ta gmeat,s [b.e.t.t]er A arned yo thuey sa oyuingght [. to. b]e th parte tphaere Adm foerri tchaant? economy is going to get much worse1N/A ObamaR[hN.o .m,.]e Is B a,um mtac mokenofs itdu iermenpt o ahrabt oaun th tel yt ,h we c Aaen’rm epea gryoic ithnangei e trco bo hinlaosvm e[.y t. o. h]elp ordinary familes be able to stay in their1 1 4 BrokawQSen. McCain, in all candor, do you think the economy is going to get worse before it gets better?1N/A McCainR[Iom.ftwho.trie]n Ikiegrtofih oeicwonumkteiv aegfn wdlyt.ebri[ua.dyc otuf]petfh ec tserivo bnlayd,islmfoaw nes,d staobptihelcaziteplt ihoneptlrheoscuatsni hgflauvmean rckne itnw– WmhoaisrcthgiaIngbetoalnitevshoe w ne wca vna,l ucet1 240 Table 1: Example turns from the annotated 2008 election debates. The topics (TQ and TR) are from the Policy Agendas Topics Codebook which contains the following codes of topic: Macroeconomics Community Development (14), Government Operations (20). (1), Housing & Codebook, a manual inventory of 19 major topics and 225 subtopics.5 Table 1 shows an example annotation. To get reference segmentations, we assign each turn a real value from 0 to 1indicating how much a turn changes the topic. For a question-typed turn, the score is the fraction of clause topics not appearing in the previous turn; for response-typed turns, the score is the fraction of clause topics that do not appear in the corresponding question. This results in a set of non-binary reference segmentations. For evaluation metrics that require binary segmentations, we create a binary segmentation by setting a turn as a segment boundary if the computed score is 1. This threshold is chosen to include only true segment boundaries. CNN’s Crossfire Crossfire was a weekly U.S. television “talking heads” program engineered to incite heated arguments (hence the name). Each episode features two recurring hosts, two guests, and clips from the week’s news. Our Crossfire dataset contains 1134 transcribed episodes aired between 2000 and 2004.6 There are 2567 unique speakers. Unlike the previous two datasets, Crossfire does not have explicit topic segmentations, so we use it to explore speaker-specific characteristics (Section 6). 5 Topic Segmentation Experiments In this section, we examine how well SITS can replicate annotations of when new topics are introduced. 5 http://www.policyagendas.org/page/topic-codebook 6 http://www.cs.umd.edu/∼vietan/topicshift/crossfire.zip 82 We discuss metrics for evaluating an algorithm’s segmentation against a gold annotation, describe our experimental setup, and report those results. Evaluation Metrics To evaluate segmentations, we use Pk (Beeferman et al., 1999) and WindowDiff (WD) (Pevzner and Hearst, 2002). Both metrics measure the probability that two points in a document will be incorrectly separated by a segment boundary. Both techniques consider all spans of length k in the document and count whether the two endpoints of the window are (im)properly segmented against the gold segmentation. However, these metrics have drawbacks. First, they require both hypothesized and reference segmentations to be binary. Many algorithms (e.g., probabilistic approaches) give non-binary segmentations where candidate boundaries have real-valued scores (e.g., probability or confidence). Thus, evaluation requires arbitrary thresholding to binarize soft scores. To be fair, thresholds are set so the number of segments are equal to a predefined value (Purver et al., 2006; Galley et al., 2003). To overcome these limitations, we also use Earth Mover’s Distance (EMD) (Rubner et al., 2000), a metric that measures the distance between two distributions. The EMD is the minimal cost to transform one distribution into the other. Each segmentation can be considered a multi-dimensional distribution where each candidate boundary is a dimension. In EMD, a distance function across features allows partial credit for “near miss” segment boundaries. In addition, because EMD operates on distributions, we can compute the distance between non-binary hypothesized segmentations with binary or real-valued reference segmentations. We use the FastEMD implementation (Pele and Werman, 2009). Experimental Methods We applied the following methods to discover topic segmentations in a document: • TextTiling (Hearst, 1997) is one of the earliest generalpurpose topic segmentation algorithms, sliding a fixedwidth window to detect major changes in lexical similarity. • P-NoSpeaker-S: parametric version without speaker identity run on keaerc-hS conversation (Purver et al., 2006) • P-NoSpeaker-M: parametric version without speaker identity run on Mall conversations • P-SITS: the parametric version of SITS with speaker identity run on all conversations • NP-HMM: the HMM-based nonparametric model which a single topic per turn. This model can be considered a Sticky HDP-HMM (Fox et al., 2008) with speaker identity. • NP-SITS: the nonparametric version of SITS with speaker identity run on all conversations. Parameter Settings and Implementations experiment, all parameters same as in (Hearst, 1997). of TextTiling In our are the For statistical models, Gibbs sampling with 10 randomly initialized chains is used. Initial hyperparameter values are sampled from U(0, 1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal, 2003) optimizes hyperparameters. Results and Analysis Table 2 shows the perfor- mance of various models on the topic segmentation problem, using the ICSI corpus and the 2008 debates. Consistent with previous results, probabilistic models outperform TextTiling. In addition, among the probabilistic models, the models that had access to speaker information consistently segment better than those lacking such information, supporting our assertion that there is benefit to modeling conversation as a social process. Furthermore, NP-SITS outperforms NP-HMM in both experiments, suggesting that using a distribution over topics to turns is better than using a single topic. This is consistent with parametric results reported in (Purver et al., 2006). The contribution of speaker identity seems more valuable in the debate setting. Debates are characterized by strong rewards for setting the agenda; dodging a question or moving the debate toward an oppo83 nent’s weakness can be useful strategies (Boydstun et al., 2011). In contrast, meetings (particularly lowstakes ICSI meetings) are characterized by pragmatic rather than strategic topic shifts. Second, agendasetting roles are clearer in formal debates; a modera- tor is tasked with setting the agenda and ensuring the conversation does not wander too much. The nonparametric model does best on the smaller debate dataset. We suspect that an evaluation that directly accessed the topic quality, either via prediction (Teh et al., 2006) or interpretability (Chang et al., 2009) would favor the nonparametric model more. 6 Evaluating Topic Shift Tendency In this section, we focus on the ability of SITS to capture speaker-level attributes. Recall that SITS associates with each speaker a topic shift tendency π that represents the probability of asserting a new topic in the conversation. While topic segmentation is a well studied problem, there are no established quantitative measurements of an individual’s ability to control a conversation. To evaluate whether the tendency is capturing meaningful characteristics of speakers, we compare our inferred tendencies against insights from political science. 2008 Elections To obtain a posterior estimate of π (Figure 3) we create 10 chains with hyperparameters sampled from the uniform distribution U(0, 1) and averaged π over 10 chains (as described in Section 5). In these debates, Ifill is the moderator of the debate between Biden and Palin; Brokaw, Lehrer and Schieffer are the three moderators of three debates between Obama and McCain. Here “Question” denotes questions from audiences in “town hall” debate. The role of this “speaker” can be considered equivalent to the debate moderator. The topic shift tendencies of moderators are much higher than for candidates. In the three debates between Obama and McCain, the moderators— Brokaw, Lehrer and Schieffer—have significantly higher scores than both candidates. This is a useful reality check, since in a debate the moderators are the ones asking questions and literally controlling the topical focus. Interestingly, in the vice-presidential debate, the score of moderator Ifill is only slightly higher than those of Palin and Biden; this is consistent with media commentary characterizing her as a size of the metrics Pk and WindowDiff chosen to replicate previous results. weak moderator.7 Similarly, the “Question” speaker had a relatively high variance, consistent with an amalgamation of many distinct speakers. These topic shift tendencies suggest that all candidates manage to succeed at some points in setting and controlling the debate topics. Our model gives Obama a slightly higher score than McCain, consistent with social science claims (Boydstun et al., 2011) that Obama had the lead in setting the agenda over McCain. Table 4 shows of SITS-detected topic shifts. Crossfire Crossfire, unlike the debates, has many speakers. This allows us to examine more closely what we can learn about speakers’ topic shift tendency. We verified that SITS can segment topics, and assuming that changing the topic is useful for a speaker, how can we characterize who does so effectively? We examine the relationship between topic shift tendency, social roles, and political ideology. To focus on frequent speakers, we filter out speakers with fewer than 30 turns. Most speakers have relatively small π, with the mode around 0.3. There are, however, speakers with very high topic shift tendencies. Table 5 shows the speakers having the highest values according to SITS. We find that there are three general patterns for who influences the course of a conversation in Crossfire. First, there are structural “speakers” the show uses to frame and propose new topics. These are 7 http://harpers.org/archive/2008/10/hbc-90003659 84 2008 Presidential Election Debates (larger means greater tendency) audience questions, news clips (e.g. many of Gore’s and Bush’s turns from 2000), and voice overs. That SITS is able to recover these is reassuring. Second, the stable of regular hosts receives high topic shift tendencies, which is reasonable given their experience with the format and ostensible moderation roles (in practice they also stoke lively discussion). The remaining class is more interesting. The remaining non-hosts with high topic shift tendency are relative moderates on the political spectrum: • John Kasich, one of few Republicans to support the assault weapons ban and now governor of Ohio, a swing state • Christine Todd Whitman, former Republican governor of CNehrwis Jersey, a very iDtmemano,c froartmice srt Ratee • John McCain, who before 2008 was known as a “maverick” for working with Democrats (e.g. Russ Feingold) This suggests that, despite Crossfire’s tendency to create highly partisan debates, those who are able to work across the political spectrum may best be able to influence the topic under discussion in highly polarized contexts. Table 4 shows detected topic shifts from these speakers; two of these examples (McCain and Whitman) show disagreement of Republicans with President Bush. In the other, Kasich is defending a Republican plan (school vouchers) popular with traditional Democratic constituencies. 7 Related and Future Work In the realm of statistical models, a number of techniques incorporate social connections and identity to explain content in social networks (Chang and Blei, atsbDePMmwphIncFiAoasCrtuLleycnNdAg:irIs’SatYphyo,weumckItGrasy’.qoheivfnuIakgrsdt?heo vna,dtbpJ.omslrheyivcaBnwdspeur[.ihodqtef]nuar,slihmetdnyuaopi’s-SbeI[hBn.FCtDvHLcr]ligEemIhysNoa:nFbvWidxeAltEsghnmRboad:eics[yr.,fmtuwleinha][go.,dLYftweur]–’lhsdaitngxerkbIfoat.hqeslkOufinrmbtyoeha,rit[n.geholyasc]rdi,wteoaxylpm’sburneItaopfkvicsqr.,n[BYoOtafebxruli.,mcEksGgatvn]roOebpyitmlnorcd.ea[sfviPYtr]lgoandyu., Previous turnTurn detected as shifting topic examples of those with high topic shift tendency 238947156FPAGNQMreouna.mlvsWea†‡kt.iluBonrcseh‡.7586 41702 4863150FBCKWMealchgrsitCvA lamuhoin†efr.5 2473509 π. RankSpeakerπRankSpeakerπ Table 5: Top speakers by topic shift tendencies. We mark hosts (†) and “speakers” who often (but not always) appeared in clips (‡). Apart from those groups, speakers with the highest tendency were political moderates. 2009) and scientific corpora (Rosen-Zvi et al., 2004). However, these models ignore the temporal evolution of content, treating documents as static. Models that do investigate the evolution of topics over time typically ignore the identify of the speaker. For example: models having sticky topics over ngrams (Johnson, 2010), sticky HDP-HMM (Fox et al., 2008); models that are an amalgam of sequential models and topic models (Griffiths et al., 2005; Wal85 lach, 2006; Gruber et al., 2007; Ahmed and Xing, 2008; Boyd-Graber and Blei, 2008; Du et al., 2010); or explicit models of time or other relevant features as a distinct latent variable (Wang and McCallum, 2006; Eisenstein et al., 2010). In contrast, SITS jointly models topic and individuals’ tendency to control a conversation. Not only does SITS outperform other models using standard computational linguistics baselines, but it also pro- poses intriguing hypotheses for social scientists. Associating each speaker with a scalar that models their tendency to change the topic does improve performance on standard tasks, but it’s inadequate to fully describe an individual. Modeling individuals’ perspective (Paul and Girju, 2010), “side” (Thomas et al., 2006), or personal preferences for topics (Grimmer, 2009) would enrich the model and better illuminate the interaction of influence and topic. Statistical analysis of political discourse can help discover patterns that political scientists, who often work via a “close reading,” might otherwise miss. We plan to work with social scientists to validate our implicit hypothesis that our topic shift tendency correlates well with intuitive measures of “influence.” Acknowledgements This research was funded in part by the Army Research Laboratory through ARL Cooperative Agreement W91 1NF-09-2-0072 and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory. Jordan Boyd-Graber and Philip Resnik are also supported by US National Science Foundation Grant NSF grant #1018625. Any opinions, findings, conclusions, or recommendations expressed are the authors’ and do not necessarily reflect those of the sponsors. References [Abbott et al., 2011] Abbott, R., Walker, M., Anand, P., Fox Tree, J. E., Bowmani, R., and King, J. (201 1). How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the Workshop on Language in Social Media (LSM 2011), pages 2–1 1. [Ahmed and Xing, 2008] Ahmed, A. and Xing, E. P. (2008). Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In SDM, pages 219– 230. [Beeferman et al., 1999] Beeferman, D., Berger, A., and Lafferty, J. (1999). Statistical models for text segmentation. Mach. Learn., 34: 177–210. [Blei and Lafferty, 2009] Blei, D. M. and Lafferty, J. (2009). Text Mining: Theory and Applications, chapter Topic Models. Taylor and Francis, London. [Boyd-Graber and Blei, 2008] Boyd-Graber, J. and Blei, D. M. (2008). Syntactic topic models. In Proceedings of Advances in Neural Information Processing Systems. [Boydstun et al., 2011] Boydstun, A. E., Phillips, C., and Glazier, R. A. (201 1). It’s the economy again, stupid: Agenda control in the 2008 presidential debates. Forthcoming. [Chang and Blei, 2009] Chang, J. and Blei, D. M. (2009). Relational topic models for document networks. In Proceedings of Artificial Intelligence and Statistics. [Chang et al., 2009] Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., and Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems. [Du et al., 2010] Du, L., Buntine, W., and Jin, H. (2010). Sequential latent dirichlet allocation: Discover underlying topic structures within a document. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 148 –157. 86 [Ehlen et al., 2007] Ehlen, P., Purver, M., and Niekrasz, J. (2007). A meeting browser that learns. In In: Proceedings of the AAAI Spring Symposium on Interaction Challenges for Intelligent Assistants. [Eisenstein and Barzilay, 2008] Eisenstein, J. and Barzilay, R. (2008). Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Proceedings of Emperical Methods in Natural Language Processing. [Eisenstein et al., 2010] Eisenstein, J., O’Connor, B., Smith, N. A., and Xing, E. P. (2010). A latent variable model for geographic lexical variation. In EMNLP’10, pages 1277–1287. [Ferguson, 1973] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230. [Fox et al., 2008] Fox, E. B., Sudderth, E. B., Jordan, M. I., and Willsky, A. S. (2008). An hdp-hmm for systems with state persistence. In Proceedings of International Conference of Machine Learning. [Galley et al., 2003] Galley, M., McKeown, K., FoslerLussier, E., and Jing, H. (2003). Discourse segmentation of multi-party conversation. In Proceedings of the Association for Computational Linguistics. [Georgescul et al., 2006] Georgescul, M., Clark, A., and Armstrong, S. (2006). Word distributions for thematic segmentation in a support vector machine approach. In Conference on Computational Natural Language Learning. [Gerrish and Blei, 2010] Gerrish, S. and Blei, D. M. (2010). A language-based approach to measuring scholarly impact. In Proceedings of International Conference of Machine Learning. [Griffiths et al., 2005] Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. (2005). Integrating topics and syntax. In Proceedings of Advances in Neural Information Processing Systems. [Grimmer, 2009] Grimmer, J. (2009). A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases. Political Analysis, 18: 1–35. [Gruber et al., 2007] Gruber, A., Rosen-Zvi, M., and Weiss, Y. (2007). Hidden topic Markov models. In Artificial Intelligence and Statistics. [Hawes et al., 2009] Hawes, T., Lin, J., and Resnik, P. (2009). Elements of a computational model for multiparty discourse: The turn-taking behavior of Supreme Court justices. Journal of the American Society for Information Science and Technology, 60(8): 1607–1615. [Hearst, 1997] Hearst, M. A. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64. [Hsueh et al., 2006] Hsueh, P.-y., Moore, J. D., and Renals, S. (2006). Automatic segmentation of multiparty dialogue. In Proceedings of the European Chapter of the Association for Computational Linguistics. [Ireland et al., 2011] Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., and Pennebaker, J. W. (201 1). Language style matching predicts relationship initiation and stability. Psychological Science, 22(1):39–44. [Janin et al., 2003] Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. (2003). The ICSI meeting corpus. In IEEE International Confer- ence on Acoustics, Speech, and Signal Processing. [Johnson, 2010] Johnson, M. (2010). PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proceedings of the Association for Computational Linguistics. [Morris and Hirst, 1991] Morris, J. and Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17:21–48. [M¨ uller and Quintana, 2004] Mu¨ller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statistical Science, 19(1):95–1 10. [Murray et al., 2005] Murray, G., Renals, S., and Carletta, J. (2005). Extractive summarization of meeting recordings. In European Conference on Speech Communication and Technology. [Neal, 2000] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249– 265. [Neal, 2003] Neal, R. M. (2003). Slice sampling. Annals of Statistics, 31:705–767. [Olney and Cai, 2005] Olney, A. and Cai, Z. (2005). An orthonormal basis for topic segmentation in tutorial dialogue. In Proceedings of the Human Language Technology Conference. [Paul and Girju, 2010] Paul, M. and Girju, R. (2010). A two-dimensional topic-aspect model for discovering multi-faceted topics. In Association for the Advancement of Artificial Intelligence. [Pele and Werman, 2009] Pele, O. and Werman, M. (2009). Fast and robust earth mover’s distances. In International Conference on Computer Vision. [Pevzner and Hearst, 2002] Pevzner, L. and Hearst, M. A. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28. [Purver, 2011] Purver, M. (201 1). Topic segmentation. In Tur, G. and de Mori, R., editors, Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pages 291–3 17. Wiley. 87 [Purver et al., 2006] Purver, M., Ko¨rding, K., Griffiths, T. L., and Tenenbaum, J. (2006). Unsupervised topic modelling for multi-party spoken discourse. In Proceedings of the Association for Computational Linguistics. [Resnik and Hardisty, 2010] Resnik, P. and Hardisty, E. (2010). Gibbs sampling for the uninitiated. Technical Report UMIACS-TR-2010-04, University of Maryland. http://www.lib.umd.edu/drum/handle/1903/10058. [Reynar, 1998] Reynar, J. C. (1998). Topic Segmentation: Algorithms and Applications. PhD thesis, University of Pennsylvania. [Rosen-Zvi et al., 2004] Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., and Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of Uncertainty in Artificial Intelligence. [Rubner et al., 2000] Rubner, Y., Tomasi, C., and Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40:99–121 . [Teh et al., 2006] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476): 1566–1581. [Thomas et al., 2006] Thomas, M., Pang, B., and Lee, L. (2006). Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of Emperical Methods in Natural Language Processing. [Tur et al., 2010] Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tu¨r, D., Dowding, J., Favre, B., Ferna´ndez, R., Frampton, M., Frandsen, M., Frederickson, C., Graciarena, M., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Purver, M., Riedhammer, K., Shriberg, E., Tien, J., Vergyri, D., and Yang, F. (2010). The CALO meeting assistant system. Trans. Audio, Speech and Lang. Proc., 18: 1601–161 1. [Wallach, 2006] Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. In Proceedings of International Conference of Machine Learning. [Wallach, 2008] Wallach, H. M. (2008). Structured Topic Models for Language. PhD thesis, University of Cambridge. [Wang et al., 2008] Wang, C., Blei, D. M., and Heckerman, D. (2008). Continuous time dynamic topic models. In Proceedings of Uncertainty in Artificial Intelligence. [Wang and McCallum, 2006] Wang, X. and McCallum, A. (2006). Topics over time: a non-Markov continuoustime model of topical trends. In Knowledge Discovery and Data Mining, Knowledge Discovery and Data Mining.
5 0.16774212 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
Author: Vladimir Eidelman ; Jordan Boyd-Graber ; Philip Resnik
Abstract: We propose an approach that biases machine translation systems toward relevant translations based on topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation without any human annotation. We use these topic distributions to compute topic-dependent lex- ical weighting probabilities and directly incorporate them into our translation model as features. Conditioning lexical probabilities on the topic biases translations toward topicrelevant output, resulting in significant improvements of up to 1 BLEU and 3 TER on Chinese to English translation over a strong baseline.
6 0.16410758 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
7 0.14259252 79 acl-2012-Efficient Tree-Based Topic Modeling
8 0.12376512 144 acl-2012-Modeling Review Comments
9 0.11277722 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
10 0.11175175 31 acl-2012-Authorship Attribution with Author-aware Topic Models
11 0.11120788 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
13 0.098930687 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
14 0.098459966 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
15 0.09031333 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
16 0.0849379 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
17 0.078815073 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
19 0.068117276 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
20 0.067681953 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
topicId topicWeight
[(0, -0.165), (1, 0.187), (2, 0.213), (3, 0.136), (4, -0.328), (5, -0.109), (6, 0.058), (7, -0.134), (8, 0.101), (9, -0.027), (10, -0.115), (11, 0.047), (12, 0.11), (13, 0.07), (14, 0.013), (15, 0.009), (16, 0.017), (17, 0.03), (18, -0.016), (19, 0.05), (20, -0.039), (21, 0.126), (22, 0.049), (23, 0.042), (24, -0.119), (25, -0.073), (26, -0.111), (27, 0.126), (28, -0.015), (29, 0.037), (30, 0.229), (31, -0.177), (32, -0.213), (33, -0.065), (34, -0.063), (35, -0.155), (36, 0.029), (37, -0.029), (38, -0.078), (39, 0.155), (40, 0.087), (41, 0.093), (42, 0.12), (43, 0.186), (44, -0.036), (45, -0.061), (46, -0.044), (47, -0.095), (48, -0.167), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.92299253 98 acl-2012-Finding Bursty Topics from Microblogs
Author: Qiming Diao ; Jing Jiang ; Feida Zhu ; Ee-Peng Lim
Abstract: Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. The former helps find eventdriven posts while the latter helps identify and filter out “personal” posts. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs.
2 0.78433478 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
Author: Xin Zhao ; Rishan Chen ; Kai Fan ; Hongfei Yan ; Xiaoming Li
Abstract: Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, we proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. Meanwhile, it significantly reduces the number of non-zero entries in the representation. We test it via scalable event detection, and experiments in a 10-year news archive show that our methods are both effective and efficient.
3 0.48878852 79 acl-2012-Efficient Tree-Based Topic Modeling
Author: Yuening Hu ; Jordan Boyd-Graber
Abstract: Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the SPARSELDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time.
Author: Viet-An Nguyen ; Jordan Boyd-Graber ; Philip Resnik
Abstract: One of the key tasks for analyzing conversational data is segmenting it into coherent topic segments. However, most models of topic segmentation ignore the social aspect of conversations, focusing only on the words used. We introduce a hierarchical Bayesian nonparametric model, Speaker Identity for Topic Segmentation (SITS), that discovers (1) the topics used in a conversation, (2) how these topics are shared across conversations, (3) when these topics shift, and (4) a person-specific tendency to introduce new topics. We evaluate against current unsupervised segmentation models to show that including personspecific information improves segmentation performance on meeting corpora and on political debates. Moreover, we provide evidence that SITS captures an individual’s tendency to introduce new topics in political contexts, via analysis of the 2008 US presidential debates and the television program Crossfire. 1 Topic Segmentation as a Social Process Conversation, interactive discussion between two or more people, is one of the most essential and common forms of communication. Whether in an informal situation or in more formal settings such as a political debate or business meeting, a conversation is often not about just one thing: topics evolve and are replaced as the conversation unfolds. Discovering this hidden structure in conversations is a key problem for conversational assistants (Tur et al., 2010) and tools that summarize (Murray et al., 2005) and display (Ehlen et al., 2007) conversational data. Topic segmentation also can illuminate individuals’ agendas (Boydstun et al., 2011), patterns of agree- ment and disagreement (Hawes et al., 2009; Abbott 78 Jordan Boyd-Graber iSchool and UMIACS University of Maryland College Park, MD jbg@ umiac s .umd .edu Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik @ umd .edu al., 2011), and relationships among conversational participants (Ireland et al., 2011). One of the most natural ways to capture conversational structure is topic segmentation (Reynar, 1998; Purver, 2011). Topic segmentation approaches range from simple heuristic methods based on lexical similarity (Morris and Hirst, 1991 ; Hearst, 1997) to more intricate generative models and supervised methods (Georgescul et al., 2006; Purver et al., 2006; Gruber et al., 2007; Eisenstein and Barzilay, 2008), which have been shown to outperform the established heuristics. However, previous computational work on conversational structure, particularly in topic discovery and topic segmentation, focuses primarily on conet tent, ignoring the speakers. We argue that, because conversation is a social process, we can understand conversational phenomena better by explicitly modeling behaviors of conversational participants. In Section 2, we incorporate participant identity in a new model we call Speaker Identity for Topic Segmentation (SITS), which discovers topical structure in conversation while jointly incorporating a participantlevel social component. Specifically, we explicitly model an individual’s tendency to introduce a topic. After outlining inference in Section 3 and introducing data in Section 4, we use SITS to improve state-ofthe-art-topic segmentation and topic identification models in Section 5. In addition, in Section 6, we also show that the per-speaker model is able to discover individuals who shape and influence the course of a conversation. Finally, we discuss related work and conclude the paper in Section 7. 2 Modeling Multiparty Discussions Data Properties We are interested in turn-taking, multiparty discussion. This is a broad category, inProce Jedijung, sR oefpu thbeli c50 othf K Aonrneua,a8l -M14e Jtiunlgy o 2f0 t1h2e. A ?c s 2o0c1ia2ti Aosns fo cria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsteiscs 78–87, cluding political debates, business meetings, and online chats. More formally, such datasets contain C conversations. A conversation c has Tc turns, each of which is a maximal uninterrupted utterance by one speaker.1 In each turn t ∈ [1, Tc], a speaker ac,t utters N words {wc,t,n}. Eatch ∈ w [1o,rTd is from a vocabulary of size V , {awnd th}ere are M distinct speakers. Modeling Approaches The key insight of topic segmentation is that segments evince lexical cohesion (Galley et al., 2003; Olney and Cai, 2005). Words within a segment will look more like their neighbors than other words. This insight has been used to tune supervised methods (Hsueh et al., 2006) and inspire unsupervised models of lexical cohesion using bags of words (Purver et al., 2006) and language models (Eisenstein and Barzilay, 2008). We too take the unsupervised statistical approach. It requires few resources and is applicable in many domains without extensive training. Like previous approaches, we consider each turn to be a bag of words generated from an admixture of topics. Topics—after the topic modeling literature (Blei and Lafferty, 2009)—are multinomial distributions over terms. These topics are part of a generative model posited to have produced a corpus. However, topic models alone cannot model the dynamics of a conversation. Topic models typically do not model the temporal dynamics of individual documents, and those that do (Wang et al., 2008; Gerrish and Blei, 2010) are designed for larger documents and are not applicable here because they assume that most topics appear in every time slice. Instead, we endow each turn with a binary latent variable lc,t, called the topic shift. This latent variable signifies whether the speaker changed the topic of the conversation. To capture the topic-controlling behavior of the speakers across different conversations, we further associate each speaker m with a latent topic shift tendency, πm. Informally, this variable is intended to capture the propensity of a speaker to effect a topic shift. Formally, it represents the probability that the speaker m will change the topic (distribution) of a conversation. We take a Bayesian nonparametric approach (M¨uller and Quintana, 2004). Unlike 1Note the distinction with phonetic definition are bounded by silence. utterances, which by 79 parametric models, which a priori fix the number of topics, nonparametric models use a flexible number of topics to better represent data. Nonparametric distributions such as the Dirichlet process (Ferguson, 1973) share statistical strength among conversations using a hierarchical model, such as the hierarchical Dirichlet process (HDP) (Teh et al., 2006). 2.1 Generative Process In this section, we develop SITS, a generative model of multiparty discourse that jointly discovers topics and speaker-specific topic shifts from an unannotated corpus (Figure 1a). As in the hierarchical Dirichlet process (Teh et al., 2006), we allow an unbounded number of topics to be shared among the turns of the corpus. Topics are drawn from a base distribution H over multinomial distributions over the vocabulary, a finite Dirichlet with symmetric prior λ. Unlike the HDP, where every document (here, every turn) draws a new multinomial distribution from a Dirichlet process, the social and temporal dynamics of a conversation, as specified by the binary topic shift indicator lc,t, determine when new draws happen. The full generative process is as follows: 1. For speaker m ∈ [1, M], draw speaker shift probability πm ∼ Beta(γ) 2. Draw∼ global probability measure G0 ∼ DP(α, H) 3. For each conversation c ∈ [1, C] (a) Draw conversation distribution Gc ∼ DP(α0 , G0) (b) For each turn t ∈ [1, Tc] with speaker ac,t i. If t = 1, set the topic shift lc,t = 1. Otherwise, draw lc,t ∼ Bernoulli(πac,t ). ii. If lc,t = 1∼, d Breawrn Gc,t ∼ DP(αc, Gc). Otherwise, set Gc,t ≡ Gc,t−1 . iii. For each word ≡ind Gex n ∈ [1, Nc,t] • Draw ψc,t,n ∼ Gc,t • DDrraaww wc,t,n ∼ Multinomial(ψc,t,n) The hierarchy of Dirichlet processes allows statistical strength to be shared across contexts; within a conversation and across conversations. The perspeaker topic shift tendency πm allows speaker identity to influence the evolution of topics. To make notation concrete and aligned with the topic segmentation, we introduce notation for segments in a conversation. A segment s of conversation c is a sequence of turns [τ, τ0] such that lc,τ = lc,τ0+1 = 1and lc,t = 0, ∀t ∈ (τ, τ0] . When lc,t = 0, Gc,t is the same =Gc 0,t,−∀1t a ∈nd ( aτ,llτ τtopics (i.e. multinomial distributions over words) {ψc,t,n} that generate words in turn t and the topics{ ψ{ψc,t}−1,n} that generate words in turn t −1 come from{ψ ψthc,et −s1a,mn}e as Figure 1: Graphical model representations of our proposed models: (a) the nonparametric version; (b) the parametric version. Nodes represent random variables (shaded ones are observed), lines are probabilistic dependencies. Plates represent repetition. The innermost plates are turns, grouped in conversations. distribution. Thus all topics used in a segment s are drawn from a single distribution, Gc,s, , , , Gc,s | lc,1 lc,2 · · · lc,Tc , αc, Gc ∼ DP(αc, Gc) (1) For notational convenience, Sc denotes the number of segments in conversation c, and st denotes the segment index of turn t. We emphasize that all segment-related notations are derived from the posterior over the topic shifts land not part of the model itself. Parametric Version SITS is a generalization of a parametric model (Figure 1b) where each turn has a multinomial distribution over K topics. In the parametric case, the number of topics K is fixed. Each topic, as before, is a multinomial distribution φ1 . . . φK. In the parametric case, each turn t in conversation c has an explicit multinomial distribution over K topics θc,t, identical for turns within a segment. A new topic distribution θ is drawn from a Dirichlet distribution parameterized by α when the topic shift indicator lis 1. The parametric version does not share strength within or across conversations, unlike SITS. When applied on a single conversation without speaker identity (all speakers are identical) it is equivalent to (Purver et al., 2006). In our experiments (Section 5), we compare against both. 80 3 Inference To find the latent variables that best explain observed data, we use Gibbs sampling, a widely used Markov chain Monte Carlo inference technique (Neal, 2000; Resnik and Hardisty, 2010). The state space is latent variables for topic indices assigned to all tokens z = {zc,t,n} and topic shifts assigned to turns l= {lc,t}. {Wze marginalize over all other latent variablle =s. Here, we only present the conditional sampling equations; for more details, see our supplement.2 3.1 Sampling Topic Assignments To sample zc,t,n, the index of the shared topic assigned to token n of turn t in conversation c, we need to sample the path assigning each word token to a segment-specific topic, each segment-specific topic to a conversational topic and each conversational topic to a shared topic. For efficiency, we make use of the minimal path assumption (Wallach, 2008) to generate these assignments.3 Under the minimal path assumption, an observation is assumed to have been generated by using a new distribution if and only if there is no existing distribution with the same value. 2 http://www.cs.umd.edu/∼vietan/topicshift/appendix.pdf 3We also investigated using the maximal assumption and fully sampling assignments. We found the minimal path assumption worked as well as explicitly sampling seating assignments and that the maximal path assumption worked less well. We use Nc,s,k to denote the number of tokens in segment s in conversation c assigned topic k; Nc,k denotes the total number of segment-specific topics in conversation c assigned topic k and Nk denotes the number of conversational topics assigned topic k. TWk,w denotes the number of times the shared topic k is assigned to word w in the vocabulary. Marginal counts are represented with · and ∗ represents all hyperparameters. The condit·ional d∗istribution for zc,t,n is P(zc,t,n = k | wc,t,n = w, z−c,t,n, w−c,t,n, l, ∗) ∝ Nc−,sct ,kn+αNc −c,s−ct,kct·,n Nn+c −,·αc ,t0cnN +k−· αc,t0 ,n + αK × VT1 W k−, ·c,wctk, n e+w V.λ( 2), Here V is the size of the vocabulary, K is the current number of shared topics and the superscript −c,t,n denotes counts without considering wc,t,n. In Equation 2, the first factor is proportional to the probability of sampling a path according to the minimal path assumption; the second factor is proportional to the likelihood of observing w given the sampled topic. Since an uninformed prior is used, when a new topic is sampled, all tokens are equiprobable. 3.2 Sampling Topic Shifts Sampling the topic shift variable lc,t requires us to consider merging or splitting segments. We use kc,t to denote the shared topic indices of all tokens in turn t of conversation c; Sac,t,x to denote the number of times speaker ac,t is assigned the topic shift with value x ∈ {0, 1}; Jcx,s to denote the number of topics in segment s 1o}f conversation c if lc,t = x and Ncx,s,j to denote the number of tokens assigned to the segment-specific topic j when lc,t = x.4 Again, the superscript −c,t is used to denote exclusion of turn t of conversation c in the corresponding counts. Recall that the topic shift is a binary variable. We use 0 to represent the case that the topic distribution is identical to the previous turn. We sample this assignment P(lc,t = 0 | l−c,t, w, k, a, ∗) ∝ SSa−a−cc,c,ct,t , t·,0++ 2 γγ×αcJ0c,sNtx=Qc01,sjJt=c,0·,1s(tx(N −c0 1,s +t,j α−c) 1)!. (3) 4Deterministically knowQing the path assignments is the primary efficiency motivation for using the minimal path assumption. The alternative is to explicitly sample the path assignments, which is more complicated (for both notation and computation). This option is spelled in full detail in the supplementary material. 81 In Equation 3, the first factor is proportional to the probability of assigning a topic shift of value 0 to speaker ac,t and the second factor is proportional to the joint probability of all topics in segment st of conversation c when lc,t = 0. The other alternative is for the topic shift to be 1, which represents the introduction of a new distri- bution over topics inside an existing segment. We sample this as P(lc,t = 1 | l−c,t, w, k, a, ∗) ∝ S −a −c ,c t, t, t, ·1+ 2 γ ×αcJc1,(st−1x)NQ=c1,1(jJs=ct1−,1(s1t)−,·1()x(N −c1 1,( +st− α1c) ,j− 1)! αcJcQ1,sNxt=c1Q1,stJj,c=1·,(s1xt( −N 1c1, +stj α−c) 1)!. (4) As above, the first faQctor in Equation 4 is proportional to the probability of assigning a topic shift of value 1to speaker ac,t; the second factor in the big bracket is proportional to the joint distribution of the topics in segments st − 1 and st. In this case lc,t = 1 means splitting the current segment, which results in two joint probabilities for two segments. 4 Datasets This section introduces the three corpora we use. We preprocess the data to remove stopwords and remove turns containing fewer than five tokens. The ICSI Meeting Corpus: The ICSI Meeting Corpus (Janin et al., 2003) is 75 transcribed meetings. For evaluation, we used a standard set of reference segmentations (Galley et al., 2003) of 25 meetings. Segmentations are binary, i.e., each point of the document is either a segment boundary or not, and on average each meeting has 8 segment boundaries. After preprocessing, there are 60 unique speakers and the vocabulary contains 3346 non-stopword tokens. The 2008 Presidential Election Debates Our second dataset contains three annotated presidential debates (Boydstun et al., 2011) between Barack Obama and John McCain and a vice presidential debate between Joe Biden and Sarah Palin. Each turn is one of two types: questions (Q) from the moderator or responses (R) from a candidate. Each clause in a turn is coded with a Question Topic (TQ) and a Response Topic (TR). Thus, a turn has a list of TQ’s and TR’s both of length equal to the number of clauses in the turn. Topics are from the Policy Agendas Topics SpeakerTypeTurn clausesTQTR BrokawQbSeenfo.r Oeib ta gmeat,s [b.e.t.t]er A arned yo thuey sa oyuingght [. to. b]e th parte tphaere Adm foerri tchaant? economy is going to get much worse1N/A ObamaR[hN.o .m,.]e Is B a,um mtac mokenofs itdu iermenpt o ahrabt oaun th tel yt ,h we c Aaen’rm epea gryoic ithnangei e trco bo hinlaosvm e[.y t. o. h]elp ordinary familes be able to stay in their1 1 4 BrokawQSen. McCain, in all candor, do you think the economy is going to get worse before it gets better?1N/A McCainR[Iom.ftwho.trie]n Ikiegrtofih oeicwonumkteiv aegfn wdlyt.ebri[ua.dyc otuf]petfh ec tserivo bnlayd,islmfoaw nes,d staobptihelcaziteplt ihoneptlrheoscuatsni hgflauvmean rckne itnw– WmhoaisrcthgiaIngbetoalnitevshoe w ne wca vna,l ucet1 240 Table 1: Example turns from the annotated 2008 election debates. The topics (TQ and TR) are from the Policy Agendas Topics Codebook which contains the following codes of topic: Macroeconomics Community Development (14), Government Operations (20). (1), Housing & Codebook, a manual inventory of 19 major topics and 225 subtopics.5 Table 1 shows an example annotation. To get reference segmentations, we assign each turn a real value from 0 to 1indicating how much a turn changes the topic. For a question-typed turn, the score is the fraction of clause topics not appearing in the previous turn; for response-typed turns, the score is the fraction of clause topics that do not appear in the corresponding question. This results in a set of non-binary reference segmentations. For evaluation metrics that require binary segmentations, we create a binary segmentation by setting a turn as a segment boundary if the computed score is 1. This threshold is chosen to include only true segment boundaries. CNN’s Crossfire Crossfire was a weekly U.S. television “talking heads” program engineered to incite heated arguments (hence the name). Each episode features two recurring hosts, two guests, and clips from the week’s news. Our Crossfire dataset contains 1134 transcribed episodes aired between 2000 and 2004.6 There are 2567 unique speakers. Unlike the previous two datasets, Crossfire does not have explicit topic segmentations, so we use it to explore speaker-specific characteristics (Section 6). 5 Topic Segmentation Experiments In this section, we examine how well SITS can replicate annotations of when new topics are introduced. 5 http://www.policyagendas.org/page/topic-codebook 6 http://www.cs.umd.edu/∼vietan/topicshift/crossfire.zip 82 We discuss metrics for evaluating an algorithm’s segmentation against a gold annotation, describe our experimental setup, and report those results. Evaluation Metrics To evaluate segmentations, we use Pk (Beeferman et al., 1999) and WindowDiff (WD) (Pevzner and Hearst, 2002). Both metrics measure the probability that two points in a document will be incorrectly separated by a segment boundary. Both techniques consider all spans of length k in the document and count whether the two endpoints of the window are (im)properly segmented against the gold segmentation. However, these metrics have drawbacks. First, they require both hypothesized and reference segmentations to be binary. Many algorithms (e.g., probabilistic approaches) give non-binary segmentations where candidate boundaries have real-valued scores (e.g., probability or confidence). Thus, evaluation requires arbitrary thresholding to binarize soft scores. To be fair, thresholds are set so the number of segments are equal to a predefined value (Purver et al., 2006; Galley et al., 2003). To overcome these limitations, we also use Earth Mover’s Distance (EMD) (Rubner et al., 2000), a metric that measures the distance between two distributions. The EMD is the minimal cost to transform one distribution into the other. Each segmentation can be considered a multi-dimensional distribution where each candidate boundary is a dimension. In EMD, a distance function across features allows partial credit for “near miss” segment boundaries. In addition, because EMD operates on distributions, we can compute the distance between non-binary hypothesized segmentations with binary or real-valued reference segmentations. We use the FastEMD implementation (Pele and Werman, 2009). Experimental Methods We applied the following methods to discover topic segmentations in a document: • TextTiling (Hearst, 1997) is one of the earliest generalpurpose topic segmentation algorithms, sliding a fixedwidth window to detect major changes in lexical similarity. • P-NoSpeaker-S: parametric version without speaker identity run on keaerc-hS conversation (Purver et al., 2006) • P-NoSpeaker-M: parametric version without speaker identity run on Mall conversations • P-SITS: the parametric version of SITS with speaker identity run on all conversations • NP-HMM: the HMM-based nonparametric model which a single topic per turn. This model can be considered a Sticky HDP-HMM (Fox et al., 2008) with speaker identity. • NP-SITS: the nonparametric version of SITS with speaker identity run on all conversations. Parameter Settings and Implementations experiment, all parameters same as in (Hearst, 1997). of TextTiling In our are the For statistical models, Gibbs sampling with 10 randomly initialized chains is used. Initial hyperparameter values are sampled from U(0, 1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal, 2003) optimizes hyperparameters. Results and Analysis Table 2 shows the perfor- mance of various models on the topic segmentation problem, using the ICSI corpus and the 2008 debates. Consistent with previous results, probabilistic models outperform TextTiling. In addition, among the probabilistic models, the models that had access to speaker information consistently segment better than those lacking such information, supporting our assertion that there is benefit to modeling conversation as a social process. Furthermore, NP-SITS outperforms NP-HMM in both experiments, suggesting that using a distribution over topics to turns is better than using a single topic. This is consistent with parametric results reported in (Purver et al., 2006). The contribution of speaker identity seems more valuable in the debate setting. Debates are characterized by strong rewards for setting the agenda; dodging a question or moving the debate toward an oppo83 nent’s weakness can be useful strategies (Boydstun et al., 2011). In contrast, meetings (particularly lowstakes ICSI meetings) are characterized by pragmatic rather than strategic topic shifts. Second, agendasetting roles are clearer in formal debates; a modera- tor is tasked with setting the agenda and ensuring the conversation does not wander too much. The nonparametric model does best on the smaller debate dataset. We suspect that an evaluation that directly accessed the topic quality, either via prediction (Teh et al., 2006) or interpretability (Chang et al., 2009) would favor the nonparametric model more. 6 Evaluating Topic Shift Tendency In this section, we focus on the ability of SITS to capture speaker-level attributes. Recall that SITS associates with each speaker a topic shift tendency π that represents the probability of asserting a new topic in the conversation. While topic segmentation is a well studied problem, there are no established quantitative measurements of an individual’s ability to control a conversation. To evaluate whether the tendency is capturing meaningful characteristics of speakers, we compare our inferred tendencies against insights from political science. 2008 Elections To obtain a posterior estimate of π (Figure 3) we create 10 chains with hyperparameters sampled from the uniform distribution U(0, 1) and averaged π over 10 chains (as described in Section 5). In these debates, Ifill is the moderator of the debate between Biden and Palin; Brokaw, Lehrer and Schieffer are the three moderators of three debates between Obama and McCain. Here “Question” denotes questions from audiences in “town hall” debate. The role of this “speaker” can be considered equivalent to the debate moderator. The topic shift tendencies of moderators are much higher than for candidates. In the three debates between Obama and McCain, the moderators— Brokaw, Lehrer and Schieffer—have significantly higher scores than both candidates. This is a useful reality check, since in a debate the moderators are the ones asking questions and literally controlling the topical focus. Interestingly, in the vice-presidential debate, the score of moderator Ifill is only slightly higher than those of Palin and Biden; this is consistent with media commentary characterizing her as a size of the metrics Pk and WindowDiff chosen to replicate previous results. weak moderator.7 Similarly, the “Question” speaker had a relatively high variance, consistent with an amalgamation of many distinct speakers. These topic shift tendencies suggest that all candidates manage to succeed at some points in setting and controlling the debate topics. Our model gives Obama a slightly higher score than McCain, consistent with social science claims (Boydstun et al., 2011) that Obama had the lead in setting the agenda over McCain. Table 4 shows of SITS-detected topic shifts. Crossfire Crossfire, unlike the debates, has many speakers. This allows us to examine more closely what we can learn about speakers’ topic shift tendency. We verified that SITS can segment topics, and assuming that changing the topic is useful for a speaker, how can we characterize who does so effectively? We examine the relationship between topic shift tendency, social roles, and political ideology. To focus on frequent speakers, we filter out speakers with fewer than 30 turns. Most speakers have relatively small π, with the mode around 0.3. There are, however, speakers with very high topic shift tendencies. Table 5 shows the speakers having the highest values according to SITS. We find that there are three general patterns for who influences the course of a conversation in Crossfire. First, there are structural “speakers” the show uses to frame and propose new topics. These are 7 http://harpers.org/archive/2008/10/hbc-90003659 84 2008 Presidential Election Debates (larger means greater tendency) audience questions, news clips (e.g. many of Gore’s and Bush’s turns from 2000), and voice overs. That SITS is able to recover these is reassuring. Second, the stable of regular hosts receives high topic shift tendencies, which is reasonable given their experience with the format and ostensible moderation roles (in practice they also stoke lively discussion). The remaining class is more interesting. The remaining non-hosts with high topic shift tendency are relative moderates on the political spectrum: • John Kasich, one of few Republicans to support the assault weapons ban and now governor of Ohio, a swing state • Christine Todd Whitman, former Republican governor of CNehrwis Jersey, a very iDtmemano,c froartmice srt Ratee • John McCain, who before 2008 was known as a “maverick” for working with Democrats (e.g. Russ Feingold) This suggests that, despite Crossfire’s tendency to create highly partisan debates, those who are able to work across the political spectrum may best be able to influence the topic under discussion in highly polarized contexts. Table 4 shows detected topic shifts from these speakers; two of these examples (McCain and Whitman) show disagreement of Republicans with President Bush. In the other, Kasich is defending a Republican plan (school vouchers) popular with traditional Democratic constituencies. 7 Related and Future Work In the realm of statistical models, a number of techniques incorporate social connections and identity to explain content in social networks (Chang and Blei, atsbDePMmwphIncFiAoasCrtuLleycnNdAg:irIs’SatYphyo,weumckItGrasy’.qoheivfnuIakgrsdt?heo vna,dtbpJ.omslrheyivcaBnwdspeur[.ihodqtef]nuar,slihmetdnyuaopi’s-SbeI[hBn.FCtDvHLcr]ligEemIhysNoa:nFbvWidxeAltEsghnmRboad:eics[yr.,fmtuwleinha][go.,dLYftweur]–’lhsdaitngxerkbIfoat.hqeslkOufinrmbtyoeha,rit[n.geholyasc]rdi,wteoaxylpm’sburneItaopfkvicsqr.,n[BYoOtafebxruli.,mcEksGgatvn]roOebpyitmlnorcd.ea[sfviPYtr]lgoandyu., Previous turnTurn detected as shifting topic examples of those with high topic shift tendency 238947156FPAGNQMreouna.mlvsWea†‡kt.iluBonrcseh‡.7586 41702 4863150FBCKWMealchgrsitCvA lamuhoin†efr.5 2473509 π. RankSpeakerπRankSpeakerπ Table 5: Top speakers by topic shift tendencies. We mark hosts (†) and “speakers” who often (but not always) appeared in clips (‡). Apart from those groups, speakers with the highest tendency were political moderates. 2009) and scientific corpora (Rosen-Zvi et al., 2004). However, these models ignore the temporal evolution of content, treating documents as static. Models that do investigate the evolution of topics over time typically ignore the identify of the speaker. For example: models having sticky topics over ngrams (Johnson, 2010), sticky HDP-HMM (Fox et al., 2008); models that are an amalgam of sequential models and topic models (Griffiths et al., 2005; Wal85 lach, 2006; Gruber et al., 2007; Ahmed and Xing, 2008; Boyd-Graber and Blei, 2008; Du et al., 2010); or explicit models of time or other relevant features as a distinct latent variable (Wang and McCallum, 2006; Eisenstein et al., 2010). In contrast, SITS jointly models topic and individuals’ tendency to control a conversation. Not only does SITS outperform other models using standard computational linguistics baselines, but it also pro- poses intriguing hypotheses for social scientists. Associating each speaker with a scalar that models their tendency to change the topic does improve performance on standard tasks, but it’s inadequate to fully describe an individual. Modeling individuals’ perspective (Paul and Girju, 2010), “side” (Thomas et al., 2006), or personal preferences for topics (Grimmer, 2009) would enrich the model and better illuminate the interaction of influence and topic. Statistical analysis of political discourse can help discover patterns that political scientists, who often work via a “close reading,” might otherwise miss. We plan to work with social scientists to validate our implicit hypothesis that our topic shift tendency correlates well with intuitive measures of “influence.” Acknowledgements This research was funded in part by the Army Research Laboratory through ARL Cooperative Agreement W91 1NF-09-2-0072 and by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), through the Army Research Laboratory. Jordan Boyd-Graber and Philip Resnik are also supported by US National Science Foundation Grant NSF grant #1018625. Any opinions, findings, conclusions, or recommendations expressed are the authors’ and do not necessarily reflect those of the sponsors. References [Abbott et al., 2011] Abbott, R., Walker, M., Anand, P., Fox Tree, J. E., Bowmani, R., and King, J. (201 1). How can you say such things?!?: Recognizing disagreement in informal political argument. In Proceedings of the Workshop on Language in Social Media (LSM 2011), pages 2–1 1. [Ahmed and Xing, 2008] Ahmed, A. and Xing, E. P. (2008). Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In SDM, pages 219– 230. [Beeferman et al., 1999] Beeferman, D., Berger, A., and Lafferty, J. (1999). Statistical models for text segmentation. Mach. Learn., 34: 177–210. [Blei and Lafferty, 2009] Blei, D. M. and Lafferty, J. (2009). Text Mining: Theory and Applications, chapter Topic Models. Taylor and Francis, London. [Boyd-Graber and Blei, 2008] Boyd-Graber, J. and Blei, D. M. (2008). Syntactic topic models. In Proceedings of Advances in Neural Information Processing Systems. [Boydstun et al., 2011] Boydstun, A. E., Phillips, C., and Glazier, R. A. (201 1). It’s the economy again, stupid: Agenda control in the 2008 presidential debates. Forthcoming. [Chang and Blei, 2009] Chang, J. and Blei, D. M. (2009). Relational topic models for document networks. In Proceedings of Artificial Intelligence and Statistics. [Chang et al., 2009] Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., and Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems. [Du et al., 2010] Du, L., Buntine, W., and Jin, H. (2010). Sequential latent dirichlet allocation: Discover underlying topic structures within a document. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages 148 –157. 86 [Ehlen et al., 2007] Ehlen, P., Purver, M., and Niekrasz, J. (2007). A meeting browser that learns. In In: Proceedings of the AAAI Spring Symposium on Interaction Challenges for Intelligent Assistants. [Eisenstein and Barzilay, 2008] Eisenstein, J. and Barzilay, R. (2008). Bayesian unsupervised topic segmentation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Proceedings of Emperical Methods in Natural Language Processing. [Eisenstein et al., 2010] Eisenstein, J., O’Connor, B., Smith, N. A., and Xing, E. P. (2010). A latent variable model for geographic lexical variation. In EMNLP’10, pages 1277–1287. [Ferguson, 1973] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230. [Fox et al., 2008] Fox, E. B., Sudderth, E. B., Jordan, M. I., and Willsky, A. S. (2008). An hdp-hmm for systems with state persistence. In Proceedings of International Conference of Machine Learning. [Galley et al., 2003] Galley, M., McKeown, K., FoslerLussier, E., and Jing, H. (2003). Discourse segmentation of multi-party conversation. In Proceedings of the Association for Computational Linguistics. [Georgescul et al., 2006] Georgescul, M., Clark, A., and Armstrong, S. (2006). Word distributions for thematic segmentation in a support vector machine approach. In Conference on Computational Natural Language Learning. [Gerrish and Blei, 2010] Gerrish, S. and Blei, D. M. (2010). A language-based approach to measuring scholarly impact. In Proceedings of International Conference of Machine Learning. [Griffiths et al., 2005] Griffiths, T. L., Steyvers, M., Blei, D. M., and Tenenbaum, J. B. (2005). Integrating topics and syntax. In Proceedings of Advances in Neural Information Processing Systems. [Grimmer, 2009] Grimmer, J. (2009). A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases. Political Analysis, 18: 1–35. [Gruber et al., 2007] Gruber, A., Rosen-Zvi, M., and Weiss, Y. (2007). Hidden topic Markov models. In Artificial Intelligence and Statistics. [Hawes et al., 2009] Hawes, T., Lin, J., and Resnik, P. (2009). Elements of a computational model for multiparty discourse: The turn-taking behavior of Supreme Court justices. Journal of the American Society for Information Science and Technology, 60(8): 1607–1615. [Hearst, 1997] Hearst, M. A. (1997). TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64. [Hsueh et al., 2006] Hsueh, P.-y., Moore, J. D., and Renals, S. (2006). Automatic segmentation of multiparty dialogue. In Proceedings of the European Chapter of the Association for Computational Linguistics. [Ireland et al., 2011] Ireland, M. E., Slatcher, R. B., Eastwick, P. W., Scissors, L. E., Finkel, E. J., and Pennebaker, J. W. (201 1). Language style matching predicts relationship initiation and stability. Psychological Science, 22(1):39–44. [Janin et al., 2003] Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. (2003). The ICSI meeting corpus. In IEEE International Confer- ence on Acoustics, Speech, and Signal Processing. [Johnson, 2010] Johnson, M. (2010). PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proceedings of the Association for Computational Linguistics. [Morris and Hirst, 1991] Morris, J. and Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17:21–48. [M¨ uller and Quintana, 2004] Mu¨ller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statistical Science, 19(1):95–1 10. [Murray et al., 2005] Murray, G., Renals, S., and Carletta, J. (2005). Extractive summarization of meeting recordings. In European Conference on Speech Communication and Technology. [Neal, 2000] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249– 265. [Neal, 2003] Neal, R. M. (2003). Slice sampling. Annals of Statistics, 31:705–767. [Olney and Cai, 2005] Olney, A. and Cai, Z. (2005). An orthonormal basis for topic segmentation in tutorial dialogue. In Proceedings of the Human Language Technology Conference. [Paul and Girju, 2010] Paul, M. and Girju, R. (2010). A two-dimensional topic-aspect model for discovering multi-faceted topics. In Association for the Advancement of Artificial Intelligence. [Pele and Werman, 2009] Pele, O. and Werman, M. (2009). Fast and robust earth mover’s distances. In International Conference on Computer Vision. [Pevzner and Hearst, 2002] Pevzner, L. and Hearst, M. A. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28. [Purver, 2011] Purver, M. (201 1). Topic segmentation. In Tur, G. and de Mori, R., editors, Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pages 291–3 17. Wiley. 87 [Purver et al., 2006] Purver, M., Ko¨rding, K., Griffiths, T. L., and Tenenbaum, J. (2006). Unsupervised topic modelling for multi-party spoken discourse. In Proceedings of the Association for Computational Linguistics. [Resnik and Hardisty, 2010] Resnik, P. and Hardisty, E. (2010). Gibbs sampling for the uninitiated. Technical Report UMIACS-TR-2010-04, University of Maryland. http://www.lib.umd.edu/drum/handle/1903/10058. [Reynar, 1998] Reynar, J. C. (1998). Topic Segmentation: Algorithms and Applications. PhD thesis, University of Pennsylvania. [Rosen-Zvi et al., 2004] Rosen-Zvi, M., Griffiths, T. L., Steyvers, M., and Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of Uncertainty in Artificial Intelligence. [Rubner et al., 2000] Rubner, Y., Tomasi, C., and Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40:99–121 . [Teh et al., 2006] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476): 1566–1581. [Thomas et al., 2006] Thomas, M., Pang, B., and Lee, L. (2006). Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In Proceedings of Emperical Methods in Natural Language Processing. [Tur et al., 2010] Tur, G., Stolcke, A., Voss, L., Peters, S., Hakkani-Tu¨r, D., Dowding, J., Favre, B., Ferna´ndez, R., Frampton, M., Frandsen, M., Frederickson, C., Graciarena, M., Kintzing, D., Leveque, K., Mason, S., Niekrasz, J., Purver, M., Riedhammer, K., Shriberg, E., Tien, J., Vergyri, D., and Yang, F. (2010). The CALO meeting assistant system. Trans. Audio, Speech and Lang. Proc., 18: 1601–161 1. [Wallach, 2006] Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. In Proceedings of International Conference of Machine Learning. [Wallach, 2008] Wallach, H. M. (2008). Structured Topic Models for Language. PhD thesis, University of Cambridge. [Wang et al., 2008] Wang, C., Blei, D. M., and Heckerman, D. (2008). Continuous time dynamic topic models. In Proceedings of Uncertainty in Artificial Intelligence. [Wang and McCallum, 2006] Wang, X. and McCallum, A. (2006). Topics over time: a non-Markov continuoustime model of topical trends. In Knowledge Discovery and Data Mining, Knowledge Discovery and Data Mining.
5 0.46519551 31 acl-2012-Authorship Attribution with Author-aware Topic Models
Author: Yanir Seroussi ; Fabian Bohnert ; Ingrid Zukerman
Abstract: Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.
6 0.4569535 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
7 0.44489846 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
8 0.39672133 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
9 0.36596557 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
10 0.31345642 144 acl-2012-Modeling Review Comments
11 0.29777765 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
13 0.27336121 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System
14 0.27030754 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
15 0.25680247 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
16 0.25401676 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
17 0.22551548 216 acl-2012-Word Epoch Disambiguation: Finding How Words Change Over Time
18 0.22223815 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
19 0.21814905 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
20 0.21770951 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
topicId topicWeight
[(26, 0.024), (28, 0.039), (30, 0.042), (37, 0.023), (39, 0.059), (45, 0.072), (52, 0.011), (74, 0.015), (76, 0.097), (82, 0.036), (84, 0.012), (85, 0.013), (90, 0.244), (92, 0.144), (94, 0.012), (99, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.96170789 98 acl-2012-Finding Bursty Topics from Microblogs
Author: Qiming Diao ; Jing Jiang ; Feida Zhu ; Ee-Peng Lim
Abstract: Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microblog posts are particularly diverse and noisy. To find topics that have bursty patterns on microblogs, we propose a topic model that simultaneously captures two observations: (1) posts published around the same time are more likely to have the same topic, and (2) posts published by the same user are more likely to have the same topic. The former helps find eventdriven posts while the latter helps identify and filter out “personal” posts. Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. We also show some case studies that demonstrate the importance of considering both the temporal information and users’ personal interests for bursty topic detection from microblogs.
2 0.93073779 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection
Author: Xin Zhao ; Rishan Chen ; Kai Fan ; Hongfei Yan ; Xiaoming Li
Abstract: Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, we proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. Meanwhile, it significantly reduces the number of non-zero entries in the representation. We test it via scalable event detection, and experiments in a 10-year news archive show that our methods are both effective and efficient.
3 0.90050912 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Author: Hiroyuki Shindo ; Yusuke Miyao ; Akinori Fujino ; Masaaki Nagata
Abstract: We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
4 0.8913576 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models
Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha
Abstract: We present a novel approach for verb subcategorization lexicons using a simple graphical model. In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. This quantitative evaluation is com- plemented by a qualitative discussion of verbs and their frames. We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. We conclude with future work to augment the approach.
5 0.88841194 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
Author: Xinyan Xiao ; Deyi Xiong ; Min Zhang ; Qun Liu ; Shouxun Lin
Abstract: Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. Our model also achieves a better performance and a faster speed than previous approaches that work at the word level.
6 0.88522464 31 acl-2012-Authorship Attribution with Author-aware Topic Models
7 0.88473952 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
8 0.88237226 167 acl-2012-QuickView: NLP-based Tweet Search
9 0.88053006 142 acl-2012-Mining Entity Types from Query Logs via User Intent Modeling
10 0.87551707 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
11 0.87315214 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling
12 0.87106824 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
13 0.87082523 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
14 0.86942887 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
15 0.86887431 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
16 0.86670512 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
17 0.8658188 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
18 0.86483508 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
19 0.8629052 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery
20 0.86180568 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions