acl acl2011 acl2011-118 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu a Abstract In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. [sent-9, score-0.742]
2 We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. [sent-10, score-1.156]
3 Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. [sent-11, score-0.432]
4 Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. [sent-12, score-0.606]
5 1 Introduction In conversation, dialogue partners often become more similar to each other. [sent-13, score-0.287]
6 This phenomenon, known in the literature as entrainment, alignment, accommodation, or adaptation has been found to occur along many acoustic, prosodic, syntactic and lexical dimensions in both human-human interactions (Brennan and Clark, 1996; Coulston et al. [sent-14, score-0.035]
7 , 2003) and has been associated with dialogue success and naturalness (Pickering and Garrod, 2004; Goleman, 113 2006; Nenkova et al. [sent-19, score-0.198]
8 However, the question of how best to measure this phenomenon has not been well established. [sent-22, score-0.061]
9 Most research has examined similarity of behavior over a conversation, or has compared similarity in early and later phases of a conversation; more recent work has proposed new metrics of synchrony and convergence (Edlund et al. [sent-23, score-0.102]
10 , 2009) and measures of similar- ity at a more local level (Heldner et al. [sent-24, score-0.054]
11 While a number of dimensions of potential entrainment have been studied in the literature, entrainment in turn-taking behaviors has received little attention. [sent-26, score-1.507]
12 In this paper we examine entrainment in a novel turn-taking dimension: backchannelpreceding cues (BPC)s. [sent-27, score-0.964]
13 1 Backchannels are short segments of speech uttered to signal continued interest and understanding without taking the floor (Schegloff, 1982). [sent-28, score-0.108]
14 In a study of the Columbia Games Corpus, Gravano and Hirschberg (2009; 2011) identify five speech phenomena that are significantly correlated with speech followed by backchannels. [sent-29, score-0.126]
15 However, they also note that individual speakers produced different combinations of these cues and varied the way cues were expressed. [sent-30, score-0.462]
16 In our work, we look for evidence that speaker pairs negotiate the choice of such cues and their realizations in a conversation that is, they entrain to one another in their choice and production of such cues. [sent-31, score-0.628]
17 We test for evidence both at the global and at the local level. [sent-32, score-0.089]
18 – 1Prior studies termed cues that precede backchannels, backchannel-inviting cues. [sent-33, score-0.182]
19 To avoid suggesting that such cues are a speaker’s conscious decision, we adopt a more neutral term. [sent-34, score-0.182]
20 In Section 3, we present three measures of BPC entrainment. [sent-38, score-0.038]
21 In Section 4, we further show that two of these measures also correlate with dialogue coordination and task success. [sent-39, score-0.35]
22 2 The Columbia Games Corpus The Columbia Games Corpus is a collection of 12 spontaneous dyadic conversations elicited from native speakers of Standard American English. [sent-40, score-0.119]
23 13 people participated in the collection of the corpus. [sent-41, score-0.02]
24 11 participated in two sessions, each time with a different partner. [sent-42, score-0.02]
25 They played a series of computer games requiring collaboration in order to achieve a high score. [sent-44, score-0.119]
26 There are 5641 exchanges in the corpus; of these, approximately 58% are smooth switches, 2% are interruptions, and 11% are backchannels. [sent-47, score-0.018]
27 Other turn types include overlaps and pause interruptions; a full description of the Columbia Games Corpus’ annotation for turn-taking behavior can be found in (Gravano and Hirschberg, 2011). [sent-48, score-0.089]
28 3 Evidence of entrainment Gravano and Hirschberg (2009; 2011) identify five cues that tend to be present in speech preceding backchannels. [sent-49, score-0.992]
29 The likelihood that a segment of speech will be followed by a backchannel increases quadratically with the number of cues present in the speech. [sent-51, score-0.372]
30 However, they note that individual speakers may display different combinations of cues. [sent-52, score-0.098]
31 Furthermore, the realization of a cue may differ from speaker to speaker. [sent-53, score-0.256]
32 We hypothesize that speaker pairs adopt a common set of cues to which each will respond with a backchan- nel. [sent-54, score-0.374]
33 We look for evidence for this hypothesis using three different measures of entrainment. [sent-55, score-0.104]
34 these measures capture entrainment globally, over the course of an entire dialogue, while the third looks at entrainment on a local level. [sent-57, score-1.564]
35 The unit of analysis we employ for each experiment is an interpausal unit (IPU), defined as a pause-free segment of speech from a single speaker, where pause is de- fined as a silence of 50ms or more from the same speaker. [sent-58, score-0.105]
36 We term consecutive pairs of IPUs from a single speaker holds, and contrast hold-preceding IPUs with backchannel-preceding IPUs to isolate cues that are significant in preceding backchannels. [sent-59, score-0.375]
37 That is, when a speaker pauses without giving up the turn, which IPUs are followed by backchannels and which are not? [sent-60, score-0.276]
38 We consider a speaker to use a certain BPC if, for any of the features modeling that cue, the difference between backchannelpreceding IPUs and hold-preceding IPUs is significant (ANOVA, p < 0. [sent-61, score-0.204]
39 1 Entrainment measure 1: Common cues For our first entrainment metric, we measure the similarity of two speakers’ cue sets by simply counting the number of cues that they have in common over the entire conversation. [sent-64, score-1.266]
40 We hypothesize that speaker pairs will use similar sets of cues. [sent-65, score-0.192]
41 The speakers in our corpus each displayed 0 to 5 of the BPCs described in Table 1(mean = 2. [sent-66, score-0.098]
42 The number of cues speaker pairs had in common ranged from 0 to 4 (out of a maximum of 5). [sent-68, score-0.349]
43 Let S1 and S2 be two speakers in a given dialogue, and n1,2 the number of BPCs they had in common. [sent-69, score-0.098]
44 Let also n1,∗ and n∗,2 be the mean number of cues S1 and S2 had in common with all other speakers in the corpus not partnered with them in any session. [sent-70, score-0.346]
45 The results indicate that, on average, the speakers had significantly more cues in common with their interlocutors than with other speakers in the corpus (t = 2. [sent-72, score-0.434]
46 We measure how similarly two speakers S1 and S2 in a conversation realize a BPC as follows: First, we compute the difference between both speakers for the mean value of a feature f over all backchannel-preceding IPUs. [sent-78, score-0.384]
47 Second, we compute the same difference between each of S1 and S2 and the averaged values of all other speakers in the corpus who are not partnered with that speaker in any session (df1,∗ and d∗f,2). [sent-79, score-0.331]
48 Finally, if for any fea- (df1,2) ture f modeling a given cue, it holds that d1f,2 < min(d1f,∗ , d∗f,2), we say that that session exhibits mutual1 e,∗ntra∗i,n2ment on that cue. [sent-80, score-0.033]
49 Eleven out of 12 sessions exhibit mutual entrainment on pitch and intensity, 9 exhibit mutual entrainment on voice quality, 8 on intonation, and 7 on duration. [sent-81, score-1.624]
50 Interestingly, the only session not entraining on intensity is the only session not entraining on pitch, but the relationships between the different types of entrainment is not readily observable. [sent-82, score-0.969]
51 For each of the 10 features associated with backchannel invitation, we compare the differences between conversational partners and the aver- (df1,2) asgpe dak deirfsfe inre nthce s c oberptuwse (nd1f e,∗acahn sdp de∗fa,2k)e. [sent-83, score-0.315]
52 r P a nidre tdhe t- otethsetsr (Table 2) show that the dif1f,∗erences∗ ,in2 intensity, pitch and voice quality in backchannel-preceding IPUs are smaller between conversational partners than between speakers and their non-partners in the corpus. [sent-84, score-0.418]
53 115 Table 2: T-tests between partners and their non-partners in the corpus. [sent-85, score-0.121]
54 The differences between interlocutor and their non-partners in features modeling pitch show that there is no single “optimal” value for a pitch level that precedes a backchannel; this value is coordi- nated between partners on a pair-by-pair basis. [sent-86, score-0.412]
55 Similarly, while varying intensity or voice quality may be considered a universal cue for a backchannel, the specific values of the production appear to be a matter of coordination between individual speaker pairs. [sent-87, score-0.511]
56 While some views of entrainment hold that coordination takes place at the very beginning of a dialogue, others hypothesize that coordination continues to improve over the course of the conversation. [sent-88, score-1.082]
57 T-tests for difference of means show that indeed the differences between conversational partners in mean pitch and intensity in the final 1000 milliseconds of backchannel-preceding IPUs are smaller in the second half of the conversation than in the first (t = 3. [sent-89, score-0.507]
58 01), indicating that entrainment in this dimension is an ongoing process that results in closer alignment after the interlocutors have been speaking for some time. [sent-93, score-0.818]
59 3 Measure 3: Local BPC entrainment Measures 1 and 2 capture global entrainment and can be used to characterize an entire dialogue with respect to entrainment. [sent-95, score-1.682]
60 We now look for evidence to support the hypothesis that a speaker’s realization of BPCs influences how her interlocutor produces BPCs. [sent-96, score-0.17]
61 To capture this, we compile a list of pairs of backchannel-preceding IPUs, in which the second member of each pair follows the first in the conversation and is produced by a different speaker. [sent-97, score-0.096]
62 For each feature, we calculate the Pearson’s correlation between acoustic variables extracted from the first element of each pair and the second. [sent-98, score-0.05]
63 The correlations for mean pitch and intensity are significant (r = 0. [sent-99, score-0.274]
64 These results suggest that entrainment on pitch and intensity at least is a localized phenomenon. [sent-103, score-0.937]
65 Spoken dialogue systems may exploit this information, modifying their output to invite a backchannel similar to the user’s own previous backchannel invitation. [sent-104, score-0.424]
66 4 Correlation with dialogue coordination and task success Entrainment is widely believed to be crucial to dialogue coordination. [sent-105, score-0.536]
67 Long latencies (periods of silence) before backchannels can be considered a sign of poor coordination, as when a speaker is waiting for an indication that his partner is still attending, and the partner is slow to realize this. [sent-107, score-0.389]
68 Similarly, interruptions signal poor coordination, as when a speaker has not finished what he has to say, but his partner thinks it is her turn to speak. [sent-108, score-0.419]
69 We thus use mean backchannel latency and proportion of interruptions as measures of coordination of whole sessions. [sent-109, score-0.561]
70 We use the combined score of the games the subjects played as a measure of task success. [sent-110, score-0.157]
71 We correlate all three with our two global entrainment scores and report correlation coefficients in Table 3. [sent-111, score-0.799]
72 Our first metric for identifying entrainment, Measure 1, the number of cues the speaker pair has in common, is negatively correlated with mean latency 116 and proportion of interruptions, our two measures of poor coordination. [sent-113, score-0.555]
73 Its correlation with score, though not significant, is positive. [sent-114, score-0.028]
74 So, more entrainment in BPCs under Measure 1means smaller latency before backchannels and fewer interruptions, while there is a tendency for such entrainment to be associated with higher scores. [sent-115, score-1.644]
75 Our second entrainment metric, Measure 2, captures the similarities between speaker means of the 10 features associated with BPCs. [sent-116, score-0.912]
76 To test correlations of this measure with task success, we collapse the ten features into a single measure by taking the negated Euclidean distance between each speaker pair’s 2 vectors of means; this measure tells us how close these speakers are across all features examined. [sent-117, score-0.428]
77 Under this analysis, we find that Measure 2 is negatively correlated with mean latency and positively correlated with score. [sent-118, score-0.177]
78 Again, the correlation with interruptions is negative, although not significant. [sent-120, score-0.176]
79 Thus, more entrainment defined by this metric means shorter latency between turns, fewer interruptions, and again and more strongly, higher scores. [sent-121, score-0.829]
80 We thus find that, the more entrainment at the global level, the better the coordination between the partners and the better their performance on their joint task. [sent-122, score-1.038]
81 These results provide evidence of the importance of BPC entrainment to dialogue. [sent-123, score-0.792]
82 5 Conclusion In this paper we discuss the role of entrainment in turn-taking behavior and its impact on conversational coordination and task success in the Columbia Games Corpus. [sent-124, score-1.016]
83 We examine a novel form of entrainment, entrainment in BPCs characteristics of speech segments that are followed by backchannels from the interlocutor. [sent-125, score-0.893]
84 We employ three measures of entrainment two global and one local and find evidence of entrainment in all three. [sent-126, score-1.617]
85 We also find correlations between our two global entrainment measures and conversational coordination and task success. [sent-127, score-1.069]
86 In future, we will extend this analysis – – – to the complementary turn-taking category of turnyielding cues and explore how a spoken dialogue system may take advantage of information about entrainment to improve dialogue coordination and the user experience. [sent-128, score-1.435]
87 Amplitude convergence in children’s conversational speech with animated personas. [sent-178, score-0.136]
88 Lexical and syntactic priming and their impact in deployed spoken dialogue systems. [sent-250, score-0.233]
89 Automatically measuring lexical and acoustic/prosodic convergence in tutorial dialog corpora. [sent-256, score-0.051]
wordName wordTfidf (topN-words)
[('entrainment', 0.745), ('cues', 0.182), ('bpc', 0.168), ('bpcs', 0.168), ('ipus', 0.168), ('speaker', 0.167), ('dialogue', 0.166), ('interruptions', 0.148), ('coordination', 0.146), ('backchannel', 0.129), ('partners', 0.121), ('pitch', 0.108), ('games', 0.099), ('speakers', 0.098), ('conversation', 0.096), ('backchannels', 0.087), ('gravano', 0.086), ('intensity', 0.084), ('columbia', 0.078), ('interlocutor', 0.075), ('latency', 0.067), ('conversational', 0.065), ('cue', 0.06), ('partner', 0.057), ('brennan', 0.056), ('entrain', 0.056), ('heldner', 0.056), ('interlocutors', 0.056), ('correlations', 0.049), ('edlund', 0.049), ('evidence', 0.047), ('hirschberg', 0.044), ('speech', 0.039), ('bell', 0.039), ('measure', 0.038), ('measures', 0.038), ('aires', 0.037), ('backchannelpreceding', 0.037), ('buder', 0.037), ('buenos', 0.037), ('coulston', 0.037), ('entraining', 0.037), ('lumbi', 0.037), ('niederhoffer', 0.037), ('priming', 0.037), ('reitter', 0.037), ('pause', 0.036), ('session', 0.033), ('negotiate', 0.033), ('stoyanchev', 0.033), ('gustafson', 0.033), ('partnered', 0.033), ('mean', 0.033), ('success', 0.032), ('prosodic', 0.032), ('convergence', 0.032), ('spoken', 0.03), ('silence', 0.03), ('pickering', 0.03), ('realization', 0.029), ('switches', 0.028), ('correlation', 0.028), ('production', 0.028), ('behavior', 0.028), ('correlated', 0.026), ('voice', 0.026), ('preceding', 0.026), ('global', 0.026), ('believed', 0.026), ('floor', 0.026), ('turn', 0.025), ('negatively', 0.025), ('alter', 0.025), ('hypothesize', 0.025), ('psychology', 0.024), ('ward', 0.024), ('phenomenon', 0.023), ('df', 0.023), ('followed', 0.022), ('acoustic', 0.022), ('signal', 0.022), ('continued', 0.021), ('realize', 0.021), ('spontaneous', 0.021), ('similarity', 0.021), ('participated', 0.02), ('course', 0.02), ('played', 0.02), ('nenkova', 0.02), ('social', 0.019), ('dialog', 0.019), ('look', 0.019), ('smooth', 0.018), ('adaptation', 0.018), ('metric', 0.017), ('speaking', 0.017), ('dimensions', 0.017), ('local', 0.016), ('georgetown', 0.016), ('slate', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 118 acl-2011-Entrainment in Speech Preceding Backchannels.
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
2 0.19284318 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
Author: Fabrizio Morbini ; Kenji Sagae
Abstract: Individual utterances often serve multiple communicative purposes in dialogue. We present a data-driven approach for identification of multiple dialogue acts in single utterances in the context of dialogue systems with limited training data. Our approach results in significantly increased understanding of user intent, compared to two strong baselines.
4 0.12566228 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
Author: Elijah Mayfield ; Carolyn Penstein Rose
Abstract: We present a novel computational formulation of speaker authority in discourse. This notion, which focuses on how speakers position themselves relative to each other in discourse, is first developed into a reliable coding scheme (0.71 agreement between human annotators). We also provide a computational model for automatically annotating text using this coding scheme, using supervised learning enhanced by constraints implemented with Integer Linear Programming. We show that this constrained model’s analyses of speaker authority correlates very strongly with expert human judgments (r2 coefficient of 0.947).
5 0.11380421 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
Author: Paul Piwek ; Svetlana Stoyanchev
Abstract: This short paper introduces an implemented and evaluated monolingual Text-to-Text generation system. The system takes monologue and transforms it to two-participant dialogue. After briefly motivating the task of monologue-to-dialogue generation, we describe the system and present an evaluation in terms of fluency and accuracy.
6 0.11197273 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue
7 0.10738858 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
8 0.10112531 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
9 0.094015718 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
10 0.086844057 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
11 0.083838798 227 acl-2011-Multimodal Menu-based Dialogue with Speech Cursor in DICO II+
12 0.083830737 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
13 0.069629289 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
14 0.064328581 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations
15 0.063341476 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
16 0.052824676 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
17 0.048065618 101 acl-2011-Disentangling Chat with Local Coherence Models
18 0.040157799 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
19 0.037285607 252 acl-2011-Prototyping virtual instructors from human-human corpora
20 0.036894895 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
topicId topicWeight
[(0, 0.075), (1, 0.031), (2, -0.026), (3, 0.021), (4, -0.197), (5, 0.195), (6, -0.029), (7, -0.018), (8, 0.012), (9, 0.016), (10, 0.055), (11, -0.007), (12, 0.023), (13, 0.004), (14, 0.031), (15, -0.008), (16, -0.035), (17, 0.001), (18, 0.007), (19, -0.055), (20, 0.02), (21, -0.025), (22, -0.05), (23, 0.102), (24, 0.015), (25, 0.074), (26, 0.066), (27, -0.004), (28, -0.047), (29, -0.006), (30, -0.023), (31, 0.01), (32, 0.058), (33, 0.016), (34, -0.037), (35, -0.012), (36, 0.084), (37, -0.018), (38, -0.02), (39, 0.033), (40, -0.021), (41, -0.026), (42, 0.12), (43, -0.008), (44, 0.036), (45, 0.082), (46, -0.008), (47, 0.017), (48, 0.028), (49, 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.95355374 118 acl-2011-Entrainment in Speech Preceding Backchannels.
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
2 0.91390836 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
3 0.65324408 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
Author: Siwei Wang ; Gina-Anne Levow
Abstract: Verbal feedback is an important information source in establishing interactional rapport. However, predicting verbal feedback across languages is challenging due to languagespecific differences, inter-speaker variation, and the relative sparseness and optionality of verbal feedback. In this paper, we employ an approach combining classifier weighting and SMOTE algorithm oversampling to improve verbal feedback prediction in Arabic, English, and Spanish dyadic conversations. This approach improves the prediction of verbal feedback, up to 6-fold, while maintaining a high overall accuracy. Analyzing highly weighted features highlights widespread use of pitch, with more varied use of intensity and duration.
4 0.62249684 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
Author: Je Hun Jeon ; Wen Wang ; Yang Liu
Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.
5 0.62127364 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue
Author: Kristy Boyer ; Joseph Grafsgaard ; Eun Young Ha ; Robert Phillips ; James Lester
Abstract: Dialogue act classification is a central challenge for dialogue systems. Although the importance of emotion in human dialogue is widely recognized, most dialogue act classification models make limited or no use of affective channels in dialogue act classification. This paper presents a novel affect-enriched dialogue act classifier for task-oriented dialogue that models facial expressions of users, in particular, facial expressions related to confusion. The findings indicate that the affectenriched classifiers perform significantly better for distinguishing user requests for feedback and grounding dialogue acts within textual dialogue. The results point to ways in which dialogue systems can effectively leverage affective channels to improve dialogue act classification. 1
6 0.60928988 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations
7 0.60597485 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
8 0.58662784 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
10 0.56910318 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
11 0.55995321 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
12 0.52606571 227 acl-2011-Multimodal Menu-based Dialogue with Speech Cursor in DICO II+
13 0.49941319 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
14 0.40562353 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
15 0.37705213 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
16 0.36811286 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
17 0.32695672 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style
18 0.32471639 252 acl-2011-Prototyping virtual instructors from human-human corpora
19 0.3203195 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity
20 0.31937599 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
topicId topicWeight
[(5, 0.032), (17, 0.487), (26, 0.018), (37, 0.036), (39, 0.017), (41, 0.075), (55, 0.014), (59, 0.025), (72, 0.018), (91, 0.023), (96, 0.115), (98, 0.038)]
simIndex simValue paperId paperTitle
1 0.94156009 107 acl-2011-Dynamic Programming Algorithms for Transition-Based Dependency Parsers
Author: Marco Kuhlmann ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We develop a general dynamic programming technique for the tabulation of transition-based dependency parsers, and apply it to obtain novel, polynomial-time algorithms for parsing with the arc-standard and arc-eager models. We also show how to reverse our technique to obtain new transition-based dependency parsers from existing tabular methods. Additionally, we provide a detailed discussion of the conditions under which the feature models commonly used in transition-based parsing can be integrated into our algorithms.
2 0.89000541 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
Author: Gunter Neumann ; Sven Schmeier
Abstract: We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted between chunks. Our measure of association strength is based on the pointwise mutual information between chunk pairs which explicitly takes their distance into account. An initial user evaluation shows that this system is especially helpful for finding new interesting information on topics about which the user has only a vague idea or even no idea at all.
3 0.88983715 109 acl-2011-Effective Measures of Domain Similarity for Parsing
Author: Barbara Plank ; Gertjan van Noord
Abstract: It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. As more and more data becomes available, automatic ways to select data that is beneficial for a new (unknown) target domain are becoming attractive. This paper evaluates various ways to automatically acquire related training data for a given test set. The results show that an unsupervised technique based on topic models is effective – it outperforms random data selection on both languages exam- ined, English and Dutch. Moreover, the technique works better than manually assigned labels gathered from meta-data that is available for English. 1 Introduction and Motivation Previous research on domain adaptation has focused on the task of adapting a system trained on one domain, say newspaper text, to a particular new domain, say biomedical data. Usually, some amount of (labeled or unlabeled) data from the new domain was given which has been determined by a human. However, with the growth of the web, more and more data is becoming available, where each document “is potentially its own domain” (McClosky et al., 2010). It is not straightforward to determine – 1566 Gertjan van Noord University of Groningen The Netherlands G J M van Noord@ rug nl . . . . . which data or model (in case we have several source domain models) will perform best on a new (unknown) target domain. Therefore, an important issue that arises is how to measure domain similarity, i.e. whether we can find a simple yet effective method to determine which model or data is most beneficial for an arbitrary piece of new text. Moreover, if we had such a measure, a related question is whether it can tell us something more about what is actually meant by “domain”. So far, it was mostly arbitrarily used to refer to some kind of coherent unit (related to topic, style or genre), e.g.: newspaper text, biomedical abstracts, questions, fiction. Most previous work on domain adaptation, for instance Hara et al. (2005), McClosky et al. (2006), Blitzer et al. (2006), Daum e´ III (2007), sidestepped this problem of automatic domain selection and adaptation. For parsing, to our knowledge only one recent study has started to examine this issue (McClosky et al., 2010) we will discuss their approach in Section 2. Rather, an implicit assumption of all of these studies is that domains are given, i.e. that they are represented by the respective corpora. Thus, a corpus has been considered a homogeneous unit. As more data is becoming available, it is unlikely that – domains will be ‘given’ . Moreover, a given corpus might not always be as homogeneous as originally thought (Webber, 2009; Lippincott et al., 2010). For instance, recent work has shown that the well-known Penn Treebank (PT) Wall Street Journal (WSJ) actually contains a variety of genres, including letters, wit and short verse (Webber, 2009). In this study we take a different approach. Rather than viewing a given corpus as a monolithic entity, ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. A ?c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s566–1576, we break it down to the article-level and disregard corpora boundaries. Given the resulting set of documents (articles), we evaluate various ways to automatically acquire related training data for a given test set, to find answers to the following questions: • Given a pool of data (a collection of articles fGriovmen nun ak pnooowln o domains) caonldle a test article, eiss there a way to automatically select data that is relevant for the new domain? If so: • Which similarity measure is good for parsing? • How does it compare to human-annotated data? • Is the measure also useful for other languages Iasnd th/oer mtaesakssu?r To this end, we evaluate measures of domain similarity and feature representations and their impact on dependency parsing accuracy. Given a collection of annotated articles, and a new article that we want to parse, we want to select the most similar articles to train the best parser for that new article. In the following, we will first compare automatic measures to human-annotated labels by examining parsing performance within subdomains of the Penn Treebank WSJ. Then, we extend the experiments to the domain adaptation scenario. Experiments were performed on two languages: English and Dutch. The empirical results show that a simple measure based on topic distributions is effective for both languages and works well also for Part-of-Speech tagging. As the approach is based on plain surfacelevel information (words) and it finds related data in a completely unsupervised fashion, it can be easily applied to other tasks or languages for which annotated (or automatically annotated) data is available. 2 Related Work The work most related to ours is McClosky et al. (2010). They try to find the best combination of source models to parse data from a new domain, which is related to Plank and Sima’an (2008). In the latter, unlabeled data was used to create several parsers by weighting trees in the WSJ according to their similarity to the subdomain. McClosky et al. (2010) coined the term multiple source domain adaptation. Inspired by work on parsing accuracy 1567 prediction (Ravi et al., 2008), they train a linear regression model to predict the best (linear interpolation) of source domain models. Similar to us, McClosky et al. (2010) regard a target domain as mixture of source domains, but they focus on phrasestructure parsing. Furthermore, our approach differs from theirs in two respects: we do not treat source corpora as one entity and try to mix models, but rather consider articles as base units and try to find subsets of related articles (the most similar articles); moreover, instead of creating a supervised model (in their case to predict parsing accuracy), our approach is ‘simplistic’ : we apply measures of domain simi- larity directly (in an unsupervised fashion), without the necessity to train a supervised model. Two other related studies are (Lippincott et al., 2010; Van Asch and Daelemans, 2010). Van Asch and Daelemans (2010) explore a measure of domain difference (Renyi divergence) between pairs of domains and its correlation to Part-of-Speech tagging accuracy. Their empirical results show a linear correlation between the measure and the performance loss. Their goal is different, but related: rather than finding related data for a new domain, they want to estimate the loss in accuracy of a PoS tagger when applied to a new domain. We will briefly discuss results obtained with the Renyi divergence in Section 5.1. Lippincott et al. (2010) examine subdomain variation in biomedicine corpora and propose awareness of NLP tools to such variation. However, they did not yet evaluate the effect on a practical task, thus our study is somewhat complementary to theirs. The issue of data selection has recently been examined for Language Modeling (Moore and Lewis, 2010). A subset of the available data is automatically selected as training data for a Language Model based on a scoring mechanism that compares cross- entropy scores. Their approach considerably outperformed random selection and two previous proposed approaches both based on perplexity scoring.1 3 Measures of Domain Similarity 3.1 Measuring Similarity Automatically Feature Representations A similarity function may be defined over any set of events that are con1We tested data selection by perplexity scoring, but found the Language Models too small to be useful in our setting. sidered to be relevant for the task at hand. For parsing, these might be words, characters, n-grams (of words or characters), Part-of-Speech (PoS) tags, bilexical dependencies, syntactic rules, etc. However, to obtain more abstract types such as PoS tags or dependency relations, one would first need to gather respective labels. The necessary tools for this are again trained on particular corpora, and will suffer from domain shifts, rendering labels noisy. Therefore, we want to gauge the effect of the simplest representation possible: plain surface characteristics (unlabeled text). This has the advantage that we do not need to rely on additional supervised tools; moreover, it is interesting to know how far we can get with this level of information only. We examine the following feature representations: relative frequencies of words, relative frequencies of character tetragrams, and topic models. Our motivation was as follows. Relative frequencies of words are a simple and effective representation used e.g. in text classification (Manning and Sch u¨tze, 1999), while character n-grams have proven successful in genre classification (Wu et al., 2010). Topic models (Blei et al., 2003; Steyvers and Griffiths, 2007) can be considered an advanced model over word distributions: every article is represented by a topic distribution, which in turn is a distribution over words. Similarity between documents can be measured by comparing topic distributions. Similarity Functions There are many possible similarity (or distance) functions. They fall broadly into two categories: probabilistically-motivated and geometrically-motivated functions. The similarity functions examined in this study will be described in the following. The Kullback-Leibler (KL) divergence D(q| |r) is a cTlahsesic Kaull measure oibfl ‘edri s(KtaLn)ce d’i2v ebregtweneceen D Dtw(oq probability distributions, and is defined as: D(q| |r) = Pyq(y)logrq((yy)). It is a non-negative, additive, aPsymmetric measure, and 0 iff the two distributions are identical. However, the KL-divergence is undefined if there exists an event y such that q(y) > 0 but r(y) = 0, which is a property that “makes it unsuitable for distributions derived via maximumlikelihood estimates” (Lee, 2001). 2It is not a proper distance metric since it is asymmetric. 1568 One option to overcome this limitation is to apply smoothing techniques to gather non-zero estimates for all y. The alternative, examined in this paper, is to consider an approximation to the KL divergence, such as the Jensen-Shannon (JS) divergence (Lin, 1991) and the skew divergence (Lee, 2001). The Jensen-Shannon divergence, which is symmetric, computes the KL-divergence between q, r, and the average between the two. We use the JS divergence as defined in Lee (2001): JS(q, r) = [D(q| |avg(q, r)) + D(r| |avg(q, r))] . The asymm[eDtr(icq |s|akvewg( divergence sα, proposed by Lee (2001), mixes one distribution with the other by a degree de- 21 fined by α ∈ [0, 1) : sα (q, r, α) = D(q| |αr + (1 α)q). Ays α α approaches 1, rt,hαe )sk =ew D divergence approximates the KL-divergence. An alternative way to measure similarity is to consider the distributions as vectors and apply geometrically-motivated distance functions. This family of similarity functions includes the cosine cos(q, r) = qq(y) · r(y)/ | |q(y) | | | |r(y) | |, euclidean − euc(q,r) = qPy(q(y) − r(y))2 and variational (also known asq LP1 or MPanhattan) distance function, defined as var(q, r) = Py |q(y) − r(y) |. 3.2 Human-annotatePd data In contrast to the automatic measures devised in the previous section, we might have access to human annotated data. That is, use label information such as topic or genre to define the set of similar articles. Genre For the Penn Treebank (PT) Wall Street Journal (WSJ) section, more specifically, the subset available in the Penn Discourse Treebank, there exists a partition of the data by genre (Webber, 2009). Every article is assigned one of the following genre labels: news, letters, highlights, essays, errata, wit and short verse, quarterly progress reports, notable and quotable. This classification has been made on the basis of meta-data (Webber, 2009). It is wellknown that there is no meta-data directly associated with the individual WSJ files in the Penn Treebank. However, meta-data can be obtained by looking at the articles in the ACL/DCI corpus (LDC99T42), and a mapping file that aligns document numbers of DCI (DOCNO) to WSJ keys (Webber, 2009). An example document is given in Figure 1. The metadata field HL contains headlines, SO source info, and the IN field includes topic markers.
same-paper 4 0.86621743 118 acl-2011-Entrainment in Speech Preceding Backchannels.
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
5 0.85534048 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
Author: Ashish Vaswani ; Haitao Mi ; Liang Huang ; David Chiang
Abstract: Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use minimal rules (those that cannot be formed out of other rules), and instead rely on a rule Markov model of the derivation history to capture dependencies between minimal rules. Large-scale experiments on a state-of-the-art tree-to-string translation system show that our approach leads to a slimmer model, a faster decoder, yet the same translation quality (measured using B ) as composed rules.
6 0.81172788 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
7 0.64025742 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
8 0.62592041 30 acl-2011-Adjoining Tree-to-String Translation
9 0.5939908 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
10 0.55739319 141 acl-2011-Gappy Phrasal Alignment By Agreement
11 0.55169302 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
12 0.54938966 154 acl-2011-How to train your multi bottom-up tree transducer
13 0.5425573 61 acl-2011-Binarized Forest to String Translation
14 0.54170978 296 acl-2011-Terminal-Aware Synchronous Binarization
15 0.52119333 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
16 0.51763469 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
18 0.5114435 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
19 0.51084602 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
20 0.51080239 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction