acl acl2011 acl2011-312 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. [sent-3, score-0.333]
2 This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. [sent-4, score-1.118]
3 Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. [sent-5, score-0.452]
4 In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. [sent-6, score-0.67]
5 1 Introduction Human conversation is a seemingly simple, everyday phenomenon that requires a complex mental process of turn-taking, in which participants manage to yield and hold the floor with little pause inbetween speaking turns. [sent-7, score-0.407]
6 Most linguists subscribe to the idea that this process is governed by a subconscious internal mechanism, that is, a set of cues or rules that steers humans toward proper turntaking (Duncan, 1972). [sent-8, score-0.418]
7 These cues may include lexical features such as the words used to end the turn, or prosodic features such as speaking rate, pitch, and intensity (Cutler and Pearson, 1986). [sent-9, score-0.665]
8 While successful turn-taking is fairly easy for humans to accomplish, it is still difficult for models to be implemented in spoken dialogue systems. [sent-10, score-0.355]
9 Many systems use a set time-out to decide 94 when a user is finished speaking, often resulting in unnaturally long pauses or awkward overlaps (Ward, et. [sent-11, score-0.029]
10 Others detect when a user interrupts the system, known as “barge-in”, though this is characteristic of failed turn-taking rather than successful conversation (Glass, 1999). [sent-14, score-0.08]
11 Improper turn-taking can often be a source of user discomfort and dissatisfaction with a spoken dialogue system. [sent-15, score-0.362]
12 Little work has been done to study turn-taking in tutoring, so we hope to investigate it further while using a human-human (HH) tutoring corpus and language technologies to extract useful information about turn-taking cues. [sent-16, score-0.347]
13 This analysis is particularly interesting in a tutoring domain because of the speculated unequal statuses of participants. [sent-17, score-0.359]
14 The goal is to eventually develop a model for turn-taking based on this analysis which can be implemented in an existent tutoring system, ITSPOKE, an intelligent tutor for college-level Newtonian physics (Litman and Silliman, 2004). [sent-18, score-0.621]
15 ITSPOKE currently uses a time-out to determine the end of a student turn and does not recognize student barge-in. [sent-19, score-0.422]
16 We hypothesize that improving upon the turn-taking model this system uses will help engage students and hopefully lead to increased student learning, a standard performance measure of intelligent tutoring systems (Litman et. [sent-20, score-0.614]
17 2 Related Work Turn-taking has been a recent focus in spoken dialogue system work, with research producing many different models and approaches. [sent-23, score-0.333]
18 c T2 2001111 A Sstsuodceinatti Soens fsoiorn C,o pmagpeusta 94ti–o9n8a,l Linguistics model, which is used to predict end-of-turn and performed significantly better than a fixedthreshold baseline in reducing endpointing latency in a spoken dialogue system. [sent-26, score-0.333]
19 Selfridge and Heeman (2010) took a different approach and presented a bidding model for turn-taking, in which dialogue participants compete for the turn based on the importance of what they will say next. [sent-27, score-0.372]
20 Of considerable inspiration to the research in this paper was Gravano and Hirschberg’s (2009) analysis of their games corpus, which showed that it was possible for turn-yielding cues to be identified in an HH corpus. [sent-28, score-0.433]
21 Since our work is similar to that done by Gravano and Hirschberg (2009), we hypothesize that turn-yielding cues can also be identified in our HH tutoring corpus. [sent-31, score-0.773]
22 However, it is possible that the cues identified will be very different, due to factors specific to a tutoring environment. [sent-32, score-0.719]
23 These include, but are not limited to, status differences between the student and tutor, engagement of the student, and the different goals of the student and tutor. [sent-33, score-0.308]
24 Our hypothesis is that for certain prosodic features, there will be a significant difference between places where students yield their turn (allow the tutor to speak) and places where they hold it (continue talking). [sent-34, score-0.97]
25 This would designate these features as turn-taking cues, and would allow them to be used as features in a turn-taking model for a spoken dialogue system in the future. [sent-35, score-0.377]
26 3 Method The data for this analysis is from an HH tutoring corpus recorded during the 2002-2003 school year. [sent-36, score-0.315]
27 This is an audio corpus of 17 university students, all native Standard English speakers, working with a tutor (the same for all subjects) on physics problems (Litman et. [sent-37, score-0.245]
28 Both the student and the tutor were sitting in front of separate work stations, so they could communicate only through microphones or, in the case of a studentwritten essay, through the shared computer environment. [sent-40, score-0.366]
29 Any potential turn-taking cues that the tutor received from the student were very compa95 rable to what a spoken dialogue system would have to analyze during a user interaction. [sent-41, score-1.08]
30 For each participant, student speech was isolated and segmented into breath groups. [sent-42, score-0.745]
31 A breath group is defined as any segment of speech by one dialogue participant bounded by 200 ms of silence or more based on a certain threshold of intensity (Liscombe et. [sent-43, score-1.097]
32 This break-down allowed for feature measurement and comparison at places that were and were not turn boundaries. [sent-46, score-0.175]
33 Although Gravano and Hirschberg (2009) segmented their corpus by 50 ms of silence, we used 200 ms to divide the breath groups, as this data had already been calculated for another experiment done with the HH corpus (Liscombe et. [sent-47, score-0.679]
34 Figure 1 is a diagram of a hypothetical conversation between two participants, with examples of HOLD’s and YIELD’s labeled. [sent-52, score-0.051]
35 These groups were determined strictly by time and not by the actually speech being spoken. [sent-53, score-0.095]
36 Speech acts such as backchannels, then, would be included in the YIELD group if they were spoken during clear gaps in the tutor’s speech, but would be placed in the OVERLAP group if they occurred during or overlapping with tutor speech. [sent-54, score-0.494]
37 Four prosodic features were calculated for each breath group: duration, pitch, RMS, and percent silence. [sent-56, score-0.736]
38 Duration is the length of the breath group in seconds. [sent-57, score-0.589]
39 Pitch is the mean fundamental frequency (f0) of the speech. [sent-58, score-0.066]
40 RMS (the root mean 1 Many thanks to the researchers at Columbia University for providing the breath group data for this corpus. [sent-59, score-0.655]
41 82S907 1 YHIOELS Di gnG irf oic u anp cM e a n 1 N7 Tab*ldepu 10r2=. [sent-66, score-0.066]
42 S50 1 * denotes a significant p value squared amplitude) is the energy or loudness. [sent-74, score-0.076]
43 Percent silence was the amount of internal silence within the breath group. [sent-75, score-0.74]
44 For pitch and RMS, the mean was taken over the length of the breath group. [sent-76, score-0.708]
45 These features were used because they are similar to those used by Gravano and Hirschberg (2009), and are already used in the spoken dialogue system we will be using (Forbes-Riley and Litman, 2011). [sent-77, score-0.355]
46 While only a small set of features is examined here, future work will include expanding the feature set. [sent-78, score-0.022]
47 Mean values for each feature for HOLD’s and YIELD’s were calculated and compared using the student T-test in SPSS Statistics software. [sent-79, score-0.177]
48 Two separate tests were done, one to compare the means for each student individually, and one to compare the means across all students. [sent-80, score-0.154]
49 The p-values given are the probability of obtaining the difference between groups by chance. [sent-83, score-0.044]
50 These individual results indicated that while turn-taking cues could be identified, there was much variation between students. [sent-86, score-0.398]
51 Table 1 displays the results of the analysis for one subject, student 111. [sent-87, score-0.154]
52 For this student, all four prosodic features are turn-taking cues, as there is a significant different between the HOLD and YIELD groups for all of them. [sent-88, score-0.255]
53 As shown in Table 3, mul96 tiple significant cues could be identified for most students, and there was only one which appeared to have no significant turn-yielding cues. [sent-90, score-0.49]
54 In this analysis, duration, pitch, and RMS were all found to be significant cues. [sent-92, score-0.032]
55 A more detailed look at each of the three significant cues is done below. [sent-95, score-0.438]
56 oCNfu esmberofStuNdeSuntm sd9601 bwe nirthsof Significant Cues Duration: The mean duration for HOLD’s is longer than the mean duration for YIELD’s. [sent-97, score-0.304]
57 This suggests that students speak for a longer uninterrupted time when they are trying to hold their turn, and yield their turns with shorter utterances. [sent-98, score-0.386]
58 Pitch: The mean pitch for YIELD’s is higher than the mean pitch for HOLD’s. [sent-100, score-0.42]
59 During tutoring, students are possibly more uncertain, which may raise the mean pitch of the YIELD breath groups. [sent-103, score-0.79]
60 RMS: The mean RMS, or energy, for HOLD’s is higher than the mean energy for YIELD’s. [sent-104, score-0.176]
61 This is consistent with student’s speaking more softly, i. [sent-105, score-0.058]
62 This is consistent with the results from the Columbia games corpus (Gravano and Hirshberg, 2009). [sent-108, score-0.029]
63 2 Combining Cues Gravano and Hirschberg (2009) were able to show using their cues and corpus that there is a positive relationship between the number of turn-yielding cues present and the probability of a turn actually being taken. [sent-110, score-0.891]
64 This suggests that in order to make sure that the other participant is aware whether the turn is going to continue or end, the speaker may subconsciously give them more information through multiple cues. [sent-111, score-0.185]
65 To see whether this relationship existed in our data, each breath group was marked with a binary value for each significant cue, representing whether the cue was present or not present within that breath group. [sent-112, score-1.188]
66 A cue was considered present if the value for that breath group was strictly closer to the student’s mean for YIELD’s than HOLD’s. [sent-113, score-0.718]
67 The number of cues present for each breath group was totaled. [sent-114, score-0.963]
68 Only the three cues found to be significant cues were used for these calculations. [sent-115, score-0.78]
69 For each number of cues possible x (0 to 3, inclusively), the probability of the turn being taken was calculated by p(x) = Y / T, where Y is the number of YIELD’s with x cues present, and T is the total number of breath groups with x cues present. [sent-116, score-1.801]
70 Probability of YIELD According to these results, a positive relationship seems to exist for these cues and this corpus. [sent-120, score-0.403]
71 The number of cues present and probability of a turn yield is strongly correlated (r = . [sent-122, score-0.641]
72 A regression analysis done using SPSS showed that the adjusted r2 = . [sent-125, score-0.032]
73 When no turn-yielding cues are present, there is still a majority chance that the student will yield their turn; however, this is understandable due to the small number of cues being analyzed. [sent-128, score-1.055]
74 Regardless, this gives a very preliminary support for the idea that it is possible to predict when a turn will be taken based on the number of cues present. [sent-129, score-0.488]
75 5 Conclusions This paper presented preliminary work in using an HH tutoring corpus to construct a turn-taking model that can later be implemented in a spoken dialogue system. [sent-130, score-0.67]
76 A small set of prosodic features was used to try and identify turn-taking cues by comparing their values at places where students yielded their turn to the tutor and places where they held it. [sent-131, score-1.061]
77 Results show that turn-taking cues such as those investigated can be identified for the corpus, and may hold predictive ability for turn boundaries. [sent-132, score-0.638]
78 While this work uncovers some interesting results in the tutoring domain, there are some shortcomings in the method that may make it difficult to effectively evaluate the results. [sent-135, score-0.315]
79 As the breath group is different from the segment used in Gravano and Hirschberg’s (2009) experiment, and the set of prosodic features is smaller, direct comparison becomes quite difficult. [sent-136, score-0.768]
80 The differences between the two methods provide enough doubt for the results to truly be interpreted as contradictory. [sent-137, score-0.029]
81 Thus the first line of future inquiry is to redo this method using a smaller silence boundary (50 ms) and different set of prosodic features so that it is truly comparable to Gravano and Hirschberg’s (2009) work with the game corpus. [sent-138, score-0.329]
82 This could yield interesting discoveries in the differences between the two corpora, shedding light on phenomena that are particular to tutoring scenarios. [sent-139, score-0.49]
83 Perhaps direct comparison is not entirely necessary, and instead this work should be considered an isolated look at an HH corpus that provides insight into turn-taking, specifically in tutoring and other domains with unequal power levels. [sent-144, score-0.388]
84 Future work in this direction would include growing the set of features by adding more prosodic ones and introducing lexical ones such as bi-grams and uni-grams. [sent-145, score-0.179]
85 Already, work has been done to investigate the features used in the INTERSPEECH 2009 Emotion Challenge using openSMILE (Eyben et. [sent-146, score-0.054]
86 When a large feature bank has been developed, significant cues will be used in conjunction with machine learning techniques to build a model for turntaking which can be implemented in a spoken dialogue tutoring system. [sent-149, score-1.12]
87 The goal would be to learn more about human turn-taking while seeing if better turn-taking by a computer tutor ultimately leads to increased student learning in an intelligent tutoring system. [sent-150, score-0.7]
88 Iwould like to thank Diane Litman, my advisor, Scott Silliman, for software assistance, Joanna Drummond, for many helpful comments on this paper, and the ITSPOKE research group for their feedback on my work. [sent-152, score-0.091]
89 Some signals and rules for taking speaking turns in conversations. [sent-168, score-0.058]
90 Root causes of lost time and user stress in a simple dialog system. [sent-226, score-0.055]
wordName wordTfidf (topN-words)
[('breath', 0.498), ('cues', 0.374), ('tutoring', 0.315), ('dialogue', 0.211), ('tutor', 0.19), ('prosodic', 0.157), ('gravano', 0.157), ('student', 0.154), ('yield', 0.153), ('hirschberg', 0.147), ('litman', 0.144), ('pitch', 0.144), ('rms', 0.132), ('spoken', 0.122), ('silence', 0.121), ('hold', 0.12), ('turn', 0.114), ('hh', 0.109), ('itspoke', 0.1), ('group', 0.091), ('duration', 0.086), ('students', 0.082), ('diane', 0.08), ('participant', 0.071), ('liscombe', 0.066), ('mean', 0.066), ('places', 0.061), ('speaking', 0.058), ('conversation', 0.051), ('clemens', 0.05), ('cutler', 0.05), ('eyben', 0.05), ('friedberg', 0.05), ('opensmile', 0.05), ('silliman', 0.05), ('spss', 0.05), ('ms', 0.045), ('groups', 0.044), ('energy', 0.044), ('turntaking', 0.044), ('unequal', 0.044), ('raux', 0.044), ('selfridge', 0.044), ('interspeech', 0.042), ('intelligent', 0.041), ('cue', 0.04), ('segmented', 0.036), ('kate', 0.036), ('sigdial', 0.036), ('percent', 0.036), ('done', 0.032), ('intensity', 0.032), ('scott', 0.032), ('significant', 0.032), ('physics', 0.031), ('speak', 0.031), ('identified', 0.03), ('games', 0.029), ('relationship', 0.029), ('user', 0.029), ('isolated', 0.029), ('truly', 0.029), ('speech', 0.028), ('dialog', 0.026), ('julia', 0.026), ('participants', 0.025), ('overlap', 0.025), ('discourse', 0.024), ('audio', 0.024), ('variation', 0.024), ('ward', 0.024), ('calculated', 0.023), ('pittsburgh', 0.023), ('columbia', 0.023), ('strictly', 0.023), ('uncertainty', 0.022), ('hypothesize', 0.022), ('vanlehn', 0.022), ('microphones', 0.022), ('tiple', 0.022), ('hopes', 0.022), ('drummond', 0.022), ('joanna', 0.022), ('rivera', 0.022), ('nigel', 0.022), ('amplitude', 0.022), ('turnyielding', 0.022), ('anp', 0.022), ('discoveries', 0.022), ('oic', 0.022), ('certainness', 0.022), ('ethan', 0.022), ('bhembe', 0.022), ('dumisizwe', 0.022), ('existent', 0.022), ('bidding', 0.022), ('irf', 0.022), ('optical', 0.022), ('features', 0.022), ('implemented', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
2 0.19791661 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue
Author: Kristy Boyer ; Joseph Grafsgaard ; Eun Young Ha ; Robert Phillips ; James Lester
Abstract: Dialogue act classification is a central challenge for dialogue systems. Although the importance of emotion in human dialogue is widely recognized, most dialogue act classification models make limited or no use of affective channels in dialogue act classification. This paper presents a novel affect-enriched dialogue act classifier for task-oriented dialogue that models facial expressions of users, in particular, facial expressions related to confusion. The findings indicate that the affectenriched classifiers perform significantly better for distinguishing user requests for feedback and grounding dialogue acts within textual dialogue. The results point to ways in which dialogue systems can effectively leverage affective channels to improve dialogue act classification. 1
3 0.19284318 118 acl-2011-Entrainment in Speech Preceding Backchannels.
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
Author: Fabrizio Morbini ; Kenji Sagae
Abstract: Individual utterances often serve multiple communicative purposes in dialogue. We present a data-driven approach for identification of multiple dialogue acts in single utterances in the context of dialogue systems with limited training data. Our approach results in significantly increased understanding of user intent, compared to two strong baselines.
5 0.15276414 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
Author: Je Hun Jeon ; Wen Wang ; Yang Liu
Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.
6 0.15002757 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
7 0.14697741 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
8 0.12813532 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
9 0.11837313 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
10 0.11482646 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations
11 0.10713279 227 acl-2011-Multimodal Menu-based Dialogue with Speech Cursor in DICO II+
12 0.09596923 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
13 0.088168502 50 acl-2011-Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
14 0.075754888 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
15 0.072929487 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
16 0.050458558 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
17 0.04894872 248 acl-2011-Predicting Clicks in a Vocabulary Learning System
18 0.045815255 177 acl-2011-Interactive Group Suggesting for Twitter
19 0.045750614 149 acl-2011-Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation
20 0.045672107 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
topicId topicWeight
[(0, 0.095), (1, 0.042), (2, -0.025), (3, 0.02), (4, -0.251), (5, 0.257), (6, -0.026), (7, -0.037), (8, -0.002), (9, 0.023), (10, 0.058), (11, -0.0), (12, 0.042), (13, 0.004), (14, 0.051), (15, -0.001), (16, -0.028), (17, -0.038), (18, 0.01), (19, -0.06), (20, 0.006), (21, -0.039), (22, -0.044), (23, 0.091), (24, 0.028), (25, 0.065), (26, 0.082), (27, -0.007), (28, -0.052), (29, 0.014), (30, -0.068), (31, 0.023), (32, 0.072), (33, -0.023), (34, -0.036), (35, -0.017), (36, 0.051), (37, -0.007), (38, -0.013), (39, 0.033), (40, -0.042), (41, -0.027), (42, 0.079), (43, 0.019), (44, 0.045), (45, 0.064), (46, -0.024), (47, 0.032), (48, 0.053), (49, 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.95606434 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
2 0.93246263 118 acl-2011-Entrainment in Speech Preceding Backchannels.
Author: Rivka Levitan ; Agustin Gravano ; Julia Hirschberg
Abstract: In conversation, when speech is followed by a backchannel, evidence of continued engagement by one’s dialogue partner, that speech displays a combination of cues that appear to signal to one’s interlocutor that a backchannel is appropriate. We term these cues backchannel-preceding cues (BPC)s, and examine the Columbia Games Corpus for evidence of entrainment on such cues. Entrainment, the phenomenon of dialogue partners becoming more similar to each other, is widely believed to be crucial to conversation quality and success. Our results show that speaking partners entrain on BPCs; that is, they tend to use similar sets of BPCs; this similarity increases over the course of a dialogue; and this similarity is associated with measures of dialogue coordination and task success. 1
3 0.6972844 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue
Author: Kristy Boyer ; Joseph Grafsgaard ; Eun Young Ha ; Robert Phillips ; James Lester
Abstract: Dialogue act classification is a central challenge for dialogue systems. Although the importance of emotion in human dialogue is widely recognized, most dialogue act classification models make limited or no use of affective channels in dialogue act classification. This paper presents a novel affect-enriched dialogue act classifier for task-oriented dialogue that models facial expressions of users, in particular, facial expressions related to confusion. The findings indicate that the affectenriched classifiers perform significantly better for distinguishing user requests for feedback and grounding dialogue acts within textual dialogue. The results point to ways in which dialogue systems can effectively leverage affective channels to improve dialogue act classification. 1
4 0.68162912 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations
Author: Anna Margolis ; Mari Ostendorf
Abstract: We investigate the use of textual Internet conversations for detecting questions in spoken conversations. We compare the text-trained model with models trained on manuallylabeled, domain-matched spoken utterances with and without prosodic features. Overall, the text-trained model achieves over 90% of the performance (measured in Area Under the Curve) of the domain-matched model including prosodic features, but does especially poorly on declarative questions. We describe efforts to utilize unlabeled spoken utterances and prosodic features via domain adaptation.
5 0.67559338 228 acl-2011-N-Best Rescoring Based on Pitch-accent Patterns
Author: Je Hun Jeon ; Wen Wang ; Yang Liu
Abstract: In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount ofdata and has high error rate; the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3%. This gain is consistent across the two different tests, showing promising future directions of incorporating prosodic information to improve speech recognition.
7 0.65700549 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
8 0.64742535 83 acl-2011-Contrasting Multi-Lingual Prosodic Cues to Predict Verbal Feedback for Rapport
9 0.62176484 227 acl-2011-Multimodal Menu-based Dialogue with Speech Cursor in DICO II+
10 0.61262137 95 acl-2011-Detection of Agreement and Disagreement in Broadcast Conversations
11 0.59578514 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
12 0.58925128 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
13 0.57173872 223 acl-2011-Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts
14 0.39678276 77 acl-2011-Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
15 0.37522218 252 acl-2011-Prototyping virtual instructors from human-human corpora
16 0.37111157 226 acl-2011-Multi-Modal Annotation of Quest Games in Second Life
17 0.35875347 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search
18 0.35581204 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style
19 0.34665573 249 acl-2011-Predicting Relative Prominence in Noun-Noun Compounds
20 0.3396813 301 acl-2011-The impact of language models and loss functions on repair disfluency detection
topicId topicWeight
[(5, 0.031), (17, 0.079), (26, 0.017), (37, 0.051), (39, 0.018), (41, 0.119), (55, 0.019), (59, 0.031), (72, 0.03), (91, 0.029), (96, 0.124), (98, 0.367)]
simIndex simValue paperId paperTitle
1 0.80721146 22 acl-2011-A Probabilistic Modeling Framework for Lexical Entailment
Author: Eyal Shnarch ; Jacob Goldberger ; Ido Dagan
Abstract: Recognizing entailment at the lexical level is an important and commonly-addressed component in textual inference. Yet, this task has been mostly approached by simplified heuristic methods. This paper proposes an initial probabilistic modeling framework for lexical entailment, with suitable EM-based parameter estimation. Our model considers prominent entailment factors, including differences in lexical-resources reliability and the impacts of transitivity and multiple evidence. Evaluations show that the proposed model outperforms most prior systems while pointing at required future improvements. 1 Introduction and Background Textual Entailment was proposed as a generic paradigm for applied semantic inference (Dagan et al., 2006). This task requires deciding whether a tex- tual statement (termed the hypothesis-H) can be inferred (entailed) from another text (termed the textT). Since it was first introduced, the six rounds of the Recognizing Textual Entailment (RTE) challenges1 , currently organized under NIST, have become a standard benchmark for entailment systems. These systems tackle their complex task at various levels of inference, including logical representation (Tatu and Moldovan, 2007; MacCartney and Manning, 2007), semantic analysis (Burchardt et al., 2007) and syntactic parsing (Bar-Haim et al., 2008; Wang et al., 2009). Inference at these levels usually 1http://www.nist.gov/tac/2010/RTE/index.html 558 requires substantial processing and resources (e.g. parsing) aiming at high performance. Nevertheless, simple entailment methods, performing at the lexical level, provide strong baselines which most systems did not outperform (Mirkin et al., 2009; Majumdar and Bhattacharyya, 2010). Within complex systems, lexical entailment modeling is an important component. Finally, there are cases in which a full system cannot be used (e.g. lacking a parser for a targeted language) and one must resort to the simpler lexical approach. While lexical entailment methods are widely used, most of them apply ad hoc heuristics which do not rely on a principled underlying framework. Typically, such methods quantify the degree of lexical coverage of the hypothesis terms by the text’s terms. Coverage is determined either by a direct match of identical terms in T and H or by utilizing lexical semantic resources, such as WordNet (Fellbaum, 1998), that capture lexical entailment relations (denoted here as entailment rules). Common heuristics for quantifying the degree of coverage are setting a threshold on the percentage coverage of H’s terms (Majumdar and Bhattacharyya, 2010), counting absolute number of uncovered terms (Clark and Harrison, 2010), or applying an Information Retrievalstyle vector space similarity score (MacKinlay and Baldwin, 2009). Other works (Corley and Mihalcea, 2005; Zanzotto and Moschitti, 2006) have applied a heuristic formula to estimate the similarity between text fragments based on a similarity function between their terms. These heuristics do not capture several important aspects of entailment, such as varying reliability of Proceedings ofP thoer t4l9atnhd A, Onrnuegaoln M,e Jeuntineg 19 o-f2 t4h,e 2 A0s1s1o.c?i ac t2io0n11 fo Ar Cssoocmiaptuiotanti foonra Clo Lminpguutiast i ocns:aslh Loirntpgaupisetrics , pages 558–563, entailment resources and the impact of rule chaining and multiple evidence on entailment likelihood. An additional observation from these and other systems is that their performance improves only moderately when utilizing lexical resources2. We believe that the textual entailment field would benefit from more principled models for various entailment phenomena. Inspired by the earlier steps in the evolution of Statistical Machine Translation methods (such as the initial IBM models (Brown et al., 1993)), we formulate a concrete generative probabilistic modeling framework that captures the basic aspects of lexical entailment. Parameter estimation is addressed by an EM-based approach, which enables estimating the hidden lexical-level entailment parameters from entailment annotations which are available only at the sentence-level. While heuristic methods are limited in their ability to wisely integrate indications for entailment, probabilistic methods have the advantage of being extendable and enabling the utilization of wellfounded probabilistic methods such as the EM algorithm. We compared the performance of several model variations to previously published results on RTE data sets, as well as to our own implementation of typical lexical baselines. Results show that both the probabilistic model and our percentagecoverage baseline perform favorably relative to prior art. These results support the viability of the probabilistic framework while pointing at certain modeling aspects that need to be improved. 2 Probabilistic Model Under the lexical entailment scope, our modeling goal is obtaining a probabilistic score for the likelihood that all H’s terms are entailed by T. To that end, we model prominent aspects of lexical entailment, which were mostly neglected by previous lexical methods: (1) distinguishing different reliability levels of lexical resources; (2) allowing transitive chains of rule applications and considering their length when estimating their validity; and (3) considering multiple entailments when entailing a term. 2See ablation tests reports in http://aclweb.org/aclwiki/ index.php?title=RTE Knowledge Resources#Ablation Tests 559 Figure 1: The generative process of entailing terms of a hypothesis from a text. Edges represent entailment rules. There are 3 evidences for the entailment of hi :a rule from Resource1 , another one from Resource3 both suggesting that tj entails it, and a chain from t1through an intermediate term t0. 2.1 Model Description For T to entail H it is usually a necessary, but not sufficient, that every term h ∈ H would be entsauiflefidci by ,at t hleatast e one t teerrmm mt h ∈ ∈T (Glickman eet al., 2006). Figure s1t odneescr tiebrmes tth ∈e process komf entailing hypothesis terms. The trivial case is when identical terms, possibly at the stem or lemma level, appear in T and H (a direct match as tn and hm in Figure 1). Alternatively, we can establish entailment based on knowledge of entailing lexical-semantic relations, such as synonyms, hypernyms and morphological derivations, available in lexical resources (e.g the rule inference → reasoning from WordNet). (We.eg d theneo rutel by R(r) cthee → resource nwgh ficroh provided teht)e. rule r. Since entailment is a transitive relation, rules may compose transitive chains that connect a term t ∈ T ctoo a pteosrme rha ∈ Hive through hinatte cromnendeicatte a tteerrmms. t ∈Fo Tr instance, fr hom ∈ t hHe r thurleosu infer → inference armnds inference → reasoning we can dre →duc inef tehreen rcuele a infer → reasoning (were inference dise dthuec ein thteerm rueldeia intef trer →m as t0 in Figure 1). Multiple chains may connect t to h (as for tj and hi in Figure 1) or connect several terms in T to h (as t1 and tj are indicating the entailment of hi in Figure 1), thus providing multiple evidence for h’s entailment. It is reasonable to expect that if a term t indeed entails a term h, it is likely to find evidences for this relation in several resources. Taking a probabilistic perspective, we assume a parameter θR for each resource R, denoting its reliability, i.e. the prior probability that applying a rule from R corresponds to a valid entailment instance. Direct matches are considered as a special “resource”, called MATCH, for which θMATCH is expected to be close to 1. We now present our probabilistic model. For a text term t ∈ T to entail a hypothesis term h by a tcehxatin te c, mde tn ∈ote Td by etn →tcai h, thhyep application mof h every r ∈ c must be valid. N −→ote h ,t thhaet a pruplleic r i onn a cfh eaviner c rco ∈nne cc mtsu tswt ob ete vramlisd (its oleteft t-hhaatnd a- rsuildee ran ind aits c righthand-side, denoted lhs → rhs). The lhs of the first rhualned i-ns c eis, td ∈ oTte adn ldh sth →e r rhhss )o.f T Tthhee l lahsts r oufle t hine ifitr sist rhu ∈ iHn. c W ise t d ∈en Tote a nthde t event so fo a vhael ilda rtu rluel applicathio ∈n by l Whse →dren orhtes. t Sei envceen a-priori a d ru rluel r aips pvliacliadwith probability θR(r) , ancnde assuming independence of all r ∈ c, we obtain Eq. 1 to specify the probability rof ∈ ∈th ce, weveen otb tt i→cn Ehq. Next, pleetc C(h) ede pnroobtethe set of chains which− → suggest txhte, leentt Cail(mhe)n dt eonfo hte. The probability that T does not entail h at all (by any chain), specified in Eq. 2, is the probability that all these chains are not valid. Finally, the probability that T entails all of H, assuming independence of H’s terms, is the probability that every h ∈ H is entailed, as given ien p Eq. a3b. Nityot tihceat t ehvaet yth here ∈ c oHul ids be a term h which is not covered by any available rule chain. Under this formulation, we assume that each such h is covered by a single rule coming from a special “resource” called UNCOVERED (expecting θUNCOVERED to be relatively small). p(t −→c h) = Yp(lhs →r rhs) = Yr∈c p(T 9 h) = Y YθR(r)(1) Yr∈c [1 − p(t− →c h)] (2) c∈YC(h) p(T → H) = Y p(T → h) (3) hY∈H As can be seen, our model indeed distinguishes varying resource reliability, decreases entailment probability as rule chains grow and increases it when entailment of a term is supported by multiple chains. The above treatment of uncovered terms in H, as captured in Eq. 3, assumes that their entailment probability is independent of the rest of the hypothesis. However, when the number of covered hypothesis terms increases the probability that the remaining terms are actually entailed by T increases too 560 (even though we do not have supporting knowledge for their entailment). Thus, an alternative model is to group all uncovered terms together and estimate the overall probability of their joint entailment as a function of the lexical coverage of the hypothesis. We denote Hc as the subset of H’s terms which are covered by some rule chain and Huc as the remaining uncovered part. Eq. 3a then provides a refined entailment model for H, in which the second term specifies the probability that Huc is entailed given that Hc is validly entailed and the corresponding lengths: p(T→H) = [Yp(T→h)]·p(T→Huc hY∈Hc 2.2 | |Hc|,|H|) (3a) Parameter Estimation The difficulty in estimating the θR values is that these are term-level parameters while the RTEtraining entailment annotation is given for the sentence-level. Therefore, we use EM-based estimation for the hidden parameters (Dempster et al., 1977). In the E step we use the current θR values to compute all whcr (T, H) values for each training pair. whcr (T, H) stands for the posterior probability that application of the rule r in the chain c for h ∈ H tish valid, given nth oaft heieth reurl eT r e innta thiles c Hha or not ra hcc ∈ord Hing to the training annotation (see Eq. 4). Remember that a rule r provides an entailment relation between its left-hand-side (lhs) and its right-hand-side (rhs). Therefore Eq. 4 uses the notation lhs →r rhs to designate the application of the rule r (similar htos Eq. 1). wEhc:r(T,H)= p (lTh9→sH−→ |rlhsrp−→ rh(Tsr→9|hTsH )9→p(lhHs−→ r) =hs)if(4T)9→H After applying Bayes’ rule we get a fraction with Eq. 3 in its denominator and θR(r) as the second term of the numerator. The first numerator term is defined as in Eq. 3 except that for the corresponding rule application we substitute θR(r) by 1(per the conditioning event). The probabilistic model defined by Eq. 1-3 is a loop-free directed acyclic graphical model (aka a Bayesian network). Hence the E-step probabilities can be efficiently calculated using the belief propagation algorithm (Pearl, 1988). The M step uses Eq. 5 to update the parameter set. For each resource R we average the whcr (T, H) val- ues for all its rule applications in the training, whose total number is denoted nR. M : θR=n1RTX,HhX∈Hc∈XC(h)r∈c|RX(r)=wRhcr(T,H) (5) For Eq. 3a we need to estimate also p(T→Huc | |Hc| ,|H|). 3Tah iws eis n ndeoende t directly avteia a amlsaoxi pm(Tu→m Hlikeli-| |hHoo|d, eHst|i)m.a Tthioins over tehe d training set, by calculating the proportion of entailing examples within the set of all examples of a given hypothesis length (|H|) aonfd a a given lneusm ofbe ar goifv ecnov heyrepdo hteersmiss (|Hc|). HA|)s |Hc| we tvaekne tnhuem nbuemrb oefr ocofv videerendtic taelr mtesrm (|sH in| )T. a Ands |HH (exact match) suinmcbee irn o afl imdeonstti caall cases itner Tms a nind H which have an exact match in T are indeed entailed. We also tried initializing the EM algorithm with these direct estimations but did not obtain performance improvements. 3 Evaluations and Results The 5th Recognizing Textual Entailment challenge (RTE-5) introduced a new search task (Bentivogli et al., 2009) which became the main task in RTE6 (Bentivogli et al., 2010). In this task participants should find all sentences that entail a given hypothesis in a given document cluster. This task’s data sets reflect a natural distribution of entailments in a corpus and demonstrate a more realistic scenario than the previous RTE challenges. In our system, sentences are tokenized and stripped of stop words and terms are lemmatized and tagged for part-of-speech. As lexical resources we use WordNet (WN) (Fellbaum, 1998), taking as entailment rules synonyms, derivations, hyponyms and meronyms of the first senses of T and H terms, and the CatVar (Categorial Variation) database (Habash and Dorr, 2003). We allow rule chains of length up to 4 in WordNet (WN4). We compare our model to two types of baselines: (1) RTE published results: the average of the best runs of all systems, the best and second best performing lexical systems and the best full system of each challenge; (2) our implementation of lexical 561 coverage model, tuning the percentage-of-coverage threshold for entailment on the training set. This model uses the same configuration as ourprobabilistic model. We also implemented an Information Re- trieval style baseline3 (both with and without lexical expansions), but given its poorer performance we omit its results here. Table 1 presents the results. We can see that both our implemented models (probabilistic and coverage) outperform all RTE lexical baselines on both data sets, apart from (Majumdar and Bhattacharyya, 2010) which incorporates additional lexical resources, a named entity recognizer and a co-reference system. On RTE-5, the probabilistic model is comparable in performance to the best full system, while the coverage model achieves considerably better results. We notice that our implemented models successfully utilize resources to increase performance, as opposed to typical smaller or less consistent improvements in prior works (see Section 1). ModelRTE-5F1%RTE-6 ERT2b avne sgdst.b floeu fsxl taic slyealxs tisyceysmastlemesyms tem4 3504 . 36.4531 4 34873. 0 .68254 evrcagon+ o CW raeN tsVo4a+urCcaetVr43479685. 25384 4534. 5817 Tabspticrlaoe1:+ Envo CW arlueN tasV4oi+urnCcaetsVularonRTE-5and4 R521 T. 80 E-6.RT4 s25 y. s9635t1ems (1)(MacKinlay and Baldwin, 2009), (2)(Clark and Harrison, 2010), (3)(Mirkin et al., 2009)(2 submitted runs), (4)(Majumdar and Bhattacharyya, 2010) and (5)(Jia et al., 2010). are: While the probabilistic and coverage models are comparable on RTE-6 (with non-significant advantage for the former), on RTE-5 the latter performs 3Utilizing Lucene search engine (http://lucene.apache.org) better, suggesting that the probabilistic model needs to be further improved. In particular, WN4 performs better than the single-step WN only on RTE-5, suggesting the need to improve the modeling of chain- ing. The fluctuations over the data sets and impacts of resources suggest the need for further investigation over additional data sets and resources. As for the coverage model, under our configuration it poses a bigger challenge for RTE systems than perviously reported baselines. It is thus proposed as an easy to implement baseline for future entailment research. 4 Conclusions and Future Work This paper presented, for the first time, a principled and relatively rich probabilistic model for lexical entailment, amenable for estimation of hidden lexicallevel parameters from standard sentence-level annotations. The positive results of the probabilistic model compared to prior art and its ability to exploit lexical resources indicate its future potential. Yet, further investigation is needed. For example, analyzing current model’s limitations, we observed that the multiplicative nature of eqs. 1and 3 (reflecting independence assumptions) is too restrictive, resembling a logical AND. Accordingly we plan to explore relaxing this strict conjunctive behavior through models such as noisy-AND (Pearl, 1988). We also intend to explore the contribution of our model, and particularly its estimated parameter values, within a complex system that integrates multiple levels of inference. Acknowledgments This work was partially supported by the NEGEV Consortium of the Israeli Ministry of Industry, Trade and Labor (www.negev-initiative.org), the PASCAL-2 Network of Excellence of the European Community FP7-ICT-2007-1-216886, the FIRBIsrael research project N. RBIN045PXH and by the Israel Science Foundation grant 1112/08. References Roy Bar-Haim, Jonathan Berant, Ido Dagan, Iddo Greental, Shachar Mirkin, Eyal Shnarch, and Idan Szpektor. 2008. Efficient semantic deduction and approximate matching over compact parse forests. In Proceedings of Text Analysis Conference (TAC). 562 Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of Text Analysis Conference (TAC). Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proceedings of Text Analysis Conference (TAC). Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 11, June. Aljoscha Burchardt, Nils Reiter, Stefan Thater, and Anette Frank. 2007. A semantic approach to textual entailment: System evaluation and task analysis. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Peter Clark and Phil Harrison. 2010. BLUE-Lite: a knowledge-based lexical entailment system for RTE6. In Proceedings of Text Analysis Conference (TAC). Courtney Corley and Rada Mihalcea. 2005. Measuring the semantic similarity of texts. In Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, se- ries [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press. Oren Glickman, Eyal Shnarch, and Ido Dagan. 2006. Lexical reference: a semantic matching subtask. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 172–179. Association for Computational Linguistics. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for english. In Proceedings of the North American Association for Computational Linguistics. Houping Jia, Xiaojiang Huang, Tengfei Ma, Xiaojun Wan, and Jianguo Xiao. 2010. PKUTM participation at TAC 2010 RTE and summarization track. In Proceedings of Text Analysis Conference (TAC). Bill MacCartney and Christopher D. Manning. 2007. Natural logic for textual inference. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. Andrew MacKinlay and Timothy Baldwin. 2009. A baseline approach to the RTE5 search pilot. In Proceedings of Text Analysis Conference (TAC). Debarghya Majumdar and Pushpak Bhattacharyya. 2010. Lexical based text entailment system for main task of RTE6. In Proceedings of Text Analysis Conference (TAC). Mirkin, Roy Bar-Haim, Jonathan Berant, Ido Eyal Shnarch, Asher Stern, and Idan Szpektor. 2009. Addressing discourse and document structure in the RTE search task. In Proceedings of Text Analysis Conference (TAC). Judea Pearl. 1988. Probabilistic reasoning in intelligent systems: networks ofplausible inference. Morgan Kaufmann. Marta Tatu and Dan Moldovan. 2007. COGEX at RTE 3. In Proceedings of the ACL-PASCAL Workshop on Shachar Dagan, Textual Entailment and Paraphrasing. Rui Wang, Yi Zhang, and Guenter Neumann. 2009. A joint syntactic-semantic representation for recognizing textual relatedness. In Proceedings of Text Analysis Conference (TAC). Fabio Massimo Zanzotto and Alessandro Moschitti. 2006. Automatic learning of textual entailments with cross-pair similarities. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 563
same-paper 2 0.76801276 312 acl-2011-Turn-Taking Cues in a Human Tutoring Corpus
Author: Heather Friedberg
Abstract: Most spoken dialogue systems are still lacking in their ability to accurately model the complex process that is human turntaking. This research analyzes a humanhuman tutoring corpus in order to identify prosodic turn-taking cues, with the hopes that they can be used by intelligent tutoring systems to predict student turn boundaries. Results show that while there was variation between subjects, three features were significant turn-yielding cues overall. In addition, a positive relationship between the number of cues present and the probability of a turn yield was demonstrated. 1
3 0.65866804 282 acl-2011-Shift-Reduce CCG Parsing
Author: Yue Zhang ; Stephen Clark
Abstract: CCGs are directly compatible with binarybranching bottom-up parsing algorithms, in particular CKY and shift-reduce algorithms. While the chart-based approach has been the dominant approach for CCG, the shift-reduce method has been little explored. In this paper, we develop a shift-reduce CCG parser using a discriminative model and beam search, and compare its strengths and weaknesses with the chart-based C&C; parser. We study different errors made by the two parsers, and show that the shift-reduce parser gives competitive accuracies compared to C&C.; Considering our use of a small beam, and given the high ambiguity levels in an automatically-extracted grammar and the amount of information in the CCG lexical categories which form the shift actions, this is a surprising result.
Author: Michael Mohler ; Razvan Bunescu ; Rada Mihalcea
Abstract: In this work we address the task of computerassisted assessment of short student answers. We combine several graph alignment features with lexical semantic similarity measures using machine learning techniques and show that the student answers can be more accurately graded than if the semantic measures were used in isolation. We also present a first attempt to align the dependency graphs of the student and the instructor answers in order to make use of a structural component in the automatic grading of student answers.
5 0.58545041 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models
Author: Ivan Vulic ; Wim De Smet ; Marie-Francine Moens
Abstract: A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus confirm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space. The best results, obtained by combining knowledge from wordtopic distributions with similarity measures in the original space, are also reported.
6 0.52707356 33 acl-2011-An Affect-Enriched Dialogue Act Classification Model for Task-Oriented Dialogue
7 0.52323699 112 acl-2011-Efficient CCG Parsing: A* versus Adaptive Supertagging
9 0.50897527 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
10 0.47835806 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework
11 0.46569529 118 acl-2011-Entrainment in Speech Preceding Backchannels.
12 0.46311188 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction
14 0.4600462 56 acl-2011-Bayesian Inference for Zodiac and Other Homophonic Ciphers
15 0.45956928 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
16 0.45776349 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
17 0.45663399 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing
18 0.45406696 94 acl-2011-Deciphering Foreign Language
19 0.45312479 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
20 0.45278367 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation