acl acl2010 acl2010-227 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell
Abstract: Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.
Reference: text
sentIndex sentText sentNum sentScore
1 The impact of interpretation problems on tutorial dialogue Myroslava O. [sent-1, score-0.64]
2 mi l l Abstract Supporting natural language input may improve learning in intelligent tutoring systems. [sent-10, score-0.287]
3 However, interpretation errors are unavoidable and require an effective recovery policy. [sent-11, score-0.424]
4 We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. [sent-12, score-1.203]
5 In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. [sent-13, score-0.593]
6 We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research. [sent-14, score-0.342]
7 1 Introduction There is a mounting body of evidence that student self-explanation and contentful talk in humanhuman tutorial dialogue are correlated with increased learning gain (Chi et al. [sent-15, score-1.016]
8 Thus, computer tutors that understand student explanations have the potential to improve student learning (Graesser et al. [sent-18, score-0.811]
9 However, understanding and correctly assessing the student’s contributions is a difficult problem due to the wide range of variation observed in student input, and especially due to students’ sometimes vague and incorrect use of domain terminology. [sent-23, score-0.522]
10 Many tutorial dialogue systems limit the range of student input by asking short-answer questions. [sent-24, score-0.774]
11 This provides a measure of robustness, and previous evaluations of ASR in spoken tutorial dialogue systems indicate that neither word error rate nor concept error rate in such systems affect learning gain (Litman and Forbes-Riley, 2005; Pon-Barry et al. [sent-25, score-0.793]
12 However, limiting the range of possible input limits the contentful talk that the students are expected to produce, and therefore may limit the overall effectiveness of the system. [sent-27, score-0.31]
13 Most ofthe existing tutoring systems that accept unrestricted language input use classifiers based on statistical text similarity measures to match student answers to open-ended questions with pre-authored anticipated answers (Graesser et al. [sent-28, score-0.862]
14 While such systems are robust to unexpected terminology, they provide only a very coarse-grained assessment of student answers. [sent-32, score-0.407]
15 Recent research aims to develop methods that produce detailed analyses of student input, including correct, incorrect and missing parts (Nielsen et al. [sent-33, score-0.478]
16 , 2008), because the more detailed assessments can help tailor tutoring to the needs of individual students. [sent-35, score-0.314]
17 While the detailed assessments of answers to open-ended questions are intended to improve potential learning, they also increase the probability of misunderstandings, which negatively impact tutoring and therefore negatively impact student learning (Jordan et al. [sent-36, score-1.193]
18 Thus, appropriate error recovery strategies are crucially important for tutorial dialogue applications. [sent-38, score-0.755]
19 We describe an evaluation of an implemented tutorial dialogue system which aims to accept unrestricted student input and limit misunderstandings by rejecting low confidence interpretations and employing a range oferror recovery strategies depending on the cause of interpretation failure. [sent-39, score-1.399]
20 By comparing two different system policies, we demonstrate that with less restricted language input the rate of non-understanding errors impacts both learning gain and user satisfaction, and that problems arising from incorrect use of terminology have a particularly negative impact. [sent-40, score-0.646]
21 At the same time, students appear to be aware that the system does not fully understand them even if it accepts their input without indicating that it is having interpretation problems, and this is reflected in decreased user satisfaction. [sent-45, score-0.548]
22 We argue that this indicates that we need better strategies for dealing with terminology problems, and that accepting non-standard terminology without explicitly addressing the difference in acceptable phrasing may not be sufficient for effective tutoring. [sent-46, score-0.564]
23 In Section 2 we describe our tutoring system, and the two tutoring policies implemented for the experiment. [sent-47, score-0.546]
24 Finally, in Section 4 we discuss the implications of our results for error recovery policies in tutorial dialogue systems. [sent-49, score-0.767]
25 , 2010), a tutorial dialogue system which provides tutoring in basic electricity and electronics. [sent-51, score-0.732]
26 BEETLE II uses a deep parser together with a domain-specific diagnoser to process student input, and a deep generator to produce tutorial feedback automatically depending on the current tutorial policy. [sent-53, score-0.781]
27 It also implements an error recovery policy to deal with interpretation problems. [sent-54, score-0.534]
28 While typing removes the uncertainty and errors involved in speech recognition, expected student answers are considerably more complex and varied than in a typical spoken dialogue system. [sent-56, score-0.823]
29 Therefore, a significant number of interpretation errors arise, primarily during the semantic interpretation process. [sent-57, score-0.406]
30 Our approach to selecting an error recovery policy is to prefer non-understandings to misunderstandings. [sent-59, score-0.375]
31 There is a known trade-offin spoken dialogue systems between allowing misunderstandings, i. [sent-60, score-0.257]
32 , cases in which a system accepts and acts on an incorrect interpretation of an utterance, and non-understandings, i. [sent-62, score-0.321]
33 Since misunderstandings on the part of a computer tutor are known to negatively impact student learning, and since in human-human tutorial dialogue the majority of student responses using unexpected terminology are classified as incorrect (Jordan et al. [sent-65, score-1.925]
34 , 2009), it would be a reasonable approach for a tutorial dialogue system to deal with potential interpretation problems by treating low-confidence interpretations as non-understandings and focusing on an effective non-understanding recovery policy. [sent-66, score-0.835]
35 All student utterances are passed through the standard interpretation pipeline, so that the results can be analyzed later. [sent-69, score-0.564]
36 However, the system does not attempt to address the student content. [sent-70, score-0.418]
37 Instead, regardless of the answer analysis, the system always uses a neutral acceptance and bottom out strategy, giving the student the correct answer every time, e. [sent-71, score-0.838]
38 One way to phrase the correct answer is: the open switch creates a gap in the circuit”. [sent-74, score-0.262]
39 Thus, the students are never given any indication of whether they have been understood or not. [sent-75, score-0.234]
40 The full policy acts differently depending on the analysis of the student answer. [sent-76, score-0.502]
41 For correct answers, it acknowledges the answer as correct and optionally restates it (see (Dzikovska et al. [sent-77, score-0.315]
42 For incorrect answers, it restates the correct portion of the answer (if any) and provides a hint to guide the student towards the completely correct answer. [sent-79, score-0.839]
43 44 The content of the bottom out is the same as in the baseline, except that the full system indicates clearly that the answer was incorrect or was not understood, e. [sent-83, score-0.399]
44 The help messages are based on the TargetedHelp approach successfully used in spoken dialogue (Hockey et al. [sent-87, score-0.349]
45 , 2003), together with the error classification we developed for tutorial dialogue (Dzikovska et al. [sent-88, score-0.52]
46 The goal of the help messages is to give the student as much information as possible as to why the system failed to understand them but without giving away the answer. [sent-91, score-0.547]
47 In comparing the two policies, we would expect that the students in both conditions would learn something, but that the learning gain and user satisfaction would be affected by the difference in policies. [sent-92, score-0.497]
48 We hypothesized that students who receive feedback on their errors in the full condition would learn more compared to those in the baseline condition. [sent-93, score-0.372]
49 Each session lasted approximately 4 hours, with 232 student language turns in FULL (SD = 25. [sent-97, score-0.367]
50 In informal comments after the session many students said that they were frustrated when the system said that it did not understand them. [sent-117, score-0.322]
51 However, some students in BASE also mentioned that they sometimes were not sure if the system’s answer was correcting a problem with their answer, or simply phrasing it in a different way. [sent-118, score-0.489]
52 We used mean frequency of non-interpretable utterances (out of all student utterances in each session) to evaluate the effectiveness of the two different policies. [sent-119, score-0.516]
53 2 The frequency of nonunderstandings was negatively correlated with learning gain in FULL: r = −0. [sent-121, score-0.43]
54 f5 non-understandings was negatively correlated with user satisfaction: FULL r = −0. [sent-128, score-0.357]
55 Thus, even though in BASE th −e0 system did not indicate non-understanding, students were negatively affected. [sent-133, score-0.476]
56 That is, they were not satisfied with the policy that did not directly address the interpretation problems. [sent-134, score-0.244]
57 We investigated the effect of different types of interpretation errors using two criteria. [sent-136, score-0.247]
58 The reduced frequency means that the recovery strategy for this particular error type is effective in reducing the error frequency. [sent-138, score-0.487]
59 Second, we looked for the cases where the frequency of a given error type is negatively correlated with either learning gain or user satisfaction. [sent-139, score-0.61]
60 An irrelevant answer error occurs when the student makes a statement that uses domain termi2We do not know the percentage of misunderstandings or concept error rate as yet. [sent-143, score-0.986]
61 For example, the expected answer to “In circuit 1, which components are in a closed path? [sent-148, score-0.278]
62 ” If that happens, in FULL the system says “Sorry, this isn’t the form of answer that I expected. [sent-151, score-0.279]
63 I lookam ing for a component”, pointing out to the student the kind of information it is looking for. [sent-152, score-0.367]
64 The BASE system for this error, and for all other errors discussed below, gives away the correct answer without indicating that there was a problem with interpreting the student’s utterance, e. [sent-153, score-0.372]
65 ” The no appr terms error happens when the stu- dent is using terminology inappropriate for the lesson in general. [sent-156, score-0.422]
66 If instead the student says “Voltage is electricity”, FULL responds with “I am sorry, Iam having trouble understanding. [sent-160, score-0.44]
67 We had hoped that by telling the student that the content of their utterance is outside the domain as understood by the system, and hinting at the correct terms to use, the system would guide students towards a better answer. [sent-164, score-0.765]
68 Selectional restr failure errors are typically due to incorrect terminology, when the students phrased answers in a way that contradicted the sys- with p <= 0. [sent-165, score-0.626]
69 So if the student says “The path is damaged”, the FULL system would respond with “I am sorry, Iam having trouble understanding. [sent-169, score-0.459]
70 ” Program error were caused by faults in the underlying network software, but usually occurred when the student was using extremely long and complicated utterances. [sent-172, score-0.48]
71 Out of the four important error types described above, only the strategy for irrelevant answer was effective: the frequency of irrelevant answer errors is significantly higher in BASE (t-test, p < 0. [sent-173, score-0.781]
72 05), and it is negatively correlated with learning gain in BASE. [sent-174, score-0.393]
73 However, one other finding is particularly interesting: the frequency of no appr terms errors is negatively correlated with user satisfaction in BASE. [sent-176, score-0.648]
74 This indicates that simply accepting the student’s answer when they are using incorrect terminology and exposing them to the correct answer is not the best strategy, possibly because the students are noticing the unexplained lack of alignment be- tween their utterance and the system’s answer. [sent-177, score-1.085]
75 4 Discussion and Future Work As discussed in Section 1, previous studies of short-answer tutorial dialogue systems produced a counter-intuitive result: measures of interpretation accuracy were not correlated with learning gain. [sent-178, score-0.665]
76 With less restricted language, misunderstandings 46 negatively affected learning. [sent-179, score-0.336]
77 Our study provides further evidence that interpretation quality significantly affects learning gain in tutorial dialogue. [sent-180, score-0.469]
78 Moreover, while it has long been known that user satisfaction is negatively correlated with interpretation error rates in spoken dialogue, this is the first attempt to evaluate the impact of different types of interpretation errors on task success and usability of a tutoring system. [sent-181, score-1.297]
79 In our system, all of the error types negatively correlated with learning gain stem from the same underlying problem: the use of incorrect or vague terminology by the student. [sent-183, score-0.846]
80 With the exception of the irrelevant answer strategy, the targeted help strategies we implemented were not effective in reducing error frequency or improving learning gain. [sent-184, score-0.533]
81 One possibility is that irrelevant answer was easier to remediate compared to other error types. [sent-186, score-0.361]
82 Help messages for other error types were more frequent when the expected answer was a complex sentence, and multiple possible ways of phrasing the correct answer were acceptable. [sent-191, score-0.646]
83 One way to improve the help messages may be to have the system indicate more clearly when user terminology is a problem. [sent-193, score-0.395]
84 Our system apologized each time there was a non-understanding, leading students to believe that they may be answering cor- rectly but the answer is not being understood. [sent-194, score-0.472]
85 Together with an appropriate mechanism to detect paraphrases of correct answers (as opposed to vague answers whose correctness is difficult to determine), this approach could be more beneficial in helping students learn. [sent-197, score-0.546]
86 However, our results also indicate that simply accepting the incorrect terminology may not be the best strategy. [sent-203, score-0.364]
87 Users appear to be sensitive when the system’s language does not align with their terminology, as reflected in the decreased satisfaction ratings associated with higher rates of incorrect terminology problems in BASE. [sent-204, score-0.43]
88 Moreover, prior analysis of human-human data indicates that tutors use different restate strategies depending on the “quality” of the student answers, even if they are accepting them as correct (Dzikovska et al. [sent-205, score-0.579]
89 Together, these point at an important unaddressed issue: existing systems are often built on the assumption that only incorrect and missing parts ofthe student answer should be remediated, and a wide range of terminology should be accepted (Graesser et al. [sent-207, score-0.85]
90 While it is obviously important for the system to accept a range of different phrasings, our analysis indicates that this may not be sufficient by itself, and students could potentially benefit from addressing the terminology issues with a specifically devised strategy. [sent-210, score-0.505]
91 Finally, it could also be possible that some differences between strategy effectiveness were caused by incorrect error type classification. [sent-211, score-0.307]
92 Manual examination of several dialogues suggests that most of the errors are assigned to the appropriate type, though in some cases incorrect syntac- tic parses resulted in unexpected interpretation errors, causing the system to give a confusing help message. [sent-212, score-0.496]
93 - An investigation of nonunderstanding errors and recovery strategies. [sent-230, score-0.265]
94 Beetle II: a system for tutoring and computational linguistics experimentation. [sent-262, score-0.289]
95 Targeted help for spoken dialogue systems: intelligent feedback improves naive users’ performance. [sent-275, score-0.353]
96 Combining competing language understanding approaches in an intelligent tutoring system. [sent-280, score-0.287]
97 Evidence of misunderstandings in tutorial dialogue and their impact on learning. [sent-290, score-0.585]
98 Speech recognition performance and learning in spoken dialogue tutoring. [sent-294, score-0.257]
99 Assessing forward-, reverse-, and averageentailment indeces on natural language input from the intelligent tutoring system, iSTART. [sent-305, score-0.287]
100 Contentlearning correlations in spoken tutoring dialogs at word, turn and discourse levels. [sent-321, score-0.295]
wordName wordTfidf (topN-words)
[('student', 0.367), ('tutoring', 0.238), ('students', 0.234), ('dzikovska', 0.218), ('tutorial', 0.207), ('dialogue', 0.2), ('negatively', 0.191), ('answer', 0.187), ('terminology', 0.185), ('recovery', 0.177), ('interpretation', 0.159), ('misunderstandings', 0.145), ('error', 0.113), ('answers', 0.111), ('incorrect', 0.111), ('gain', 0.103), ('correlated', 0.099), ('elaine', 0.099), ('myroslava', 0.099), ('satisfaction', 0.093), ('litman', 0.092), ('graesser', 0.091), ('circuit', 0.091), ('sorry', 0.091), ('errors', 0.088), ('policy', 0.085), ('jordan', 0.084), ('flairs', 0.08), ('beetle', 0.079), ('farrow', 0.079), ('gwendolyn', 0.079), ('natalie', 0.079), ('steinhauser', 0.079), ('tutor', 0.079), ('appr', 0.073), ('policies', 0.07), ('johanna', 0.069), ('sd', 0.068), ('accepting', 0.068), ('phrasing', 0.068), ('utterance', 0.067), ('user', 0.067), ('diane', 0.064), ('irrelevant', 0.061), ('aied', 0.059), ('callaway', 0.059), ('strategies', 0.058), ('spoken', 0.057), ('grove', 0.054), ('pamela', 0.054), ('base', 0.053), ('system', 0.051), ('hockey', 0.051), ('lesson', 0.051), ('coconut', 0.051), ('moore', 0.05), ('full', 0.05), ('intelligent', 0.049), ('campbell', 0.048), ('help', 0.047), ('strategy', 0.047), ('correct', 0.046), ('hint', 0.046), ('bohus', 0.045), ('restr', 0.045), ('messages', 0.045), ('vague', 0.044), ('says', 0.041), ('florida', 0.041), ('problems', 0.041), ('unexpected', 0.04), ('aleven', 0.04), ('bulbs', 0.04), ('contentful', 0.04), ('kurt', 0.04), ('makatchev', 0.04), ('tutors', 0.04), ('voltage', 0.04), ('iam', 0.04), ('utterances', 0.038), ('frequency', 0.037), ('understand', 0.037), ('failure', 0.037), ('restates', 0.036), ('electricity', 0.036), ('nielsen', 0.036), ('damaged', 0.036), ('effectiveness', 0.036), ('charles', 0.036), ('accept', 0.035), ('maxim', 0.034), ('brighton', 0.034), ('purandare', 0.034), ('impact', 0.033), ('responds', 0.032), ('education', 0.032), ('selectional', 0.031), ('targeted', 0.03), ('assessments', 0.029), ('switch', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell
Abstract: Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.
2 0.73044652 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell ; Elaine Farrow ; Charles B. Callaway
Abstract: We present BEETLE II, a tutorial dialogue system designed to accept unrestricted language input and support experimentation with different tutorial planning and dialogue strategies. Our first system evaluation used two different tutorial policies and demonstrated that the system can be successfully used to study the impact of different approaches to tutoring. In the future, the system can also be used to experiment with a variety ofnatural language interpretation and generation techniques.
3 0.16399591 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
Author: Srinivasan Janarthanam ; Oliver Lemon
Abstract: We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to understand in technical domains where users may not know the technical ‘jargon’ names of the domain entities. In such cases, dialogue systems must be able to model the user’s (lexical) domain knowledge and use appropriate referring expressions. We present a reinforcement learning (RL) framework in which the sys- tem learns REG policies which can adapt to unknown users online. Furthermore, unlike supervised learning methods which require a large corpus of expert adaptive behaviour to train on, we show that effective adaptive policies can be learned from a small dialogue corpus of non-adaptive human-machine interaction, by using a RL framework and a statistical user simulation. We show that in comparison to adaptive hand-coded baseline policies, the learned policy performs significantly better, with an 18.6% average increase in adaptation accuracy. The best learned policy also takes less dialogue time (average 1.07 min less) than the best hand-coded policy. This is because the learned policies can adapt online to changing evidence about the user’s domain expertise.
4 0.12236305 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
Author: Verena Rieser ; Oliver Lemon ; Xingkun Liu
Abstract: We present a novel approach to Information Presentation (IP) in Spoken Dialogue Systems (SDS) using a data-driven statistical optimisation framework for content planning and attribute selection. First we collect data in a Wizard-of-Oz (WoZ) experiment and use it to build a supervised model of human behaviour. This forms a baseline for measuring the performance of optimised policies, developed from this data using Reinforcement Learning (RL) methods. We show that the optimised policies significantly outperform the baselines in a variety of generation scenarios: while the supervised model is able to attain up to 87.6% of the possible reward on this task, the RL policies are significantly better in 5 out of 6 scenarios, gaining up to 91.5% of the total possible reward. The RL policies perform especially well in more complex scenarios. We are also the first to show that adding predictive “lower level” features (e.g. from the NLG realiser) is important for optimising IP strategies according to user preferences. This provides new insights into the nature of the IP problem for SDS.
5 0.12221096 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
6 0.12009714 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
7 0.10311456 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
8 0.093610518 178 acl-2010-Non-Cooperation in Dialogue
9 0.089502126 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
10 0.087543711 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
11 0.085925877 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
12 0.082264625 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
13 0.071447022 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
14 0.068902686 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
15 0.0676184 190 acl-2010-P10-5005 k2opt.pdf
16 0.066197976 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
17 0.060779039 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
18 0.059348702 31 acl-2010-Annotation
19 0.058671545 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System
20 0.058387477 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
topicId topicWeight
[(0, -0.15), (1, 0.095), (2, -0.082), (3, -0.252), (4, -0.041), (5, -0.283), (6, -0.219), (7, 0.079), (8, -0.113), (9, -0.002), (10, 0.13), (11, -0.066), (12, -0.049), (13, 0.013), (14, 0.014), (15, 0.052), (16, -0.243), (17, -0.253), (18, 0.099), (19, -0.26), (20, -0.118), (21, 0.361), (22, -0.101), (23, -0.19), (24, 0.031), (25, -0.116), (26, -0.082), (27, -0.083), (28, 0.032), (29, 0.062), (30, 0.05), (31, -0.013), (32, 0.081), (33, -0.115), (34, 0.119), (35, 0.061), (36, 0.092), (37, -0.069), (38, 0.009), (39, -0.088), (40, 0.057), (41, 0.143), (42, 0.067), (43, -0.079), (44, -0.01), (45, 0.061), (46, 0.024), (47, 0.027), (48, 0.03), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.97622365 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell
Abstract: Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.
2 0.96947873 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell ; Elaine Farrow ; Charles B. Callaway
Abstract: We present BEETLE II, a tutorial dialogue system designed to accept unrestricted language input and support experimentation with different tutorial planning and dialogue strategies. Our first system evaluation used two different tutorial policies and demonstrated that the system can be successfully used to study the impact of different approaches to tutoring. In the future, the system can also be used to experiment with a variety ofnatural language interpretation and generation techniques.
3 0.34849674 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
4 0.34200546 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System
Author: Jessica Villing
Abstract: In-vehicle dialogue systems often contain more than one application, e.g. a navigation and a telephone application. This means that the user might, for example, interrupt the interaction with the telephone application to ask for directions from the navigation application, and then resume the dialogue with the telephone application. In this paper we present an analysis of interruption and resumption behaviour in human-human in-vehicle dialogues and also propose some implications for resumption strategies in an in-vehicle dialogue system.
5 0.33328596 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
Author: Marie-Catherine de Marneffe ; Christopher D. Manning ; Christopher Potts
Abstract: Texts and dialogues often express information indirectly. For instance, speakers’ answers to yes/no questions do not always straightforwardly convey a ‘yes’ or ‘no’ answer. The intended reply is clear in some cases (Was it good? It was great!) but uncertain in others (Was it acceptable? It was unprecedented.). In this paper, we present methods for interpreting the answers to questions like these which involve scalar modifiers. We show how to ground scalar modifier meaning based on data collected from the Web. We learn scales between modifiers and infer the extent to which a given answer conveys ‘yes’ or ‘no’ . To evaluate the methods, we collected examples of question–answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’ . Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.
6 0.32465065 178 acl-2010-Non-Cooperation in Dialogue
7 0.3087523 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
8 0.3055951 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
9 0.29389948 190 acl-2010-P10-5005 k2opt.pdf
10 0.28754464 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
11 0.27949172 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
12 0.26753029 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
13 0.24108323 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
14 0.23782268 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
15 0.22522752 81 acl-2010-Decision Detection Using Hierarchical Graphical Models
16 0.2244065 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
17 0.21620937 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
18 0.20706603 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
19 0.20122673 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
20 0.19110405 248 acl-2010-Unsupervised Ontology Induction from Text
topicId topicWeight
[(14, 0.014), (25, 0.046), (32, 0.227), (39, 0.01), (42, 0.041), (44, 0.01), (58, 0.204), (59, 0.061), (73, 0.062), (78, 0.023), (83, 0.09), (84, 0.027), (98, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.78515005 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell
Abstract: Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.
2 0.77503252 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell ; Elaine Farrow ; Charles B. Callaway
Abstract: We present BEETLE II, a tutorial dialogue system designed to accept unrestricted language input and support experimentation with different tutorial planning and dialogue strategies. Our first system evaluation used two different tutorial policies and demonstrated that the system can be successfully used to study the impact of different approaches to tutoring. In the future, the system can also be used to experiment with a variety ofnatural language interpretation and generation techniques.
3 0.6421231 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
Author: Emily M. Bender ; Scott Drellishak ; Antske Fokkens ; Michael Wayne Goodman ; Daniel P. Mills ; Laurie Poulson ; Safiyyah Saleem
Abstract: This demonstration presents the LinGO Grammar Matrix grammar customization system: a repository of distilled linguistic knowledge and a web-based service which elicits a typological description of a language from the user and yields a customized grammar fragment ready for sustained development into a broad-coverage grammar. We describe the implementation of this repository with an emphasis on how the information is made available to users, including in-browser testing capabilities.
4 0.62603378 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
Author: Francisco Costa ; Antonio Branco
Abstract: We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to validate this adaptation, we use the obtained data to replicate some results in the literature that used the original English data. The fact that comparable results are obtained indicates that our approach can be used successfully to rapidly create semantically annotated resources for new languages.
5 0.53315687 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita
Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.
6 0.4722591 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
7 0.46612298 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
8 0.46422598 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
9 0.46116069 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
10 0.46018356 214 acl-2010-Sparsity in Dependency Grammar Induction
11 0.45914295 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
12 0.45834965 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
13 0.45752245 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
14 0.45694172 158 acl-2010-Latent Variable Models of Selectional Preference
15 0.45670319 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."
16 0.45628646 39 acl-2010-Automatic Generation of Story Highlights
17 0.45618707 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
18 0.45553917 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
19 0.45519251 71 acl-2010-Convolution Kernel over Packed Parse Forest
20 0.45513368 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries