acl acl2010 acl2010-81 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Trung H. Bui ; Stanley Peters
Abstract: We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
Reference: text
sentIndex sentText sentNum sentScore
1 Decision detection using hierarchical graphical models Trung H. [sent-1, score-0.25]
2 edu Abstract We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. [sent-3, score-0.271]
3 Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. [sent-4, score-0.161]
4 HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. [sent-5, score-0.282]
5 For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0. [sent-6, score-0.459]
6 1 Introduction In work environments, people share information and make decisions in multi-party conversations known as meetings. [sent-9, score-0.034]
7 The demand for systems that can automatically process information contained in audio and video recordings of meetings is growing rapidly. [sent-10, score-0.141]
8 We are currently investigating the automatic detection of decision discussions. [sent-13, score-0.299]
9 Our approach involves distinguishing between different dialogue act (DA) types based on their role in the decisionmaking process. [sent-14, score-0.205]
10 Groups of DDAs combine to form a decision region. [sent-16, score-0.218]
11 , 2009) showed that Directed Graphical Models (DGMs) outperform other machine learning techniques such as Support Vector Machines (SVMs) for detecting individual DDAs. [sent-18, score-0.097]
12 However, the proposed models, which were non-hierarchical, did not signifi- cantly improve identification of decision regions. [sent-19, score-0.218]
13 This paper tests whether giving DGMs hierarchical structure (making them HGMs) can improve Stanley Peters CSLI Stanford University Stanford, CA 94305, USA pet e rs @ c s l . [sent-20, score-0.054]
14 Section 2 discusses related work, and section 3 our data set and annotation scheme for decision discussions. [sent-24, score-0.261]
15 Section 4 summarizes previous decision detection experiments using DGMs. [sent-25, score-0.299]
16 , 2005) have confirmed that meeting participants consider deci- sions to be one of the most important meeting outputs, and Whittaker et al. [sent-29, score-0.092]
17 (2006) found that the development of an automatic decision detection component is critical for re-using meeting archives. [sent-30, score-0.345]
18 With the new availability of substantial meeting corpora such as the AMI corpus (McCowan et al. [sent-31, score-0.046]
19 , 2005), recent years have seen an increasing amount ofresearch on decisionmaking dialogue. [sent-32, score-0.044]
20 This research has tackled issues such as the automatic detection of agreement and disagreement (Galley et al. [sent-33, score-0.111]
21 , 2004), and of the level of involvement of conversational participants (Gatica-Perez et al. [sent-34, score-0.047]
22 Recent work on automatic detection of decisions has been conducted by Hsueh and Moore (2007), Fern a´ndez et al. [sent-36, score-0.115]
23 ” Here, one sub-classifier per DDA class hypothesizes occurrences of that type of DDA and then, based on these hypotheses, a super-classifier determines which regions of dialogue are decision discussions. [sent-45, score-0.555]
24 Results were better than those obtained with (Hsueh and Moore, 2007)’s approach—the F1-score for detecting decision discussions in manual transcripts was 0. [sent-47, score-0.38]
25 DGMs are attractive because they provide a natural framework for modeling sequence and dependencies between variables, including the DDAs. [sent-57, score-0.058]
26 (2008) obtained much more value from lexical than nonlexical features (and indeed no value at all from prosodic features), but lexical features have limitations. [sent-61, score-0.173]
27 In particular, they can be domain specific, increase the size of the feature space dramatically, and deteriorate more in quality than other features when automatic speech recognition (ASR) is poor. [sent-62, score-0.057]
28 More detail about decision detection using DGMs will be presented in section 4. [sent-63, score-0.299]
29 Beyond decision detection, DGMs are used for labeling and segmenting sequences of observations in many different fields—including bioinformatics, ASR, Natural Language Processing (NLP), and information extraction. [sent-64, score-0.287]
30 Undirected graphical models (UGMs) are also valuable for building probabilistic models for segmenting and labeling sequence data. [sent-72, score-0.232]
31 , 2001) and outperform maximum entropy Markov models and HMMs. [sent-74, score-0.053]
32 However, the graphical models used in these applications are mainly non-hierarchical, including those in Bui et al. [sent-75, score-0.115]
33 (2007) proposed a three-level HGM (in the form of a dynamic CRF) for the joint noun phrase chunking and part of speech labeling problem; they showed that this model performs better than a non- hierarchical counterpart. [sent-78, score-0.083]
34 3 Data For the experiments reported in this study, we used 17 meetings from the AMI Meeting Corpus1 , a freely available corpus of multi-party meetings with both audio and video recordings, and a wide range of annotated information including DAs and topic segmentation. [sent-79, score-0.17]
35 The meetings last around 30 minutes each, and are scenario-driven, wherein four participants play different roles in a company’s design team: project manager, marketing expert, interface designer and industrial designer. [sent-80, score-0.059]
36 We use the same annotation scheme as Fern a´ndez et al. [sent-81, score-0.043]
37 As stated in section 2, this scheme distinguishes between a small number of DA types based on the role which they perform in the formulation of a decision. [sent-83, score-0.073]
38 Besides improving the detection of decision discussions (Fern ´andez et al. [sent-84, score-0.34]
39 , 2008), such a scheme also aids in summarization of them, because it indicates which utterances provide particular types of information. [sent-85, score-0.172]
40 Hence the scheme distinguishes between three main DDA classes: issue (I), resolution (R), and agreement (A). [sent-87, score-0.22]
41 Class R is further subdivided into resolution proposal (RP) and resolution restatement (RR). [sent-88, score-0.164]
42 I utterances introduce the topic of the decision discussion, examples being “Are we going to have a backup? [sent-89, score-0.347]
43 In comparison, R utterances specify the resolution which is ultimately adopted as the deci1http://corpus. [sent-92, score-0.211]
44 Or we do just– B: But would a backup really be necessary? [sent-95, score-0.087]
45 A: I think maybe we could just go for the kinetic energy and be bold and innovative. [sent-96, score-0.212]
46 Table 1: An excerpt from the AMI dialogue ES2015c. [sent-105, score-0.129]
47 “I think maybe we could just go for the kinetic energy . [sent-110, score-0.212]
48 ”), while RR utterances close the discussion by confirming/summarizing the decision (e. [sent-113, score-0.388]
49 Finally, A utterances agree with the proposed resolution, signaling that it is adopted as the decision, (e. [sent-116, score-0.129]
50 Unsurprisingly, an utterance may be assigned to more than one DDA class; and within a decision discussion, more than one utterance can be assigned to the same DDA class. [sent-119, score-0.434]
51 We use manual transcripts in the experiments described here. [sent-120, score-0.053]
52 Inter-annotator agreement was satisfactory, with kappa values ranging from . [sent-121, score-0.03]
53 The manual transcripts contain a total of 15,680 utterances, and on average 40 DDAs per meeting. [sent-124, score-0.053]
54 9%) are tagged as decision-related utterances, and on average there are 221 decision-related utterances per meeting. [sent-133, score-0.129]
55 The DDA node is binary valued, with value 1 indicating the presence of a DDA and 0 its absence. [sent-137, score-0.029]
56 These include utterance features (UTT) such as length in words2, duration in milliseconds, position within the meeting (as percentage of elapsed time), manually annotated dialogue act (DA) features3 such as inform, assess, suggest, and prosodic features (PROS) such as energy and pitch. [sent-139, score-0.488]
57 These features are the same as the nonlexical features used by Fern a´ndez et al. [sent-140, score-0.126]
58 The hidden component node (C) in the -mix models represents the distribution of observable evidence E as a mixture of Gaussian distributions. [sent-142, score-0.104]
59 a)BDN E-Asim DEbDA)BN-miCx DEDAD EA CDDEADDEAC time t-1 time t c) DBN-sim time t-1 time t d) DBN-mix Figure 1: Simple DGMs for individual decision dialogue act detection. [sent-144, score-0.379]
60 More complex models were constructed from the four simple models in Fig. [sent-146, score-0.048]
61 time t-1 time t IRPERRAIRPERRA Figure 2: A DGM that takes the dependencies between decision dialogue acts into account. [sent-151, score-0.425]
62 Decision discussion regions were identified using the DGM output and the following two simple rules: (1) A decision discussion region begins with an Issue DDA; (2) A decision discussion region contains at least one Issue DDA and one Resolution DDA. [sent-152, score-0.854]
63 We plan experiments to determine how much using ASR output degrades detection of decision regions. [sent-155, score-0.327]
64 309 The authors conducted experiments using the AMI corpus and found that when using nonlexical features, the DGMs outperform the hierarchical SVM classification method of (Fern ´andez et al. [sent-157, score-0.173]
65 5 Hierarchical graphical models Although the results just discussed showed graphical models are better than SVMs for detecting decision dialogue acts (Bui et al. [sent-165, score-0.689]
66 , 2009), two-level graphical models like those shown in Figs. [sent-166, score-0.115]
67 1and 2 cannot exploit dependencies between high-level discourse items such as decision discussions and DDAs; and the “superclassifier” rule (Bui et al. [sent-167, score-0.378]
68 , 2009) used for detecting decision regions did not significantly improve the F1-score for decisions. [sent-168, score-0.459]
69 We thus investigate whether HGMs (structured as three or more levels) are superior for discovering the structure and learning the parameters of decision recognition. [sent-169, score-0.218]
70 Our approach composes graphical models to increase hierarchy with an additional level above or below previous ones, or inserts a new level such as for discourse topics into the interior of a given model. [sent-170, score-0.22]
71 The top level corresponds to high-level discourse regions such as decision discussions. [sent-173, score-0.472]
72 The middle level corresponds to mid-level discourse items such as issues, resolution proposals, resolution restatements, and agreements. [sent-175, score-0.306]
73 , Cn nodes) are represented as a collection of random variables, each corresponding to an individual mid-level utterance class. [sent-179, score-0.108]
74 For example, the middle level of the three-level HGM Fig. [sent-180, score-0.057]
75 2, each middle level node containing random variables for the DDA classes I, RP, RR, and A. [sent-182, score-0.114]
76 The bottom level corresponds to vectors of observed features as before, e. [sent-183, score-0.054]
77 current utterance next utterance Le ve vle2l 13DCER1C nC 1DERCn Figure 3: A simple structure of a three-level HGM: DRs are high-level discourse regions; C1, C2, . [sent-186, score-0.273]
78 , Cn are mid-level utterance classes; and Es are vectors of observed features. [sent-189, score-0.108]
79 The classifier hypothesizes that an utterance belongs to a decision region if the marginal probability of the utterance’s DR node is above a hand-tuned threshold. [sent-191, score-0.48]
80 To evaluate the accuracy of hypothesized decision regions, we divided the dialogue into 30-second windows and evaluated on a per window basis. [sent-193, score-0.347]
81 4b explicitly models the dependency between the decision regions and the observed features. [sent-199, score-0.415]
82 Table 2 shows the results of 17-fold crossvalidation for the hierarchical SVM classification (Fern ´andez et al. [sent-202, score-0.054]
83 The features were selected based on the best combination of non-lexical features for each method. [sent-216, score-0.06]
84 158e0c5isondis- cussion regions by the SVM super-classifier, rule-based DGM classifier, and HGM classifier, each using its best combination of non-lexical features: SVM (UTT+DA), DGM (UTT+DA+PROS), HGM (UTT+DA). [sent-223, score-0.173]
85 In contrast with the hierarchical SVM and rulebased DGM methods, the HGM method identifies decision-related utterances by exploiting not just DDAs but also direct dependencies between decision regions and UTT, DA, and PROS features. [sent-224, score-0.608]
86 As mentioned in the second paragraph of this section, explicitly modeling the dependency between decision regions and observable features helps to improve detection of decision regions. [sent-225, score-0.747]
87 Furthermore, a three-level HGM can straightforwardly model the composition of each high-level decision region as a sequence of mid-level DDA utterances. [sent-226, score-0.303]
88 While the hierarchical SVM method can also take dependency between successive utterances into account, it has no principled way to associate this dependency with more extended decision regions. [sent-227, score-0.401]
89 In addition, this dependency is only meaningful for lexical features (Fern ´andez et al. [sent-228, score-0.03]
90 7 Conclusions and Future Work To detect decision discussions in multi-party dialogue, we investigated HGMs as an extension of 6We used the paired t test for computing statistical signif- icance. [sent-239, score-0.259]
91 , 2009) and also the hierarchical SVM classification method of Fern a´ndez et al. [sent-246, score-0.054]
92 The F1-score for identifying decision discussion regions increased to 0. [sent-248, score-0.432]
93 , 2007) to improve the detection of DDAs further by using detected decision regions and (b) extend HGMs beyond three levels in order to integrate useful semantic information such as topic structure. [sent-254, score-0.472]
94 The necessity of a meeting recording and playback system, and the benefit of topic-level annotations to meeting browsing. [sent-259, score-0.092]
95 Extracting decisions from multi-party dialogue using directed graphical models and semantic similarity. [sent-271, score-0.278]
96 Unsupervised classification of dialogue acts using a dirichlet process mixture model. [sent-275, score-0.173]
97 Identifying agreement and disagreement in conversational speech: Use of Bayesian networks to model pragmatic dependencies. [sent-287, score-0.053]
98 Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [sent-309, score-0.117]
99 Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. [sent-337, score-0.117]
100 Analysing meeting records: An ethnographic study and technological implications. [sent-341, score-0.046]
wordName wordTfidf (topN-words)
[('hgm', 0.35), ('dda', 0.328), ('bui', 0.306), ('dgms', 0.262), ('fern', 0.249), ('decision', 0.218), ('ddas', 0.197), ('dgm', 0.175), ('hgms', 0.175), ('regions', 0.173), ('ami', 0.153), ('ndez', 0.14), ('utt', 0.131), ('utterances', 0.129), ('dialogue', 0.129), ('da', 0.118), ('andez', 0.109), ('utterance', 0.108), ('graphical', 0.091), ('backup', 0.087), ('kinetic', 0.087), ('resolution', 0.082), ('detection', 0.081), ('detecting', 0.068), ('energy', 0.066), ('hsueh', 0.066), ('mccowan', 0.066), ('nonlexical', 0.066), ('purver', 0.066), ('svm', 0.065), ('region', 0.061), ('meetings', 0.059), ('dbns', 0.057), ('discourse', 0.057), ('stanley', 0.055), ('asr', 0.055), ('rr', 0.055), ('hierarchical', 0.054), ('transcripts', 0.053), ('okay', 0.053), ('rp', 0.051), ('sigdial', 0.047), ('prosodic', 0.047), ('pros', 0.047), ('meeting', 0.046), ('acts', 0.044), ('crook', 0.044), ('decisionmaking', 0.044), ('dowding', 0.044), ('ehlen', 0.044), ('trung', 0.044), ('scheme', 0.043), ('discussions', 0.041), ('discussion', 0.041), ('segmenting', 0.04), ('sutton', 0.039), ('frampton', 0.038), ('mlmi', 0.038), ('yeah', 0.038), ('hypothesizes', 0.035), ('gael', 0.035), ('janin', 0.035), ('issue', 0.035), ('decisions', 0.034), ('dependencies', 0.034), ('matthew', 0.034), ('das', 0.033), ('whittaker', 0.033), ('middle', 0.033), ('act', 0.032), ('elizabeth', 0.031), ('maybe', 0.031), ('distinguishes', 0.03), ('agreement', 0.03), ('recordings', 0.03), ('features', 0.03), ('outperform', 0.029), ('classifier', 0.029), ('labeling', 0.029), ('node', 0.029), ('items', 0.028), ('degrades', 0.028), ('csli', 0.028), ('video', 0.028), ('think', 0.028), ('classes', 0.028), ('fields', 0.028), ('stanford', 0.027), ('observable', 0.027), ('banerjee', 0.027), ('naval', 0.027), ('recognition', 0.027), ('dr', 0.026), ('authors', 0.024), ('level', 0.024), ('sequence', 0.024), ('models', 0.024), ('audio', 0.024), ('hidden', 0.024), ('conversational', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 81 acl-2010-Decision Detection Using Hierarchical Graphical Models
Author: Trung H. Bui ; Stanley Peters
Abstract: We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
2 0.1051247 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
3 0.094969705 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
Author: Francois Mairesse ; Milica Gasic ; Filip Jurcicek ; Simon Keizer ; Blaise Thomson ; Kai Yu ; Steve Young
Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation perfor- mance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.
4 0.078276053 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
Author: Costanza Navarretta ; Patrizia Paggio
Abstract: This paper addresses the issue of how linguistic feedback expressions, prosody and head gestures, i.e. head movements and face expressions, relate to one another in a collection of eight video-recorded Danish map-task dialogues. The study shows that in these data, prosodic features and head gestures significantly improve automatic classification of dialogue act labels for linguistic expressions of feedback.
5 0.074592836 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
Author: Ethan Selfridge ; Peter Heeman
Abstract: Current turn-taking approaches for spoken dialogue systems rely on the speaker releasing the turn before the other can take it. This reliance results in restricted interactions that can lead to inefficient dialogues. In this paper we present a model we refer to as Importance-Driven Turn-Bidding that treats turn-taking as a negotiative process. Each conversant bids for the turn based on the importance of the intended utterance, and Reinforcement Learning is used to indirectly learn this parameter. We find that Importance-Driven Turn-Bidding performs better than two current turntaking approaches in an artificial collaborative slot-filling domain. The negotiative nature of this model creates efficient dia- logues, and supports the improvement of mixed-initiative interaction.
6 0.065268688 178 acl-2010-Non-Cooperation in Dialogue
7 0.062981173 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
8 0.062266987 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
9 0.062074911 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
10 0.059820224 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
11 0.056395825 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
12 0.055195879 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
13 0.053046957 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
14 0.050588619 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
15 0.048423003 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System
16 0.048131291 86 acl-2010-Discourse Structure: Theory, Practice and Use
17 0.047870032 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
18 0.043952674 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
19 0.042805184 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
20 0.042329371 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
topicId topicWeight
[(0, -0.126), (1, 0.053), (2, -0.027), (3, -0.11), (4, -0.047), (5, -0.075), (6, -0.083), (7, 0.033), (8, -0.009), (9, 0.012), (10, -0.002), (11, -0.037), (12, 0.033), (13, 0.012), (14, 0.015), (15, -0.044), (16, -0.001), (17, -0.002), (18, 0.021), (19, -0.042), (20, -0.001), (21, 0.02), (22, -0.016), (23, 0.007), (24, -0.019), (25, 0.014), (26, 0.038), (27, 0.015), (28, 0.013), (29, 0.004), (30, -0.009), (31, 0.155), (32, 0.037), (33, 0.028), (34, 0.002), (35, -0.026), (36, -0.082), (37, 0.007), (38, -0.033), (39, 0.121), (40, 0.025), (41, -0.073), (42, -0.117), (43, -0.028), (44, 0.079), (45, 0.034), (46, -0.033), (47, -0.048), (48, -0.055), (49, -0.135)]
simIndex simValue paperId paperTitle
same-paper 1 0.91636956 81 acl-2010-Decision Detection Using Hierarchical Graphical Models
Author: Trung H. Bui ; Stanley Peters
Abstract: We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
2 0.76433754 178 acl-2010-Non-Cooperation in Dialogue
Author: Brian Pluss
Abstract: This paper presents ongoing research on computational models for non-cooperative dialogue. We start by analysing different levels of cooperation in conversation. Then, inspired by findings from an empirical study, we propose a technique for measuring non-cooperation in political interviews. Finally, we describe a research programme towards obtaining a suitable model and discuss previous accounts for conflictive dialogue, identifying the differences with our work.
3 0.69771075 194 acl-2010-Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning
Author: Francois Mairesse ; Milica Gasic ; Filip Jurcicek ; Simon Keizer ; Blaise Thomson ; Kai Yu ; Steve Young
Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation perfor- mance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.
4 0.6777724 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
5 0.6621688 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
Author: Costanza Navarretta ; Patrizia Paggio
Abstract: This paper addresses the issue of how linguistic feedback expressions, prosody and head gestures, i.e. head movements and face expressions, relate to one another in a collection of eight video-recorded Danish map-task dialogues. The study shows that in these data, prosodic features and head gestures significantly improve automatic classification of dialogue act labels for linguistic expressions of feedback.
6 0.62419671 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System
7 0.5511238 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
8 0.52835524 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation
9 0.52684563 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
10 0.50522429 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
11 0.43872225 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
12 0.41367495 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
13 0.4047704 112 acl-2010-Extracting Social Networks from Literary Fiction
14 0.38869905 74 acl-2010-Correcting Errors in Speech Recognition with Articulatory Dynamics
15 0.38105872 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
16 0.35918388 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
17 0.35663569 61 acl-2010-Combining Data and Mathematical Models of Language Change
19 0.32387909 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
20 0.31098309 40 acl-2010-Automatic Sanskrit Segmentizer Using Finite State Transducers
topicId topicWeight
[(3, 0.364), (14, 0.045), (25, 0.061), (35, 0.018), (42, 0.042), (59, 0.055), (71, 0.011), (73, 0.043), (78, 0.026), (80, 0.011), (83, 0.132), (84, 0.017), (98, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.74524546 81 acl-2010-Decision Detection Using Hierarchical Graphical Models
Author: Trung H. Bui ; Stanley Peters
Abstract: We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
2 0.60919017 264 acl-2010-Wrapping up a Summary: From Representation to Generation
Author: Josef Steinberger ; Marco Turchi ; Mijail Kabadjov ; Ralf Steinberger ; Nello Cristianini
Abstract: The main focus of this work is to investigate robust ways for generating summaries from summary representations without recurring to simple sentence extraction and aiming at more human-like summaries. This is motivated by empirical evidence from TAC 2009 data showing that human summaries contain on average more and shorter sentences than the system summaries. We report encouraging preliminary results comparable to those attained by participating systems at TAC 2009.
3 0.46264905 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
4 0.43476561 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."
Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth
Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.
5 0.42774647 73 acl-2010-Coreference Resolution with Reconcile
Author: Veselin Stoyanov ; Claire Cardie ; Nathan Gilbert ; Ellen Riloff ; David Buttler ; David Hysom
Abstract: Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare results across different coreference resolution systems. This is due to the high cost of implementing a complete end-to-end coreference resolution system, which often forces researchers to substitute available gold-standard information in lieu of implementing a module that would compute that information. Unfortunately, this leads to inconsistent and often unrealistic evaluation scenarios. With the aim to facilitate consistent and realistic experimental evaluations in coreference resolution, we present Reconcile, an infrastructure for the development of learning-based noun phrase (NP) coreference resolution systems. Reconcile is designed to facilitate the rapid creation of coreference resolution systems, easy implementation of new feature sets and approaches to coreference res- olution, and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental results showing that Reconcile can be used to create a coreference resolver that achieves performance comparable to state-ofthe-art systems on six benchmark data sets.
6 0.4235712 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
7 0.42259282 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
8 0.42028657 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
9 0.41797423 112 acl-2010-Extracting Social Networks from Literary Fiction
10 0.41638619 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
11 0.41590756 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
12 0.41567412 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
13 0.41489306 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
14 0.41209671 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
15 0.41092348 71 acl-2010-Convolution Kernel over Packed Parse Forest
16 0.41063735 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information
17 0.41052544 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
18 0.41044843 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization
19 0.40959939 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
20 0.40849853 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information